Talk:Cryptanalysis/Draft

Potential reorganization
I am thinking of a re-organisation here, along the lines: Then the topics we currently have under "Mathematical cryptanalysis".
 * Attacks on the system
 * Practical cryptanalysis
 * Traffic analysis
 * Side channel attacks
 * Bypassing authentication
 * Guessing secrets
 * Dictionary attacks on passwords
 * Random number weaknesses
 * Small keys
 * Attacks on the ciphers

Things like man-in-the-middle would then turn up in two places, first under "Bypassing authentication" because if you can do that then you don't have to break the actual encryption, and second under "Attacks on the ciphers" for details of attacks on different authentication mechanisms since those details are much the same as other attacks on RSA, block ciphers or whatever. Sandy Harris 01:51, 17 October 2008 (UTC)


 * Should social engineering be under guessing, or its own category? For that matter, where does one put the people who write their keys on their desk calendar?


 * "Social engineering" and "shoulder surfing" would be categories, perhaps subheads under practical cryptanalysis. Sandy Harris 07:18, 17 October 2008 (UTC)


 * Side channel, I assume. covers TEMPEST/HIJACK/TEAPOT/NONSTOP, timing analysis on plaintext, acoustic cryptanalysis, Operation RAFTER (specific case of getting the received text off the intermediate frequency)


 * Could you define "attacks on the ciphers"?


 * I think this is going somewhere interesting, but not sure where it is yet.


 * If you would, see if we can agree on some of the more specific (e.g.,) authentication attacks in communications security. I am also open to a better name for that article. Howard C. Berkowitz 05:51, 17 October 2008 (UTC)


 * By "attacks on the ciphers" (chosen mainly to contrast with "attacks on the system") I meant what is now called "mathematical cryptanalysis" and might be called "cryptanalysis proper". The article introduction refers to it as "classic cryptanalyis". Not sure what the best title would be.


 * Somewhere up in the opening/overview part I'd want to say that, while this is a real threat, it may not be the main threat in many cases. Quote Anderson about banking sytems "the threat model commonly used by cryptosystem designers was wrong: most frauds were not caused by cryptanalysis or other technical attacks, but by implementation errors and management failures." or Schneier's intro to Secrets and Lies where he says in some ways writing "Applied Cryptography" was a mistake; too much technology, not enough attention to other issues.


 * Side channel certainly covers Tempest and RAFTER (new to me). I'm not sure if differential fault analysis and timing attacks go there or under "attacks on the ciphers"; I lean toward the latter since they aim at finding the keys rather than just reading material. Sandy Harris 07:18, 17 October 2008 (UTC)

Orientation
Given that many readers won't have much background, I think we need a section that considers both early manual methods before they were pure guesswork (and maybe a little there), and then materials, some declassified, from the 1920s and 1930s.

Let me offer what is mostly outline, which would go before specific attacks. ...said Howard C. Berkowitz (talk)

Preparatory
While it is not necessary to be fluent in a language to cryptanalyze it, languages have different statistical properties and it helps to know the language. Of course, if there is a change of guessing probable words, knowledge of the language is more important, and knowledge of the area of application (e.g., smuggling liquor or describing radar) can reveal even more. In a real-world rather than puzzle cryptanalysis, knowledge of the circumstances in which the ciphertext was collected can be informative. William Friedman proposed four basic steps:


 * 1) The determination of the language employed in the plain-text version.
 * 2) The determination of the general system of cryptography employed.
 * 3) The reconstruction of the specific key in the case of a cipher system, or the reconstruction, partial or complete, of the code book, in the case of a code system; or both, in the case of an enciphered code system
 * 4) The reconstruction or establishment of the plain text

These are usually done in the order in which they are given above and in which they usually are performed in the solution of cryptograms, although occasionally the second step may precede the first.

Basic methods
The very earliest cryptanalysis seems to have been inspired guessing about the plaintext, or perhaps attacks against a known and fairly weak manual systems. In 20th century, mathematical and statistical techniques.

The most basic is frequency analysis. . When the frequency of letters, graphed by frequency against count, do not follow the curve characteristic to the language, which becomes increasingly effective when keys become more complex, even in simply polyalphabetic substitution on a monographic cipher, such as the Vigenere method. Frequency analysis is almost useless against pure transposition systems, other than than confirming the system is simple transposition if the frequency of letter curve matches, and the counts of letters in cleartest match the letters in the natural language.

Monoalphabetic substitution
The most basic form, such as the Caesar cipher, uses a single key alphabet; the same cleartext letter was always represented by the same ciphertext letter. Against this, basic frequency analysis is possible. The monoalphabetic keys of increasing complexity involve:
 * Constant shift monoalphabetic solution, in which each plaintext letter is shifted a fixed number of letter positions to the right
 * Keyword-based mixed, where the unique letters of a keyword (e.g., CRYPTO) form the first letters of a key alphabet, with the rest following in alphabetical order: CRYPTOABDEFGHIJKLMQSUVW
 * Completely random: QPWOEIROTIYUSAGFHGKJLMZNXBCV

Increasing the complexity of monoalphabetics
Encryption, even of fairly simple ciphers, became more difficult when two things happened:
 * encrypted text did not follow the spacing of plaintext, so that the insight of knowing that a particular cipher symbol was at the beginning, in the middle, or of the word did not help.
 * padding began being used, so that X's or perhaps meaningless digraphs were inserted, so the message was always an integral multiple of the number of cipher alphabets

Basic polyalphabetic substitutions
The simplest polyalphabetic substitution used multiple constant shift keys. For a period of 4, the number of different encryptions before the sequence of keys repeats, an example would be:

Key 1: BCDEFGHIJKLMNOPQRSTUVWXYZA Key 2: CDEFGHIJKLMNOPQRSTUVWXYZAB Key 3: DEFGHIJKLMNOPQRSTUVWXYZABC Key 4: EFGHIJKLMNOPQRSTUVWXYZABCD

Even with a method that was a short-period polyalphabetic solution, cryptographers might use different techniques for putting letters into the encryption system, and removing them. Assume there are four alphabets, 1-4, all Caesar variants (+1, +2, +3, +4), not shown for simplicity in the drawing, and the rows of ciphertext fall under the appropriate alphabet.

Contrast the example below, of ATTACK AT TWO TODAY, with spaces suppresed. The initial encryption works by

Writing out the cleartext, Cleartext 1 2 3 4 Row A A T T A Row B  C K A T Row C  T W O T Row D  O D A Y

Result of polyalphabetic substitution (four Caesar based used) 1 2 3 4 Row A B V W D Row B  D M D X Row C  U X R D Row D  P F C B

Mixing in basic transposition
Even with a method that was fundamentally polyalphabetic solution, cryptographers might use different techniques for putting letters into the encryption system, and removing them. Assume there are four alphabets, 1-4, not shown for simplicity in the drawing, and the rows of ciphertext fall under the appropriate alphabet.

In the example below, taking off the letters in groups of four, from left to right, would give BVWD DMDX UXRD PFCB

Only a slight modification of sending all the odd rows first and then the even would give: BVWD UXRD DMDX PFCB

While this is a trivial example, simple frequency analysis is no longer enough to decrypt; even given the key alphabets, simple decryption yields nonsense. Still, a system simple enough to respond to hand analysis often was still a challenge, especially when the encryptor used mixed alphabets and changed them frequently

Nonperiodic key alphabets
Ciphers were not always "geometric" or strictly periodic. An aperiodic cipher might only have 4 cipheralphabets in the key, but, if the order of their use is 1-2-4-3 on the first cycle but 2-3-4-1 on the next, reconstructing the key becomes more difficult. Using a nonperiodic key alphabets, ATTACK AT TWO TODAY, becomes, with spaces suppressed:

Result of polyalphabetic substitution 4 3 1 2 Row A E W U F Row B  G O B V Row C  W Z P V Row D  S H B A

Adding the simple transposition
Using the row-by-row model, the ciphertext of the first would be: EWUF GOBV WZPV SHBA but, with even-first, GOBV SHBA EWUF WZPV

Mathematical cryptanalysis
Mathematical cryptanalysis emerged roughly in the 1920s, principally from William Friedman and his colleagues. These still used fairly basic mathematical techniques. A significant increase in cryptanalytic power came when techniques such as group theory were applied against the Enigma machine and other early machine ciphers; these are discussed later.

Basic approaches
The first two tests mentioned both address the problem of determining the number of key alphabets used in a polyalphabetic cipher. Both were designed at a time when polyalphabetic substitution was primarily a manual process, so they work best when there are both a relatively small number of key alphabets. an algorithm in which the key alphabets are used in a straightforward repeating sequence (e.g., 12341234, not 42231431) and a relatively large amount of text encrypted in them.

All depend on the reality that there is a high redundancy in written human languages. In English, the trigraph "the" is most common, and the cryptanalyst can safely assume that the most frequent trigraph may well be those three plaintext letters, which have rotated under the same key alphabets in a suitably long volume of text.

The purpose of these methods is to calculate what is variously called the "keyword length", the "cryptoperiod", or the number of key alphabets.

Kasiski test
Best known from the work of Friedrich Kasiski in 1863, subsequent research shows it may have been independently invented by Charles Babbage in 1844, although not publicized. Formally, it is a known ciphertext attack, but supplemented by linguistic knowledge of the cleartext language.

The analyst searches through the text and finds all repetitions of at least 3. Assume that JWR repeats 15 letters after its first appearance, then 20 letters later, then 15, then 45. This does not immediately give insight.

The next step, however, is mathematical: the cryptanalyst factors the separations, and discovers the all have 5 as a prime factor. This suggests that 5 key alphabets are in use, and that THE has been encrypted by the same three in each cryptoperiod. As yet, all that can be said is that the three are consecutive: they could be, within the period, 123, 234, 345, 451, or 352. Now, there is a choice. Should the Kasiski test be repeated looking for a different repeated string, or should the analyst start reconstructing the three potential key alphabets?

Actually, it may be more appropriate to check the assumption that this is a polyalphabettic substitution with a period of 5. If this is being done manually, start at the first letter of the ciphertext, and write it all out, in columns of 5. If the assumption is correct, the frequency distribution in each column should correspond to the frequency distribution of letters in plaintext. The ciphertext, suspected to be the "E" at the end of the trigraph, should be the most frequent letter in the column.

New heading
I've moved in some stuff from block cipher. I think we need a general heading whose subsections are Practical cryptanalysis, side channel attacks, traffic analysis, and attacks via host security weaknesses (key loggers, etc.). Not sure what to call it, though. "Non-mathematical cryptanalysis"? Sandy Harris 04:11, 16 April 2009 (UTC)

What next?
I've done some re-organisation and added some text, but I'm now blocked. I'm not at all sure the structure I have so far is right and I am sure it is incomplete.

The material above under "Orientation: and "Mathematical cryptanalysis" is good, though it is also incomplete. Should it go into this article? Or History of cryptography? See also my comment at Talk:Cryptology. Sandy Harris 15:13, 1 February 2010 (UTC)


 * What about things like "unicity distance", "work factor", ... There is a lot of quite solid theory that this article should either cover or link to. I do not know enough to write that well and am not sure how the structure of either this article or the whole group of related articles ought to work. But those & the stuff above definitely need to go in somewhere. Sandy Harris 16:12, 8 February 2010 (UTC)


 * Interesting; I know about unicity distance, but learned it in the study of error correction and only later found its cryptographic applications. In the cryptologic context, I have no idea what a "work factor" may be but would be happy to learn.


 * Unicity distance probably should be in its own article, as it's significant in areas such as Redundant Arrays of Inexpensive Disks and other topics such as error-correcting codes. I do have Shannon's book somewhere and it's discussed in some of my graduate texts in discrete mathematics, so I could probably take a practitioner's draft. It would need to be edited and extended by a mathematician. Actually, I should be reviewing this area myself for some hopefully paid work regarding fault tolerance in cloud computing.


 * Do you know if the index of coincidence has been put on a more solid foundation that William Friedman's intuition?


 * Warning: possible tangential thought: if I were going to pick an area for a tutorial, this would be done. At least in my society, there is a great attraction to cryptology, beginning in puberty--Freud wrote about it, as I recall. I do have a few books, hopefully not in storage, directed at the beginner, such as Gaines and Sinkov. Are there good "seekrit code" articles on the web, or would our developing one improve our hit count? Don't worry, I wouldn't discuss how a few easy technical measures could make life much more difficult for drug traffickers and terrorists, although it might be fun to recount some of Dorothy Friedman's testimony against the rum-runners. Howard C. Berkowitz 16:35, 8 February 2010 (UTC)


 * "work factor" is a measure of the effort required to break a cipher by various methods. e.g. For brute force against a block cipher with n-bit key, the average work factor is 2n-1. I think the term is from Shannon and he proves some things about it. Sandy Harris 17:29, 8 February 2010 (UTC)


 * I have his information theory book in hard copy; maybe your reference is in the mathematical theory of cryptology, which I have somewhere but it's probably easier to download than find. will be offline for at least a few hours; I am alternating being medical advocate and interpreter for a four-legged and a two-legged family member. Today, it's human hematology. --Howard C. Berkowitz 17:35, 8 February 2010 (UTC)

My feeling here is that, while both the article and Howard's stuff above have some good text, the overall outline (both this article & some of the structure it needs to link to) is a mess. I am not at all sure how to sort it out. Sandy Harris 02:28, 24 April 2010 (UTC)

Steganography and covert channels
Sandy, do you consider a covert channel a subset of steganography, or a separate topic? I read your definition of steganography to refer to hidden information that "looks like" the covering information. In contrast, a covert channel (e.g., varying inter-symbol time) is not "in" the information but "besides" the information.

Since the steganographic information is not necessarily itself encrypted, but masked, I'm wondering if extracting it is strictly cryptanalytic. It's definitely part of communications intelligence. There are some very blurred areas, such as frequency-hopping or spread spectrum clear information transfer that can be received only if the receiver is synchronized to the crypto-like transmission changing scheme. --Howard C. Berkowitz 04:18, 9 February 2010 (UTC)


 * I'd say the distinction is blurry and the stego & c.c. text (do we have cc text? where?) should link to each other. I agree with your distinction; stego alters the message, puts data in it, e.g. least significant bits, while c.c. transmits the original message unaltered, but modulated by another signal. e.g. your timing example. However, there are intermediate cases, like hiding a message in non-printing characters so the visible message is unchanged, as in c.c., but the data on the wire is changed, as in stego.


 * If extracting stego'd data is not cryptanalysis, is it steganalysis? Sandy Harris 07:51, 9 February 2010 (UTC)


 * I added text in the stego section. Sandy Harris 10:06, 9 February 2010 (UTC)


 * And is the practitioner of steganalysis a stegosaurus? :-) There is a term, which I can't remember at the moment, for specialists in finding secret inks; perhaps it should be revised. I can do something on CC.


 * Adding to the blur are the areas such as inadvertent covert channels, such as van Eck radiation (TEMPEST) and related areas such as HIJACK (code word) and NONSTOP (code word). One could consider passive bugs in which ambient audio modulates microwaves or lasers sort-of-covert-channels (e.g., TEAPOT). Many of these, as well as things such as Operation RAFTER, fall under radiofrequency MASINT). --Howard C. Berkowitz 14:56, 9 February 2010 (UTC)


 * Perhaps we should add a section, or just a sentence(?), under Cryptanalysis with links to those? Or cover them in a covert channels article? When you get beyond crypto and net security into intelligence work, or even out of the digital realm and into analog, I have almost no expertise. Sandy Harris 15:32, 9 February 2010 (UTC)


 * There's also related text at Cryptanalysis. I'm not sure how all this needs to be organised. Sandy Harris 15:35, 9 February 2010 (UTC)

Toward Approval
Currently, this article is nominated for approval by User:Howard C. Berkowitz. There is a large single edit that is obviously content related that begins this article and one or two that could be considered such throughout so it needs two more editors on board for a three editor endorsement for approval before we'll be able to lock the version. D. Matt Innis 12:43, 13 June 2010 (UTC)


 * Today is June 19th, 2010 and we still do not have two more editors on board for this article. I'll wait until the end of the day to give it a chance to get up to aproval standards. D. Matt Innis 12:55, 19 June 2010 (UTC)


 * OK -- I don't know if Sandy has left for the summer, so let me ping some potential other Editors. Howard C. Berkowitz 17:34, 19 June 2010 (UTC)


 * Hayford has begun to lock the article, but I returned the status to 1 until you've got the editors on board. If they agree we can return it to 0.  If they need changes, make them to the draft and I'll just bump up the version number once everyone is agreed. D. Matt Innis 18:20, 19 June 2010 (UTC)


 * I had the intention to help out with the approval, but I discovered some broken links in the references and asked Sandy to first fix them. --Peter Schmitt 22:20, 19 June 2010 (UTC)


 * How about giving yourselves a few more days to get things together. It appears that you might be close; maybe 2 or 3 more days will be all that you need. D. Matt Innis 23:42, 19 June 2010 (UTC)

(undent) Sandy, are you still around? If not, I can probably fix the links. We will need a third editor besides Peter. Howard C. Berkowitz 00:04, 20 June 2010 (UTC)


 * Peter's edits are merely copy edits so you do have the option of Peter using the single editor approval if he feels he can. D. Matt Innis 00:13, 20 June 2010 (UTC)


 * Matt, if Peter would be the primary nominator, might I then be a secondary approver as an honor to Sandy? Joe has said he has no problem with this sort of two-author approval. Howard C. Berkowitz 00:42, 20 June 2010 (UTC)


 * Sure, there can be as many supporters as want to add their endorsement. Only three can go into the template at this point, though. D. Matt Innis 03:03, 20 June 2010 (UTC)