Talk:Cryptanalysis/Draft: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Howard C. Berkowitz
imported>Howard C. Berkowitz
(Kasiski in progress--need some preparatory text about mixed key alphabets)
Line 125: Line 125:
==Mathematical cryptanalysis==
==Mathematical cryptanalysis==
Mathematical cryptanalysis emerged roughly in the 1920s, principally from [[William Friedman]] and his colleagues. These still used fairly basic mathematical techniques. A significant increase in cryptanalytic power came when techniques such as [[group theory]] were applied against the [[Enigma machine]] and other early machine ciphers; these are discussed later.
Mathematical cryptanalysis emerged roughly in the 1920s, principally from [[William Friedman]] and his colleagues. These still used fairly basic mathematical techniques. A significant increase in cryptanalytic power came when techniques such as [[group theory]] were applied against the [[Enigma machine]] and other early machine ciphers; these are discussed later.
===Basic approaches===
The first two tests mentioned both address the problem of determining the number of key alphabets used in a polyalphabetic cipher. Both were designed at a time when polyalphabetic substitution was primarily a manual process, so they work best when there are both a relatively small number of key alphabets. an algorithm in which the key alphabets are used in a straightforward repeating sequence (e.g., 12341234, not 42231431) and a relatively large amount of text encrypted in them.
All depend on the reality that there is a high redundancy in written human languages. In English, the trigraph "the" is most common, and the cryptanalyst can safely assume that the most frequent trigraph may well be those three plaintext letters, which have rotated under the same key alphabets in a suitably long volume of text.
The purpose of these methods is to calculate what is variously called the "keyword length", the "cryptoperiod", or the number of key alphabets.
====Kasiski test====
====Kasiski test====
Best known from the work of Friedrich Kasiski in 1863, subsequent research shows it may have been independently invented by [[Charles Babbage]] in 1844, although not publicized. Formally, it is a known ciphertext attack, but supplemented by linguistic knowledge of the cleartext language.
The analyst searches through the text and finds all repetitions of at least 3. Assume that JWR repeats 15 letters after its first appearance, then 20 letters later, then 15, then 45. This does not immediately give insight.
The next step, however, is mathematical: the cryptanalyst [[Unique factorization|factors]] the separations, and discovers the all have 5 as a prime factor. This suggests that 5 key alphabets are in use, and that THE has been encrypted by the same three in each cryptoperiod. As yet, all that can be said is that the three are consecutive: they could be, within the period, 123, 234, 345, 451, or 352. Now, there is a choice. Should the Kasiski test be repeated looking for a different repeated string, or should the analyst start reconstructing the three potential key alphabets?
Actually, it may be more appropriate to check the assumption that this is a polyalphabettic substitution with a period of 5. If this is being done manually, start at the first letter of the ciphertext, and write it all out, in columns of 5. If the assumption is correct, the frequency distribution in each column should correspond to the frequency distribution of letters in plaintext. The ciphertext, suspected to be the "E" at the end of the trigraph, should be the most frequent letter in the column.
====Index of coincidence====
====Index of coincidence====
====Kappa test====
====Kappa test====

Revision as of 11:45, 19 October 2008

I am thinking of a re-organisation here, along the lines:

  • Attacks on the system
    • Practical cryptanalysis
    • Traffic analysis
    • Side channel attacks
    • Bypassing authentication
    • Guessing secrets
      • Dictionary attacks on passwords
      • Random number weaknesses
      • Small keys
  • Attacks on the ciphers

Then the topics we currently have under "Mathematical cryptanalysis".

Things like man-in-the-middle would then turn up in two places, first under "Bypassing authentication" because if you can do that then you don't have to break the actual encryption, and second under "Attacks on the ciphers" for details of attacks on different authentication mechanisms since those details are much the same as other attacks on RSA, block ciphers or whatever. Sandy Harris 01:51, 17 October 2008 (UTC)

Should social engineering be under guessing, or its own category? For that matter, where does one put the people who write their keys on their desk calendar?
"Social engineering" and "shoulder surfing" would be categories, perhaps subheads under practical cryptanalysis. Sandy Harris 07:18, 17 October 2008 (UTC)
Side channel, I assume. covers TEMPEST/HIJACK/TEAPOT/NONSTOP, timing analysis on plaintext, acoustic cryptanalysis, Operation RAFTER (specific case of getting the received text off the intermediate frequency)
Could you define "attacks on the ciphers"?
I think this is going somewhere interesting, but not sure where it is yet.
If you would, see if we can agree on some of the more specific (e.g.,) authentication attacks in communications security. I am also open to a better name for that article. Howard C. Berkowitz 05:51, 17 October 2008 (UTC)
By "attacks on the ciphers" (chosen mainly to contrast with "attacks on the system") I meant what is now called "mathematical cryptanalysis" and might be called "cryptanalysis proper". The article introduction refers to it as "classic cryptanalyis". Not sure what the best title would be.
Somewhere up in the opening/overview part I'd want to say that, while this is a real threat, it may not be the main threat in many cases. Quote Anderson [1] about banking sytems "the threat model commonly used by cryptosystem designers was wrong: most frauds were not caused by cryptanalysis or other technical attacks, but by implementation errors and management failures." or Schneier's intro to Secrets and Lies where he says in some ways writing "Applied Cryptography" was a mistake; too much technology, not enough attention to other issues.
Side channel certainly covers Tempest and RAFTER (new to me). I'm not sure if differential fault analysis and timing attacks go there or under "attacks on the ciphers"; I lean toward the latter since they aim at finding the keys rather than just reading material. Sandy Harris 07:18, 17 October 2008 (UTC)

Orientation

Given that many readers won't have much background, I think we need a section that considers both early manual methods before they were pure guesswork (and maybe a little there), and then materials, some declassified, from the 1920s and 1930s.

Let me offer what is mostly outline, which would go before specific attacks:

Preparatory

While it is not necessary to be fluent in a language to cryptanalyze it, languages have different statistical properties and it helps to know the language. Of course, if there is a change of guessing probable words, knowledge of the language is more important, and knowledge of the area of application (e.g., smuggling liquor or describing radar) can reveal even more. In a real-world rather than puzzle cryptanalysis, knowledge of the circumstances in which the ciphertext was collected can be informative. William Friedman proposed four basic steps:[1]

  1. The determination of the language employed in the plain-text version.
  2. The determination of the general system of cryptography employed.
  3. The reconstruction of the specific key in the case of a cipher system, or the reconstruction, partial or complete, of the code book, in the case of a code system; or both, in the case of an enciphered code system
  4. The reconstruction or establishment of the plain text

These are usually done in the order in which they are given above and in which they usually are performed in the solution of cryptograms, although occasionally the second step may precede the first.

Basic methods

The very earliest cryptanalysis seems to have been inspired guessing about the plaintext, or perhaps attacks against a known and fairly weak manual systems. In 20th century, mathematical and statistical techniques.

The most basic is frequency analysis.[2]. When the frequency of letters, graphed by frequency against count, do not follow the curve characteristic to the language, [3] which becomes increasingly effective when keys become more complex, even in simply polyalphabetic substitution on a monographic cipher, such as the Vigenere method. Frequency analysis is almost useless against pure transposition systems, other than than confirming the system is simple transposition if the frequency of letter curve matches, and the counts of letters in cleartest match the letters in the natural language.

Monoalphabetic substitution

The most basic form, such as the Caesar cipher, uses a single key alphabet; the same cleartext letter was always represented by the same ciphertext letter. Against this, basic frequency analysis is possible. The monoalphabetic keys of increasing complexity involve:

  • Constant shift monoalphabetic solution, in which each plaintext letter is shifted a fixed number of letter positions to the right
  • Keyword-based mixed, where the unique letters of a keyword (e.g., CRYPTO) form the first letters of a key alphabet, with the rest following in alphabetical order: CRYPTOABDEFGHIJKLMQSUVW
  • Completely random: QPWOEIROTIYUSAGFHGKJLMZNXBCV
Increasing the complexity of monoalphabetics

Encryption, even of fairly simple ciphers, became more difficult when two things happened:

  • encrypted text did not follow the spacing of plaintext, so that the insight of knowing that a particular cipher symbol was at the beginning, in the middle, or of the word did not help.[4]
  • padding began being used, so that X's or perhaps meaningless digraphs were inserted, so the message was always an integral multiple of the number of cipher alphabets

Basic polyalphabetic substitutions

The simplest polyalphabetic substitution used multiple constant shift keys. For a period of 4, the number of different encryptions before the sequence of keys repeats, an example would be:

Key 1: BCDEFGHIJKLMNOPQRSTUVWXYZA
Key 2: CDEFGHIJKLMNOPQRSTUVWXYZAB
Key 3: DEFGHIJKLMNOPQRSTUVWXYZABC
Key 4: EFGHIJKLMNOPQRSTUVWXYZABCD

Even with a method that was a short-period polyalphabetic solution, cryptographers might use different techniques for putting letters into the encryption system, and removing them. Assume there are four alphabets, 1-4, all Caesar variants (+1, +2, +3, +4), not shown for simplicity in the drawing, and the rows of ciphertext fall under the appropriate alphabet.

Contrast the example below, of ATTACK AT TWO TODAY, with spaces suppresed. The initial encryption works by

Writing out the cleartext,

       Cleartext
       1 2 3 4
Row A  A T T A
Row B  C K A T
Row C  T W O T
Row D  O D A Y
       Result of polyalphabetic substitution (four Caesar based  used)
       1 2 3 4
Row A  B V W D
Row B  D M D X
Row C  U X R D
Row D  P F C B
Mixing in basic transposition

Even with a method that was fundamentally polyalphabetic solution, cryptographers might use different techniques for putting letters into the encryption system, and removing them. Assume there are four alphabets, 1-4, not shown for simplicity in the drawing, and the rows of ciphertext fall under the appropriate alphabet.

In the example below, taking off the letters in groups of four, from left to right, would give

      BVWD DMDX UXRD PFCB

Only a slight modification of sending all the odd rows first and then the even would give:

      BVWD UXRD DMDX PFCB

While this is a trivial example, simple frequency analysis is no longer enough to decrypt; even given the key alphabets, simple decryption yields nonsense. Still, a system simple enough to respond to hand analysis often was still a challenge, especially when the encryptor used mixed alphabets and changed them frequently

Nonperiodic key alphabets

Ciphers were not always "geometric" or strictly periodic. An aperiodic cipher might only have 4 cipheralphabets in the key, but, if the order of their use is 1-2-4-3 on the first cycle but 2-3-4-1 on the next, reconstructing the key becomes more difficult. Using a nonperiodic key alphabets, ATTACK AT TWO TODAY, becomes, with spaces suppressed:

       Result of polyalphabetic substitution
       4 3 1 2
Row A  E W U F
Row B  G O B V
Row C  W Z P V
Row D  S H B A
Adding the simple transposition

Using the row-by-row model, the ciphertext of the first would be:

EWUF GOBV WZPV SHBA

but, with even-first,

GOBV SHBA EWUF WZPV

Mathematical cryptanalysis

Mathematical cryptanalysis emerged roughly in the 1920s, principally from William Friedman and his colleagues. These still used fairly basic mathematical techniques. A significant increase in cryptanalytic power came when techniques such as group theory were applied against the Enigma machine and other early machine ciphers; these are discussed later.

Basic approaches

The first two tests mentioned both address the problem of determining the number of key alphabets used in a polyalphabetic cipher. Both were designed at a time when polyalphabetic substitution was primarily a manual process, so they work best when there are both a relatively small number of key alphabets. an algorithm in which the key alphabets are used in a straightforward repeating sequence (e.g., 12341234, not 42231431) and a relatively large amount of text encrypted in them.

All depend on the reality that there is a high redundancy in written human languages. In English, the trigraph "the" is most common, and the cryptanalyst can safely assume that the most frequent trigraph may well be those three plaintext letters, which have rotated under the same key alphabets in a suitably long volume of text.

The purpose of these methods is to calculate what is variously called the "keyword length", the "cryptoperiod", or the number of key alphabets.

Kasiski test

Best known from the work of Friedrich Kasiski in 1863, subsequent research shows it may have been independently invented by Charles Babbage in 1844, although not publicized. Formally, it is a known ciphertext attack, but supplemented by linguistic knowledge of the cleartext language.

The analyst searches through the text and finds all repetitions of at least 3. Assume that JWR repeats 15 letters after its first appearance, then 20 letters later, then 15, then 45. This does not immediately give insight.

The next step, however, is mathematical: the cryptanalyst factors the separations, and discovers the all have 5 as a prime factor. This suggests that 5 key alphabets are in use, and that THE has been encrypted by the same three in each cryptoperiod. As yet, all that can be said is that the three are consecutive: they could be, within the period, 123, 234, 345, 451, or 352. Now, there is a choice. Should the Kasiski test be repeated looking for a different repeated string, or should the analyst start reconstructing the three potential key alphabets?

Actually, it may be more appropriate to check the assumption that this is a polyalphabettic substitution with a period of 5. If this is being done manually, start at the first letter of the ciphertext, and write it all out, in columns of 5. If the assumption is correct, the frequency distribution in each column should correspond to the frequency distribution of letters in plaintext. The ciphertext, suspected to be the "E" at the end of the trigraph, should be the most frequent letter in the column.

Index of coincidence

Kappa test

References

  1. Friedman, William F. (1938), Military Cryptanalysis, vol. I: Monoalphabetic Substitution Systems, Signal Intelligence Section, Plans and Training Division, U.S. War Department., p. 7-11
  2. Friedman-I, p. 11-17
  3. Friedman-I, pp.18-26
  4. Gaines, Helen (1939), Cryptanalysis,pp. 69-72