Cryptanalysis

The goal of cryptanalysis is to find some weakness or insecurity in a cryptographic scheme, thus permitting its subversion. It is an essential part of communications intelligence. Cryptanalysis might be undertaken by a malicious attacker, attempting to subvert a system, or by the system's designer (or others) attempting to evaluate whether a system has vulnerabilities. In modern practice, however, quality cryptographic algorithms and protocols have usually been carefully examined and many have been proved that establish practical security of the system (at least, under clear -- and hopefully reasonable -- assumptions).

It is a commonly held misconception that every encryption method can be broken. In connection with his WWII work at Bell Labs, Claude Shannon proved that the one-time pad cipher is unbreakable, provided the key material is truly random, never reused, kept secret from all possible attackers, and of equal or greater length than the message. That is, an enemy who intercepts an encrypted message has provably no better chance of guessing the contents than an enemy who only knows the length of the message.

Even two-time use of the keys can lead to compromise, as shown by the VENONA project that allowed cryptanalysis of Soviet espionage traffic, in which a one-time pad was used more than once.

Any cipher except a one-time pad can be broken with enough computational effort (by brute force attack if nothing else), but the amount of effort needed to break a cipher may be exponentially dependent on the key size, as compared to the effort needed to use the cipher. In such cases, effective security can still be achieved if some conditions (e.g., key size) are such that the effort ('work factor' in Shannon's terms) is beyond the ability of any adversary.

Non-mathematical methods
Before discussing classic cryptanalyis, be aware that mathematical cryptanalysis is not the only way to access protected content.

Practical cryptanalysis
Practical cryptanalysis is a euphemism for using physical or social means to compromise the cryptosystem, such as clandestinely breaking into a communications center and copying the keys, or placing a hidden video camera in position to record passwords as they are typed in, or a host of other such methods. One variant is referred to as rubber hose cryptanalysis &mdash; beating, torturing or threatening someone to get him or her to reveal keys.

Any of the techniques of espionage &mdash; bribery, coercion, blackmail, deception ... &mdash; may be used to obtain keys. In general, these methods work against the people and organisations involved, looking for human weaknesses or poor security procedures. Social engineering, deceiving the personnel who work with cryptosystems into providing valuable information, may be most productive attack of all.

Details of these methods and defenses against them are beyond our scope here; see information security.

Computer vulnerabilities
For computer-based security systems, host security is a critical prerequisite. No cryptographic system can be secure if the underlying computer is not. Even systems generally thought to be secure, such as IPsec or PGP are trivially easy to subvert for an enemy who has already subverted the machine they run on. For example, an enemy with unfettered access to a PGP user's machine can copy the secret key file and install a keystroke logger that captures the PGP passphrase. Once he has the key and the passphrase to unlock it, PGP security is utterly lost.

For some systems, host security may be an impossible goal. Consider a Digital Rights Management system whose design goal is to protect content against the owner of the computer or DVD player it runs on. If that owner has full control over his device then the goal is not achievable.

For further discussion, see computer security.

Traffic analysis
An attacker might also study the pattern and length of messages to derive valuable information; this is known as traffic analysis, and can be quite useful to an alert adversary. Encrypting messages does not prevent this; an enemy may be able to gain useful information from the timing, size, source and destination of traffic, even if he cannot read the message contents.

Classifying attacks
There are a wide variety of cryptanalytic attacks, and they can be classified in any of several ways.

The attacker's objective
There are two main types of attack where the attacker must perform cryptanalysis; he has to defeat some cryptographic mechanism in order to conduct the attack. In a passive attack, the attacker only eavesdrops, tries to read data without authorisation. Generally, this requires defeating a cipher; often the objective is to decrypt material without the key. In an active attack, the attacker is not just an eavesdropper. He may create, forge, alter, replace, block or reroute messages. Generally, this requires defeating some cryptographic authentication mechanism. Sometimes the attacker must defeat a cipher as well.

Two other classes of attack can, in general, be conducted without cryptanalysis. In a denial of service attack, the attacker attempts to disrupt communication. Often this can be done without attacking the cryptography, but breaking the crypto may allow additional disruptions. In traffic analysis the attacker attempts to infer useful information from the source, destination, timing and other characteristics of messages, without reading the actual content. In general, he need not break cryptography to conduct such attacks, Of course, if the attacker does manage to defeat whatever cryptography is in play, the attack becomes more dangerous.

Theoretical versus practical security
There are several attacks that will in theory break any symmetric cipher &mdash; brute force attack, code book attack and algebraic attack &mdash; but all real ciphers are designed to resist them, so in nearly all cases these attacks are utterly useless in practice because they require the attacker to deploy astronomically large resources. For example, a code book attack on a 128-bit block cipher such as AES does not become useful until the attacker collects 264 blocks (271 bytes) of material encrypted with a single key. At a gigabit a second, transmitting that much data would take several hundred thousand years; it seems reasonable to assume the user would change keys before that. Also, the attacker is likely to have trouble storing that much data; if he uses one-terabyte drives then he needs several billion of them. It is clear, then, that while AES (or any other block cipher) is theoretically vulnerable to a codebook attack, this is of little concern in practice.

There are also some ciphers that are provably secure against some attacks, at least when properly used. For example, a one-time pad is provably secure in a rather strong sense and Serge Vaudenay's work on de-correlation theory shows how to construct ciphers that are provably secure against a broad class of attacks. However, it does not necessarily follow that such ciphers will be secure in practice. In at least one case (see VENONA) a one-time pad system was broken in practice, because it was not used correctly, and some of Vaudenay's ciphers such as DFC have been broken by attacks that went outside the assumptions of the security proofs.

Some cryptographic techniques derive their security from the difficulty of a mathematical problem &mdash; the RSA algorithm from integer factorisation, the Diffie-Hellman protocol from the discrete logarithm problem, and other systems from various elliptic curve problems. In all cases the underlying problem is thought to be hard &mdash; no known method solves it efficiently, so the cryptosystem is thought to be secure. In fact, it is demonstrably secure against all known methods of solving that problem. However, in all cases there is no proof that no efficient method exists. Discovery of an efficient solution for any of these problems would render all the cryptosystems based on it worthless.

Resources the attacker has
Another way to classify attacks is by what an attacker knows, what data is available to him.

Ciphertext-only
The ciphertext-only attack is the case where the cryptanalyst has access only to the ciphertext. Modern cryptosystems are generally effectively immune to ciphertext-only attacks.

Pure ciphertext-only attacks are rare in practice because the analyst is often able to guess some plaintext. This converts a ciphertext-only situation into a known plaintext attack; see next section.

Known plaintext
In a known-plaintext attack, the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs).

Sometimes it is enough for the attacker to have partial knowledge of the plaintext &mdash; perhaps that it is ASCII text with the top bit of every byte zero, or that it is radar data in a known format. This gives him something to go on, a way to check if a decryption or partial decryption is correct; that may be all he needs.

Often the attacker can guess some plaintext. In British World War II ULTRA codebreaking, such guesses were known as "cribs". Many messages contain fixed text like dates or formal phrases like "your humble and obedient servant", and various system such as compression algorithms or email handlers insert fixed-format headers; all these are free gifts to the cryptananlyst. In war, names of enemy officers, bases, ships or units (or their codenames) are good guesses, also perhaps words like "order" and "ammunition". An intelligence organisation that knows the enemy well may have additional cribs available, looking for "congratulations" to a promoted officer. "happy birthday" to a general, and so on.

Language structure may also provide cribs. Consider ordinary English text, where about one in seven characters are spaces, and "the" and "of" are the most common words. Suppose the cipher uses 64-bit (8 character) blocks. The chance that one of them encodes the 8-character string " of the " is significant. If the cipher user (foolishly) sends large volumes of data with the same key and the attacker has the determination and the (huge) resources to test them all, this crib is almost certain to break the cipher eventually. More plausibly, an attacker may be able to use text statistics as a entry point: space is the most common character, "e" the most common letter, "q" is often followed by "u", and so on.

Generally, if a true known plaintext attack (where the attacker actually knows some plaintext) is feasible, then variants based on guessed plaintext or on partial knowledge of plaintext will be more difficult, but not prohibitively so. Suppose there is a known plaintext attack that breaks the cipher at reasonable cost, but the attacker has only some guessed plaintext that has a 10% chance of being right. That gives him a one-in-ten chance of solving the cipher on the first try. If he has many such guessed cribs available, he is almost certain to solve it eventually at some cost not horrifically more than the cost of a pure known plaintext attack.

Or suppose he only knows the plaintext is ASCII; the top bit of every byte is zero. Suppose we are dealing with a block cipher that has 64-bit blocks. If there is a feasible known plaintext attack, then an enemy who knows only 64 bits of plaintext in a single block can break the cipher. However, if the data is known ASCII and the enemy has intercepted N blocks, then he knows that 8N bits of the plaintext are zero. Whether this lets him break the cipher or not is an extremely complex question depending on all the details of the cipher and on any additional knowledge the attacker may have. However, if you are trying to keep the data secure, you should guess "yes" and choose a cipher that is secure against known plaintext attacks..

In general, '''if there is an effective known plaintext attack on the cipher. then the cipher must be considered insecure'''.

A number of attacks require known (including guessed) plaintext to work:
 * a brute force attack tries all possible keys; you need to know one block of plaintext so you can tell when you have found the right key
 * a meet-in-the-middle attack finds a middle value in two ways, by half-encrypting a block of known plaintext and half-decrypting the matching cyphertext, and searches for matching "middle" results; this is much more efficient than brute force but is not applicable to most ciphers
 * an algebraic attack writes the cipher operations as equations in some algebraic system, usually Boolean, then plugs in known values for plaintext and ciphertext and solves for the key. Depending on various details, this may need anywhere from one to a few dozen plaintexts
 * a code book attack requires huge numbers of known plaintexts, at least 2blocksize/2 before it becomes useful.
 * linear cryptanalysis and differential cryptanalysis are often very efficient in terms of the attacker's effort, significantly better than brute force. However, they require large numbers of known or chosen plaintexts.

All these should be completely impractical against any well-designed cipher, properly used. An important usage precaution is to re-key often enough to prevent code book, linear and differential attacks; this is standard practice.

Chosen plaintext
In a chosen-plaintext attack, the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times); an example is the gardening used by the British during WWII.

Linear cryptanalysis and differential cryptanalysis can use either chosen plaintexts or a larger number of known plaintexts. Generally, both numbers are very large, larger than 2blocksize/2, so reasonably frequent re-keying prevents these attacks.

Chosen ciphertext
In a chosen-ciphertext attack, the cryptanalyst may choose ciphertexts and learn their corresponding plaintexts. Also important, often overwhelmingly so, are mistakes (generally in the design or use of one of the protocols involved; see ULTRA for some historical examples of this).

Related key attack
Using two or more related keys for different messages, different links, or different sessions may give a cryptanalyst an entry point.

The best-known failure of his type is for the WEP protocols used in wireless networking. WEP generates keys for different connections by concatenating a connection-specific intialisation value with another secret value, and this creates a weakness. See for example, "Breaking 104 bit WEP in less than 60 seconds".

Side channel attacks
While pure cryptanalysis uses plaintext and ciphertext, looking for weaknesses in the algorithms themselves, a side-channel attack looks at some other aspect of the behaviour of a cryptographic device which may reveal information of value to the analyst.

For example, any electrical device handling fast-changing signals will produce electromagnetic radiation. An enemy might listen to the radiation from a computer or from crypto hardware. For the defenders, there are standards for limiting such radiation; see TEMPEST and protected distribution system. If the device emits sound, that is another side channel that might give an attack. Timing attacks make inferences from the length of time cryptographic operations take. If a cryptanalyst has access to, say, the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character, he may be able to break a cipher that is otherwise resistant to analysis. For example, the time required for a shift or rotation operation may depend on the distance shifted, and on some computers the time required for a multiplication operation depends on the number of ones in the binary representation of one operand. An analyst who gets timing data may therefore be able to infer something useful. Power analysis has also been used, in much the same way as timing. The two may be combined.

These attacks may be used against devices such as smartcards or against systems implemented on computers. Any cryptographic primitive &mdash; block cipher, stream cipher, public key or cryptographic hash &mdash; can be attacked this way.

Differential fault analysis attacks a cipher embedded in a smartcard or other device. Apply stress (heat, mechanical stress, radiation, ...) to the device until it begins to make errors; with the right stress level, most will be single-bit errors. Comparing correct and erroneous output gives the cryptanalyst a window into cipher internals. This attack is extremely powerful; "we can extract the full DES key from a sealed tamper-resistant DES encryptor by analyzing between 50 and 200 ciphertexts generated from unknown but related plaintexts".

See information security for discussion of defenses.

The attacker's methods
There are three passive attacks that will in theory break any cipher; variants of these work for either block ciphers or stream ciphers:
 * brute force attack &mdash; try all possible keys
 * algebraic attack &mdash; write the cipher as a system of equations and solve for the key
 * code book attack &mdash; collect all possible plaintext/ciphertext pairs for a block cipher, or the entire pseudorandom stream until it starts repeating for a stream cipher

However, all of those attacks are spectacularly impractical against real ciphers. Brute force and algebraic attacks require the attacker to do far too much work. For a code book attack, he needs far too much data &mdash; a huge collection of intercepts, all encrypted with the same key. If the cipher user changes keys at reasonable intervals, a code book attack is impossible.

A meet-in-the-middle attack is quite effective if it can be used, but it cannot be used against most ciphers.

Strategies against symmetric cryptosystems
Cryptanalysis of symmetric-key techniques typically involves looking for efficient attacks against block ciphers or stream ciphers. Against an ideal cipher, there is no attack better than brute force.

For example, a simple brute force attack against DES requires one known plaintext and 255 decryptions, trying approximately half of the possible keys, before chances are better than even the key will have been found. But this may not be enough assurance; a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations. This is a considerable improvement on brute force attacks.

See also the stream cipher article.

Strategies against asymmetric cryptosystems
Public-key algorithms are based on the computational difficulty of various problems. The most famous of these is integer factorization (the RSA cryptosystem is based on a problem related to factoring), but the discrete logarithm problem is also important. Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems, or some of them, efficiently. For instance, the best algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring, at least for problems of equivalent size. Thus, other things being equal, to achieve an equivalent strength of attack resistance, factoring-based encryption techniques must use larger keys than elliptic curve techniques. For this reason, public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s.

Vulnerabilities of cryptographic primitives
Much of the theoretical work in cryptography concerns cryptographic primitives &mdash; algorithms with basic cryptographic properties &mdash; and their relationship to other cryptographic problems. For example, a one-way function is a function intended to be easy to compute but hard to invert. In a very general sense, for any cryptographic application to be secure (if based on such computational feasibility assumptions), one-way functions must exist. However, if one-way functions exist, this implies that P ≠ NP. . Since the P versus NP problem is currently unsolved, we don't know if one-way functions exist. If they do, however, we can build other cryptographic tools from them. For instance, if one-way functions exist, then secure pseudorandom generators and secure pseudorandom functions exist.

Other cryptographic primitives include cipher algorithms themselves, one-way permutations, trapdoor permutations, etc.

Vulnerabilities of cryptographic protocols
In many cases, cryptographic techniques involve back and forth communication among two or more parties in space or across time (e.g., cryptographically protected backup data). The term cryptographic protocol captures this general idea. Cryptographic protocols have been developed for a wide range of problems, including relatively simple ones like interactive proofs, secret sharing , and zero-knowledge , and much more complex ones like electronic cash and secure multiparty computation.

When the security of a cryptographic system fails, it is rare that the vulnerabilty leading to the breach will have been in a quality cryptographic primitive. Instead, weaknesses are often mistakes in the protocol design (often due to inadequate design procedures or less than thoroughly informed designers), in the implementation (e.g., a software bug), in a failure of the assumptions on which the design was based (e.g., proper training of those who will be using the system), or some other human error. Many cryptographic protocols have been designed and analyzed using ad hoc methods. Methods for formally analyzing the security of protocols, based on techniques from mathematical logic (see for example BAN logic), and more recently from concrete security principles, have been the subject of research for the past few decades. Unfortunately, these tools are cumbersome and not widely used for complex designs.