Dictionary attack

From Citizendium
Jump to navigation Jump to search
This article is a stub and thus not approved.
Main Article
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

A dictionary attack is an attack on a password or other user authentication system where the system being attacked stores an encrypted form of the passwords. The attacker encrypts an entire dictionary of possible passwords, and then searches for each actual encrypted password in the dictionary. If anyone's password is in the dictionary, then the attacker can break into that account. The technique is fairly widely used, both by actual attackers and by systems administrators who want to check if users are using weak passwords which make the system vulnerable.

The attacker's methods

The advantage to the attacker is that he does most of his work offline. He can do the whole dictionary encryption at his leisure before even approaching the target system. It is usually possible to use a single encrypted dictionary against many targets; the same encrypted dictionary is useful against all machines using the same password encryption algorithm, which often means all machines running the same operating system. Also, a single dictionary attack generally checks all passwords on a system. This makes it worthwhile for the attacker to expend considerable resources on a large dictionary, both time to compute all the encrypted forms and space to store them.

The resources needed are not huge. One common algorithm for password encryption is SHA-1 which gives a 160-bit or 20-byte hash, so with a million-word dictionary the attacker needs 20 megabytes to store the hashes. A password test cannot take more than about a second without inconveniencing users, so an enemy can certainly encrypt a million-word dictionary in under a million seconds, about 12 days, using a single machine about as fast as the target systems. In many cases the password algorithm is faster than that. It may also be faster on the attacker's machine than on the target; consider an attacker with a good PC trying to break into an embedded system with a limited CPU. In some cases, an attacker may deploy larger resources; the dictionary calculation is easily parallelized if the attacker has access to multiple machines. The attacker may also use more time; in some cases, it might not be unreasonable to take six months building a dictionary.

If a plausible attacker has huge resources (e.g. a major government or a criminal group that controls a large botnet) and the target is of great value, either because it is very widely deployed (e.g. Microsoft Windows or Android phones) or because it is used in critical applications (e.g. the control computers for Iranian nuclear reactors targeted by the Stuxnet attack), then it is not unreasonable to consider an attack involving thousands of CPUs running for months to set up a dictionary with billions of candidate passwords. If you are trying to keep such a system secure, the better-safe-than-sorry assumption is that dictionaries suitable for attacking it have already been built, possibly by multiple attackers. If they get your password file, they will almost certainly break some passwords.

It is quite common to add additional things to the dictionaries, Nearly all password-cracking dictionaries include a list of women's names; these have been shown to be very common password choices. Many dictionaries also add names of movie, TV or comic book characters, cities, and so on. This catches the user who imagines that "Gandalf" or "Isphahan" is obscure enough to be a good password. Entire foreign language dictionaries may also be added. A word in Spanish or Hindi is not a good password either. Nor are common phrases; "hello,world" is a dreadful choice. Some attackers' dictionaries even contain initials from common phrases to catch users who create passwords that way, so "ouatiagfa" from "Once upon a time in a galaxy far away" may not be a good password either.

The attacker can use variants on dictionary words as well. If the dictionary has "wombat", he might try "Wombat", "w0mbat" and "tabmow" as well, and perhaps also wombat1, wombat2, ... Generating such variants is easily automated. There is a trade-off he can make, doing more work and using more storage versus improving the chances of success.

The resource constraints are not tight. With a dictionary size of a few tens of millions of words — say half a dozen languages plus lists of names and geographic terms — all you need to build the attack dictionary is a reasonable computer and a few weeks to work on it. To actually conduct the attack you need access to encrypted passwords, though, and that is often harder.

For an attack against a single computer system, getting access to encrypted passwords is generally difficult. However, against network authentication systems it may be easier; if an attacker can intercept passwords, then he can test them against a dictionary. This is the basis of some attacks against the WPA system used in some wireless networks.


Back in the 1970s, Unix introduced the idea of adding salt to passwords and today nearly all systems do that. This is a random number that is added to the password before encryption. The salt need not be secret — in fact it is usually stored in clear text with the user name and encrypted password — it just needs to be random and different for each password. An attacker who is building a dictionary does not know the salt, so he or she is forced to try all possibilities. For a 12-bit salt, as used in the original Unix method, each word in the attacker's dictionary gives 4096 possibilities. If encrypting the unsalted dictionary would need a few hours and a few megabytes of storage, he now needs months and gigabytes. Modern systems often use much larger salt. A minor but desirable side effect of this is that if a user uses the same password on several systems (not a good idea, but fairly common) then, because the salt is different, the encrypted forms will be different on each system.

If the attacker can copy the password file from the target, then he can do all the comparisons between the dictionary and the encrypted passwords offline as well. This was possible against early Unix systems; the password file was readable by any user and it contained all the encrypted passwords. The primary defense against dictionary attacks is to prevent the enemy from getting the password list. For example, on modern Unix systems, the data is partitioned; there is still a world-readable /etc/passwd file so anyone can look up user names, home directories, etc. but the actual password data is in a "shadow" password file which only root (the system administrator) can read; other operating systems use similar techniques.

However, this is not a complete defense against dictionary attacks. It protects the local machine quite nicely; only an enemy who already has administrative privileges can try the attack, and that enemy already has more privileges than the attack is likely to give him. However, it may threaten other machines. Consider an enemy who breaks into machine A, acquires admin privileges, grabs the password file, runs a dictionary attack, and acquires half a dozen user passwords. A look at the users' files or the system logs may tell him what other machines those users have accounts on, and if they have re-used the same password, then he can get into those machines.

Another defense is to make the password encryption algorithm slow. If encrypting a password takes only a millisecond, then encrypting a million-word dictionary takes 1000 seconds, under 20 minutes. Using an algorithm that needs a second, so that a million-word dictionary needs several days, is far from a complete defense — some attackers might use many machines in parallel and others might have months to construct an attack — but at least it makes the attacker work harder. Normal block ciphers and cryptographic hashes are quite fast; they have to be for many of their applications. It is therefore common to iterate them many times in a password application. The original Unix password algorithm iterated DES 25 times and the current Linux algorithm iterates MD5 1000 times. There is also an algorithm called Bcrypt, based on Blowfish and used in OpenBSD, where the number of iterations is controlled by a tuneable parameter.[1]

If the attacker cannot get the password file for an offline attack, then his only dictionary-based methods are to try repeated logins or, from a user account, to make repeated tries to elevate his privileges to those of an administrator. This is inherently much slower than an offline attack — trying all of a million possible passwords is far more expensive than taking a known user password and checking for a match in a million-entry table. It is also much easier for defenders to detect; with good logging or an intrusion detection system, or alert system administrators, an attack that tests many passwords is quite likely to be noticed. Also, on some systems, every time a user logs on there is a little notice about when and where he or she last logged in; this allows an alert user to detect compromise of the account. As a general rule, these direct password-guessing attacks are impractical; they take too long and the risk of detection is too high.

Various methods can be used to slow such attacks. One technique is to just use a slow password algorithm. Another is to add a deliberate delay in login processing, ideally an exponentially increasing delay on successive login attempts — say half a second on the first attempt and doubling for each later attempt. It is more effective to apply this to all attempts from the same terminal or IP address, rather than just attempts to log in to the same account. The inconvenience to legitimate users is small, but any delay makes testing many passwords more difficult and exponential increase makes trying a large number of them effectively impossible.

Another defense is to set up the system to automatically disable an account after some number of failed login attempts. In most cases, it is better to just log failed login attempts and set up something that alerts administrators if there are too many. Disabling accounts prevents password-guessing attacks, but it opens the system up to a form of denial of service attack. An attacker with a list of valid user IDs, which can be easily obtained for many systems, need only set up a program that tries to log in to all of them enough times with an arbitrary improbable password. Suppose a login attempt takes 10 seconds and an account is disabled after three bad attempts; a single attacking computer can then disable two accounts a minute or 120 an hour. If the target is a company with a 5-day work week, then someone running the attack over the weekend has about 60 hours at his disposal; utter chaos will ensue on Monday morning. If the attacker has even a small botnet to use, then he can cause chaos more quickly. If the botnet has 500 zombies (compromised machines), then it can disable 1000 accounts a minute, enough to cause real problems even if the system administrators are quick to notice and block the attack.

Using challenge-response protocol blocks dictionary attacks. In such a protocol, the system sends a random challenge on each login attempt. To gain access, the other player must produce a response that depends in some fairly complex way on both the password and the challenge.


  1. Niels Provos and David Mazières (1999), A Future-Adaptable Password Scheme