Dictionary attack

A dictionary attack is an attack on a password or other user authentication system where the system being attacked stores an encrypted form of the passwords. The attacker encrypts an entire dictionary, builds a large table of encrypted candidate passwords, then compares the actual encrypted passwords to that. If anyone's password is in the dictionary, then he can break into that account. The technique is fairly widely used, both by actual attackers and by systems administrators who want to check if users are using weak passwords which make the system vulnerable.

The advantage to the attacker is that he does most of his work offline. He can do the whole dictionary encryption at his leisure before even approaching the target system. It is usually possible to use a single encrypted dictionary against many targets; the same encrypted dictionary is useful against all machines using the same password encryption algorithm, which often means all machines running the same operating system. This makes it worthwhile for him to expend considerable resources on a large dictionary, both time to compute all the encrypted forms and space to store them.

The resources needed are not huge. A password test cannot take more than about a second without inconveniencing users, so an enemy can certainly encrypt a million-word dictionary in under a million seconds, about 12 days, using a single machine about as fast as the target systems. It might be much less if the password algorithm is fast or the attacker deploys larger resources. One common algorithm for password encryption is SHA-1 which gives a 160-bit or 20-byte hash, so with a million-word dictionary the attacker needs 20 megabytes to store the hashes.

If he can copy the password file from the target, he may be able to do all the comparisons offline as well. This was possible against early Unix systems; the password file was readable by any user and it contained all the encrypted passwords. On modern Unix systems, the data is partitioned; there is still a world-readable /etc/passwd file so anyone can look up user names, etc. but the actual password data is in a "shadow" password file which only root (the system administrator) can read.

Back in the 1970s, Unix introduced the idea of adding salt to passwords and today nearly all systems do that. This is a random number, constant for each system, that is added to the password before encryption. The attacker, not knowing the salt, is forced to try all possibilities. For a 12-bit salt, as used in the original Unix method, each word in the attacker's dictionary gives 4096 possibilities. If encrypting the unsalted dictionary would need a few hours and a few megabytes of storage, he now needs months and gigabytes. Modern systems often use much larger salt. A minor but desirable side effect of this is that if a user uses the same password on several systems (not a good idea, but fairly common) then, because the salt is different, the encrypted forms will be different on each system.

It is quite common to add additional things to the dictionaries, Nearly all password-cracking dictionaries contain comprehensive lists of women's names; these have been shown to be very common password choices. Many dictionaries also add names of movie, TV or comic book characters, cities, and so on. This catches the user who imagines that "Gandalf" or "Isphahan" is obscure enough to be a good password. Entire foreign language dictionaries may also be added. A word in Spanish or Hindi is not a good password either. Nor are common phrases; "hello,world" is a dreadful choice. Some attackers' dictionaries even contain initials from common phrases to catch users who create passwords that way, so "ouatiagfa" from "Once upon a time in a galaxy far away" may not be a good password either.

The attacker can use variants on dictionary words as well. If the dictionary has "wombat", he might try "Wombat", "w0mbat" and "tabmow" as well, and perhaps also wombat1, wombat2, ... Generating such variants is easily automated. There is a trade-off he can make, doing more work and using more storage versus improving the chances of success.