Random number generator

While defining true randomness can drift from computer science and mathematics into philosophy, Bruce Schneier has a reasonable criterion for a source of random numbers: "It cannot be reliably reproduced." This article focuses first on the idea of randomness, rather than on how the random or pseudo-random sequence is produced. For properly selected applications, it may be possible to be adequately random with a technique that does not depend on a true random physical process. In other cases, it may be practical to use a combination of physical and computational methods.

A truly random one-time pad may be generated with a combination of measurement of a random physical phenomenon (e.g., thermal noise or radioactive disintegrations), which is then captured by a computer and put into high-density storage. An expert in the physical phenomenon being measured needs to be consulted to determine if postprocessing is needed.

There are a wide range of applications for random or pseudo-random numbers, with various degrees of randomness. A computer game can give an apparently different scenario whenever played, with some simple fairly random physical inputs, such as selected bits from the computer's time of day clock, and, perhaps, the time between the last several keystrokes or mouse movements. In other cases, such as extremely critical cryptography, only the best uncorrelated physical random sequences will do. See related articles for examples of applications.

Random sequences from physical phenomena
Real-world physical phenomena often exhibit randomness, although some it is possible to be random in the short term, yet may exhibit some unexpected patterns or self-similarity. The number of cosmic rays striking an object at a given second is as random a process as is known. The number of disintegration events, in a radioactive material, is often considered random, but consider that if the amount of radioisotope is finite, as more and more half-lives pass, there will be fewer disintegrations. There is no simple answer to the question of whether a downward trend in disintegrations makes that sequence nonrandom. "Many natural sources of random bits may be defective in that they produce biased output bits (so that the probability of a one is different than the probability of a zero), or bits which are correlated with each other."

There are, however, methods that can be used to postprocess such a source to remove the correlation.

One widespread technique is due to John von Neumann. Take the input bits in pairs, discard 11 or 00 pairs, output one if the input is 10 and zero if the input is 01. As long as the input bits are independent, this eliminates bias. For example, suppose the input is 90% ones. The chance of a 10 sequence is .9*.1 while that of a 01 sequence is .1*.9; the two are exactly equal so the output is unbiased. However, this does not hold if the input bits are correlated. Also, in the presence of large bias, the technique can be inefficient. In our example with 90% bias, 81% of pairs are 11 and 1% are 00; these are thrown away.

Another common technique is to apply a cryptographic hash to the input. Suppose we again have 90% bias and we estimate that there is at least .2 bits of randomness per input byte. Hash 800 input bytes and you should have 800*.2=160 bits of randomness. If your hash is SHA with 160-bit output, the whole hash output should be unbiased and adequately random. In practice, one might hash 1000 bytes to give a safety margin, and use only part of the hash output as random output to avoid giving an enemy enough information to determine the internal state of the hash.

Manual generation
In the past, one-time pads were generated by typists who were told to type randomly, but, in practice, had patterns. Limited to manual methods, a not-unreasonable way is to put numbered balls in a bowl, having a blindfolded person take it out, the number recorded, the ball returned to the bowl, and the bowl shaken thorougly before taking the next ball.

This is a point mostly of historical interest, but typist patterns did cause weakness in some one-time pads. It is hard to imagine why manual generation would be useful today.

Pseudorandom number generators
In practical computing, it is convenient to use a pseudorandom number generator (PRNG), which produces a sufficiently varying sequence that, for example, a computer game driven with that generator appears to produce different characteristics each time that it is played. Nevertheless, most PRNGs that are operating system utilities, such as UNIX, will eventually repeat the sequence over a sufficiently long period of time. In addition, the programmer may have the choice of giving an explicit initialization value, which may be called a seed or a nonce, to the PRNG.

Most general-purpose PRNGs will produce the same sequence as long as they are given the same seed, which can even be useful for some software development purposes (see pitfalls in computer simulation, or for such things as being sure that a series of psychological research volunteers all see the same set of events. A given PRNG, however, may be told to use some reasonably random physical event in the computer as the seed, such that the seed is unpredictable.

Donald Knuth, an authority on PRNGs, quoted John von Neumann in a tongue-in-cheek but realistic view of the limits of PRNGs :

It is possible to write very good and very bad pseudorandom number generators. Known pitfalls need to be avoided, both in initialization and in computing the next number.

Known bad ideas
Any PRNG must be initialised with a seed. If you have any concern about someone attacking your generator, such seeds must be truly random. Using something like your computer's clock or your process ID, or even a combination of such things, makes it dangerously easy for an enemy to guess the value and break your generator. Using another PRNG only moves the problem without solving it, since that PRNG must also be intialised.

Several common types of PRNG &mdash; such as linear congruential generators or linear feedback shift registers &mdash; cannot be used directly in cryptographic applications. With well-chosen parameters, they have reasonable statistical properties and are useful for such things as controlling Monte Carlo simulations. However, in the presence of an adversary &mdash; someone who wants to break your cryptosystem or cheat at your game &mdash; they are inadequate, easily broken. They might be used as components in some more complex generator, but they cannot safely be used alone in such applications.

"Cryptanalytic Attacks on Pseudorandom Number Generators" by Kelsey, Schneier, Wagner, and Hall enumerates attacks and assesses vulnerabilities in widely used generators.

Linear congruential
Knuth illustrates the most commonly used PRNG, which he attributes to D.H. Lehmer in 1949. This is an easy generator to implement in software and is very widely used. For example, the rand function in most C libraries works this way.

A linear congruential generator creates successive outputs xn for n = 1,2, 3... using the formula: xi = a * xi-1 + b modulo m The seed is the initial value x0.

Choice of the constants a, b and the modulus m is critical; bad choices can give very weak generators.

The period of such a generator is at most m-1. a, b and m are normally chosen so that this maximum is achieved.

OFB and CTR Sequences
Encryption devices can be used as practical and strong PRNGs, if the output applies some type of masking so an adversary cannot know the full internal state of the generator.

Shift registers
Many random number generators based on shift registers with feedback are possible. These are commonly used in stream ciphers; we describe some in the stream cipher article.

Blum Blum Shub
This is the strongest known generator, with the advantage of relative simplicity, but the disadvantage of being computationally intensive. If it is used to generate numbers that are not needed frequently, it may be useful.

The possibility also exists that one or more processors could be dedicated to BBS computation.

The possibility of pseudo-random methods that are adequately random
Some research indicates that in carefully selected situations, a pseudo-random number generator, which meets a number of cryptographic criteria, may be adequately random for security. The proofs are complex and not universally accepted, but may be true for a useful number of situations.

Random numbers that follow particular statistical distributions
In many applications, the exact sequence is desired to be as random as possible, but the sequence may be under additional constraints to meet specific statistical requirements.

See generating values for normal random variables for an example of generating random numbers that fit a desired distribution.

Testing for Randomness
It is entirely reasonable to have individual numbers recur in a random sequence &mdash; a random sequence may even have a "run" of a number such as 100 354 972 972 972 155 579. Indeed, when typists have been asked to type randomly, they may intuitively avoid runs and produce a less random sequence.

Knuth presents an extensive range of tests, of which a few are mentioned here. It is entirely reasonable to have individual numbers recur in a random sequence &mdash; a random sequence may even have a "run" of a number such as 100 354 972 972 972 155 579. Indeed, when typists have been asked to type randomly, they may intuitively avoid runs and produce a less random sequence.It is entirely reasonable to have individual numbers recur in a random sequence &mdash; a random sequence may even have a "run" of a number such as 100 354 972 972 972 155 579. Indeed, when typists have been asked to type randomly, they may intuitively avoid runs and produce a less random sequence.

Computer simulation
If it is desired to do repeated runs of a simulation, either for software testing or to evaluate variable components under repeatable condition, a PRNG that will reuse the same seed to produce the same sequence is useful.

Cryptography
Cryptographic one-time pads preferably are generated from physical random phenomena. For high-volume ciphers, however, reproducibility is needed, so, while the initialization variables tend to be truly random, a key generator for a stream or block cipher uses a demonstrably strong pseudo-random number generator.