~~
Better random numbers for javascript, v. 0.9
~~~~
Update: 2011-11-09
~~~~
I keep this page since it is currently very often reached from a Hacker News discussion and there is a thread about the XSLT.
But actually, it would be better to go to the new wiki. This page will soon simply redirect there.
~~
ECMAScript provides a standard function, Math.random, which according to the specifications (ECMA-262, 3rd edition, 15.8.2.14) "returns a number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments."
This is good enough for its intended use, viz., to provide an easy way for script authors to write cute animations, games, etc. However, if you want to use it for more serious programming, there are several problems.
Why Math.random should not be used for serious programming
~~
No guarantee of quality
~~~~
"Using an implementation-dependent algorithm or strategy" means that you don't usually know how the "random" numbers are made, and that you cannot examine the code to get an impression of how "random" they are likely to be, even if you know a lot about PRNGs. You essentially have to trust the vendor. As a matter of fact, most Math.random implementations are not good. J R Stockton's Merlyn site provides some proofs of that fact.
~~
No standard way to repeat sequences
~~
In most implementations if not in all, Math.random's generator is silently seeded at startup by some supposedly random value, usually the present time to the millisecond or whatever other granularity the system provides.
This means that there is no easy way to repeat a series of pseudo-random numbers in order, e.g., to determine what went wrong in a simulation. It also means that you cannot give your code to a colleague and expect that she will find what you want to show. Anything that uses Math.random is inherently not portable.
~~
Security
~~
On the other hand, the weakness of many implementations mean that a sequence of outputs of Math.random can often be predicted, even when the intent was that it could not. This may present security risks. Math.random should never be used to generate separators in POST requests, Version 4 UUIDs, session keys, etc.
If one wants to do things like that in javascript, Math.random is simply not good enough. Actually, even a cryptographically secure PRNG would not by itself be enough, you would also need some means of collecting truly random entropy to seed it.
I shall address that problem separately. For now, let us concentrate on making PRNGs with better statistical properties and longer periods, which can be seeded either by something that is repeatable, or by, e.g., the current time if less predictable output is desired. Cryptological security and true, physical randomness are separate issues.
~~
Why it is difficult to write good PRNGs in javascript
~~
Many families of excellent PRNGs have been conceived for other languages. However, javascript's notion of Numbers make most of them less than ideal for our purpose. They can be ported (javascript, like any other decent programming language, is Turing complete), but at a heavy price in speed.
Only 53 significant bits in multiplications
~~~~
Many present algorithms assume exact multiplications on 32-bit unsigned integers, yielding a 64-bit result. Some even multiply 64-bit integers, yielding a 128-bit result. If you want to emulate that in javascript, you have to write multi-precision routines. That is certainly possible, but hardly efficient.
Even worse, most algorithms based on multiplications assume that overflow results in truncation and discard of the high bits, effectively giving modular arithmetic for free. This works fine on unsigned integers, but javascript Numbers are always "double-precision 64-bit binary format IEEE 754 values". Which means that in the case of overflow, it is the low order bits who are discarded.
As an example, let us examine the best-known PRNG of all: the venerable LCG, on 32-bit unsigned integers. If you use Marsaglia's "easy to remember" multiplier 69069, everything works nicely: 69069 being less than 221, x = 69069 * x + 1 >>> 0 will indeed cycle through all the possible values. But if you choose a bigger multiplier modulo 232 with good figures of merit in the spectral test, say, 1103515245 which is used in ANSI C and the glibc library, the result of the multiplication will almost never fit in 53 bits. Low order bits are discarded, which ruins the basis for the algorithm: its operation modulo 232.
~~
Limited and inefficient bit operations
~~
Javascript has the usual set of bitwise operations: AND, OR, XOR, shifts, etc. However, they only apply to 32-bit integers, and since javascript Numbers are always floating point, it means that the Numbers must be converted internally to integers before the operation, and back to floats again when it is done. Smart implementations may attempt to optimise, but on all the platforms I have tried, the results remain slow.
~~
Altogether, precious few of the usual tools are well adapted to javascript. Some ingenuity is required if one wants good performance.
A collection of better PRNGs
~~
I present several PRNGs that should have much better statistical properties than the built-in Math.random. They have been taken from, or inspired by, well-known sources on the subject, and they have been extensively tested. They share a common interface, but the actual algorithms differ, so that one can choose the best for one's platform and needs.
Individual PRNGs in javascript can be viewed and downloaded in their respective entries below. The entire collection of javascript PRNGs can be downloaded here, and the corresponding C versions here.
~~~~
The C versions are what is used for the tests. Those consume about 238 numbers and take more than 24 hours even in C; using javascript is hardly an option.
They are completely equivalent to the javascript versions, as can be seen from the sources and checked by experiments. The only difference in output is when one uses non-ASCII arguments. (For the C versions, the interpretation of such arguments depends on the platform.)
~~
Common interface
~~
We shal take Alea as example, but the other PRNGs - MRG32k3a, KISS07, etc - share the same interface.
Simple usage:
~~~~
var random = [new] Alea([...]);
Calling Alea (or any of the others) with new is unnecessary, but harmless. The optional arguments may be of any number and nature.
The call returns a function, random, that is functionally equivalent to Math.random. That is, successive calls of random() return Number values with positive sign, greater than or equal to 0 but less than 1, chosen pseudo-randomly with approximately uniform distribution over that range. (Hopefully, with a much better approximation of that uniform distribution.)
Example:
~~~~
var random = new Alea('my', 3, 'seeds');
random(); // returns 0.30802189325913787
random(); // returns 0.5190450621303171
random(); // returns 0.43635262292809784
Any implementation of ECMAScript should yield exactly those values.
~~
The internal state of random, and hence the sequence of pseudo-random numbers it returns, is determined by the arguments to Alea. Two functions returned by calls to Alea with the same argument values will return exactly the same sequence of pseudo-random numbers. String and Number arguments should provide repeatable output across platforms. Objects provide repeatable output on the same platform, but not necessarily on others. (The point of using Objects would be to provide thoroughly unpredictable output. As an extreme example, one could try Alea(document) or Alea(this) in a browser.)
If you call Alea(), that is, with no arguments, a single argument of +new Date() is silently assumed. This provides an easy means to provide somewhat unpredictable numbers, like Math.random does.
Example:
~~
var random = Alea();
random(); // returns 0.6198398587293923
random(); // returns 0.8385338634252548
random(); // returns 0.3644848605617881
Your return values should be completely different. (But see below how you can reproduce mine on your computer.)
~~
The generated numbers should be "more random" than in most implementations of Math.random. Any sequence of any pick of the returned bits should pass any known test, unless where mentioned. (Please send me a mail if you have evidence that this is not true. But please remember that "p's happen": if you repeat any test a million times, some of the results are bound to have very suspicious p-values.)
The specification of Math.random says nothing about the precision of the returned fraction. Here, it is guaranteed that the precision is at least 32 bits.
~~
Very often, to get the full attainable precision of 53 bits, the main generator has to be called twice, which takes more time. It would be wasteful to insist on doing that systematically, since a 32 bit resolution is enough in most cases. If you really need 53 bits, the function provides a method described below to get them.
~~
More advanced usage:
~~
Functions being Objects in javascript, the returned function also has two methods, uint32 and fract53, and two properties, version and args.
uint32 is a function that takes no arguments and returns an unsigned random integer in the range [0, 232[.
Example:
~~~~
var intRandom = Alea('').uint32;
intRandom(); // returns 715789690
intRandom(); // returns 2091287642
intRandom(); // returns 486307
Any implementation of ECMAScript should yield exactly those values.
~~
To obtain an integer in [0, n[, one may simply take the remainder modulo n. With some generators, this is faster than using the default function in Math.floor(random() * n) or its clever variants, random() * n | 0 and random() * n >>> 0.
fract53 is a function that takes no arguments and returns a 53-bit fraction in [0, 1[. It is usually slower than the main function, though.
Example:
~~
var rand53 = Alea('').fract53;
rand53(); // returns 0.16665777435687268
rand53(); // returns 0.00011322738143160205
rand53(); // returns 0.17695781631176488
Any implementation of ECMAScript should yield exactly those values.
~~
version is a String that indicates the nature and the version of the generator, like 'Alea 0.9'. (For now, all the generators have a version number of 0.9; those on first versions of this page had a version number of 0.5. The versions may eventually reach 1.0 when they have been quite thoroughly tested, inspected by experts, etc.)
args is an Array of the arguments that were given to Alea. Thus, if you called it without an argument and it assumed +new Date(), you may recover the time stamp that was used in order to repeat the sequence.
Example, assuming random is the function created above without arguments:
~~
var seed = random.args; // seed = 1277182878230 - I wrote the example on 22 June 2010 shortly after 7AM in France (summer time), exactly at 2010-06-22T07:01:18.230+02:00, 1277182878230 milliseconds after the Unix epoch.
random = Alea(1277182878230);
random(); // returns 0.6198398587293923
random(); // returns 0.8385338634252548
random(); // returns 0.3644848605617881
Any implementation of ECMAScript should yield exactly those values.
~~
Common implementation details
~~
Regardless of the differences in their algorithms, internal state, etc, all the generators provided here share a common implementation pattern. First, the internal state is declared. Then, it is seeded. Then, the main function for the algorithm is presented. Finally, the function-object to be returned is built and returned.
Seeding and the Mash function
~~~~
The arguments to Alea (and the other provided generators) are always converted into a String which is hashed by the means of a suitable hash function. (You may test it here.)
That hash function, with its initialised internal state, is provided by another function called Mash, which must therefore be included in any application that uses any of the provided functions.
(Alternatively, if you decide to use only one PRNG, you may simply copy Mash into your copy of the relevant file.)
Each element of the internal state is altered by a hash of each argument. (The hash function preserves its own internal state, so that each of the alterations is likely to be different.) The exact nature of the alteration depends on the nature of the state element - if it is an integer, the hash is simply added, if it is a single bit, it is XOR-ed, if it is a fraction between 0 and 1, the hash is subtracted and the element is incremented by 1 if it has become negative, etc.
Finally, whenever the PRNG algorithm places constraints on certain elements of its internal state, the last stage of the seeding process makes sure that these constraints are met.
~~
At least two random functions, not just one
~~
All the generators provide two functions, the main function which returns 32-bit or 53-bit fractions, and its uint32 method which returns 32-bit unsigned integers. They also provide the fract53 method which is guaranteed to return 53-bit fractions; it may or may not be the same as the main function.
In most algorithms ported from other languages, the algorithm is actually implemented in the uint32 method. (Operating on 53-bit numbers without ever losing a bit to overflow leaves very few possibilities. 32-bit unsigned integers provide handy bit operations in addition to reducing the problems of unintended overflow; most PRNGs operate on integers for that reason.) The "main" container function calls that integer function and multiplies the result by 232, the fract53 method calls it twice, once for its 32 high bits and once again for the 21 lowest.
In the generators of the Lagged Fibonacci family, the PRNG algorithm directly provides the 53-bit fractions. The uint32 method multiplies by 232 and returns the floored result. The fract53 method is the same as the main container function.
In Alea and Kybos, the main PRNG algorithm provides 32-bit fractions. Both uint32 and fract53 are derived.
~~
The generators
~~
MRG32k3a
~~~~
One of Pierre L'Ecuyer's Combined Multiple Recursive PRNGs. It is remarkable for its ingenious use of 53-bit integer arithmetic (originally implemented in C doubles) which makes it very suitable for javascript, as well as for the care in the choice of multipliers and moduli. The period is about 2191, and it passes - of course - L'Ecuyers own TestU01 BigCrush battery of tests. (The Mersenne Twister fails Crush.)
This is surely the most reputable of all the generators presented here. It has been quite extensively tested and it is widely used. It is copyrighted but free for non-commercial uses. Commercial users must request written permission from Professor L'Ecuyer.
~~
Xorshift03
~~
George Marsaglia's 2003 version of his xorshift family of PRNGs. They are among the fastest available in integer arithmetic, but they become much slower in javascript. Also, most variants tend to fail L'Ecuyer's tests which are rather more stringent than Marsaglia's own.
This version, however, passes BigCrush, mainly because it contains a final multiplication of two components. Its period is about 2160.
~~
KISS07
~~
George Marsaglia's 2007 version of his KISS family of PRNGs.
It is deceptively simple. It merely adds the outputs of three small generators which individually would not have the slightest chance of passing any kind of randomness test. Yet, the result passes BigCrush. The period is about 2121.
Its xorshift component makes it rather slow, though. It remains a very attractive generator, because its combination of three completely different algorithms with different periods makes it likely that any bias from one of them is blurred by the two others.
~~
LFIB4
~~
Marsaglia's LFIB4, adapted to subtraction modulo 1 on 53-bit fractions. (One cannot add two 53-bit fractions in [0, 1[ in IEEE double precision without risking the loss of a bit to overflow. One can subtract them, though, and add 1 if the result is negative - the sign bit serves as a temporary 54-th bit.)
The period is very close to 2308, and its combination of four "taps" instead of two makes it more robust than the usual lagged Fibonacci generators. It also makes it just a little slower, although much faster in javascript than integer-based algorithms.
~~
LFib
~~
This is a L(253, 255, 52, -) generator - I have never understood why people insist on 55, 24, especially since one often uses an array of 256 anyway, in order to take advantage of C and similar languages' implicit modulo 256 arithmetic on unsigned char indices. The period is very close to 2307.
LFib and LFIB4 are the fastest of the generators presented here to provide full 53-bit resolution, since that is what they provide natively. However, the simple linear dependency between their last bits makes me somewhat reluctant to recommend them wholeheartedly, even though they pass BigCrush. They fail Michael Niedermayer's tests. All of the other PRNGs presented here pass.
~~
Alea
~~
This generator implements my variation on Marsaglia's Multiply-with-carry theme, adapted to javascript's quaint notion of numbers: the carries are exactly the integer parts of Numbers with exactly 32 bits of fractional part.
Such generators depend crucially on their multiplier, a. It must be less than 221, so that the result of the multiplication fits in 53 bits, and for an array of n 32-bit numbers, a * 232n - 1 must be a safe prime. The period is the corresponding Germain prime, a * 232n - 1 - 1.
The one presented here uses n = 3: just 3 32-bit fractions, which means that one may use three rotating variables without walking through an Array. (Table lookup is rather expensive, time-wise.) The period is close to 2116, it passes BigCrush, and it is the fastest javascript PRNG I know that does so.
I expected such generators with any n up to 256 (or even beyond, if one wants monstrously long periods and can find the appropriate multipliers) to be faster than those relying on integer arithmetics, which they are. But they also turn out to be faster than the lagged Fibonacci generators if one does not insist on 53 bits, much to my surprise.
I therefore propose them as the PRNGs of choice in javascript. I must however confess to the bias induced by more personal involvement in those generators than in the others, which are simple ports in the case of MRG32k3a, Xorshift03 and KISS07, and rather straightforward adaptations in the case of the lagged Fibonaccis.
~~
Kybos
~~
All the generators presented so far have quite well-defined mathematical properties which have been extensively studied. This gives good reasons to consider them satisfactory, but it also raises a suspicion - aren't they too well-behaved? Isn't their elegant mathematical structure by itself a sign that they are not random? Except possibly for KISS07 which combines three quite different ideas, they could be victims of the Mersenne Twister syndrome - a theoretically wonderful algorithm, with actual proofs of good distribution in all dimensions up to a large number and a huge period, which turns out to fail tests that other, less sophisticated but more robust generators pass easily.
Kybos combines Alea with a variant of the Bays-Durham shuffle, using the ultimate non linear tool: table lookup. The original shuffle rarely improves PRNGs in a noticeable way, but the variant used here, in which the "random" numbers in the lookup table are incremented by those from the original generator instead of being replaced, makes some terrible generators pass very stringent tests.
The problem is that one cannot say much about the period or other aspects of the resulting numbers. Which may be what one would expect of anything that is indeed random, but it doesn't make mathematicians happy.
I mainly propose Kybos because it has an extra method, addNoise, which allows one to alter the state of the lookup table using true, physical randomness, without changing anything in the state of the Alea generator that provides the guarantee of a suitably long period and good distribution. The lookup table essentially serves as an entropy pool. Its size of 256 bits makes it suitable for the generation of 128-bit UUIDs.
~~