Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is very easy to happen

For example, if you do a simulation with random numbers, and you're doing random()%NUM where NUM > MAX_RAND

The trick is that MAX_RAND varies between platforms cough Visual C cough



random()%NUM is usually wrong even if NUM < MAX_RAND.

random() is often a linear congruential generator (LCG: http://en.wikipedia.org/wiki/Linear_congruential_generator) for speed and simplicity purposes. LCGs are a multiply, an add and a modulus (the modulus is usually implicit from the machine word size). That means their low bits are highly predictable and not random at all.

    X(n+1) = (a * X(n) + c) mod m
Assume m is a power of 2 since it's usually implemented via machine word wrap-around. If c is relatively prime with m (in order to fill the whole range of m), then it will be odd. a-1 is normally a multiple of 4 since m is a power of two, so a is odd too.

So if X(n) is odd, X(n+1) will be even (o * o+o => o+o => e), and X(n+2) will be odd (o * e+o => e+o => o), and so on, with zero randomness.

So if you're trying to simulate coin flips and use %2, you will get a 1,0,1,0... sequence.


It is often wrong even if random() were to produce perfectly random numbers; random()%NUM, for most values of NUM, will be biased.

For example, if RAND_MAX is 255, random()%10 equals 0, 1, 2, 3, 4, or 5 26/256 of the time, but 6, 7, 8, or 9 only 25/256 of the time.

That's why good libraries have a 'next(maxValue)' function that is more complex than random()%NUM. For example, see lines 251-268 of http://developer.classpath.org/doc/java/util/Random-source.h....


I just tested this quickly (on MacOS 10.9.2), it is not a 1,0,1,0 sequence. (It is repeatable since I'm not seeding)

On Linux, same thing. It even gives the same sequence as the Mac OS version

(it's 100 numbers, "1,0,1,1...0,1," no \n at the end)

./rt | md5sum 7a5a5a0758ca83c95b21906be6052666


@barrkel made a small mistake, confusing rand() and random().

rand() is the earliest C random number generator. Its low-order bits (back in the day) went through a predictable sequence, so rand() & 0x1 was a bad source of random bits.

I don't think that rand() was specified so fully as to make this behavior required, but typical implementations exhibited it, so you could not use rand() for any serious work.

random() came after, does not use a LCG, and thus fixed this problem, so you would not see it if your code calls random(), whose man page says:

  The difference [between random() and rand()] is that rand() produces a much less
  random sequence -- in fact, the low dozen bits generated by rand go through a
  cyclic pattern.  All of the bits generated by random() are usable.  For
  example, `random()&01' will produce a random binary value.
Typically, because of this screwup, people use a third-party generator, like the "Mersenne twister".


LCG can be used for serious work, depending on your definition of serious. You need to use multiplication instead of modulus.


My comments were not specific to C. They were a general statement about runtime library provided random number functions across all languages - a risk to be aware of.

There are other mitigations. For example, Java's RNG uses an LCG, but returns the high 32 bits and uses a 48-bit modulus to counter this weakness.


If you are a computational scientist who is using random() for any part of your simulation, you are an idiot.


Apparently people didn't like what I said. But I stand by it. If you are using the built-in random number generator in your simulations you are doing something VERY stupid.

The built-in random() is not a good source of random numbers for scientific purposes. I've heard too many stories about how people don't think about the random number generator, only to have it bite them in the ass.


> The built-in random() is not a good source of random numbers for scientific purposes.

Unless you are going to be more specific about when it's not good, this is not generally true.

Scientific purposes often includes Monte Carlo methods, and random() in Python uses a Mersenne Twister - a perfectly good match.

Be careful about calling people idiots.


You're probably getting downvotes for how you said it, and that you provided no explanation for what you said. A phrasing that would not clash with HN culture: "If you are a computational scientist who is using random() for any part of your simulation, you are making an enormous mistake."


1 - You don't know what the purpose of the simulation is and what is a good source for every specific case.

2 - You're assuming people don't test their generators and just accept blindly whatever result it gives and/or don't compare results of known cases

3 - You say "If you do X you're an idiot" and provide exactly ZERO alternatives to it. Doesn't look very credible


It's because you labeled these people as idiots. How about telling the truth and admitting that they may just be uninformed and making a mistake? One can be intelligent and, at the same time, make mistakes due to lack of experience, lack of training, lack of <whatever>.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: