Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
When random numbers are too random: Low discrepancy sequences (2017) (demofox.org)
132 points by ibobev on Jan 7, 2024 | hide | past | favorite | 69 comments


I'm reminded of a funny story from Edward Friedkin:

"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible. a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."


physical entropy sources are crucial to cryptographic security; failing to use them properly has been a major source of insecurity in applied cryptography, most notably the debian openssl bug but also in some other cases. linux has had /dev/random and /dev/urandom for this for decades and has more recently added a getrandom() call

also, there are other intentional sources of nondeterminism in current programming environments, such as aslr and the random hash salt python and perl apply to thwart hash-collision dos attacks


There is the wall of lava lamps approach too: https://blog.cloudflare.com/lavarand-in-production-the-nitty...


You're of course unlikely to get the same read, but I find it crazy that their source of cryptographic entropy is on public display from the street!


agreed, it means the fluid-dynamics chaos in the lava lamps is useless for their cryptographic purposes


Not really. You can combine the public random source with a secret key. Since you assume they can keep secret a random source, they can equally keep this key secret. In fact, it's sometimes preferable to get rid of hardware source completely and only rely on a secret key and a state, and obtain a design that is more resistant to fault.


I seem to recall early Windows using mouse movement, to the point where lack thereof would block if entropy was exhausted. Then other things like network rx timing was added to the mix. But what's a system to do in the absence of all such physical inputs?


I half remember a case where the program(some sort of install or copy I think) would complete faster while shaking your mouse around.

I will look for the article.

This was different from the old gpg prompts that instructed to move the mouse around when generating keys.

update: it was probably this effect. https://retrocomputing.stackexchange.com/a/11535


PuTTYgen waits for you to move the mouse across a portion of the screen enough times when generating keys[1], sampling the mouse position for entropy.

[1]: https://the.earth.li/~sgtatham/putty/0.80/htmldoc/Chapter8.h...


Here’s the particular explanation I remember reading a few years back: https://retrocomputing.stackexchange.com/a/11535


I'm an embedded systems programmer. We use LFSRs like everyone else. But I always make a point to feed true sources of entropy into the RNG.

Stuff like this

   if (any input changed)
      RNG_seed ^= sysclk;
It's helped on numerous occasions where two or more identical systems co-exist (share a bus for example). Without entropy the systems RNGs are always in sync, which causes all sorts of problems.

For example, on a CAN network bus collision you're supposed to back off a random amount of time. If the colliding systems RNGs are in sync, they keep colliding over and over endlessly.

Entropy will kick you in the ass if you don't think about it. Which is why I always make sure to include it.

You can always turn this off in DEBUG builds.


That makes sense. I could be wrong but I just have this sneaking suspicion that some systems literally have no inputs, though. Nothing received from the outside world... no HCI or sensors of any kind. But maybe such a simple system isn't likely to need much randomness? In your CAN bus example, pretty much anything transmitting onto the bus is doing so because it knows something interesting, as opposed to expressing something deterministic from its own little stored routine?


not necessarily, it might be transmitting a heartbeat so you can detect that that part of the wiring harness is still turned on. i've seen embedded systems with literally no inputs, like the minipov2, but never one that transmits on a network


So if you wanted a bunch of those to transmit onto the same bus/network, and use exponential backoff to resolve collisions, they would just keep colliding forever, right? Because even if they tried to seed their delay with randomness, they'd all do so the same way every time!


i think so, if they didn't have any other sources of randomness; even the traffic on the can bus would give them all the same 'random' bits if they started at the same time


I was recently waiting on a linux machine to finish a big encrypted file transfer. As soon as I ssh'd in it drastically sped up. My best guess is that it was starved for entropy and logging in gave some.


Desktop / server / mobile chips typically have hardware support for collecting entropy which can be used.


“ physical entropy sources are crucial to cryptographic security”

Of this would be true, then no one would use VMs and cloud instances.


It's my understanding that most clouds (AWS/GCP/Azure) have hardware RNGs available to the VM through something like virtio-rng for exactly this reason. Even if it's not universal (I have no idea), it's certainly something that's available.


as another commenter has pointed out, physical entropy sources can be provided to vms, but sadly the other implied premise of your syllogism is also false: use of insecure cryptography is in fact widespread


> I don’t think it has ever reappeared.

It resurfaced, much later.

https://en.wikipedia.org/wiki/RDRAND

The main application is cryptography. I am not aware of Monte Carlo simulations that use real random numbers. Plus, true RNGs are often biased, so in the end, you still have a PRNG, the true RNG only being used to inject some entropy.


> I am not aware of Monte Carlo simulations that use real random numbers.

Ray tracing? Any sampling bias can be incredibly obvious there.


RT should use low discrepancy sequences


Yeah. Sampling bias is fixed with Cranley Patterson rotation. For real time rendering, blue noise is the thing to use though. At low sample counts you can't converge so you should hide the error perceptually instead.


> So that idea got abandoned, and I don’t think it has ever reappeared.

I have no idea how old your story is... but in any case, there has been an update: Cloudflare is now using lava lamps as "random numbers as a service", and the idea isn't new either, SGI ran it in the late 90s [2].

[1] https://blog.cloudflare.com/randomness-101-lavarand-in-produ...

[2] https://en.wikipedia.org/wiki/Lavarand


"HotBits: Genuine random numbers, generated by radioactive decay", since 1996.

https://www.fourmilab.ch/hotbits/


Sadly:

    Consequently, the radioactively-generated HotBits server has been retired as of the end of 2022 (more precisely, at 00:00 UTC on 2023-01-01).
https://www.fourmilab.ch/hotbits/retired.html

The OG John Walker merch is still available though ...

https://www.fourmilab.ch/evilempire/


D'oh! Missed that when I was looking for the page. Thanks!

Still, it shows the idea did reappear. :)


It depends. For numerical simulations, you’ll want to generate true random numbers to check the robustness of your results. So using a CSPRNG/true random to obtain the seed is very useful. For debugging, you pick an injected seed to start with (+ print the production seed if a specific run had a problem and you want to debug). For cryptographic applications you want as secure a random seed into your CSPRNG as possible.


I was once speaking with a group of AAA game developers at Steam Dev Days discussing how to balance random interactions with gameplay. I think everyone with me had worked on larger teams and discussed complex solutions to create balance. My solution was much simpler and was similar to the random walk example in the article - each event of being grabbed by a zombie was generated based on a random chance that only initialized after X seconds from the last time you were grabbed.

The group of AAA devs laughed it off as not being feasible but our game had market validation (10m+ downloads). It made the game fun and eliminated the "I can't believe I got randomly grabbed twice within 5 seconds" type of interactions that make you want to quit.

We also observed a lot of panicked players jumping, twisting and doing everything but hitting 'shove' the 'V' key - especially streamers. So we added in support for being able to jump repeatedly/shake to get out of a grab.

Random is too random when it comes to gameplay enjoyment for sure.

This is a very good write-up on solutions to the random placement (2D) and random interaction over time (1D) problems without resorting to less savory solutions like I made.


Same with Tetris: Don't randomly select the next tetramino (piece). Then you can get the same multiple times in a row, or long streaks without ever seeing a certain tetramino. Instead, make a "bag" holding one of each tetramino, shuffle it, and use that as the next 7 pieces. Still random, but it's bounded, and as you say not "too random".


Just don't lie to players. In Tetris it is well-known that a bag is used and that is part of the game rules that everyone but the most casual players will know about. Other games, unfortunately, secretly filter randomness behind the scenes to make Gambler's fallacy real, pretending that random events are independent when they're not.


> The group of AAA devs laughed it off as not being feasible but our game had market validation (10m+ downloads).

I would be very surprised if they actually said "feasible", and wonder if you happened to paraphrase as such. Your solution is indeed great if you only need to ensure that grabbing shouldn't happen too frequently, and no different from a temporary invulnerability after the damage in the classic game design (i.e. a validated strategy), but also not much flexible from that point. I'm sure that they have many other things to worry about as well.


This topic reminds me of how dota does it. Which may be common but was the first time I had read about it. When you see “X% chance to happen” the true chance is much lower but increases with each miss. So on a shorter time frames where you’d be paying close attention to the frequency it’ll be correct. But you also are very unlikely to go a long time without it happening or have it happen many times in a row c


Fallout developer Tim on randomness.

https://youtu.be/DqL9R5PqE20


He, heh, I read your sentence as "I was once speaking with a group of AA game developers at Steam Dev Days".

Recovering alcoholics and game developers ... seems something that goes well together :)


I found that the Physically Based Rendering book[1] has some nice material on low discrepancy sequences in chapter 8, specifically 8.6 and 8.7. It also contains some neat comparisons of how it impacts rendering.

[1]: https://pbr-book.org/4ed/contents


It's fun to see quasi-Monte Carlo turn up on Hacker News again. I work in this area. I suspect that there are more applications of it to be found. If you're interested in a deep dive, I've posted some material at the link below.

https://artowen.su.domains/mc/practicalqmc.pdf


Reminds me of the Spotify shuffle algorithm blog post:

https://engineering.atspotify.com/2014/02/how-to-shuffle-son...


Although it's not the case here, I quite dislike whenever this article is brought up because usually it's used to shut up people that says Spotify is "not random enough", ignoring that 1. Regardless of intention, the random is subjectively broken according to a lot of users (Including myself, 3 songs in my playlist become my top 5 listened song solely due to it always "randomly" selected to the top.) 2. This article is old af and may not even be relevant anymore.


Although saying 'our users [are all wrong and stupid and have] fallen victims to Gambler’s fallacy' (obviously my addition in square brackets) is a bad look, the rest of the article details how they adapted the algorithm to make it seem more random even though actually it is less random.

There's no winners here, the randomness purists will say 'well if shuffle plays all the songs in alphabetical order, or all your Metallica songs first, or some other order, they're all perfectly legal permutations and as likely as each other'.

Then the majority, folks who just want to listen to a variety of music, who will say 'how can it possibly be random if all the Metallica songs are together or Pantera songs are together, or Christina Aguilera songs are together?'

I think in the end Spotify did the right thing here by opting to be less mathematically purist, and doing what users actually want from a shuffle.

There is a separate (and maybe unrelated) issue which is that dynamically generated playlists seem to heavily favour well-known songs, which may be what you're referring to.

But yeah, this article is coming up on ten years old now. I would be surprised if the shuffle algorithm is still the same.


Whenever this is posted, it sort of reminds me of the IRS and one way they use to detect fraud.

They observed that sometimes, the amounts in the tax returns were too “random” though not in exactly the same sense as the article.

For example, people manually filling out bad tax returns would never use an amount like $5,000 because the number didn’t feel random enough.

This type of analysis is always fascinating and I still see new applications from time to time.


I think you're referring to Benford's Law [0] and related analysis? It doesn't apply in every context, in particular it assumes values are naturally distributed and have varying numbers of digits. But it is quite effective in those contexts where it is appropriate!

[0]: https://en.m.wikipedia.org/wiki/Benford%2527s_law


There's also Benford's Law, which is related to the growth of numbers, which is used to detect fraud.

https://en.m.wikipedia.org/wiki/Benford%27s_law



This article was written before this one:

https://extremelearning.com.au/unreasonable-effectiveness-of...

You should definitely read that one instead. It's such a great method (and article) that I'll even forgive their use of "unreasonable effectiveness".


Came here to recommend that great post too.

Also, the graphics are much prettier there if you like that kind of thing (which I do).


Discussed at the time (not much, but gwern):

When Random Numbers Are Too Random: Low Discrepancy Sequences - https://news.ycombinator.com/item?id=14439705 - May 2017 (1 comment)


I thought gwern was a word all of a sudden


A totally different but very nice approach to generating uniform-ish node distributions is described in this paper:

https://www.sciencedirect.com/science/article/pii/S089812211...

Useful for PDE and geometry.


Interesting article. The difference between the square root of two and pi sequences is surprising to me; is there some intuitive reason why pi looks so much more regular?


This talks about that more deeply and is by the same author (me) :) https://blog.demofox.org/2020/07/26/irrational-numbers/


BTW, you probably never found the irrationality measure to be very useful, because it's well known that almost all real numbers have irrationality measure 2.


this is very interesting, thanks


For a more comprehensive explanation of this behavior, look into the subject of Diophantine approximation; root 2, phi and other quadratic irrationals are badly approximable numbers (phi being the king of those), and their ability to "evade" rationals for a long time means their multiples are free to scatter around the interval more flexibly than better approximable ones. Diophantine approximation formalizes this heuristic; a good introduction is Khinchin's classic Continued Fractions.


In a similar vein there is there’s this video explaining why phi is somehow “more irrational” than other numbers. It looks like a special case of the same kind of analysis you are talking about

https://m.youtube.com/watch?v=WL_Yzbo1ha4&pp=QAFIAQ%3D%3D


if there's a good rational approximation to some number x = n/m + ε where n and m are integers and ε is small, then every m multiples you get back to almost the same place; only the residual difference ε remains. so you get m clusters of points that are each spread apart by ε. after enough multiples, these clusters grow big enough to overlap, and then you start to see clusters corresponding to the next approximant

the first few approximants of pi using http://canonical.org/~kragen/sw/netbook-misc-devel/contfrac.... are 22/7, 333/106, 355/113, and 103993/33102. the large jump from 22/7 to 333/106 means that ε is tiny in π = 22/7 + ε. so you see 7 cleanly defined groups, and that's all you'll see for a long time, until you have several hundred samples, at which point i think you'll have 106 or 113 clusters, depending on how you squint, and that will remain the case for tens of thousands of samples

by contrast the first few approximants of √2 are 3/2, 7/5, 17/12, 41/29, 99/70, 239/169, 577/408, etc. you never get a large jump from one denominator to the next

btw does anyone know how to wring these out of pari/gp? vecsort(vecsort([bestappr(Pi, i) | i <- [1..10000]],,8), (a, b) -> denominator(a) - denominator(b)) gave me [3, 13/4, 16/5, 19/6, 22/7, 333/106, 355/113], and i don't really know what to make of that. also it's obviously not a good way to get approximants like π ≈ 4272943/1360120. i don't know how to use pari/gp very well


If you are willing to use Sage, here is how to get the list of convergents of pi:

    pcf = continued_fraction(pi)
    pcv = pcf.convergents()
    
    pcv[10] # prints your approximant


thanks!


7 times pi is very close to a whole number (~21.99), so the pattern will repeat with a very slight shift every 7 iterations.


pi cubed is 31

almost certainly unrelated


Calculator on my Mac says you're close.


Not directly related to the phenomenon in the article, Numberphile has a video[1] that goes into why pi has a surprising amount of regularity despite being transcendental.

[1]: https://www.youtube.com/watch?v=sj8Sg8qnjOg


Writing square root of two as a continuous fraction leads to a periodic sequence of coefficients. More generally, a continuous fraction is periodic if and only if the number it represents is a quadratic number, that is, a linear combination of a rational number and the square root of a rational number [1]. It does not exactly answer your question, yet in the sense of continuous fraction at least, square root of two is more structured than pi, and less random.

Edit: the golden ratio is also a quadratic number, so this intuition is wrong in the end!

[1]: https://en.m.wikipedia.org/wiki/Periodic_continued_fraction#...


Had this problem helping 4 kids play a game where one is secretly randomly selected, and the same person got selected more than half the time. However since the game hinges on guessing who the picked person is, it needs to be random!


Interesting Oldschool Runescape recently released some new bosses with a new drop mechanic. Historically a rare drop could mean a 1/5000 dice roll, some people get very lucky but others can get very very unlucky. The new mechanic was to have 3 more common invisible "drops" and then on the 3rd one you get the item. The result compresses the distribution, it's more unlikely to get very lucky or very unlucky.

Time will tell how this impacts the game's enjoyment.


Perhaps related, when I see simulated starry skies, it's always unconvincing. It's weird that we don't have useful realistic way of simulating the distribution of stars.


I always liked this one when it crops up

https://en.wikipedia.org/wiki/Lavarand


These are also called quasirandom numbers. Despite games, another use case is for hyperparameter search for neural networks.

https://github.com/google-research/tuning_playbook?tab=readm...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: