I thought I’d put out a PSA about the dangers of using bash’s convenient, built-in source of random numbers: $RANDOM.
No, this isn’t the usual lecture about using a cryptographically secure random number generator. There’s lots of situations where you just need a random blob and you’re not worried about malicious attacks. No, this is about why, even in those situations, you need to consider whether $RANDOM is random enough.
For instance, I was just using it to generate unique filenames in a bash loop. I just wanted to be able to generate filenames without worrying about collisions.
However, I overestimated the entropy provided by $RANDOM and underestimated the birthday paradox. I know $RANDOM only gives you a number from 0-32767 (15 bits of entropy), and I know about the birthday paradox, but it’s surprising what the combination of those two can result in.
I was only generating 45 filenames, but I actually encountered a collision. Only 45 numbers from 0-32767 and two are the same? How?!
Well, it’s more likely than you think. Specifically, it’s 3% likely.* Still rare, but likely enough to be plausible that I encountered it by chance.
To give you a sense of how few trials you need to perform to have a chance of encountering a collision, I calculated the probabilities for a range of trials:

All you need is to choose 214 numbers and you’re more likely than not to find a duplicate. It’s hard to see the detail on the low end, so here’s a zoom:

A better alternative
So if you’re trying to generate unique blobs, and you’re doing it more than a dozen or so times (even 12 has a 0.2% chance), here’s a still-simple, but better alternative:
head -c 15 /dev/urandom | base64 | tr -d /
This uses urandom, an industrial-strength source of randomness. It obtains 15 random bytes from it and encodes them into an ASCII blob using base64. This will give you something like dGTA1BHDBEuj5tlQei0v. Still pretty concise, but it contains 120 bits of entropy, enough to prevent collisions no matter how many you generate.
Update: I realized base64 can include a slash, which isn’t great if you’re generating filenames! So I added the tr -d / above, which is a quick solution that’ll just remove any slash characters. It’s unlikely to remove much entropy, but I’d certainly like to hear better solutions.
Update 2: I thought of an even simpler solution:
mktemp -u XXXXXXXXXXXXXXXXXXXX
This avoids the issues with slashes, and always gives you 20 random characters. It only gives you 62 possible characters, compared to the 63 above, but I doubt you wanted + anyway. And according to this Stack Exchange answer, it also gets its randomness from /dev/urandom. Also, if you’re trying to create unique filenames, you just can cut out the middleman and create them with mktemp!
Calculating probabilities yourself
The formula to calculate this is surprisingly hard to find. As usual, the Wikipedia article on a math topic is useless. But I found a pretty simple equation here.
The probability of a collision when picking T numbers between 1 and C is:
You can calculate the probability yourself with this Python 3 function:
from math import factorial def collision_prob(choices, trials): inverse = factorial(choices) / (factorial(choices-trials) * choices**trials) return 1 - inverse
For example, with 45 trials of 2**15 choices, you get the probability I mentioned earlier: 0.03.
*Assuming perfect randomness from $RANDOM. In reality, it uses a linear congruential pseudorandom function which is sort of a lowest common denominator of programming languages. Don’t get me wrong, it’s pretty good for basic uses like this, but it may have properties which make these sorts of collisions even more likely than if it were perfectly random. I say “may” because I’m no mathematician. Don’t rely on that statement if you’re doing something that matters.