/dev/urandom Question

Linux_Kidd · 11-22-2017, 09:13 AM

centOS 6.8 (32bit running in virtual box)

so, i needed some random numbers 10 digits wide

i ran this

Code:

cat /dev/urandom |tr -dc '0-9' |fold -w 10 -n 1000000 >> rand.out

a million rand numbers 10 digits wide.

the rand pool is a 10 billion (minus 1) big (all 9's, etc), and out of a million rand i get 51 dupes.

1million out of 10billion seems like a small amount, and 51 dupes in 1million seems like a lot.

is this urandom rand issue, or am i seeing dupes in my final output due to tr ?

teckk · 11-22-2017, 12:29 PM

Examples:

1 million random numbers

Code:

for i in {1..1000000}:; do echo "$(((RANDOM % 10) * 1234567891))"; done >> rand.txt

Not so great

Code:

uniq -d rand.txt | wc -l
89820

Code:

for i in {1..1000000}:; do echo "$(((RANDOM % 123) * 123456789))"; done >> rand.txt

A little better

Code:

uniq -d rand.txt | wc -l
7915

2.5 million random numbers

Code:

for i in $(od -An -tu4 -N10000000 /dev/urandom); do echo $i >> rand.txt; done

Much better

Code:

uniq -d rand.txt | wc -l
706

You could make them all unique with

Code:

sort -u rand.txt > rand2.txt

uniq -d rand2.txt | wc -l
0

that leaves 2499283 unique numbers

You could then randomize that list with

Code:

sort -R rand2.txt > rand3.txt

Linux_Kidd · 11-22-2017, 01:58 PM

thanks teckk for the reply.

i did remove all numbers from my list that we dupes (51x2 as each dupe had a twin, etc), so i removed 102 numbers, now my list is completely unique.

what system did you run that on?

but my question still remains, do i see dupes from urandom (doesnt seem likely), or do they get created when using fold (i meant tr or fold in 1st post)?

teckk · 11-22-2017, 04:00 PM

You'll get duplicate numbers if you run a random set of numbers
Try a few ways.

Examples:

Code:

while :; do echo "${RANDOM: -1}"; sleep .5; done

while :; do echo $RANDOM | cut -c $((${#RANDOM}-1)); sleep .5; done

while :; do shuf -i 1-10 -n 1; sleep .5; done

while :; do tr -cd 0-9 < /dev/urandom | head -c 1; sleep .5; done

while :; do grep -m1 -ao '[0-9]' /dev/urandom | head -n1; sleep .5; done

while :; do echo $RANDOM | awk '{print substr($0,length,1)}'; sleep .5; done

while :; do awk -v min=1 -v max=10 'BEGIN{srand(); print int(min+rand()*(max-min+1))}'; sleep .5; done

michaelk · 11-22-2017, 04:56 PM

I don't claim to be an expert and only have a fundamental understanding of how /dev/random works.

In a nutshell pseudo random number generators are not totally random. If you generate enough numbers the sequence will eventually be repeated and they typically use a seed so that that sequence is not the same. I would say that is why the first method has the most duplicates in addition to being a small range.

As to why /dev/urandom generates duplicates maybe due to the running out of entropy caused by the large amount of numbers being generated.

https://en.wikipedia.org/wiki/Pseudo...mber_generator
https://stackoverflow.com/questions/...ropy-pool-work
https://lwn.net/Articles/261804/

sundialsvcs · 11-22-2017, 08:54 PM

It is actually normal to wind up with a few duplicate numbers if you generate enough of them, and especially if the numeric range of the numbers is small. "51 dupes in a million" is very fine.

What you don't want – and will not get, unless you reset the seed – is a duplicated sequence.

Beryllos · 11-23-2017, 11:23 PM

By this test, your numbers are random. With perfectly random numbers, we expect about 50 duplicates. This is an example of the "Birthday Problem."

http://mathworld.wolfram.com/BirthdayProblem.html

Here's how it works: The probability of any two numbers matching is 1 in 10 billion. That sounds pretty low, but the set of one million numbers defines roughly 500 billion pairs.

(Pair the first number with any of the next 999999 numbers, then pair the second number with any of the remaining 999998 numbers, and so on, until the second-to-last number is paired with the last number. This gives a total of 1000000*999999/2=499999500000 pairs.)

We can estimate the number of duplicates by multiplying the number of pairs sampled by the probability of one pair matching: (1000000*999999/2)*(1/10000000000)=49.99995.

Because it is random, the exact count will vary -- it is possible, though not likely, to have no matches, and likewise it is possible but extremely unlikely that all 1 million numbers will be the same -- but the average number of duplicates will be about 50.