Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
i did remove all numbers from my list that we dupes (51x2 as each dupe had a twin, etc), so i removed 102 numbers, now my list is completely unique.
what system did you run that on?
but my question still remains, do i see dupes from urandom (doesnt seem likely), or do they get created when using fold (i meant tr or fold in 1st post)?
I don't claim to be an expert and only have a fundamental understanding of how /dev/random works.
In a nutshell pseudo random number generators are not totally random. If you generate enough numbers the sequence will eventually be repeated and they typically use a seed so that that sequence is not the same. I would say that is why the first method has the most duplicates in addition to being a small range.
As to why /dev/urandom generates duplicates maybe due to the running out of entropy caused by the large amount of numbers being generated.
It is actually normal to wind up with a few duplicate numbers if you generate enough of them, and especially if the numeric range of the numbers is small. "51 dupes in a million" is very fine.
What you don't want – and will not get, unless you reset the seed – is a duplicated sequence.
Here's how it works: The probability of any two numbers matching is 1 in 10 billion. That sounds pretty low, but the set of one million numbers defines roughly 500 billion pairs.
(Pair the first number with any of the next 999999 numbers, then pair the second number with any of the remaining 999998 numbers, and so on, until the second-to-last number is paired with the last number. This gives a total of 1000000*999999/2=499999500000 pairs.)
We can estimate the number of duplicates by multiplying the number of pairs sampled by the probability of one pair matching: (1000000*999999/2)*(1/10000000000)=49.99995.
Because it is random, the exact count will vary -- it is possible, though not likely, to have no matches, and likewise it is possible but extremely unlikely that all 1 million numbers will be the same -- but the average number of duplicates will be about 50.
Last edited by Beryllos; 11-23-2017 at 11:47 PM.
Reason: corrected a small inaccuracy in the explanation
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.