[SOLVED] Need a fast von Neumann randomness extractor
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I tuned the radio to a noisy channel. This is the hardest part, because noise doesn't always mean randomness.
Maybe you should hash before the von Neuman extractor since the distribution of 0s and 1s in the upper bits might differ from that of the lower bits. I would expect that to be the case whether or not there is clipping during A/D conversion. I only suggest that because it seems like the VN extractor only works well if the data is close to random in the first place. I guess if you're hashing anywhere in there, it might not make a difference where that happens.
Kevin Barry
Thanks for this thread! I learned something here, you guys lit up my week!!!!
Just let me/us know where your "eagle" landed, please
@ sycamorex - it would seem to be a way to find patterns in randomness...did'nt know that hissing on the radio was good for something, goes to prove you can always learn something!!!
@ sycamorex - it would seem to be a way to find patterns in randomness...did'nt know that hissing on the radio was good for something, goes to prove you can always learn something!!!
Excuse my ignorance but I'm totally confused and don't really understand much from the thread. Is the OP trying to extract a truly random data or find patterns in apparent randomness?
yeah, I've been following this thread and wondering why exactly you're doing it. Is it anything to do with encryption or generating keys?
Quote:
Originally Posted by sycamorex
Excuse my ignorance but I'm totally confused and don't really understand much from the thread. Is the OP trying to extract a truly random data or find patterns in apparent randomness?
I want to do both actually. But, my goal is to make a TRNG, true random number generator. I probably will not achieve it, but may come close.
Reading that made me want to try to do something like that.
Technically, I don't have too much use for it, I just want a proof of concept for myself. I trust radioactive decay (hotbits) more than radio, but I have tried samples from random.org and they seem to be of good quality. So then, I wanted to do it myself so I don't have to rely on sites that may go down or have daily limits (hotbits).
Reading that made me want to try to do something like that.
Technically, I don't have too much use for it, I just want a proof of concept for myself. I trust radioactive decay (hotbits) more than radio, but I have tried samples from random.org and they seem to be of good quality. So then, I wanted to do it myself so I don't have to rely on sites that may go down or have daily limits (hotbits).
Ok, it all makes sense now. Thank you for the explanation.
I've tried using the hash before the extractor, but it doesn't make much difference. Either way, both must be used to produce good random output, neither will work well alone. Using the hash first produces slightly smaller output, which is a good thing in theory.
Some more theory:
I am using a short-wave radio for this, because this is HF radio, and is prone to atmospheric and solar interference. This means that there's plenty of noise and potential randomness.
As long as we are discussing it, I found something very interesting in my quest.
Here are algorithms to determine if a file is random and how random.
Code:
The procedure to determine if a file is random or not:
1) Run rawtopgm, view in GIMP
Horizontal stripe patterns on image + patterns or major spikes on histogram => non random
Static noise image + histogram is irregular or flat => 2
2) Run ent
not random => not random
random => random
EDIT: I removed incorrect assumptions. I will do more testing, but I need more hotbits.
The diehard and dieharder tests seem quite useless and time consuming.
Last edited by H_TeXMeX_H; 04-14-2012 at 08:42 AM.
It is hard for me to believe what I just posted, but it is EDIT: actually false. Sorry, I must do more tests on hotbits to fix these unreliable and size dependent tests.
Last edited by H_TeXMeX_H; 04-13-2012 at 10:08 AM.
But, my goal is to make a TRNG, true random number generator. I probably will not achieve it, but may come close.
Since you're already using radio, why not build an USB hardware random number generator? (Or buy one?)
For example, you could start with a $16 Teensy++ to interface to via USB. It has a native USB interface, and is very easy to program using gcc-avr. Then, use a hardware random bit generator (one or more) as a source of random bits, using the von Neumann extractor in the Teensy to avoid any bias in the result. If you have more than one, you can increase the generated bit rate. You can easily write a PRNG like Xorshift on the Teensy (or several!), and perturb its state continuously using the random bits, making it more or less a true TRNG. You can easily provide both TRNG and T/PRNG in the same device. The hardware parts should cost about $25 total, including a nice tin enclosure. The circuit generates EM noise, so a metal enclosure is a must.
The data rate is small enough that Teensy could easily continuously encrypt the output using a cryptographically secure algorithm, if you want truly random bits.
If you use the raw HID interface (the one used by USB mice, keyboards, joysticks, gamepads), you are limited to 512000 bits per second (1000 packets, 64 bytes each, per second; more than enough for several hardware bits), but you don't need any drivers. On any OS.
This might be rather interesting project. A kernel driver to feed the random bits from the above hardware USB device to the entropy pool would be very simple, even including some randomness checks on the bits. Or, one could tell the OpenSSL library to obtain initial PRNG state bits from the USB device, for example from a simple userspace application providing it via a pipe (using the HID interface to the USB device).
I've tried using the hash before the extractor, but it doesn't make much difference. Either way, both must be used to produce good random output, neither will work well alone. Using the hash first produces slightly smaller output, which is a good thing in theory.
Randomness is such a poorly-defined concept. For a probability distribution over a continuous vector space described by a continuous function, given a vector there is usually some neighborhood of vectors nearby that are considered "pretty much the same". With binary data, however, the assumption is that any difference in bit values in essence gives you something completely different. The goal of randomness for binary data is to therefore make all possible combinations of 0s and 1s of a given length seem equally likely. In reality, however, if the bits are independent Bernoulli random variables with p=0.5 then the number of 1s over N bits has a distribution of Binomial(N,0.5). Since the variance is linear wrt N, as the number of bits increases the confidence interval around 0.5*N converges to 0.5*N itself. Since this should also apply to all members of the power set of N bits, that implies that the probability of any discernible order to the bits should decrease as the length of the given subset increases. This obviously isn't possible, since the set containing all of the 1 elements, and all of that set's subsets, are in the power set of N bits. Additionally, there are 2^N elements in the power set, so even for a "modest" sample such as 64 bits it's intractable to verify that the desired randomness exists.
Much like the concept of hashing, where in general there are more sets of data of a given length with a given hash value than there are possible hash values, you can't account for all sources of order in the data. You need to determine the ways that order within the data can be useful to an adversary, which will vary by application, and attempt to make its exploitation disadvantageous. The best way to do that is to obscure how the data is transformed into the "publicly-observable final product", which will prevent inference of the most epistemically-useful ways to attack the data.
Kevin Barry
How about using a Xorshift PRNG, and using the random/non-random data to perturb its state continouously?
Here is a very slow example. Compile it, then run with a single parameter describing how many bytes it outputs per input byte. It will consume the first 16 bytes of standard input to prime the Xorshift generator state, then it will regularly use the input to perturb the generator state. Since Xorshift is supposed to pass the diehard tests, I wonder if the output from this is sufficiently random for you?
Note that I'm not at all sure whether this kind of perturbation (directly xor'ing the input into the generator state) makes sense, and I have not checked whether the output is random or not.
If you don't want to perturb the state at all, you can redirect standard input from /dev/zero, in which case the output is the same as the Wikipedia example code, but with 65536 first values discarded.
Code:
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
static uint32_t xorshift_state[4] = { 0U, 0U, 0U, 0U };
static inline uint32_t xorshift(const uint32_t perturb)
{
uint32_t temp;
temp = xorshift_state[0] ^ (xorshift_state[0] << 11U);
xorshift_state[0] = xorshift_state[1];
xorshift_state[1] = xorshift_state[2];
xorshift_state[2] = xorshift_state[3];
temp ^= (temp >> 8U) ^ xorshift_state[3] ^ (xorshift_state[3] >> 19U);
xorshift_state[3] = temp ^ perturb;
return temp;
}
static uint32_t fgetu32(FILE *const in)
{
uint32_t value;
errno = (fread(&value, sizeof value, 1, in) == 1) ? 0 : ENOENT;
return value;
}
static void fputu32(uint32_t const value, FILE *const out)
{
fwrite(&value, sizeof value, 1, out);
}
int main(int argc, char *argv[])
{
uint32_t value;
int rate, i;
char dummy;
if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n"
"Usage: %s [-h|--help]\n"
" %s rate\n"
"\n"
"This reads binary input, emitting\n"
"rate pseudorandom bytes for each input byte.\n"
"Initial 16 bytes are read for Xorshift state.\n"
"\n"
, argv[0], argv[0]);
return 0;
}
if (sscanf(argv[1], "%d %c", &rate, &dummy) != 1 || rate < 1) {
fprintf(stderr, "%s: Invalid rate.\n", argv[1]);
return 1;
}
/* Prime the Xorshift generator from input. */
xorshift_state[0] = fgetu32(stdin);
xorshift_state[1] = fgetu32(stdin);
xorshift_state[2] = fgetu32(stdin);
xorshift_state[3] = fgetu32(stdin);
if (errno)
return 1;
/* Since a zero state generates zeros, make sure state is nonzero. */
if (!xorshift_state[0] && !xorshift_state[1] &&
!xorshift_state[2] && !xorshift_state[3]) {
/* Use Wikipedia example state. */
xorshift_state[0] = 123456789U;
xorshift_state[1] = 362436069U;
xorshift_state[2] = 521288629U;
xorshift_state[3] = 88675123U;
}
/* Initial state reflects input too much. Skip. */
for (i = 0; i < 65536; i++)
(void)xorshift(0U);
/* Convert rate to the number of unperturbed words per perturbed word. */
rate--;
while (1) {
value = fgetu32(stdin);
if (errno)
return 0;
fputu32(xorshift(value), stdout);
i = rate;
while (i-->0)
fputu32(xorshift(0U), stdout);
}
}
Edited to add: The above is easily feasible on a Teensy, or just about any ARM microcontroller. The 32-bit shifts may need a bit trickery on 16- or 8-bit microcontrollers -- rotating each 8-bit unit separately. It should be fast enough to provide at least some thousands of random bits per second.
Last edited by Nominal Animal; 04-13-2012 at 11:37 AM.
I have finally found a good way to extract random data from a radio signal. After lots of tests I have come to the conclusion that the best way to extract the random data is like this:
1) Use the von Neumann extractor first.
2) Second use the whirlpool hash with 512-bit = 64 byte input (I removed the option to choose other input byte sizes because it doesn't help).
3) Split the file into 1 MB chunks.
4) Check each chunk with a modified version of ent, posted below.
5) Concatenate the chunks that passed into the final file.
Using this method the extraction ratio is 10:1. So for 100 MB input you will get 10 MB of good quality random data out.
So here's the modified version of ent, you'll need the rest of it, this is just ent.c:
Code:
/*
ENT -- Entropy calculation and analysis of putative
random sequences.
Designed and implemented by John "Random" Walker in May 1985.
Multiple analyses of random sequences added in December 1985.
Bit stream analysis added in September 1997.
Terse mode output, getopt() command line processing,
optional stdin input, and HTML documentation added in
October 1998.
Documentation for the -t (terse output) option added
in July 2006.
Replaced table look-up for chi square to probability
conversion with algorithmic computation in January 2008.
For additional information and the latest version,
see http://www.fourmilab.ch/random/
*/
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
#ifdef _WIN32
#include <fcntl.h>
#include <io.h>
#else
#include <unistd.h>
#endif
#include "iso8859.h"
#include "randtest.h"
#define UPDATE "January 28th, 2008"
#define FALSE 0
#define TRUE 1
#ifdef M_PI
#define PI M_PI
#else
#define PI 3.14159265358979323846
#endif
extern double pochisq(const double ax, const int df);
/* HELP -- Print information on how to call */
static void help(void)
{
fprintf(stderr,"ent -- Calculate entropy of file. Call");
fprintf(stderr,"\n with ent [options] [input-file]");
fprintf(stderr,"\n");
fprintf(stderr,"\n Options: -b Treat input as a stream of bits");
fprintf(stderr,"\n -c Print occurrence counts");
fprintf(stderr,"\n -f Fold upper to lower case letters");
fprintf(stderr,"\n -t Terse output in CSV format");
fprintf(stderr,"\n -u Print this message\n");
fprintf(stderr,"\nBy John Walker");
fprintf(stderr,"\n http://www.fourmilab.ch/");
fprintf(stderr,"\n %s\n", UPDATE);
}
/* GETOPT -- Dumb version of getopt for brain-dead Windows. */
#ifdef _WIN32
static int optind = 1;
static int getopt(int argc, char *argv[], char *opts)
{
static char *opp = NULL;
int o;
while (opp == NULL)
{
if ((optind >= argc) || (*argv[optind] != '-'))
{
return -1;
}
opp = argv[optind] + 1;
optind++;
if (*opp == 0)
{
opp = NULL;
}
}
o = *opp++;
if (*opp == 0)
{
opp = NULL;
}
return strchr(opts, o) == NULL ? '?' : o;
}
#endif
/* Main program */
int main(int argc, char *argv[])
{
int i, oc, opt;
long ccount[256]; /* Bins to count occurrences of values */
long totalc = 0; /* Total character count */
char *samp;
double montepi, chip, scc, ent, mean, chisq, compression, montec;
FILE *fp = stdin;
unsigned int overall;
int counts = FALSE, /* Print character counts */
fold = FALSE, /* Fold upper to lower */
binary = FALSE, /* Treat input as a bitstream */
terse = FALSE; /* Terse (CSV format) output */
while ((opt = getopt(argc, argv, "bcftu?BCFTU")) != -1)
{
switch (toISOlower(opt))
{
case 'b':
binary = TRUE;
break;
case 'c':
counts = TRUE;
break;
case 'f':
fold = TRUE;
break;
case 't':
terse = TRUE;
break;
case '?':
case 'u':
help();
return 0;
}
}
if (optind < argc)
{
if (optind != (argc - 1))
{
fprintf(stderr,"Duplicate file name.\n");
help();
return 2;
}
if ((fp = fopen(argv[optind], "rb")) == NULL)
{
fprintf(stderr,"Cannot open file %s\n", argv[optind]);
return 2;
}
}
#ifdef _WIN32
/** Warning! On systems which distinguish text mode and
binary I/O (MS-DOS, Macintosh, etc.) the modes in the open
statement for "fp" should have forced the input file into
binary mode. But what if we're reading from standard
input? Well, then we need to do a system-specific tweak
to make sure it's in binary mode. While we're at it,
let's set the mode to binary regardless of however fopen
set it.
The following code, conditional on _WIN32, sets binary
mode using the method prescribed by Microsoft Visual C 7.0
("Monkey C"); this may require modification if you're
using a different compiler or release of Monkey C. If
you're porting this code to a different system which
distinguishes text and binary files, you'll need to add
the equivalent call for that system. */
_setmode(_fileno(fp), _O_BINARY);
#endif
samp = binary ? "bit" : "byte";
memset(ccount, 0, sizeof ccount);
/* Initialise for calculations */
rt_init(binary);
/* Scan input file and count character occurrences */
while ((oc = fgetc(fp)) != EOF)
{
unsigned char ocb;
if (fold && isISOalpha(oc) && isISOupper(oc))
{
oc = toISOlower(oc);
}
ocb = (unsigned char) oc;
totalc += binary ? 8 : 1;
if (binary)
{
int b;
unsigned char ob = ocb;
for (b = 0; b < 8; b++)
{
ccount[ob & 1]++;
ob >>= 1;
}
}
else
{
ccount[ocb]++; /* Update counter for this bin */
}
rt_add(&ocb, 1);
}
fclose(fp);
/* Complete calculation and return sequence metrics */
rt_end(&ent, &chisq, &mean, &montepi, &scc);
/* Calculate probability of observed distribution occurring from
the results of the Chi-Square test */
chip = pochisq(chisq, (binary ? 1 : 255));
/* Print bin counts if requested */
if (counts)
{
if (terse)
{
fprintf(stderr,"2,Value,Occurrences,Fraction\n");
}
else
{
fprintf(stderr,"Value Char Occurrences Fraction\n");
}
for (i = 0; i < (binary ? 2 : 256); i++)
{
if (terse)
{
fprintf(stderr,"3,%d,%ld,%f\n", i, ccount[i], ((double) ccount[i] / totalc));
}
else
{
if (ccount[i] > 0)
{
fprintf(stderr,"%3d %c %10ld %f\n", i, (!isISOprint(i) || isISOspace(i)) ? ' ' : i, ccount[i], ((double) ccount[i] / totalc));
}
}
}
if (!terse)
{
fprintf(stderr,"\nTotal: %10ld %f\n\n", totalc, 1.0);
}
}
/* Print calculated results */
overall = 1;
compression = ((100 * ((binary ? 1 : 8) - ent) / (binary ? 1.0 : 8.0)));
montec = 100.0 * (fabs(PI - montepi) / PI);
if (!terse)
{
if (ent > (binary ? 0.999999 : 7.9998))
{
fprintf(stderr,"PASS ");
}
else
{
fprintf(stderr,"FAIL ");
overall = 0;
}
fprintf(stderr,"Entropy = %f bits/%s\n", ent, samp);
if (compression < (binary ? 0.000025 : 0.0025))
{
fprintf(stderr,"PASS ");
}
else
{
fprintf(stderr,"FAIL ");
overall = 0;
}
fprintf(stderr,"Compression = %f %%\n", compression);
if (chip < 0.05 || chip > 0.95)
{
fprintf(stderr,"FAIL ");
overall = 0;
}
else
{
fprintf(stderr,"PASS ");
}
fprintf(stderr,"Chi-exceed = %f %%\n", chip * 100);
if (mean < (binary ? 0.4997 : 127.3) || mean > (binary ? 0.5003 : 127.7))
{
fprintf(stderr,"FAIL ");
overall = 0;
}
else
{
fprintf(stderr,"PASS ");
}
fprintf(stderr,"Mean = %f\n", mean);
if (montec > 0.3)
{
fprintf(stderr,"FAIL ");
overall = 0;
}
else
{
fprintf(stderr,"PASS ");
}
fprintf(stderr,"Monte-Carlo = %f %%\n", montec);
if (sqrt(pow(scc,2)) > (binary ? 0.001 : 0.002))
{
fprintf(stderr,"FAIL ");
overall = 0;
}
else
{
fprintf(stderr,"PASS ");
}
fprintf(stderr,"SCC = %f\n", scc);
if (0 == overall)
{
fprintf(stderr,"\nFAIL overall\n");
return 1;
}
else
{
fprintf(stderr,"\nPASS overall\n");
}
}
else
{
fprintf(stderr, "Entropy,Compression,Chi-exceed,Mean,Monte-Carlo,SCC\n");
fprintf(stderr, "%f,%f,%f,%f,%f,%f\n", ent, compression, chip * 100, mean, montec, scc);
}
return 0;
}
And the script to extract randomness:
Code:
#!/bin/sh
# extract randomness in file
# depends on vne, whash, ent (modified version), split, etc.
temp=/tmp/randextr
if test $# != 2
then
echo "Usage: $(basename $0) infile outfile"
exit 1
fi
if test -e "$2"
then
echo "$2 exists"
exit 1
fi
current="$(pwd)"
mkdir "$temp" || exit 1
# start extraction
vne < "$1" | whash > "$temp/vh"
cd "$temp"
if split -d -b 1M -a 5 vh
then
for i in x*
do
echo "$i"
if ent "$i"
then
cat "$i" >> "$current/$2"
fi
done
else
echo "Split failed"
exit 1
fi
rm -rf "$temp"
exit 0
I can get radio data at 44.0 kB/s, so that means I can get about 4 kB/s of good quality random data.
Here's an example of data quality:
Code:
bash-4.1$ ent compiled
PASS Entropy = 7.999979 bits/byte
PASS Compression = 0.000256 %
PASS Chi exceed = 76.254451 %
PASS Mean = 127.489548
PASS Monte Carlo = 0.014112 %
PASS SCC = 0.000082
PASS overall
this is for an 8 MB file.
Last edited by H_TeXMeX_H; 04-18-2012 at 09:10 AM.
Reason: final update, terse option fixed
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.