LinuxQuestions.org - [SOLVED] Need a fast von Neumann randomness extractor

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Need a fast von Neumann randomness extractor (https://www.linuxquestions.org/questions/programming-9/need-a-fast-von-neumann-randomness-extractor-938949/)

Need a fast von Neumann randomness extractor

What would be the most efficient way of coding this:
https://en.wikipedia.org/wiki/Bernou...ess_extraction

Code:

input        output

00        discard

01        0

10        1

11        discard

I know bash and some C.

The problem is that this maybe very inefficient. I mean how do I work with two bits at a time. I don't even think I can do this in C efficiently.

Any hints on how to go about this ?

I need it because I want to extract randomness from radio signals, and although hashing the input works well, when I look at the histogram I see that it is not level and that there are many values at the two extremes, maybe due to non-uniformity. I was thinking this extractor would help ... I'm no expert tho, but I will try it.

One thing I have done is use GIMP on a generated pnm and tried to change the histogram by hand. It has worked quite well, but would like to do it a different way if possible.

EDIT: The idea is from:
http://www.entropykey.co.uk/res/down...xplanation.pdf
Basically the von Neumann extractor first, and then the hash (which I have working already).

Hmm, a quick thought...
I'd use Perl to get around it, use a for loop, but step per two, then get bit (n) and bit (n-1) and compare.

A quick table:

Quote:

0 AND NOT 0 = 0 dump
1 AND NOT 1 = 0 dump
1 AND NOT 0 = 1 keep
0 AND NOT 1 = 1 keep

If oyu'd like, I'll try some Perl on it...my Perl is somewhat (very) rusty, but I'd like to try... ;)

Thor

Ok, I think this code works, but do check it:

PHP Code:




// von Neumann extractor
#include <stdio.h>
#include <stdlib.h>

#define WORDSIZE sizeof(int)*8 // in bits

int main (void)
{
    // ! init
    unsigned int input;
    unsigned int counter;
    unsigned int array[WORDSIZE]; // array of bit masks
    
    // init
    unsigned int output = 0;
    unsigned int done = 0;
    unsigned int acounter = 1;
    
    // initialize array
    for (counter = 0; counter < WORDSIZE; counter++)
    {
        array[counter] = acounter;
        acounter *= 2;
    }
    
    fread(&input, sizeof(int), 1, stdin);
    
    while (!feof(stdin))
    {
        for (counter = 0; counter < WORDSIZE; counter += 2)
        {
            if ((input & array[counter]) ^ ((input & array[counter+1]) >> 1))
            {
                output |= (input & array[done]);
                done++;
                if (done == WORDSIZE)
                {
                    fwrite(&output, sizeof(int), 1, stdout);
                    output = 0;
                    done = 0;
                }
            }
        }
        fread(&input, sizeof(int), 1, stdin);
    }    
    
    return 0;
}

I can see that it does something useful, but I can't be 100% sure that it does what I want. It took 2-3 hours to write.

EDIT: I've done more testing and it works very well, especially after hashing. So I do raw -> extractor -> hash, and the output is very much random.

However, I do want someone to look over the code and see if it does what I think it does. I've never coded stuff like this before.

I wrote a slightly optimized version. It uses a 256-byte translation table (small enough to stay in CPU cache) to map each input byte to 0 to 4 output bits. Using a single core on Athlon II X4 640 (3GHz) it translates about 200 MB/s (1600 Mbit/s). (Tested using a 256MB file containing all byte values in sequence.)

This is not the fastest I can do, but going faster gets less portable and progressively harder to read and to maintain.

Depending on how you use the data, you probably want to tweak the input and output buffer sizes.

Code:

#include <unistd.h>

#include <errno.h>

#include <poll.h>



/* Buffer sizes. */

#define  IN_BUFSIZ    65536

#define OUT_BUFSIZ    65536



/* Milliseconds to block at most, if input/output nonblocking. */

#define  IN_MILLISECS 10000

#define OUT_MILLISECS 1000



/* Four high bits denote the number of bits used (0-4, 0/16/32/64),

 * and four low bits are the bit pattern.

*/

static const unsigned char bitmap[256] = {

    0, /*  0: 0 bits, ---- */

    16, /*  1: 1 bits, ---0 */

    17, /*  2: 1 bits, ---1 */

    0, /*  3: 0 bits, ---- */

    16, /*  4: 1 bits, --0- */

    32, /*  5: 2 bits, --00 */

    33, /*  6: 2 bits, --01 */

    16, /*  7: 1 bits, --0- */

    17, /*  8: 1 bits, --1- */

    34, /*  9: 2 bits, --10 */

    35, /*  10: 2 bits, --11 */

    17, /*  11: 1 bits, --1- */

    0, /*  12: 0 bits, ---- */

    16, /*  13: 1 bits, ---0 */

    17, /*  14: 1 bits, ---1 */

    0, /*  15: 0 bits, ---- */

    16, /*  16: 1 bits, -0-- */

    32, /*  17: 2 bits, -0-0 */

    33, /*  18: 2 bits, -0-1 */

    16, /*  19: 1 bits, -0-- */

    32, /*  20: 2 bits, -00- */

    48, /*  21: 3 bits, -000 */

    49, /*  22: 3 bits, -001 */

    32, /*  23: 2 bits, -00- */

    33, /*  24: 2 bits, -01- */

    50, /*  25: 3 bits, -010 */

    51, /*  26: 3 bits, -011 */

    33, /*  27: 2 bits, -01- */

    16, /*  28: 1 bits, -0-- */

    32, /*  29: 2 bits, -0-0 */

    33, /*  30: 2 bits, -0-1 */

    16, /*  31: 1 bits, -0-- */

    17, /*  32: 1 bits, -1-- */

    34, /*  33: 2 bits, -1-0 */

    35, /*  34: 2 bits, -1-1 */

    17, /*  35: 1 bits, -1-- */

    34, /*  36: 2 bits, -10- */

    52, /*  37: 3 bits, -100 */

    53, /*  38: 3 bits, -101 */

    34, /*  39: 2 bits, -10- */

    35, /*  40: 2 bits, -11- */

    54, /*  41: 3 bits, -110 */

    55, /*  42: 3 bits, -111 */

    35, /*  43: 2 bits, -11- */

    17, /*  44: 1 bits, -1-- */

    34, /*  45: 2 bits, -1-0 */

    35, /*  46: 2 bits, -1-1 */

    17, /*  47: 1 bits, -1-- */

    0, /*  48: 0 bits, ---- */

    16, /*  49: 1 bits, ---0 */

    17, /*  50: 1 bits, ---1 */

    0, /*  51: 0 bits, ---- */

    16, /*  52: 1 bits, --0- */

    32, /*  53: 2 bits, --00 */

    33, /*  54: 2 bits, --01 */

    16, /*  55: 1 bits, --0- */

    17, /*  56: 1 bits, --1- */

    34, /*  57: 2 bits, --10 */

    35, /*  58: 2 bits, --11 */

    17, /*  59: 1 bits, --1- */

    0, /*  60: 0 bits, ---- */

    16, /*  61: 1 bits, ---0 */

    17, /*  62: 1 bits, ---1 */

    0, /*  63: 0 bits, ---- */

    16, /*  64: 1 bits, 0--- */

    32, /*  65: 2 bits, 0--0 */

    33, /*  66: 2 bits, 0--1 */

    16, /*  67: 1 bits, 0--- */

    32, /*  68: 2 bits, 0-0- */

    48, /*  69: 3 bits, 0-00 */

    49, /*  70: 3 bits, 0-01 */

    32, /*  71: 2 bits, 0-0- */

    33, /*  72: 2 bits, 0-1- */

    50, /*  73: 3 bits, 0-10 */

    51, /*  74: 3 bits, 0-11 */

    33, /*  75: 2 bits, 0-1- */

    16, /*  76: 1 bits, 0--- */

    32, /*  77: 2 bits, 0--0 */

    33, /*  78: 2 bits, 0--1 */

    16, /*  79: 1 bits, 0--- */

    32, /*  80: 2 bits, 00-- */

    48, /*  81: 3 bits, 00-0 */

    49, /*  82: 3 bits, 00-1 */

    32, /*  83: 2 bits, 00-- */

    48, /*  84: 3 bits, 000- */

    64, /*  85: 4 bits, 0000 */

    65, /*  86: 4 bits, 0001 */

    48, /*  87: 3 bits, 000- */

    49, /*  88: 3 bits, 001- */

    66, /*  89: 4 bits, 0010 */

    67, /*  90: 4 bits, 0011 */

    49, /*  91: 3 bits, 001- */

    32, /*  92: 2 bits, 00-- */

    48, /*  93: 3 bits, 00-0 */

    49, /*  94: 3 bits, 00-1 */

    32, /*  95: 2 bits, 00-- */

    33, /*  96: 2 bits, 01-- */

    50, /*  97: 3 bits, 01-0 */

    51, /*  98: 3 bits, 01-1 */

    33, /*  99: 2 bits, 01-- */

    50, /* 100: 3 bits, 010- */

    68, /* 101: 4 bits, 0100 */

    69, /* 102: 4 bits, 0101 */

    50, /* 103: 3 bits, 010- */

    51, /* 104: 3 bits, 011- */

    70, /* 105: 4 bits, 0110 */

    71, /* 106: 4 bits, 0111 */

    51, /* 107: 3 bits, 011- */

    33, /* 108: 2 bits, 01-- */

    50, /* 109: 3 bits, 01-0 */

    51, /* 110: 3 bits, 01-1 */

    33, /* 111: 2 bits, 01-- */

    16, /* 112: 1 bits, 0--- */

    32, /* 113: 2 bits, 0--0 */

    33, /* 114: 2 bits, 0--1 */

    16, /* 115: 1 bits, 0--- */

    32, /* 116: 2 bits, 0-0- */

    48, /* 117: 3 bits, 0-00 */

    49, /* 118: 3 bits, 0-01 */

    32, /* 119: 2 bits, 0-0- */

    33, /* 120: 2 bits, 0-1- */

    50, /* 121: 3 bits, 0-10 */

    51, /* 122: 3 bits, 0-11 */

    33, /* 123: 2 bits, 0-1- */

    16, /* 124: 1 bits, 0--- */

    32, /* 125: 2 bits, 0--0 */

    33, /* 126: 2 bits, 0--1 */

    16, /* 127: 1 bits, 0--- */

    17, /* 128: 1 bits, 1--- */

    34, /* 129: 2 bits, 1--0 */

    35, /* 130: 2 bits, 1--1 */

    17, /* 131: 1 bits, 1--- */

    34, /* 132: 2 bits, 1-0- */

    52, /* 133: 3 bits, 1-00 */

    53, /* 134: 3 bits, 1-01 */

    34, /* 135: 2 bits, 1-0- */

    35, /* 136: 2 bits, 1-1- */

    54, /* 137: 3 bits, 1-10 */

    55, /* 138: 3 bits, 1-11 */

    35, /* 139: 2 bits, 1-1- */

    17, /* 140: 1 bits, 1--- */

    34, /* 141: 2 bits, 1--0 */

    35, /* 142: 2 bits, 1--1 */

    17, /* 143: 1 bits, 1--- */

    34, /* 144: 2 bits, 10-- */

    52, /* 145: 3 bits, 10-0 */

    53, /* 146: 3 bits, 10-1 */

    34, /* 147: 2 bits, 10-- */

    52, /* 148: 3 bits, 100- */

    72, /* 149: 4 bits, 1000 */

    73, /* 150: 4 bits, 1001 */

    52, /* 151: 3 bits, 100- */

    53, /* 152: 3 bits, 101- */

    74, /* 153: 4 bits, 1010 */

    75, /* 154: 4 bits, 1011 */

    53, /* 155: 3 bits, 101- */

    34, /* 156: 2 bits, 10-- */

    52, /* 157: 3 bits, 10-0 */

    53, /* 158: 3 bits, 10-1 */

    34, /* 159: 2 bits, 10-- */

    35, /* 160: 2 bits, 11-- */

    54, /* 161: 3 bits, 11-0 */

    55, /* 162: 3 bits, 11-1 */

    35, /* 163: 2 bits, 11-- */

    54, /* 164: 3 bits, 110- */

    76, /* 165: 4 bits, 1100 */

    77, /* 166: 4 bits, 1101 */

    54, /* 167: 3 bits, 110- */

    55, /* 168: 3 bits, 111- */

    78, /* 169: 4 bits, 1110 */

    79, /* 170: 4 bits, 1111 */

    55, /* 171: 3 bits, 111- */

    35, /* 172: 2 bits, 11-- */

    54, /* 173: 3 bits, 11-0 */

    55, /* 174: 3 bits, 11-1 */

    35, /* 175: 2 bits, 11-- */

    17, /* 176: 1 bits, 1--- */

    34, /* 177: 2 bits, 1--0 */

    35, /* 178: 2 bits, 1--1 */

    17, /* 179: 1 bits, 1--- */

    34, /* 180: 2 bits, 1-0- */

    52, /* 181: 3 bits, 1-00 */

    53, /* 182: 3 bits, 1-01 */

    34, /* 183: 2 bits, 1-0- */

    35, /* 184: 2 bits, 1-1- */

    54, /* 185: 3 bits, 1-10 */

    55, /* 186: 3 bits, 1-11 */

    35, /* 187: 2 bits, 1-1- */

    17, /* 188: 1 bits, 1--- */

    34, /* 189: 2 bits, 1--0 */

    35, /* 190: 2 bits, 1--1 */

    17, /* 191: 1 bits, 1--- */

    0, /* 192: 0 bits, ---- */

    16, /* 193: 1 bits, ---0 */

    17, /* 194: 1 bits, ---1 */

    0, /* 195: 0 bits, ---- */

    16, /* 196: 1 bits, --0- */

    32, /* 197: 2 bits, --00 */

    33, /* 198: 2 bits, --01 */

    16, /* 199: 1 bits, --0- */

    17, /* 200: 1 bits, --1- */

    34, /* 201: 2 bits, --10 */

    35, /* 202: 2 bits, --11 */

    17, /* 203: 1 bits, --1- */

    0, /* 204: 0 bits, ---- */

    16, /* 205: 1 bits, ---0 */

    17, /* 206: 1 bits, ---1 */

    0, /* 207: 0 bits, ---- */

    16, /* 208: 1 bits, -0-- */

    32, /* 209: 2 bits, -0-0 */

    33, /* 210: 2 bits, -0-1 */

    16, /* 211: 1 bits, -0-- */

    32, /* 212: 2 bits, -00- */

    48, /* 213: 3 bits, -000 */

    49, /* 214: 3 bits, -001 */

    32, /* 215: 2 bits, -00- */

    33, /* 216: 2 bits, -01- */

    50, /* 217: 3 bits, -010 */

    51, /* 218: 3 bits, -011 */

    33, /* 219: 2 bits, -01- */

    16, /* 220: 1 bits, -0-- */

    32, /* 221: 2 bits, -0-0 */

    33, /* 222: 2 bits, -0-1 */

    16, /* 223: 1 bits, -0-- */

    17, /* 224: 1 bits, -1-- */

    34, /* 225: 2 bits, -1-0 */

    35, /* 226: 2 bits, -1-1 */

    17, /* 227: 1 bits, -1-- */

    34, /* 228: 2 bits, -10- */

    52, /* 229: 3 bits, -100 */

    53, /* 230: 3 bits, -101 */

    34, /* 231: 2 bits, -10- */

    35, /* 232: 2 bits, -11- */

    54, /* 233: 3 bits, -110 */

    55, /* 234: 3 bits, -111 */

    35, /* 235: 2 bits, -11- */

    17, /* 236: 1 bits, -1-- */

    34, /* 237: 2 bits, -1-0 */

    35, /* 238: 2 bits, -1-1 */

    17, /* 239: 1 bits, -1-- */

    0, /* 240: 0 bits, ---- */

    16, /* 241: 1 bits, ---0 */

    17, /* 242: 1 bits, ---1 */

    0, /* 243: 0 bits, ---- */

    16, /* 244: 1 bits, --0- */

    32, /* 245: 2 bits, --00 */

    33, /* 246: 2 bits, --01 */

    16, /* 247: 1 bits, --0- */

    17, /* 248: 1 bits, --1- */

    34, /* 249: 2 bits, --10 */

    35, /* 250: 2 bits, --11 */

    17, /* 251: 1 bits, --1- */

    0, /* 252: 0 bits, ---- */

    16, /* 253: 1 bits, ---0 */

    17, /* 254: 1 bits, ---1 */

    0, /* 255: 0 bits, ---- */

};



/* Write all of the data. Return 0 if success, errno otherwise.

*/

static int writeall(const int fd, unsigned char *const data, size_t const size)

{

    unsigned char *const ends = data + size;

    unsigned char      *next = data;

    struct pollfd        list[2];

    ssize_t              bytes;

    int                  saved_errno;



    saved_errno = errno;



    while (next < ends) {



        bytes = write(fd, next, (size_t)(ends - next));



        if (bytes > (ssize_t)0) {

            next += bytes;

            continue;

        }



        /* Library error? */

        if (bytes != (ssize_t)-1)

            return errno = EIO;



        /* Interrupted by a signal? */

        if (errno == EINTR)

            continue;



        /* Nonblocking output? */

        if (errno == EAGAIN || errno == EWOULDBLOCK) {

            list[0].fd = fd;

            list[0].events = POLLOUT | POLLERR | POLLHUP;

            list[0].revents = 0;

            poll((struct pollfd *)&list, 1, OUT_MILLISECS);

            continue;

        }



        /* Write error. */

        return errno;

    }



    /* Success. */

    errno = saved_errno;

    return 0;

}



/* Read some input, up to the buffer full.

 * Return number of bytes read, or -1 with errno set.

*/

static ssize_t readany(const int fd, unsigned char *const data, size_t const size)

{

    struct pollfd list[2];

    ssize_t      bytes;

    int          saved_errno;



    saved_errno = errno;



    while (1) {



        bytes = read(fd, data, size);



        /* End of input, or more data read? */

        if (bytes >= (ssize_t)0) {

            errno = saved_errno;

            return bytes;

        }



        /* Library error? */

        if (bytes != (ssize_t)-1) {

            errno = EIO;

            return (ssize_t)-1;

        }



        /* Interrupted by a signal? */

        if (errno == EINTR)

            continue;



        /* Nonblocking input, and should block? */

        if (errno == EAGAIN || errno == EWOULDBLOCK) {

            list[0].fd = fd;

            list[0].events = POLLIN | POLLPRI;

            list[0].revents = 0;



            poll((struct pollfd *)&list, 1, IN_MILLISECS);

            continue;

        }



        /* I/O error. */

        return (ssize_t)-1;

    }

}



int main(void)

{

    const int            in_fd = STDIN_FILENO;

    unsigned char        in_data[IN_BUFSIZ];



    const int            out_fd = STDOUT_FILENO;

    unsigned char        out_data[OUT_BUFSIZ];

    unsigned char      *out_next = out_data;

    unsigned char *const out_ends = out_data + sizeof out_data;

    unsigned int        out_bits = 0U;

    unsigned int        out_part = 0U;



    ssize_t              bytes;

    int                  result = 0;

    int                  exitcode = 0;



    while (1) {



        bytes = readany(in_fd, in_data, sizeof in_data);



        /* End of input? */

        if (bytes == (ssize_t)0)

            break;



        /* Read error? */

        if (bytes == (ssize_t)-1) {

            exitcode = 1;

            break;

        }



        /* We did receive 'bytes' bytes. */

        {

            unsigned char      *in_next = in_data;

            unsigned char *const in_ends = in_data + bytes;



            while (in_next < in_ends) {

                const unsigned char byte = bitmap[*(in_next++)];

                const unsigned int  part = byte & 15U; /* Bit pattern */

                const unsigned int  bits = byte >> 4U; /* Number of bits */



                /* Add the bits to current temporary part. */

                out_part <<= bits;  /* Shift existing bits up, */

                out_part |= part;  /* to make room for the new bits, */

                out_bits += bits;  /* and add the new bit count to the total. */



                /* Full byte? */

                if (out_bits >= 8U) {

                    /* Save the highest/oldest bits. */

                    *(out_next++) = out_part >> (out_bits - 8U);

                    out_bits -= 8;

                    out_part &= (1U << out_bits) - 1U;



                    /* Time to flush output buffer? */

                    if (out_next >= out_ends) {

                        result = writeall(out_fd, out_data, (size_t)(out_next - out_data));

                        if (result)

                            break;



                        out_next = out_data;

                    }

                }

            }

        }



        /* Write error? */

        if (result)

            break;

    }



    /* Flush output buffer? */

    if (!result && out_next > out_data)

        result = writeall(out_fd, out_data, (size_t)(out_next - out_data));



    /* Note write error in the exit code. */

    if (result)

        exitcode |= 2;



    return exitcode;

}

The core of the code is rather simple: with in_data pointing to the buffer containing bytes bytes,

Code:

            unsigned char      *in_next = in_data;

            unsigned char *const in_ends = in_data + bytes;



            while (in_next < in_ends) {

                const unsigned char byte = bitmap[*(in_next++)];

                const unsigned int  part = byte & 15U; /* Bit pattern */

                const unsigned int  bits = byte >> 4U; /* Number of bits */



                /* Add the bits to current temporary part. */

                out_part <<= bits;  /* Shift existing bits up, */

                out_part |= part;  /* to make room for the new bits, */

                out_bits += bits;  /* and add the new bit count to the total. */



                /* Full byte? */

                if (out_bits >= 8U) {

                    /* Save the highest/oldest bits. */

                    *(out_next++) = out_part >> (out_bits - 8U);

                    out_bits -= 8;

                    out_part &= (1U << out_bits) - 1U;

Above, part contains the actual bits, bits the number of valid (low) bits in part. A larger accumulator, out_part with out_bits low bits in it collects the bit groups, then stores full bytes into a buffer. Note that highest bits are the oldest in out_part.

The bitmask array maps each input byte into two nibbles (high and low four bits):(the number of) bits and part. I used a trivial generator program to compute the array. (Note: And I got it wrong on the first edit. This is the fixed version.)

When a full byte is extracted, we need to extract the high bits (since they are the oldest bits), and keep the excess (low/new) bits intact. Masking the input (the last line) is not really necessary, since we do shift the value before adding the new bits (i.e. new bits are always added to clear bits) -- in other words the results are the same if you remove the masking altogether -- but I like to be careful.

The program will work with both blocking and non-blocking standard input and output.

Let me know if you want it in a "library" form you can embed in your own code.

I'll take a look at your own code next, H_TeXMeX_H.

Edit: H_TeXMeX_H, your code should work perfectly, as far as I can tell. I can see nothing wrong in it. (I did not test it, though.)

Your code will be limited by stdio.h (fread() and fwrite()) speed; that is why I used low-level I/O instead. You can alleviate that by reading and writing larger buffers at once, or you too can switch to low-level I/O.

Also, my tests indicate that on an Athlon II X4 640, doing bit pairs at a time takes about 5x to 6x the time the lookup takes, thus incurring > 5x slowdown compared to my lookup method.

Note: This is the fixed version, which only works with architectures that have 8-bit unsigned chars; the code does not check that, though. Look at my post #10 in this thread for a better example code. It is slightly faster, too, and much more portable.

Wow, that's way past my abilities. Many thanks for taking the time to write it. I will try it right away :)

Ok, I have fixed my code and it works properly now, there was a mistake which I missed:

PHP Code:




// von Neumann extractor
#include <stdio.h>
#include <stdlib.h>

#define WORDSIZE sizeof(int)*8 // in bits

int main (void)
{
    // ! init
    unsigned int input;
    unsigned int counter;
    unsigned int array[WORDSIZE]; // array of bit masks
    
    // init
    unsigned int output = 0;
    unsigned int done = 0;
    unsigned int acounter = 1;
    
    // initialize array
    for (counter = 0; counter < WORDSIZE; counter++)
    {
        array[counter] = acounter;
        acounter *= 2;
    }
    
    while (fread(&input, sizeof(int), 1, stdin))
    {
        for (counter = 0; counter < WORDSIZE; counter += 2)
        {
            if ((input & array[counter]) ^ ((input & array[counter+1]) >> 1))
            {
                if (counter > done)
                {
                    output |= ((input & array[counter]) >> (counter-done));
                }
                else if (counter < done)
                {
                    output |= ((input & array[counter]) << (done-counter));
                }
                else
                {
                    output |= (input & array[counter]);
                }
                done++;
                if (done == WORDSIZE)
                {
                    fwrite(&output, sizeof(int), 1, stdout);
                    output = 0;
                    done = 0;
                }
            }
        }
    }    
    
    return 0;
}

My code now works better than I could have hoped. The output is random as is, without any hashing, but even better with hashing.

However, there is something unusual...

@ Nominal Animal

Using your code I don't get the same result. This is strange. I tested results from both and mine produces more random output. I'm not sure of the cause. Yours is most certainly the fastest. I will try to investigate into the cause of this discrepancy.

Quote:

Originally Posted by H_TeXMeX_H (Post 4649292)

@ Nominal Animal

Using your code I don't get the same result. This is strange.

Is it just because we build the result bytes in opposite orders? In my code, highest bit in each output byte is the earliest from input; in yours, the lowest bit is. I'll write some test cases to check my code. I often make mistakes, so it is more than likely there is a bug there.

Edit: Oh, yes, I mangled the output bit order. Fixed now; both my codes yield the same output, the one I outline in post #9 in this thread.

Oh, ok, the output order may be it. I'll run some more tests.

Yes, there was a bug on my code (in the bitmask array and how it was used). It's fixed now.

Here is the table of bits you should get from eight-bit bytes, with - indicating discarded bit pair, zero 01 and one 10 bit pairs, with most significant bits left. First row corresponds to values 0 to 15, and last row to values 240 to 255:

Code:

---- ---0 ---1 ---- --0- --00 --01 --0- --1- --10 --11 --1- ---- ---0 ---1 ----

-0-- -0-0 -0-1 -0-- -00- -000 -001 -00- -01- -010 -011 -01- -0-- -0-0 -0-1 -0--

-1-- -1-0 -1-1 -1-- -10- -100 -101 -10- -11- -110 -111 -11- -1-- -1-0 -1-1 -1--

---- ---0 ---1 ---- --0- --00 --01 --0- --1- --10 --11 --1- ---- ---0 ---1 ----

0--- 0--0 0--1 0--- 0-0- 0-00 0-01 0-0- 0-1- 0-10 0-11 0-1- 0--- 0--0 0--1 0---

00-- 00-0 00-1 00-- 000- 0000 0001 000- 001- 0010 0011 001- 00-- 00-0 00-1 00--

01-- 01-0 01-1 01-- 010- 0100 0101 010- 011- 0110 0111 011- 01-- 01-0 01-1 01--

0--- 0--0 0--1 0--- 0-0- 0-00 0-01 0-0- 0-1- 0-10 0-11 0-1- 0--- 0--0 0--1 0---

1--- 1--0 1--1 1--- 1-0- 1-00 1-01 1-0- 1-1- 1-10 1-11 1-1- 1--- 1--0 1--1 1---

10-- 10-0 10-1 10-- 100- 1000 1001 100- 101- 1010 1011 101- 10-- 10-0 10-1 10--

11-- 11-0 11-1 11-- 110- 1100 1101 110- 111- 1110 1111 111- 11-- 11-0 11-1 11--

1--- 1--0 1--1 1--- 1-0- 1-00 1-01 1-0- 1-1- 1-10 1-11 1-1- 1--- 1--0 1--1 1---

---- ---0 ---1 ---- --0- --00 --01 --0- --1- --10 --11 --1- ---- ---0 ---1 ----

-0-- -0-0 -0-1 -0-- -00- -000 -001 -00- -01- -010 -011 -01- -0-- -0-0 -0-1 -0--

-1-- -1-0 -1-1 -1-- -10- -100 -101 -10- -11- -110 -111 -11- -1-- -1-0 -1-1 -1--

---- ---0 ---1 ---- --0- --00 --01 --0- --1- --10 --11 --1- ---- ---0 ---1 ----

In hexadecimal, the above zero and one bits correspond to 64 bytes,

Code:

42 dd 08 04 53 42 de 96  f7 f7 42 dd 08 04 53 42

01 00 08 24 64 04 53 52  2a 6c ed 4d 08 04 53 42

de 96 f7 f7 a5 a4 4c b5  76 96 f7 f6 6e fd ff df

de 96 f7 f7 42 dd 08 04  53 42 de 96 f7 f7 42 dd

The byte values are a very good test vector, which you can generate in e.g. Bash using

Code:

for ((i=0; i<256; i++)) ; do h=$(printf '%02x' $i) ; printf "\x$h" ; done

Here is my von Neumann randomness extractor, which provides the output described in my post above.

Instead of hardcoding the array, I precalculate it at the beginning of the program. This way the code should compile on architectures with different byte size (as long as it is at most the size of an unsigned int, as I use an unsigned int bit accumulator).

The precalculator, init_random(), simply takes each possible input value (unsigned char) and converts it bit pair by bit pair. It consumes and saves ascending bits, starting with the lowest bits first. The value loop is in descending order, but since each value is a self-contained unit, this does not matter. While calculating bit pairs is slow compared to using a lookup, doing it once for each possible input value does not take appreciable time, and allows portable code. Therefore it is worthwhile.

You might notice in the inner loop that I handle the bit accumulator has more than a word and bit accumulator has exactly a word cases separately. It turns out that on my architecture and gcc-4.6.1, this is ten percent faster.

This version is roughly 5% faster on my architecture than my previous code in this thread, handling about 218000000 bytes of input per second on an Athlon II X4 640.

Code:

#include <unistd.h>

#include <limits.h>    /* For CHAR_BIT */

#include <poll.h>

#include <errno.h>



/* Input and output buffer sizes */

#define  OUTPUT_CHARS          65536

#define  INPUT_CHARS          65536



/* Maximum polling durations in milliseconds */

#define  OUTPUT_POLL_MILLISECS  10000

#define  INPUT_POLL_MILLISECS  10000



/* Number of random bit in each unsigned char */

static unsigned char random_bits[1 << CHAR_BIT];



/* Random bit patterns for each unsigned char */

static unsigned char random_pattern[1 << CHAR_BIT];



/* Initialize the above randomness arrays.

*/

static void init_random(void)

{

    unsigned int    i = 1U << CHAR_BIT;

    unsigned char  bit_pattern, bit_count, left;



    while (i-->0U) {



        left = i;

        bit_pattern = 0U;

        bit_count = 0U;



        while (left)

            if ((left & 3U) == 1U) {

                bit_count++;

                left >>= 2U;



            } else

            if ((left & 3U) == 2U) {

                bit_pattern |= 1U << bit_count;

                bit_count++;

                left >>= 2U;



            } else

                left >>= 2U;



        random_bits[i] = bit_count;

        random_pattern[i] = bit_pattern;

    }

}



/* Output buffer. */

static int                  output_descriptor = STDOUT_FILENO;

static unsigned char        output_data[OUTPUT_CHARS];

static unsigned char        *output_next = output_data;

static unsigned char *const  output_ends = output_data + sizeof output_data;



/* Returns zero if success, nonzero errno otherwise.

*/

static int output_flush(void)

{

    struct pollfd              list[1];

    const unsigned char        *head = output_data;

    const unsigned char *const  tail = output_next;

    ssize_t                    bytes;



    output_next = output_data;



    while (head < tail) {



        bytes = write(output_descriptor, head, (size_t)(tail - head));



        if (bytes > (ssize_t)0) {

            head += bytes;

            continue;



        } else

        if (bytes != (ssize_t)-1) {

            return errno = EIO;



        } else

        if (errno == EINTR) {

            /* Interrupted by a signal. */

            continue;



        } else

        if (errno == EAGAIN || errno == EWOULDBLOCK) {

            /* Nonblocking I/O. Block until output is possible. */

            list[0].fd = output_descriptor;

            list[0].events = POLLOUT | POLLERR | POLLHUP;

            list[0].revents = 0;

            /* Ignore poll result, */

            poll((struct pollfd *)&list, 1, OUTPUT_POLL_MILLISECS);

            /* since we retry write anyway. */

            continue;

        } else {

            /* Write error. */

            return errno;

        }

    }



    /* Success. */

    return 0;

}



/* Helper function to read any data.

 * Returns the number of bytes actually read,

 * or zero with errno zero if end of input,

 * or zero with errno set if read error.

*/

static size_t readany(const int descriptor, void *const data, const size_t size)

{

    struct pollfd  list[0];

    ssize_t        bytes;



    while (1) {



        bytes = read(descriptor, data, size);



        if (bytes > (ssize_t)0) {

            return (size_t)bytes;



        } else

        if (bytes == (ssize_t)0) {

            errno = 0;

            return (size_t)0;



        } else

        if (bytes != (ssize_t)-1) {

            errno = EIO;

            return (size_t)0;



        } else

        if (errno == EINTR) {

            /* Interrupted by a signal; retry */

            continue;



        } else

        if (errno == EAGAIN || errno == EWOULDBLOCK) {

            /* Nonblocking I/O; poll */

            list[0].fd = descriptor;

            list[0].events = POLLIN | POLLPRI;

            list[0].revents = 0;

            /* Ignore poll() return values, */

            poll((struct pollfd *)&list, 1, INPUT_POLL_MILLISECS);

            /* since we retry the read anyway. */

            continue;



        } else {

            /* Read error; reason already in errno. */

            return (size_t)0;

        }

    }

}



int main(void)

{

    unsigned char  input_data[INPUT_CHARS];

    size_t          input_size;



    int            curr_bits = 0;

    unsigned int    curr_pattern = 0U;



    int            exit_status = 0;



    init_random();



    while (1) {



        input_size = readany(STDIN_FILENO, input_data, sizeof input_data);

        if (!input_size) {

            if (errno)

                exit_status |= 1; /* For read error */

            break;

        }



        {

            const unsigned char      *head = input_data;

            const unsigned char *const tail = input_data + input_size;



            while (head < tail) {

                const int          bits = random_bits[*head];

                const unsigned int pattern = random_pattern[*head];



                head++;



                if (!bits)

                    continue;

                    

                curr_pattern = (curr_pattern << bits) | pattern;

                curr_bits += bits;



                if (curr_bits > CHAR_BIT) {



                    *(output_next++) = curr_pattern >> (curr_bits - CHAR_BIT);

                    curr_pattern &= (1U << (curr_bits - CHAR_BIT)) - 1U;

                    curr_bits -= CHAR_BIT;



                    if (output_next >= output_ends)

                        if (output_flush()) {

                            exit_status |= 2;

                            break;

                        }



                } else

                if (curr_bits == CHAR_BIT) {



                    *(output_next++) = curr_pattern;

                    curr_pattern = 0U;

                    curr_bits = 0U;



                    if (output_next >= output_ends)

                        if (output_flush()) {

                            exit_status |= 2;

                            break;

                        }



                }

            }



            if (exit_status)

                break;



        }

    }



    if (output_flush())

        exit_status |= 2;



    return exit_status;

}

Thanks, I think that fixed it. It now seems to generate equivalent randomness to mine. It is about 7 times faster, so many thanks again :)

I will mark this solved.

Here's another take on it, if you're interested. It's shorter than the last one Nominal Animal posted, but about 7% slower. I'm pretty sure the difference is due to more stack usage (i.e. less can be taken care of in registers.)

Code:

#include <stdio.h>

#include <stdlib.h>

#include <limits.h>

#include <unistd.h>

#include <stdint.h>





int main(int argc, char *argv[])

{

        const unsigned int buffer_size = sysconf(_SC_PAGESIZE);

        unsigned char *input_buffer = malloc(buffer_size), *output_buffer = malloc(buffer_size);

        ssize_t read_size, output_size = 0;

        unsigned char *byte;

        int I;



        typedef uint64_t working_type;

        working_type input = 0, output = 0, bit = 1;



        while ((read_size = read(STDIN_FILENO, input_buffer, buffer_size)) > 0)

        for (byte = input_buffer; byte < input_buffer + read_size; byte += sizeof input)

        for (I = 0, input = *(working_type*) byte; I < sizeof input * CHAR_BIT; I += 2, input >>= 2)

        if ((input & 3) == 1 || (input & 3) == 2) //(using '(input & 3) % 3' looks cleaner, but it's slower)

        {

        if (input & 1) output |= bit;

        if (!(bit <<= 1))

        {

        *(working_type*) (output_buffer + output_size) = output;

        if ((output_size += sizeof output) >= buffer_size)

          {

        write(STDOUT_FILENO, output_buffer, buffer_size);

        output_size = 0;

          }

        output = 0;

        bit    = 1;

        }

        }



        write(STDOUT_FILENO, output_buffer, output_size);



        return 0;

}

Kevin Barry

edit: Actually, it takes 3.5x longer when reading from a file and dumping to /dev/null. Sorry about the false claim!

Thanks for the other take. I will try to understand it as well :)

I do not understand one thing:

Using the test sequence (0-255), the output for mine and ta0kira:

Code:

0x00000000: DD42F7F7 96DE4253 0408DD42 F7F796DE    ÝB÷÷.ÞBS..ÝB÷÷.Þ

0x00000010: DFFFFD6E F6F79676 B54CA4A5 F7F796DE    ßÿýnö÷.vµL¤¥÷÷.Þ

0x00000020: 42530408 4DED6C2A 52530464 24080001    BS..Míl*RS.d$...

0x00000030: 42530408 DD42F7F7 96DE4253 0408DD42    BS..ÝB÷÷.ÞBS..ÝB

Output for Nominal Animal:

Code:

0x00000000: 42DD0804 5342DE96 F7F742DD 08045342    BÝ..SBÞ.÷÷BÝ..SB

0x00000010: 01000824 64045352 2A6CED4D 08045342    ...$d.SR*líM..SB

0x00000020: DE96F7F7 A5A44CB5 7696F7F6 6EFDFFDF    Þ.÷÷¥¤Lµv.÷önýÿß

0x00000030: DE96F7F7 42DD0804 5342DE96 F7F742DD    Þ.÷÷BÝ..SBÞ.÷÷BÝ

How does this difference appear ? It doesn't seem to be reversed completely, only parts of it. It is hard to understand. I know all of these work just fine, but I can't understand the difference.

Quote:

Originally Posted by H_TeXMeX_H (Post 4650371)

Thanks for the other take. I will try to understand it as well :)

I ran mine differently and it's slower than I thought (see my edit.)

Quote:

Originally Posted by H_TeXMeX_H (Post 4650371)

How does this difference appear ? It doesn't seem to be reversed completely, only parts of it. It is hard to understand. I know all of these work just fine, but I can't understand the difference.

I think the difference is where the new bit is placed in the output. I put the new bit in the next-lowest bit of a 64-bit integer. If I'd used a 32-bit or 8-bit then the output would be different. It would also be different if I shifted the old value and placed the new bit in bit 1. I think Nominal Animal did one or more of those things differently.
Kevin Barry

I see, so it really depends on many things. Still the output is equivalent, so I can use it.

In case anyone is wondering what I'm doing with all this.

I hooked up a radio headphone jack to my mic jack using an audio recording cable, something like:
http://www.sonicelectronix.com/item_...he-AUX3BL.html

I tuned the radio to a noisy channel. This is the hardest part, because noise doesn't always mean randomness. I did tests on the output using ENT and this script:

Code:

#!/bin/sh



if test $# != 1

then

        echo "Usage: $(basename $0) input"

        echo 'input: input file'

        exit 1

fi



side="$(stat -c '%s' "$1" | awk '{ print int($1^(1/2)) }')"



rawtopgm "$side" "$side" "$1"



exit 0

Once I have a decent channel, I have to extract randomness as much as possible. The results vary.

As per the schema I posted in the OP, I first run a von Neumann extractor, any of these will work just fine, but performance differs. Nominal Animal's code is unsurpassed.

Then I run a hash on blocks of that output. This was hard to code, probably because I'm not that good at C. However, I have coded a small add-on to the whirlpool hash reference implementation. This is the part that I changed:

Code:

int main(void) {

        

        // ! init

        struct NESSIEstruct w;

        u8 digest[DIGESTBYTES];

        

        // ! free

        unsigned char * buffer;

        

        // malloc

        buffer = (unsigned char*) malloc (DIGESTBYTES);

        if ( buffer == NULL ) 

        { 

                fprintf(stderr, "ERROR: not enough memory !\n"); 

                exit(2); 

        }

        

        // hash

        while (fread(buffer, DIGESTBYTES, 1, stdin))

        {

                NESSIEinit(&w);

                NESSIEadd(buffer, DIGESTBITS, &w); // in bits

                NESSIEfinalize(&w, digest);

                

                fwrite(digest, DIGESTBYTES, 1, stdout);

        }

        

        // free

        free(buffer);

        

        return 0;

}

The output is quite random, and I would say may be good enough to use as such, at least according to the tests. You should do your own tests tho, because there's a lot of variables here.

I can just pipe the output from the radio like this:

Code:

rrec | vnemed | whash > output

rrec is just a wrapper for 'arecord -c 1 -r 44100 -t raw', the rest are the extractor and whirlpool hash.

yeah, I've been following this thread and wondering why exactly you're doing it:). Is it anything to do with encryption or generating keys?

Quote:

Originally Posted by H_TeXMeX_H (Post 4650415)

I hooked up a radio headphone jack to my mic jack using an audio recording cable, something like:
http://www.sonicelectronix.com/item_...he-AUX3BL.html

I tuned the radio to a noisy channel. This is the hardest part, because noise doesn't always mean randomness.

Maybe you should hash before the von Neuman extractor since the distribution of 0s and 1s in the upper bits might differ from that of the lower bits. I would expect that to be the case whether or not there is clipping during A/D conversion. I only suggest that because it seems like the VN extractor only works well if the data is close to random in the first place. I guess if you're hashing anywhere in there, it might not make a difference where that happens.
Kevin Barry

Thanks for this thread! I learned something here, you guys lit up my week!!!!

:D

Just let me/us know where your "eagle" landed, please :)
@ sycamorex - it would seem to be a way to find patterns in randomness...did'nt know that hissing on the radio was good for something, goes to prove you can always learn something!!!

Thanks

Thor

Quote:

Originally Posted by Thor_2.0 (Post 4650532)

@ sycamorex - it would seem to be a way to find patterns in randomness...did'nt know that hissing on the radio was good for something, goes to prove you can always learn something!!!

Excuse my ignorance but I'm totally confused and don't really understand much from the thread. Is the OP trying to extract a truly random data or find patterns in apparent randomness?

Quote:

Originally Posted by sycamorex (Post 4650422)

yeah, I've been following this thread and wondering why exactly you're doing it:). Is it anything to do with encryption or generating keys?

Quote:

Originally Posted by sycamorex (Post 4650537)

Excuse my ignorance but I'm totally confused and don't really understand much from the thread. Is the OP trying to extract a truly random data or find patterns in apparent randomness?

I want to do both actually. But, my goal is to make a TRNG, true random number generator. I probably will not achieve it, but may come close.

I should have linked this before:
http://www.random.org/history/

Reading that made me want to try to do something like that.

Technically, I don't have too much use for it, I just want a proof of concept for myself. I trust radioactive decay (hotbits) more than radio, but I have tried samples from random.org and they seem to be of good quality. So then, I wanted to do it myself so I don't have to rely on sites that may go down or have daily limits (hotbits).

Quote:

Originally Posted by H_TeXMeX_H (Post 4650794)

Ok, it all makes sense now. Thank you for the explanation.

I've tried using the hash before the extractor, but it doesn't make much difference. Either way, both must be used to produce good random output, neither will work well alone. Using the hash first produces slightly smaller output, which is a good thing in theory.

Some more theory:

I am using a short-wave radio for this, because this is HF radio, and is prone to atmospheric and solar interference. This means that there's plenty of noise and potential randomness.

Also see:
http://en.wikipedia.org/wiki/Noise_%28radio%29
http://en.wikipedia.org/wiki/Cosmic_noise

As long as we are discussing it, I found something very interesting in my quest.

Here are algorithms to determine if a file is random and how random.

Code:

The procedure to determine if a file is random or not:



1) Run rawtopgm, view in GIMP



Horizontal stripe patterns on image + patterns or major spikes on histogram => non random



Static noise image + histogram is irregular or flat => 2



2) Run ent



not random => not random

random => random

EDIT: I removed incorrect assumptions. I will do more testing, but I need more hotbits.

The diehard and dieharder tests seem quite useless and time consuming.

It is hard for me to believe what I just posted, but it is EDIT: actually false. Sorry, I must do more tests on hotbits to fix these unreliable and size dependent tests.

Quote:

Originally Posted by H_TeXMeX_H (Post 4650794)

But, my goal is to make a TRNG, true random number generator. I probably will not achieve it, but may come close.

Since you're already using radio, why not build an USB hardware random number generator? (Or buy one?)

For example, you could start with a $16 Teensy++ to interface to via USB. It has a native USB interface, and is very easy to program using gcc-avr. Then, use a hardware random bit generator (one or more) as a source of random bits, using the von Neumann extractor in the Teensy to avoid any bias in the result. If you have more than one, you can increase the generated bit rate. You can easily write a PRNG like Xorshift on the Teensy (or several!), and perturb its state continuously using the random bits, making it more or less a true TRNG. You can easily provide both TRNG and T/PRNG in the same device. The hardware parts should cost about $25 total, including a nice tin enclosure. The circuit generates EM noise, so a metal enclosure is a must.

The data rate is small enough that Teensy could easily continuously encrypt the output using a cryptographically secure algorithm, if you want truly random bits.

If you use the raw HID interface (the one used by USB mice, keyboards, joysticks, gamepads), you are limited to 512000 bits per second (1000 packets, 64 bytes each, per second; more than enough for several hardware bits), but you don't need any drivers. On any OS.

This might be rather interesting project. A kernel driver to feed the random bits from the above hardware USB device to the entropy pool would be very simple, even including some randomness checks on the bits. Or, one could tell the OpenSSL library to obtain initial PRNG state bits from the USB device, for example from a simple userspace application providing it via a pipe (using the HID interface to the USB device).

Quote:

Originally Posted by H_TeXMeX_H (Post 4650856)

Randomness is such a poorly-defined concept. For a probability distribution over a continuous vector space described by a continuous function, given a vector there is usually some neighborhood of vectors nearby that are considered "pretty much the same". With binary data, however, the assumption is that any difference in bit values in essence gives you something completely different. The goal of randomness for binary data is to therefore make all possible combinations of 0s and 1s of a given length seem equally likely. In reality, however, if the bits are independent Bernoulli random variables with p=0.5 then the number of 1s over N bits has a distribution of Binomial(N,0.5). Since the variance is linear wrt N, as the number of bits increases the confidence interval around 0.5*N converges to 0.5*N itself. Since this should also apply to all members of the power set of N bits, that implies that the probability of any discernible order to the bits should decrease as the length of the given subset increases. This obviously isn't possible, since the set containing all of the 1 elements, and all of that set's subsets, are in the power set of N bits. Additionally, there are 2^N elements in the power set, so even for a "modest" sample such as 64 bits it's intractable to verify that the desired randomness exists.

Much like the concept of hashing, where in general there are more sets of data of a given length with a given hash value than there are possible hash values, you can't account for all sources of order in the data. You need to determine the ways that order within the data can be useful to an adversary, which will vary by application, and attempt to make its exploitation disadvantageous. The best way to do that is to obscure how the data is transformed into the "publicly-observable final product", which will prevent inference of the most epistemically-useful ways to attack the data.
Kevin Barry

I suppose I could buy one, but I am reluctant to buy any of those because they don't provide any sample data that it generates.

From the above tests I only trust /dev/random and hotbits (uses geiger counter). Other methods may just produce pseudorandom data.

EDIT: Wrong info removed.

How about using a Xorshift PRNG, and using the random/non-random data to perturb its state continouously?

Here is a very slow example. Compile it, then run with a single parameter describing how many bytes it outputs per input byte. It will consume the first 16 bytes of standard input to prime the Xorshift generator state, then it will regularly use the input to perturb the generator state. Since Xorshift is supposed to pass the diehard tests, I wonder if the output from this is sufficiently random for you?

Note that I'm not at all sure whether this kind of perturbation (directly xor'ing the input into the generator state) makes sense, and I have not checked whether the output is random or not.

If you don't want to perturb the state at all, you can redirect standard input from /dev/zero, in which case the output is the same as the Wikipedia example code, but with 65536 first values discarded.

Code:

#include <unistd.h>

#include <stdint.h>

#include <string.h>

#include <stdio.h>

#include <errno.h>



static uint32_t      xorshift_state[4] = { 0U, 0U, 0U, 0U };



static inline uint32_t xorshift(const uint32_t perturb)

{

    uint32_t    temp;



    temp = xorshift_state[0] ^ (xorshift_state[0] << 11U);

    xorshift_state[0] = xorshift_state[1];

    xorshift_state[1] = xorshift_state[2];

    xorshift_state[2] = xorshift_state[3];

    temp ^= (temp >> 8U) ^ xorshift_state[3] ^ (xorshift_state[3] >> 19U);

    xorshift_state[3] = temp ^ perturb;



    return temp;

}



static uint32_t fgetu32(FILE *const in)

{

    uint32_t    value;

    errno = (fread(&value, sizeof value, 1, in) == 1) ? 0 : ENOENT;

    return value;

}



static void fputu32(uint32_t const value, FILE *const out)

{

    fwrite(&value, sizeof value, 1, out);

}



int main(int argc, char *argv[])

{

    uint32_t    value;

    int        rate, i;

    char        dummy;



    if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {

        fprintf(stderr, "\n"

                        "Usage: %s [-h|--help]\n"

                        "      %s rate\n"

                        "\n"

                        "This reads binary input, emitting\n"

                        "rate pseudorandom bytes for each input byte.\n"

                        "Initial 16 bytes are read for Xorshift state.\n"

                        "\n"

                      , argv[0], argv[0]);

        return 0;

    }



    if (sscanf(argv[1], "%d %c", &rate, &dummy) != 1 || rate < 1) {

        fprintf(stderr, "%s: Invalid rate.\n", argv[1]);

        return 1;

    }



    /* Prime the Xorshift generator from input. */

    xorshift_state[0] = fgetu32(stdin);    

    xorshift_state[1] = fgetu32(stdin);    

    xorshift_state[2] = fgetu32(stdin);    

    xorshift_state[3] = fgetu32(stdin);

    if (errno)

        return 1;



    /* Since a zero state generates zeros, make sure state is nonzero. */

    if (!xorshift_state[0] && !xorshift_state[1] &&

        !xorshift_state[2] && !xorshift_state[3]) {

        /* Use Wikipedia example state. */

        xorshift_state[0] = 123456789U;

        xorshift_state[1] = 362436069U;

        xorshift_state[2] = 521288629U;

        xorshift_state[3] = 88675123U;

    }



    /* Initial state reflects input too much. Skip. */

    for (i = 0; i < 65536; i++)

        (void)xorshift(0U);



    /* Convert rate to the number of unperturbed words per perturbed word. */

    rate--;



    while (1) {



        value = fgetu32(stdin);

        if (errno)

            return 0;



        fputu32(xorshift(value), stdout);



        i = rate;

        while (i-->0)

            fputu32(xorshift(0U), stdout);

    }

}

Edited to add: The above is easily feasible on a Teensy, or just about any ARM microcontroller. The 32-bit shifts may need a bit trickery on 16- or 8-bit microcontrollers -- rotating each 8-bit unit separately. It should be fast enough to provide at least some thousands of random bits per second.

I have finally found a good way to extract random data from a radio signal. After lots of tests I have come to the conclusion that the best way to extract the random data is like this:

1) Use the von Neumann extractor first.
2) Second use the whirlpool hash with 512-bit = 64 byte input (I removed the option to choose other input byte sizes because it doesn't help).
3) Split the file into 1 MB chunks.
4) Check each chunk with a modified version of ent, posted below.
5) Concatenate the chunks that passed into the final file.

Using this method the extraction ratio is 10:1. So for 100 MB input you will get 10 MB of good quality random data out.

So here's the modified version of ent, you'll need the rest of it, this is just ent.c:

Code:

/*

        ENT  --  Entropy calculation and analysis of putative

                random sequences.



        Designed and implemented by John "Random" Walker in May 1985.



        Multiple analyses of random sequences added in December 1985.



        Bit stream analysis added in September 1997.



        Terse mode output, getopt() command line processing,

        optional stdin input, and HTML documentation added in

        October 1998.

        

        Documentation for the -t (terse output) option added

        in July 2006.

        

        Replaced table look-up for chi square to probability

        conversion with algorithmic computation in January 2008.



        For additional information and the latest version,

        see http://www.fourmilab.ch/random/



*/



#include <stdio.h>

#include <string.h>

#include <ctype.h>

#include <math.h>

#ifdef _WIN32

#include <fcntl.h>

#include <io.h>

#else

#include <unistd.h>

#endif



#include "iso8859.h"

#include "randtest.h"



#define UPDATE  "January 28th, 2008"



#define FALSE 0

#define TRUE  1



#ifdef M_PI

#define PI        M_PI

#else

#define PI        3.14159265358979323846

#endif



extern double pochisq(const double ax, const int df);



/*  HELP  --  Print information on how to call        */



static void help(void)

{

        fprintf(stderr,"ent --  Calculate entropy of file.  Call");

        fprintf(stderr,"\n        with ent [options] [input-file]");

        fprintf(stderr,"\n");

        fprintf(stderr,"\n        Options:  -b  Treat input as a stream of bits");

        fprintf(stderr,"\n                  -c  Print occurrence counts");

        fprintf(stderr,"\n                  -f  Fold upper to lower case letters");

        fprintf(stderr,"\n                  -t  Terse output in CSV format");

        fprintf(stderr,"\n                  -u  Print this message\n");

        fprintf(stderr,"\nBy John Walker");

        fprintf(stderr,"\n  http://www.fourmilab.ch/");

        fprintf(stderr,"\n  %s\n", UPDATE);

}



/*  GETOPT  --        Dumb version of getopt for brain-dead Windows.  */



#ifdef _WIN32        

static int optind = 1;



static int getopt(int argc, char *argv[], char *opts)

{

    static char *opp = NULL;

    int o;

    

    while (opp == NULL) 

    {

                if ((optind >= argc) || (*argv[optind] != '-')) 

        {

                        return -1;

                }

                opp = argv[optind] + 1;

                optind++;

                if (*opp == 0) 

                {

                        opp = NULL;

                }        

        }

    o = *opp++;

    if (*opp == 0) 

    {

                opp = NULL;

    }

    return strchr(opts, o) == NULL ? '?' : o;

}

#endif



/*  Main program  */



int main(int argc, char *argv[])

{

        int i, oc, opt;

        long ccount[256];              /* Bins to count occurrences of values */

        long totalc = 0;              /* Total character count */

        char *samp;

        double montepi, chip, scc, ent, mean, chisq, compression, montec;

        FILE *fp = stdin;

        unsigned int overall;

        int counts = FALSE,              /* Print character counts */

                fold = FALSE,              /* Fold upper to lower */

                binary = FALSE,              /* Treat input as a bitstream */

                terse = FALSE;              /* Terse (CSV format) output */



        while ((opt = getopt(argc, argv, "bcftu?BCFTU")) != -1) 

        {

                switch (toISOlower(opt)) 

            {

                        case 'b':

                                binary = TRUE;

                    break;



            case 'c':

                                counts = TRUE;

                    break;



            case 'f':

                                fold = TRUE;

                    break;



            case 't':

                                terse = TRUE;

                    break;



            case '?':

            case 'u':

                                help();

                    return 0;

            }

        }

        if (optind < argc) 

        {

          if (optind != (argc - 1)) 

          {

                        fprintf(stderr,"Duplicate file name.\n");

                        help();

                        return 2;

          }

      if ((fp = fopen(argv[optind], "rb")) == NULL) 

      {

                        fprintf(stderr,"Cannot open file %s\n", argv[optind]);

                        return 2;

          }

        }

        

#ifdef _WIN32



            /** Warning!  On systems which distinguish text mode and

                binary I/O (MS-DOS, Macintosh, etc.) the modes in the open

                statement for "fp" should have forced the input file into

                binary mode.  But what if we're reading from standard

                input?  Well, then we need to do a system-specific tweak

                to make sure it's in binary mode.  While we're at it,

                let's set the mode to binary regardless of however fopen

                set it.



                The following code, conditional on _WIN32, sets binary

                mode using the method prescribed by Microsoft Visual C 7.0

                ("Monkey C"); this may require modification if you're

                using a different compiler or release of Monkey C.        If

                you're porting this code to a different system which

                distinguishes text and binary files, you'll need to add

                the equivalent call for that system. */



            _setmode(_fileno(fp), _O_BINARY);

#endif



        samp = binary ? "bit" : "byte";

        memset(ccount, 0, sizeof ccount);



        /* Initialise for calculations */



        rt_init(binary);



        /* Scan input file and count character occurrences */



        while ((oc = fgetc(fp)) != EOF) 

        {

          unsigned char ocb;



          if (fold && isISOalpha(oc) && isISOupper(oc)) 

          {

                        oc = toISOlower(oc);

          }

          ocb = (unsigned char) oc;

          totalc += binary ? 8 : 1;

          if (binary) 

          {

                        int b;

                        unsigned char ob = ocb;



                        for (b = 0; b < 8; b++) 

                        {

                        ccount[ob & 1]++;

                        ob >>= 1;

                        }

                } 

                else 

                {

                        ccount[ocb]++;              /* Update counter for this bin */

                }

                rt_add(&ocb, 1);

        }

        fclose(fp);



        /* Complete calculation and return sequence metrics */



        rt_end(&ent, &chisq, &mean, &montepi, &scc);



        /* Calculate probability of observed distribution occurring from

          the results of the Chi-Square test */



    chip = pochisq(chisq, (binary ? 1 : 255));



        /* Print bin counts if requested */



        if (counts) 

        {

                if (terse) 

                {

                        fprintf(stderr,"2,Value,Occurrences,Fraction\n");

                } 

                else 

                {

                        fprintf(stderr,"Value Char Occurrences Fraction\n");

                }

                for (i = 0; i < (binary ? 2 : 256); i++) 

                {

                        if (terse) 

                        {

                                fprintf(stderr,"3,%d,%ld,%f\n", i, ccount[i], ((double) ccount[i] / totalc));

                        } 

                        else 

                        {

                                if (ccount[i] > 0) 

                                {

                                        fprintf(stderr,"%3d  %c  %10ld  %f\n", i, (!isISOprint(i) || isISOspace(i)) ? ' ' : i, ccount[i], ((double) ccount[i] / totalc));

                                }

                        }

                }

                if (!terse) 

                {

              fprintf(stderr,"\nTotal:    %10ld  %f\n\n", totalc, 1.0);

                }

        }



        /* Print calculated results */

        overall = 1;

        

        compression = ((100 * ((binary ? 1 : 8) - ent) / (binary ? 1.0 : 8.0)));

        montec = 100.0 * (fabs(PI - montepi) / PI);



        if (!terse) 

        {

                if (ent > (binary ? 0.999999 : 7.9998))

                {

                        fprintf(stderr,"PASS ");

                }

                else

                {

                        fprintf(stderr,"FAIL ");

                        overall = 0;

                }

                fprintf(stderr,"Entropy = %f bits/%s\n", ent, samp);

                if (compression < (binary ? 0.000025 : 0.0025))

                {

                        fprintf(stderr,"PASS ");

                }

                else

                {

                        fprintf(stderr,"FAIL ");

                        overall = 0;

                }

                fprintf(stderr,"Compression = %f %%\n", compression);

                if (chip < 0.05 || chip > 0.95) 

                {

                        fprintf(stderr,"FAIL ");

                        overall = 0;

                } 

                else 

                {

                        fprintf(stderr,"PASS ");

                }

                fprintf(stderr,"Chi-exceed = %f %%\n", chip * 100);

                if (mean < (binary ? 0.4997 : 127.3) || mean > (binary ? 0.5003 : 127.7))

                {

                        fprintf(stderr,"FAIL ");

                        overall = 0;

                }

                else

                {

                        fprintf(stderr,"PASS ");

                }

                fprintf(stderr,"Mean = %f\n", mean);

                if (montec > 0.3)

                {

                        fprintf(stderr,"FAIL ");

                        overall = 0;

                }

                else

                {

                        fprintf(stderr,"PASS ");

                }

                fprintf(stderr,"Monte-Carlo = %f %%\n", montec);

                if (sqrt(pow(scc,2)) > (binary ? 0.001 : 0.002))

                {

                        fprintf(stderr,"FAIL ");

                        overall = 0;

                }

                else

                {

                        fprintf(stderr,"PASS ");

                }

                fprintf(stderr,"SCC = %f\n", scc);

                if (0 == overall)

                {

                        fprintf(stderr,"\nFAIL overall\n");

                        return 1;

                }

                else

                {

                        fprintf(stderr,"\nPASS overall\n");

                }

        }

        else

        {

                fprintf(stderr, "Entropy,Compression,Chi-exceed,Mean,Monte-Carlo,SCC\n");

                fprintf(stderr, "%f,%f,%f,%f,%f,%f\n", ent, compression, chip * 100, mean, montec, scc);

        }

        return 0;

}

And the script to extract randomness:

Code:

#!/bin/sh

# extract randomness in file

# depends on vne, whash, ent (modified version), split, etc.



temp=/tmp/randextr



if test $# != 2

then

        echo "Usage: $(basename $0) infile outfile"

        exit 1

fi



if test -e "$2"

then

        echo "$2 exists"

        exit 1

fi



current="$(pwd)"

mkdir "$temp" || exit 1



# start extraction



vne < "$1" | whash > "$temp/vh"

cd "$temp"

if split -d -b 1M -a 5 vh

then

        for i in x*

        do

                echo "$i"

                if ent "$i"

                then

                        cat "$i" >> "$current/$2"

                fi

        done

else

        echo "Split failed"

        exit 1

fi



rm -rf "$temp"



exit 0

I can get radio data at 44.0 kB/s, so that means I can get about 4 kB/s of good quality random data.

Here's an example of data quality:

Code:

bash-4.1$ ent compiled 

PASS Entropy = 7.999979 bits/byte

PASS Compression = 0.000256 %

PASS Chi exceed = 76.254451 %

PASS Mean = 127.489548

PASS Monte Carlo = 0.014112 %

PASS SCC = 0.000082



PASS overall

this is for an 8 MB file.

I have an even better solution now, in fact it is exactly what I wanted. I edited ENT some more and now I can make a pipe that outputs random data and only random data (as checked by ent). I had to make all printfs output to stderr, which should be fine, and stdout will output only random data in 1MiB chunks (edit this in the source, but you must take into account the changes that must be made to the threshold parameters, because I have found that using ent on small files in unreliable, and that 1MiB is a reasonable size to check).

The modified ent code (I call it entpipe):

Code:

/*

        ENT  --  Entropy calculation and analysis of putative

                random sequences.



        Designed and implemented by John "Random" Walker in May 1985.



        Multiple analyses of random sequences added in December 1985.



        Bit stream analysis added in September 1997.



        Terse mode output, getopt() command line processing,

        optional stdin input, and HTML documentation added in

        October 1998.

        

        Documentation for the -t (terse output) option added

        in July 2006.

        

        Replaced table look-up for chi square to probability

        conversion with algorithmic computation in January 2008.



        For additional information and the latest version,

        see http://www.fourmilab.ch/random/



*/



#include <stdio.h>

#include <string.h>

#include <ctype.h>

#include <math.h>

#include <stdlib.h>

#ifdef _WIN32

#include <fcntl.h>

#include <io.h>

#else

#include <unistd.h>

#endif



#include "iso8859.h"

#include "randtest.h"



#define UPDATE  "January 28th, 2008"



#define FALSE 0

#define TRUE  1



#ifdef M_PI

#define PI        M_PI

#else

#define PI        3.14159265358979323846

#endif



#define CHUNK 1024*1024



extern double pochisq(const double ax, const int df);



/*  HELP  --  Print information on how to call        */



static void help(void)

{

        fprintf(stderr,"ent --  Calculate entropy of file.  Call");

        fprintf(stderr,"\n        with ent [options] [input-file]");

        fprintf(stderr,"\n");

        fprintf(stderr,"\n        Options:  -b  Treat input as a stream of bits");

        fprintf(stderr,"\n                  -c  Print occurrence counts");

        fprintf(stderr,"\n                  -f  Fold upper to lower case letters");

        fprintf(stderr,"\n                  -t  Terse output in CSV format");

        fprintf(stderr,"\n                  -u  Print this message\n");

        fprintf(stderr,"\nBy John Walker");

        fprintf(stderr,"\n  http://www.fourmilab.ch/");

        fprintf(stderr,"\n  %s\n", UPDATE);

}



/*  GETOPT  --        Dumb version of getopt for brain-dead Windows.  */



#ifdef _WIN32        

static int optind = 1;



static int getopt(int argc, char *argv[], char *opts)

{

    static char *opp = NULL;

    int o;

    

    while (opp == NULL) 

    {

                if ((optind >= argc) || (*argv[optind] != '-')) 

        {

                        return -1;

                }

                opp = argv[optind] + 1;

                optind++;

                if (*opp == 0) 

                {

                        opp = NULL;

                }        

        }

    o = *opp++;

    if (*opp == 0) 

    {

                opp = NULL;

    }

    return strchr(opts, o) == NULL ? '?' : o;

}

#endif



/*  Main program  */



int main(int argc, char *argv[])

{

        // ! init

        int oc, opt;

        long ccount[256];              /* Bins to count occurrences of values */

        long totalc = 0;              /* Total character count */

        char *samp;

        double montepi, chip, scc, ent, mean, chisq, compression, montec;

        unsigned int i, overall;

        

        // init

        FILE *fp = stdin;

        unsigned char * buffer;

        unsigned int overglobal = 1;

        int counts = FALSE,              /* Print character counts */

                fold = FALSE,              /* Fold upper to lower */

                binary = FALSE,              /* Treat input as a bitstream */

                terse = FALSE;              /* Terse (CSV format) output */



        // parse argv

        while ((opt = getopt(argc, argv, "bcftu?BCFTU")) != -1) 

        {

                switch (toISOlower(opt)) 

            {

                        case 'b':

                                binary = TRUE;

                    break;



            case 'c':

                                counts = TRUE;

                    break;



            case 'f':

                                fold = TRUE;

                    break;



            case 't':

                                terse = TRUE;

                    break;



            case '?':

            case 'u':

                                help();

                    return 0;

            }

        }

        if (optind < argc) 

        {

          if (optind != (argc - 1)) 

          {

                        fprintf(stderr,"Duplicate file name.\n");

                        help();

                        return 2;

          }

      if ((fp = fopen(argv[optind], "rb")) == NULL) 

      {

                        fprintf(stderr,"Cannot open file %s\n", argv[optind]);

                        return 2;

          }

        }

        

#ifdef _WIN32



            /** Warning!  On systems which distinguish text mode and

                binary I/O (MS-DOS, Macintosh, etc.) the modes in the open

                statement for "fp" should have forced the input file into

                binary mode.  But what if we're reading from standard

                input?  Well, then we need to do a system-specific tweak

                to make sure it's in binary mode.  While we're at it,

                let's set the mode to binary regardless of however fopen

                set it.



                The following code, conditional on _WIN32, sets binary

                mode using the method prescribed by Microsoft Visual C 7.0

                ("Monkey C"); this may require modification if you're

                using a different compiler or release of Monkey C.        If

                you're porting this code to a different system which

                distinguishes text and binary files, you'll need to add

                the equivalent call for that system. */



            _setmode(_fileno(fp), _O_BINARY);

#endif



        samp = binary ? "bit" : "byte";

        

        // malloc

        buffer = (unsigned char*) malloc (CHUNK);

        if ( buffer == NULL ) 

        { 

                fprintf(stderr, "ERROR: not enough memory !\n"); 

                exit(2); 

        }

        

        // terse header

        if (terse) 

        {

                fprintf(stderr, "Entropy,Compression,Chi-exceed,Mean,Monte-Carlo,SCC\n");

        }

        

        // parse

        while (fread(buffer, CHUNK, 1, fp))

        {

                memset(ccount, 0, sizeof ccount);



                /* Initialise for calculations */



                rt_init(binary);



                /* Scan input file and count character occurrences */



                for (i=0; i < CHUNK; i++)

                {

                        oc = buffer[i];

                

                        unsigned char ocb;



                        if (fold && isISOalpha(oc) && isISOupper(oc)) 

                        {

                                oc = toISOlower(oc);

                        }

                        ocb = (unsigned char) oc;

                        totalc += binary ? 8 : 1;

                        if (binary) 

                        {

                                int b;

                                unsigned char ob = ocb;



                                for (b = 0; b < 8; b++) 

                                {

                                        ccount[ob & 1]++;

                                        ob >>= 1;

                                }

                        } 

                        else 

                        {

                                ccount[ocb]++;              /* Update counter for this bin */

                        }

                        rt_add(&ocb, 1);

                }



                /* Complete calculation and return sequence metrics */



                rt_end(&ent, &chisq, &mean, &montepi, &scc);



                /* Calculate probability of observed distribution occurring from

                the results of the Chi-Square test */



                chip = pochisq(chisq, (binary ? 1 : 255));



                /* Print bin counts if requested */



                if (counts) 

                {

                        if (terse) 

                        {

                                fprintf(stderr,"2,Value,Occurrences,Fraction\n");

                        } 

                        else 

                        {

                                fprintf(stderr,"Value Char Occurrences Fraction\n");

                        }

                        for (i = 0; i < (binary ? 2 : 256); i++) 

                        {

                                if (terse) 

                                {

                                        fprintf(stderr,"3,%d,%ld,%f\n", i, ccount[i], ((double) ccount[i] / totalc));

                                } 

                                else 

                                {

                                        if (ccount[i] > 0) 

                                        {

                                                fprintf(stderr,"%3d  %c  %10ld  %f\n", i, (!isISOprint(i) || isISOspace(i)) ? ' ' : i, ccount[i], ((double) ccount[i] / totalc));

                                        }

                                }

                        }

                        if (!terse) 

                        {

                                fprintf(stderr,"\nTotal:    %10ld  %f\n\n", totalc, 1.0);

                        }

                }



                /* Print calculated results */

                overall = 1;

                compression = ((100 * ((binary ? 1 : 8) - ent) / (binary ? 1.0 : 8.0)));

                montec = 100.0 * (fabs(PI - montepi) / PI);



                if (!terse) 

                {

                        if (ent > (binary ? 0.999999 : 7.9998))

                        {

                                fprintf(stderr,"PASS ");

                        }

                        else

                        {

                                fprintf(stderr,"FAIL ");

                                overall = 0;

                        }

                        fprintf(stderr,"Entropy = %f bits/%s\n", ent, samp);

                        if (compression < (binary ? 0.000025 : 0.0025))

                        {

                                fprintf(stderr,"PASS ");

                        }

                        else

                        {

                                fprintf(stderr,"FAIL ");

                                overall = 0;

                        }

                        fprintf(stderr,"Compression = %f %%\n", compression);

                        if (chip < 0.05 || chip > 0.95) 

                        {

                                fprintf(stderr,"FAIL ");

                                overall = 0;

                        } 

                        else 

                        {

                                fprintf(stderr,"PASS ");

                        }

                        fprintf(stderr,"Chi-exceed = %f %%\n", chip * 100);

                        if (mean < (binary ? 0.4997 : 127.3) || mean > (binary ? 0.5003 : 127.7))

                        {

                                fprintf(stderr,"FAIL ");

                                overall = 0;

                        }

                        else

                        {

                                fprintf(stderr,"PASS ");

                        }

                        fprintf(stderr,"Mean = %f\n", mean);

                        if (montec > 0.3)

                        {

                                fprintf(stderr,"FAIL ");

                                overall = 0;

                        }

                        else

                        {

                                fprintf(stderr,"PASS ");

                        }

                        fprintf(stderr,"Monte-Carlo = %f %%\n", montec);

                        if (sqrt(pow(scc,2)) > (binary ? 0.001 : 0.002))

                        {

                                fprintf(stderr,"FAIL ");

                                overall = 0;

                        }

                        else

                        {

                                fprintf(stderr,"PASS ");

                        }

                        fprintf(stderr,"SCC = %f\n", scc);

                        if (0 == overall)

                        {

                                fprintf(stderr,"\nFAIL overall\n");

                                overglobal = 0;

                        }

                        else

                        {

                                fprintf(stderr,"\nPASS overall\n");

                                fwrite(buffer, CHUNK, 1, stdout);

                        }

                }

                else

                {

                        fprintf(stderr, "%f,%f,%f,%f,%f,%f\n", ent, compression, chip * 100, mean, montec, scc);

                }

        }

        if (!terse)

        {

                if (0 == overglobal)

                {

                        fprintf(stderr,"\nFAIL globally\n");

                        return 1;

                }

                else

                {

                        fprintf(stderr,"\nPASS globally\n");

                }

        }

        return 0;

}

Do NOT use the terse option if you want it to check the parameters. I have fixed the terse option to print in CSV.

This is a script to run it all:

Code:

#!/bin/sh



arecord -c 1 -r 48000 -t raw | vne | whash | entpipe -b | entpipe



exit 0

And here is a larger file that I produced earlier today, it is 88 MiB:

Code:

bash-4.1$ ent random 

PASS Entropy = 7.999999 bits/byte

PASS Compression = 0.000017 %

PASS Chi exceed = 86.332861 %

PASS Mean = 127.502506

PASS Monte Carlo = 0.019739 %

PASS SCC = -0.000104



PASS overall

I would say that is very very good random data :)

I've also had great fun programming in C. This is probably the most I've done since high school. I'm just glad Linux seg faults instead of crashing like the only Window$ computers used to.