counting in python

sfzombie13 · 01-09-2015, 11:33 PM

i have a problem, actually, it is research that i am conducting. i want to use python (because i am trying to learn it) to count numbers, from 0xffffffffffff to 0xffffffffffffffffffff and print them in a list, in order, in hex only (1A4D instead of 0x1A4D) with capital letters to a text file, separated by commas. it does not matter how big the file is going to be, as long as they are separated by commas (or anything really, they are going to be used as input for another program).

i have some code that finally conts, then prints the numbers to a file, but it prints them in decimal, not hex and i can't figure out how to get the separators in. some have told me that it cannot work due to being too much data, soething like millions of years to calculate or using 800 8 terabyte hard drives for everyone on earth to store the file. now, i am not a mathematecian, but when i take 12 f's from 20 f's, that leaves 8 f's which convert to just under 4.3 billion numbers. i don't know how many it could write to the file per second, but it would have to be at least 5 to be reasonable. that would be (by my math) about 27 years or so. not millions, but by using more than one computer, it should be feasible. anyway, here is the code i have so far:

Code:

#!/usr/bin/python

#count in hex from 12 f's to 20 f's and write to comma 
#delimited file to create a dictionary

def count_hex():
    x = 0xffffffffffff
    while x <= 0xffffffffffffffffffff:
        x += 0x1
        s = str(x)
        s.upper()
        with open("dictionary.txt", "a") as diction:
            diction.write(s)

count_hex()

any help is appreciated. i just need to get the code to work, i may be able to get access to a cluster for the actual work, if the file is not too big, more than a few terabytes.

astrogeek · 01-10-2015, 12:08 AM

Quote:

Originally Posted by sfzombie13

i have some code that finally conts, then prints the numbers to a file, but it prints them in decimal, not hex and i can't figure out how to get the separators in. some have told me that it cannot work due to being too much data, soething like millions of years to calculate or using 800 8 terabyte hard drives for everyone on earth to store the file. now, i am not a mathematecian, but when i take 12 f's from 20 f's, that leaves 8 f's which convert to just under 4.3 billion numbers. i don't know how many it could write to the file per second, but it would have to be at least 5 to be reasonable. that would be (by my math) about 27 years or so. not millions, but by using more than one computer, it should be feasible. anyway, here is the code i have so far:

I think your math is not correct...

20 f's = 2^80 = 1.20892581961 x10^24
12 f's = 2^48 = 2.81474976711 x10^14

The difference (i.e. the numbers that you want to count) = 1.20892581933 x10^24

Note that these only differ by the last two places at this precision...

By my math, at 5 per second, that is 7.6617... x10^15 years... roughly 10^6 times the age of the universe...

If you stored them as 10-byte unsigned integer values with no physical separator byte that becomes 1.20892581933 x10^25 bytes of storage...

I don't know how many cores your processor has, or what your hard drive capacity is, but I suspect you may be a wee bit optimistic!

Done in haste, but I think this is correct within the precision given!

dugan · 01-11-2015, 10:58 PM

Code:

def count_hex():
    x = 0xffffffffffff
    while x <= 0xffffffffffffffffffff:
        x += 0x1
        with open("dictionary.txt", "a") as diction:
            diction.write(hex(x))
            diction.write(',')

sfzombie13 · 01-19-2015, 09:31 AM

astrogeek: i was counting the actual decimal number difference. i took 20 f's and put them into a hex converter and came up with the 4.3 odd billion in decimal. when i took a number of that size and put it into a gedit file, it was 32 bytes. i then added another number of equal length and came up with 64 bytes. then i deduced that it takes around 32 bytes to store a number of the largest value, and this is where i stopped. i need to sit down and finish the equation, but ran out of time. however, simply the fact that i can put the digits into a document and save them, shows me that it is possible for the computer to handle the size, and possibly save them. i know there is a disconnect between theory and practice, i just don't know where it lies.

dugan: thanx for the help, i will try this later.

pan64 · 01-19-2015, 10:45 AM

just a comment, and probably I'm wrong, but this structure:

Code:

        with open("dictionary.txt", "a") as diction:
            diction.write(hex(x))
            diction.write(',')

will open and close filehandle for every and each number you want to write, which is a huge overhead, that will cause a much longer execution. But it is not really important because 10^15 or 10^16 years is exactly the same for me (not to speak about the life of the hardware you use).

ntubski · 01-19-2015, 10:49 AM

Quote:

Originally Posted by sfzombie13

i took 20 f's and put them into a hex converter and came up with the 4.3 odd billion in decimal.

Maybe your hex converter uses 32 bit numbers internally so it only goes up to 2^32 - 1 = 4 294 967 295 = 0xFFFF FFFF.

Quote:

when i took a number of that size and put it into a gedit file, it was 32 bytes.

That's counting the size of the text (raw will be smaller) representation in decimal (hex will be smaller) of a number, it also includes any whitespace you added by e.g. pressing <enter>.

SoftSprocket · 01-19-2015, 11:59 AM

My calculations put you over 300 TB of data (ascii) tops.The limiting factor will be hd speed. I think a fast hard drive these days is likely 200 MB/sec. or 0.0002 TB. On a system that could manage that size of data and with those numbers I come up with under 18 days.

i.e
This many numbers:

Code:

>>> 0xffffffffffffffffffff - 0xffffffffffff
1208925819333154197995520L

Largest number and bytes required for it:

Code:

>>> 0xffffffffffffffffffff
1208925819614629174706175L
>>> 1208925819333154197995520L * 25
30223145483328854949888000L

That looks like 300 TB to me.

I'm don't see the advantage of writing them to load them vs. generating them when you need them.

astrogeek · 01-19-2015, 03:06 PM

Quote:

Originally Posted by sfzombie13

astrogeek: i was counting the actual decimal number difference. i took 20 f's and put them into a hex converter and came up with the 4.3 odd billion in decimal. when i took a number of that size and put it into a gedit file, it was 32 bytes. i then added another number of equal length and came up with 64 bytes. then i deduced that it takes around 32 bytes to store a number of the largest value, and this is where i stopped. i need to sit down and finish the equation, but ran out of time. however, simply the fact that i can put the digits into a document and save them, shows me that it is possible for the computer to handle the size, and possibly save them. i know there is a disconnect between theory and practice, i just don't know where it lies.

The main disconnect here is that your first calculation is not correct! Your hex converter lied to you! Everything that followed was wrong!

20 f's is NOT 4.3 odd billion in decimal! 4.3 odd billion is the limit of 32-bit unsigned integers, so your hex converter simply truncated your 20 f's down to ffff ffff and did not tell you it was doing so! It was probably written by someone who also ignored the importance of the math!

ffff ffff ffff ffff ffff = 1,208,925,819,614,629,174,706,175 or...
1 septillion, 208 sextillion, 925 quintillion, 819 quadrillion, 614 trillion, 629 billion, 174 million, 706 thousand, 175.

But in fairness to your hex converter, did your math teacher not teach you to check your results?! This is an obvious error on the order of 10^15 overflow! If you aspire to be a researcher then these things should be obvious to you!

Quote:

Originally Posted by SoftSprocket

My calculations put you over 300 TB of data (ascii) tops...

Code:

>>> 0xffffffffffffffffffff
1208925819614629174706175L
>>> 1208925819333154197995520L * 25
30223145483328854949888000L

That looks like 300 TB to me.

Your number may be correct, but your "tera" is in the wrong place!

30,223,145,483,328,854,949,888,000
..looks like
30,223,145,483,328 TB to me!

That is 30+ Tera-Tera-Bytes!

This is not rocket science, this is basic math, and it is basic computer math to boot (pun accidental)!

I re-read the original problem conditions to be sure I had understood it correctly the first time, and I think I did...

Quote:

Originally Posted by sfzombie13

i have a problem, actually, it is research that i am conducting. i want to use python (because i am trying to learn it) to count numbers, from 0xffffffffffff to 0xffffffffffffffffffff and print them in a list, in order, in hex only (1A4D instead of 0x1A4D) with capital letters to a text file, separated by commas. it does not matter how big the file is going to be, as long as they are separated by commas (or anything really, they are going to be used as input for another program).

So you want to count from ffff ffff ffff to ffff ffff ffff ffff ffff, and store those numbers as comma separated text representation of hexadecimal values, to a text file.

So the number of numbers you want to count is the difference between those two, which is:

Code:

ffff ffff ffff ffff ffff - ffff ffff ffff = ffff ffff 0000 0000 0000 = 1,208,925,819,333,154,197,995,520

You can store those more compactly, but let's use your original requirement to store them as comma separated ascii representations of hexadecimal values and use SoftSprockets value of 25 bytes each.

We can see from our calculation above that this requires 30+ Tera-Tera-Bytes of storage.

Now, how long does it take?

Let's stay with your original guess of 5 numbers per second, which is...

Code:

1,208,925,819,333,154,197,995,520/5 = 241,785,163,866,000,000,000,000 seconds (with some rounding error on the low end)

31,557,600 seconds in a year, gives...

7,661,709,504,720,000 years

The universe is generally accepted to be 13,700,000,000 years old.

So with 30+ Tera-Tera-Bytes of storage and approximately a million times the age of the universe, you should be good to go!

See my first post...

SoftSprocket · 01-19-2015, 03:26 PM

Quote:

Originally Posted by astrogeek

Your number may be correct, but your "tera" is in the wrong place!

30,223,145,483,328,854,949,888,000
..looks like
30,223,145,483,328 TB to me!

That is 30+ Tera-Tera-Bytes!

Let's stay with your original guess of 5 numbers per second, which is...

Code:

1,208,925,819,333,154,197,995,520/5 = 241,785,163,866,000,000,000,000 seconds (with some rounding error on the low end)

31,557,600 seconds in a year, gives...

7,661,709,504,720,000 years

The universe is generally accepted to be 13,700,000,000 years old.

So with 30+ Tera-Tera-Bytes of storage and approximately a million times the age of the universe, you should be good to go!

See my first post...

Quite so -I checked my numbers again:
30223145483328 TB

or ~16,000,000 years of disk writing at current technology.

5 per second might have been correct for punch cards but a modern hard drive writes at a considerably faster rate ... not that it will help.