LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-09-2015, 11:33 PM   #1
sfzombie13
Member
 
Registered: Dec 2003
Location: wv
Distribution: slackware, lfs, kali, pentoo, centos
Posts: 168

Rep: Reputation: 18
counting in python


i have a problem, actually, it is research that i am conducting. i want to use python (because i am trying to learn it) to count numbers, from 0xffffffffffff to 0xffffffffffffffffffff and print them in a list, in order, in hex only (1A4D instead of 0x1A4D) with capital letters to a text file, separated by commas. it does not matter how big the file is going to be, as long as they are separated by commas (or anything really, they are going to be used as input for another program).

i have some code that finally conts, then prints the numbers to a file, but it prints them in decimal, not hex and i can't figure out how to get the separators in. some have told me that it cannot work due to being too much data, soething like millions of years to calculate or using 800 8 terabyte hard drives for everyone on earth to store the file. now, i am not a mathematecian, but when i take 12 f's from 20 f's, that leaves 8 f's which convert to just under 4.3 billion numbers. i don't know how many it could write to the file per second, but it would have to be at least 5 to be reasonable. that would be (by my math) about 27 years or so. not millions, but by using more than one computer, it should be feasible. anyway, here is the code i have so far:
Code:
#!/usr/bin/python

#count in hex from 12 f's to 20 f's and write to comma 
#delimited file to create a dictionary

def count_hex():
    x = 0xffffffffffff
    while x <= 0xffffffffffffffffffff:
        x += 0x1
        s = str(x)
        s.upper()
        with open("dictionary.txt", "a") as diction:
            diction.write(s)

count_hex()
any help is appreciated. i just need to get the code to work, i may be able to get access to a cluster for the actual work, if the file is not too big, more than a few terabytes.

Last edited by sfzombie13; 01-09-2015 at 11:34 PM. Reason: typo
 
Old 01-10-2015, 12:08 AM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by sfzombie13 View Post
i have some code that finally conts, then prints the numbers to a file, but it prints them in decimal, not hex and i can't figure out how to get the separators in. some have told me that it cannot work due to being too much data, soething like millions of years to calculate or using 800 8 terabyte hard drives for everyone on earth to store the file. now, i am not a mathematecian, but when i take 12 f's from 20 f's, that leaves 8 f's which convert to just under 4.3 billion numbers. i don't know how many it could write to the file per second, but it would have to be at least 5 to be reasonable. that would be (by my math) about 27 years or so. not millions, but by using more than one computer, it should be feasible. anyway, here is the code i have so far:
I think your math is not correct...

20 f's = 2^80 = 1.20892581961 x10^24
12 f's = 2^48 = 2.81474976711 x10^14

The difference (i.e. the numbers that you want to count) = 1.20892581933 x10^24

Note that these only differ by the last two places at this precision...

By my math, at 5 per second, that is 7.6617... x10^15 years... roughly 10^6 times the age of the universe...

If you stored them as 10-byte unsigned integer values with no physical separator byte that becomes 1.20892581933 x10^25 bytes of storage...

I don't know how many cores your processor has, or what your hard drive capacity is, but I suspect you may be a wee bit optimistic!

Done in haste, but I think this is correct within the precision given!

Last edited by astrogeek; 01-10-2015 at 12:26 AM. Reason: typs, tpos, typos...
 
1 members found this post helpful.
Old 01-11-2015, 10:58 PM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,222

Rep: Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319Reputation: 5319
Code:
def count_hex():
    x = 0xffffffffffff
    while x <= 0xffffffffffffffffffff:
        x += 0x1
        with open("dictionary.txt", "a") as diction:
            diction.write(hex(x))
            diction.write(',')
 
Old 01-19-2015, 09:31 AM   #4
sfzombie13
Member
 
Registered: Dec 2003
Location: wv
Distribution: slackware, lfs, kali, pentoo, centos
Posts: 168

Original Poster
Rep: Reputation: 18
astrogeek: i was counting the actual decimal number difference. i took 20 f's and put them into a hex converter and came up with the 4.3 odd billion in decimal. when i took a number of that size and put it into a gedit file, it was 32 bytes. i then added another number of equal length and came up with 64 bytes. then i deduced that it takes around 32 bytes to store a number of the largest value, and this is where i stopped. i need to sit down and finish the equation, but ran out of time. however, simply the fact that i can put the digits into a document and save them, shows me that it is possible for the computer to handle the size, and possibly save them. i know there is a disconnect between theory and practice, i just don't know where it lies.

dugan: thanx for the help, i will try this later.
 
Old 01-19-2015, 10:45 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,830

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
just a comment, and probably I'm wrong, but this structure:
Code:
        with open("dictionary.txt", "a") as diction:
            diction.write(hex(x))
            diction.write(',')
will open and close filehandle for every and each number you want to write, which is a huge overhead, that will cause a much longer execution. But it is not really important because 10^15 or 10^16 years is exactly the same for me (not to speak about the life of the hardware you use).
 
Old 01-19-2015, 10:49 AM   #6
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by sfzombie13 View Post
i took 20 f's and put them into a hex converter and came up with the 4.3 odd billion in decimal.
Maybe your hex converter uses 32 bit numbers internally so it only goes up to 2^32 - 1 = 4 294 967 295 = 0xFFFF FFFF.

Quote:
when i took a number of that size and put it into a gedit file, it was 32 bytes.
That's counting the size of the text (raw will be smaller) representation in decimal (hex will be smaller) of a number, it also includes any whitespace you added by e.g. pressing <enter>.
 
1 members found this post helpful.
Old 01-19-2015, 11:59 AM   #7
SoftSprocket
Member
 
Registered: Nov 2014
Posts: 399

Rep: Reputation: Disabled
My calculations put you over 300 TB of data (ascii) tops.The limiting factor will be hd speed. I think a fast hard drive these days is likely 200 MB/sec. or 0.0002 TB. On a system that could manage that size of data and with those numbers I come up with under 18 days.

i.e
This many numbers:
Code:
>>> 0xffffffffffffffffffff - 0xffffffffffff
1208925819333154197995520L
Largest number and bytes required for it:
Code:
>>> 0xffffffffffffffffffff
1208925819614629174706175L
>>> 1208925819333154197995520L * 25
30223145483328854949888000L
That looks like 300 TB to me.

I'm don't see the advantage of writing them to load them vs. generating them when you need them.
 
Old 01-19-2015, 03:06 PM   #8
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by sfzombie13 View Post
astrogeek: i was counting the actual decimal number difference. i took 20 f's and put them into a hex converter and came up with the 4.3 odd billion in decimal. when i took a number of that size and put it into a gedit file, it was 32 bytes. i then added another number of equal length and came up with 64 bytes. then i deduced that it takes around 32 bytes to store a number of the largest value, and this is where i stopped. i need to sit down and finish the equation, but ran out of time. however, simply the fact that i can put the digits into a document and save them, shows me that it is possible for the computer to handle the size, and possibly save them. i know there is a disconnect between theory and practice, i just don't know where it lies.
The main disconnect here is that your first calculation is not correct! Your hex converter lied to you! Everything that followed was wrong!

20 f's is NOT 4.3 odd billion in decimal! 4.3 odd billion is the limit of 32-bit unsigned integers, so your hex converter simply truncated your 20 f's down to ffff ffff and did not tell you it was doing so! It was probably written by someone who also ignored the importance of the math!

ffff ffff ffff ffff ffff = 1,208,925,819,614,629,174,706,175 or...
1 septillion, 208 sextillion, 925 quintillion, 819 quadrillion, 614 trillion, 629 billion, 174 million, 706 thousand, 175.

But in fairness to your hex converter, did your math teacher not teach you to check your results?! This is an obvious error on the order of 10^15 overflow! If you aspire to be a researcher then these things should be obvious to you!

Quote:
Originally Posted by SoftSprocket View Post
My calculations put you over 300 TB of data (ascii) tops...
Code:
>>> 0xffffffffffffffffffff
1208925819614629174706175L
>>> 1208925819333154197995520L * 25
30223145483328854949888000L
That looks like 300 TB to me.
Your number may be correct, but your "tera" is in the wrong place!

30,223,145,483,328,854,949,888,000
..looks like
30,223,145,483,328 TB to me!

That is 30+ Tera-Tera-Bytes!

This is not rocket science, this is basic math, and it is basic computer math to boot (pun accidental)!

I re-read the original problem conditions to be sure I had understood it correctly the first time, and I think I did...

Quote:
Originally Posted by sfzombie13 View Post
i have a problem, actually, it is research that i am conducting. i want to use python (because i am trying to learn it) to count numbers, from 0xffffffffffff to 0xffffffffffffffffffff and print them in a list, in order, in hex only (1A4D instead of 0x1A4D) with capital letters to a text file, separated by commas. it does not matter how big the file is going to be, as long as they are separated by commas (or anything really, they are going to be used as input for another program).
So you want to count from ffff ffff ffff to ffff ffff ffff ffff ffff, and store those numbers as comma separated text representation of hexadecimal values, to a text file.

So the number of numbers you want to count is the difference between those two, which is:

Code:
ffff ffff ffff ffff ffff - ffff ffff ffff = ffff ffff 0000 0000 0000 = 1,208,925,819,333,154,197,995,520
You can store those more compactly, but let's use your original requirement to store them as comma separated ascii representations of hexadecimal values and use SoftSprockets value of 25 bytes each.

We can see from our calculation above that this requires 30+ Tera-Tera-Bytes of storage.

Now, how long does it take?

Let's stay with your original guess of 5 numbers per second, which is...

Code:
1,208,925,819,333,154,197,995,520/5 = 241,785,163,866,000,000,000,000 seconds (with some rounding error on the low end)

31,557,600 seconds in a year, gives...

7,661,709,504,720,000 years
The universe is generally accepted to be 13,700,000,000 years old.

So with 30+ Tera-Tera-Bytes of storage and approximately a million times the age of the universe, you should be good to go!

See my first post...

Last edited by astrogeek; 01-19-2015 at 03:10 PM.
 
Old 01-19-2015, 03:26 PM   #9
SoftSprocket
Member
 
Registered: Nov 2014
Posts: 399

Rep: Reputation: Disabled
Quote:
Originally Posted by astrogeek View Post

Your number may be correct, but your "tera" is in the wrong place!

30,223,145,483,328,854,949,888,000
..looks like
30,223,145,483,328 TB to me!

That is 30+ Tera-Tera-Bytes!

Let's stay with your original guess of 5 numbers per second, which is...

Code:
1,208,925,819,333,154,197,995,520/5 = 241,785,163,866,000,000,000,000 seconds (with some rounding error on the low end)

31,557,600 seconds in a year, gives...

7,661,709,504,720,000 years
The universe is generally accepted to be 13,700,000,000 years old.

So with 30+ Tera-Tera-Bytes of storage and approximately a million times the age of the universe, you should be good to go!

See my first post...
Quite so -I checked my numbers again:
30223145483328 TB

or ~16,000,000 years of disk writing at current technology.

5 per second might have been correct for punch cards but a modern hard drive writes at a considerably faster rate ... not that it will help.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Explain ark.intel.com CPU counting and linux counting? postcd Linux - Hardware 9 06-27-2014 01:24 PM
LXer: Counting syllables in Python LXer Syndicated Linux News 0 12-17-2013 10:30 PM
LXer: Python Python Python (aka Python 3) LXer Syndicated Linux News 0 08-05-2009 08:30 PM
Extending Python with C (reference counting questions) spacepigeon Programming 0 07-19-2008 09:27 AM
And counting... MasterC General 18 09-24-2003 05:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration