very long array

deepinlife · 04-07-2009, 04:05 AM

guys i want to make an array[1000 000 000 ]
using c++ , g++ ,with total memory 3 giga and used memory 1.5 giga
kernel 2.6.27 fc10

i tried to allocate it statically and dynamically but in both failed ?
statically , stated compile error saying error size is too large ,
dynamically ,at run time it throw std:bad_alloc

any idea on how to solve that ?

Sergei Steshenko · 04-07-2009, 04:09 AM

What is the array type element ?

deepinlife · 04-07-2009, 04:28 AM

long long

Hko · 04-07-2009, 08:47 AM

Quote:

Originally Posted by deepinlife

long long

Trivial calculation:

A long long is 8 bytes on 32-bits Linux.
So, your array alone would need 8Gb.

Sergei Steshenko · 04-07-2009, 09:10 AM

Quote:

Originally Posted by deepinlife

long long

And what is your CPU/OS architecture, i.e. is it a 32 bits combo or a 64 bits one ?

sundialsvcs · 04-07-2009, 09:35 AM

Don't allocate static data-structures of such an enormous size.

You will impose a horrific penalty on the virtual-memory subsystem, even if you have enough RAM to hold it all. Your program will fall to its knees (horribly...) and die (horribly).

What you are actually describing is a container of some sort, that is capable of storing an arbitrary number of values (each of which is a long long) using an int as a key. In other words, given a particular key (call it "an array index" if you like...), this container must produce (or store) the corresponding long long value.

A very good implementation for this might be an SQL table. Or an "indexed file" with a nice, big in-memory hash buffer. You can find plenty of existing container-classes that will handle this requirement easily.

The advantages of this approach are too numerous to mention. The container will store as many values as your memory (and disk) capacity will permit, but it will not waste time or storage with values that do not exist. The actual memory-reference patterns will be ones that are favorable to virtual-storage implementations.

The last sentence of the preceding paragraph deserves special mention. The "delays" associated with an "inefficient" CPU instruction path (hash-lookups and so on) will never, in a hundred years, add-up to a single second. However, the delays associated with virtual-storage consist of multiple milliseconds every time they occur. Which can add up to minutes, hours, and even days.

Always choose an implementation based on "what does it do," not "the implementation that I learned about long ago in school." Also, always choose an implementation that already exists, rather than writing a new one for yourself. You will find this to be sage advice...

Sergei Steshenko · 04-07-2009, 09:54 AM

Quote:

Originally Posted by sundialsvcs

Don't allocate static data-structures of such an enormous size.

You will impose a horrific penalty on the virtual-memory subsystem, even if you have enough RAM to hold it all. Your program will fall to its knees (horribly...) and die (horribly).

What you are actually describing is a container of some sort, that is capable of storing an arbitrary number of values (each of which is a long long) using an int as a key. In other words, given a particular key (call it "an array index" if you like...), this container must produce (or store) the corresponding long long value.

A very good implementation for this might be an SQL table. Or an "indexed file" with a nice, big in-memory hash buffer. You can find plenty of existing container-classes that will handle this requirement easily.

The advantages of this approach are too numerous to mention. The container will store as many values as your memory (and disk) capacity will permit, but it will not waste time or storage with values that do not exist. The actual memory-reference patterns will be ones that are favorable to virtual-storage implementations.

The last sentence of the preceding paragraph deserves special mention. The "delays" associated with an "inefficient" CPU instruction path (hash-lookups and so on) will never, in a hundred years, add-up to a single second. However, the delays associated with virtual-storage consist of multiple milliseconds every time they occur. Which can add up to minutes, hours, and even days.

Always choose an implementation based on "what does it do," not "the implementation that I learned about long ago in school." Also, always choose an implementation that already exists, rather than writing a new one for yourself. You will find this to be sage advice...

I completely agree with what you're saying, but I have a feeling the OP needs to understand much more basic things first, that's why I asked my next simple question about number of bits.

charlitos · 04-07-2009, 11:11 AM

jesus, thats why they invented databases. Use them...

Sergei Steshenko · 04-07-2009, 11:24 AM

Quote:

Originally Posted by charlitos

jesus, thats why they invented databases. Use them...

Not necessarily - properly organized directory + file structure tuned for particular task will be faster - database anyway relies on a filesystem and has some internal overhead.

I.e. databases are good when data structures they are supposed to represent and/or queries to be made can frequently change.

OTOH, if one needs to, say, implement disk mapped array or hash, direct implementation will be more efficient.

sundialsvcs · 04-07-2009, 10:25 PM

Quote:

Originally Posted by Sergei Steshenko

I completely agree with what you're saying, but I have a feeling the OP needs to understand much more basic things first, that's why I asked my next simple question about number of bits.

I concur.

sundialsvcs · 04-07-2009, 10:29 PM

Quote:

Originally Posted by Sergei Steshenko

Not necessarily - properly organized directory + file structure tuned for particular task will be faster - database anyway relies on a filesystem and has some internal overhead.

I.e. databases are good when data structures they are supposed to represent and/or queries to be made can frequently change.

OTOH, if one needs to, say, implement disk mapped array or hash, direct implementation will be more efficient.

Once it has been (correctly) established in the programmer's mind that "a huge array" is not the appropriate strategy, many altenate strategies present themselves, from which an intelligent choice must be made. Hashes, directory/file structures, and databases are (all three...) plausible candidates; none of them are "a drop-dead winner."

All that can comfortably be said is: "a huge array" is "a drop-dead loser."

deepinlife · 04-08-2009, 02:56 AM

Quote:

Once it has been (correctly) established in the programmer's mind that "a huge array" is not the appropriate strategy, many altenate strategies present themselves, from which an intelligent choice must be made.

that what has happened .i have thought about something else and it worked.

Quote:

Always choose an implementation based on "what does it do," not "the implementation that I learned about long ago in school." Also, always choose an implementation that already exists, rather than writing a new one for yourself. You will find this to be sage advice...

thank u so much

Quote:

And what is your CPU/OS architecture, i.e. is it a 32 bits combo or a 64 bits one ?

uname -a gives i686 i686 i386
i m sorry but will it make a difference in calculating the memory usage depending on the architecture ?

Sergei Steshenko · 04-08-2009, 04:54 AM

Quote:

Originally Posted by deepinlife

that what has happened .i have thought about something else and it worked.

thank u so much

uname -a gives i686 i686 i386
i m sorry but will it make a difference in calculating the memory usage depending on the architecture ?

It will make a difference understanding process memory limits.

On a 32 bits machine, which you have, process memory limit is 4GB.

So, trying to have and array with 1B elements each of which is 4 bytes long exhausts process memory completely. And since you want "long long" elements, which are 8 bytes long, you exhausted process memory twice.

deepinlife · 04-08-2009, 06:00 AM

Quote:

So, trying to have and array with 1B elements each of which is 4 bytes long exhausts process memory completely. And since you want "long long" elements, which are 8 bytes long, you exhausted process memory twice.

thx

sundialsvcs · 04-08-2009, 03:41 PM

Let's say that, within this "key space" of 1 billion possible keys, on any particular run you actually top-out at, say, five million. Let's say, furthermore, that the key distribution is "all over the place." Very random, very unpredictable.

Your "very large array" would flop over and die, horribly. The odds are nearly 100% that a page-fault would occur on every reference, because I said that "the key distribution is random." The virtual memory subsystem would quickly succumb to thrashing.

But what would a hash table do? Each of the key-values is translated (by some means) into a "bucket number," and only that "bucket" of the hash-table is searched (by some means) for the desired value. Although the key-distribution is random, the size of the hash-table is both reasonable and fixed. Storage is only being consumed for the actual values used, plus some amount of overhead, but the storage-management strategy is designed to minimize the number of page-faults.

Certainly, the CPU is being obliged to execute more instructions. But it does that at a rate of several hundred million instructions per second... so "who cares?" What we are avoiding is a page-fault, which takes many milliseconds per fault. You do the math.