LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-07-2009, 04:05 AM   #1
deepinlife
Member
 
Registered: Apr 2006
Posts: 78

Rep: Reputation: 15
very long array


guys i want to make an array[1000 000 000 ]
using c++ , g++ ,with total memory 3 giga and used memory 1.5 giga
kernel 2.6.27 fc10

i tried to allocate it statically and dynamically but in both failed ?
statically , stated compile error saying error size is too large ,
dynamically ,at run time it throw std:bad_alloc

any idea on how to solve that ?
 
Old 04-07-2009, 04:09 AM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
What is the array type element ?
 
Old 04-07-2009, 04:28 AM   #3
deepinlife
Member
 
Registered: Apr 2006
Posts: 78

Original Poster
Rep: Reputation: 15
long long
 
Old 04-07-2009, 08:47 AM   #4
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by deepinlife View Post
long long
Trivial calculation:

A long long is 8 bytes on 32-bits Linux.
So, your array alone would need 8Gb.
 
Old 04-07-2009, 09:10 AM   #5
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by deepinlife View Post
long long
And what is your CPU/OS architecture, i.e. is it a 32 bits combo or a 64 bits one ?
 
Old 04-07-2009, 09:35 AM   #6
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940
Don't allocate static data-structures of such an enormous size.

You will impose a horrific penalty on the virtual-memory subsystem, even if you have enough RAM to hold it all. Your program will fall to its knees (horribly...) and die (horribly).

What you are actually describing is a container of some sort, that is capable of storing an arbitrary number of values (each of which is a long long) using an int as a key. In other words, given a particular key (call it "an array index" if you like...), this container must produce (or store) the corresponding long long value.

A very good implementation for this might be an SQL table. Or an "indexed file" with a nice, big in-memory hash buffer. You can find plenty of existing container-classes that will handle this requirement easily.

The advantages of this approach are too numerous to mention. The container will store as many values as your memory (and disk) capacity will permit, but it will not waste time or storage with values that do not exist. The actual memory-reference patterns will be ones that are favorable to virtual-storage implementations.

The last sentence of the preceding paragraph deserves special mention. The "delays" associated with an "inefficient" CPU instruction path (hash-lookups and so on) will never, in a hundred years, add-up to a single second. However, the delays associated with virtual-storage consist of multiple milliseconds every time they occur. Which can add up to minutes, hours, and even days.

Always choose an implementation based on "what does it do," not "the implementation that I learned about long ago in school." Also, always choose an implementation that already exists, rather than writing a new one for yourself. You will find this to be sage advice...

Last edited by sundialsvcs; 04-07-2009 at 09:37 AM.
 
Old 04-07-2009, 09:54 AM   #7
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by sundialsvcs View Post
Don't allocate static data-structures of such an enormous size.

You will impose a horrific penalty on the virtual-memory subsystem, even if you have enough RAM to hold it all. Your program will fall to its knees (horribly...) and die (horribly).

What you are actually describing is a container of some sort, that is capable of storing an arbitrary number of values (each of which is a long long) using an int as a key. In other words, given a particular key (call it "an array index" if you like...), this container must produce (or store) the corresponding long long value.

A very good implementation for this might be an SQL table. Or an "indexed file" with a nice, big in-memory hash buffer. You can find plenty of existing container-classes that will handle this requirement easily.

The advantages of this approach are too numerous to mention. The container will store as many values as your memory (and disk) capacity will permit, but it will not waste time or storage with values that do not exist. The actual memory-reference patterns will be ones that are favorable to virtual-storage implementations.

The last sentence of the preceding paragraph deserves special mention. The "delays" associated with an "inefficient" CPU instruction path (hash-lookups and so on) will never, in a hundred years, add-up to a single second. However, the delays associated with virtual-storage consist of multiple milliseconds every time they occur. Which can add up to minutes, hours, and even days.

Always choose an implementation based on "what does it do," not "the implementation that I learned about long ago in school." Also, always choose an implementation that already exists, rather than writing a new one for yourself. You will find this to be sage advice...
I completely agree with what you're saying, but I have a feeling the OP needs to understand much more basic things first, that's why I asked my next simple question about number of bits.
 
Old 04-07-2009, 11:11 AM   #8
charlitos
Member
 
Registered: Feb 2009
Posts: 51

Rep: Reputation: 16
jesus, thats why they invented databases. Use them...
 
Old 04-07-2009, 11:24 AM   #9
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by charlitos View Post
jesus, thats why they invented databases. Use them...
Not necessarily - properly organized directory + file structure tuned for particular task will be faster - database anyway relies on a filesystem and has some internal overhead.

I.e. databases are good when data structures they are supposed to represent and/or queries to be made can frequently change.

OTOH, if one needs to, say, implement disk mapped array or hash, direct implementation will be more efficient.
 
Old 04-07-2009, 10:25 PM   #10
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940
Quote:
Originally Posted by Sergei Steshenko View Post
I completely agree with what you're saying, but I have a feeling the OP needs to understand much more basic things first, that's why I asked my next simple question about number of bits.
I concur.
 
Old 04-07-2009, 10:29 PM   #11
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940
Quote:
Originally Posted by Sergei Steshenko View Post
Not necessarily - properly organized directory + file structure tuned for particular task will be faster - database anyway relies on a filesystem and has some internal overhead.

I.e. databases are good when data structures they are supposed to represent and/or queries to be made can frequently change.

OTOH, if one needs to, say, implement disk mapped array or hash, direct implementation will be more efficient.
Once it has been (correctly) established in the programmer's mind that "a huge array" is not the appropriate strategy, many altenate strategies present themselves, from which an intelligent choice must be made. Hashes, directory/file structures, and databases are (all three...) plausible candidates; none of them are "a drop-dead winner."

All that can comfortably be said is: "a huge array" is "a drop-dead loser."
 
Old 04-08-2009, 02:56 AM   #12
deepinlife
Member
 
Registered: Apr 2006
Posts: 78

Original Poster
Rep: Reputation: 15
Quote:
Once it has been (correctly) established in the programmer's mind that "a huge array" is not the appropriate strategy, many altenate strategies present themselves, from which an intelligent choice must be made.
that what has happened .i have thought about something else and it worked.

Quote:
Always choose an implementation based on "what does it do," not "the implementation that I learned about long ago in school." Also, always choose an implementation that already exists, rather than writing a new one for yourself. You will find this to be sage advice...
thank u so much

Quote:
And what is your CPU/OS architecture, i.e. is it a 32 bits combo or a 64 bits one ?
uname -a gives i686 i686 i386
i m sorry but will it make a difference in calculating the memory usage depending on the architecture ?
 
Old 04-08-2009, 04:54 AM   #13
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by deepinlife View Post
that what has happened .i have thought about something else and it worked.


thank u so much


uname -a gives i686 i686 i386
i m sorry but will it make a difference in calculating the memory usage depending on the architecture ?
It will make a difference understanding process memory limits.

On a 32 bits machine, which you have, process memory limit is 4GB.

So, trying to have and array with 1B elements each of which is 4 bytes long exhausts process memory completely. And since you want "long long" elements, which are 8 bytes long, you exhausted process memory twice.
 
Old 04-08-2009, 06:00 AM   #14
deepinlife
Member
 
Registered: Apr 2006
Posts: 78

Original Poster
Rep: Reputation: 15
Quote:
So, trying to have and array with 1B elements each of which is 4 bytes long exhausts process memory completely. And since you want "long long" elements, which are 8 bytes long, you exhausted process memory twice.
thx
 
Old 04-08-2009, 03:41 PM   #15
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940
Let's say that, within this "key space" of 1 billion possible keys, on any particular run you actually top-out at, say, five million. Let's say, furthermore, that the key distribution is "all over the place." Very random, very unpredictable.

Your "very large array" would flop over and die, horribly. The odds are nearly 100% that a page-fault would occur on every reference, because I said that "the key distribution is random." The virtual memory subsystem would quickly succumb to thrashing.

But what would a hash table do? Each of the key-values is translated (by some means) into a "bucket number," and only that "bucket" of the hash-table is searched (by some means) for the desired value. Although the key-distribution is random, the size of the hash-table is both reasonable and fixed. Storage is only being consumed for the actual values used, plus some amount of overhead, but the storage-management strategy is designed to minimize the number of page-faults.

Certainly, the CPU is being obliged to execute more instructions. But it does that at a rate of several hundred million instructions per second... so "who cares?" What we are avoiding is a page-fault, which takes many milliseconds per fault. You do the math.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
BASH: Reading long filenames into an array using a loop DaneM Programming 12 09-11-2009 07:24 AM
Long standing USB problem with XP & Linux...long cbjhawks Linux - Hardware 4 02-23-2009 08:32 AM
long filename & long directory tree slack66 Slackware 1 09-20-2006 09:56 AM
I replaced a drive on my Raid array how long will it take for the card to reubuild? abefroman Linux - Hardware 1 09-25-2005 09:19 AM
how to convert long integer value to byte array appas Programming 11 11-23-2004 01:56 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:06 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration