LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 10-02-2008, 10:11 AM   #1
Mr Tummyfluff
LQ Newbie
 
Registered: Aug 2006
Location: Southampton
Distribution: Solaris 5.8, Red Hat Enterprise 3, OS X 10.4
Posts: 10

Rep: Reputation: 0
Question Array size in C++


I'm writing an algorithm to do a bit of signal processing. Nothing particularly complicated, just need to be able to handle a large volume of data and funny input file format.

Gist of it is, the huge amount of input is sytematically read a bit at a time a remapped into an array in memory, defined as:

Code:
float modl[nz][nx1][nx2];
All was working fine until I tried to read in a real world sized data set. Then I get a Segmentation Fault when I try to define an array greater than [3001]*[1047]*[1]. And it's definately faulting at the point of the array definition because I have a printf statement straight after which doesn't get run. I find this somewhat confusing, for two reasons:

1.If it was a memory allocation error I'd expect to recieve a malloc error rather than a Segmentation Fault.

2.Estimating the amount memory the array should require as sizeof(float)*[3001]*[1047]*[1], I find it to be in the 3 Mb ball park. Since I'm running on a machine with 4 Gb of memory this really shouldn't be a problem!

Yet, if I decrease the size of the array being defined, it works fine!

Anyone any ideas? All help appeciated greatly!
 
Old 10-02-2008, 11:23 AM   #2
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,302
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by Mr Tummyfluff View Post
1.If it was a memory allocation error I'd expect to recieve a malloc error rather than a Segmentation Fault.
You will receive a segmentation fault if you allocate the memory on the stack. See this poor guy's post for an example of the same problem.

Quote:
Originally Posted by Mr Tummyfluff View Post
2.Estimating the amount memory the array should require as sizeof(float)*[3001]*[1047]*[1], I find it to be in the 3 Mb ball park. Since I'm running on a machine with 4 Gb of memory this really shouldn't be a problem!
Your estimate is wrong. sizeof(float) is usually 4, so you would get

4 x 3001 x 1047 x 1 = 12,568,188 = 12M (in Microsoft math)

As you will see from the post mentioned above, the default stack limit is 8M.
 
Old 10-02-2008, 01:36 PM   #3
Mr Tummyfluff
LQ Newbie
 
Registered: Aug 2006
Location: Southampton
Distribution: Solaris 5.8, Red Hat Enterprise 3, OS X 10.4
Posts: 10

Original Poster
Rep: Reputation: 0
Thanks David you're a star..but can I beg one more question!

I hadn't realised there was a memory limit on how big an array you can define on the stack, so I'm now using calloc() to assign the memory and fill it with zeroes. When I want to write to that bit of memory, I shouldn't I just do as below?

Code:
float *modl = calloc ((nz*nx1*nx2),sizeof(float));

for(ix1=0; ix1<nx1; ix1++){
  for(ix2=0; ix2<nx2; ix2++){
    for(iz=0; iz<nz; iz++){
      modl[iz+nz*(ix1+nx2*ix1)] += data[it];
    }
  }
}
This compiles fine, but again gives a Segementation Fault. Sorry, bit of a C noob!

Mark

PS. Yeah, maybe I forgot to multiply by 4 for sizeof(float). Doh!
 
Old 10-02-2008, 03:38 PM   #4
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,302
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by Mr Tummyfluff View Post
Thanks David you're a star...but can I beg one more question!
I give the glory to God.

I stuck in some suggestions as comments in your code below:

Quote:
Originally Posted by Mr Tummyfluff View Post
Code:
float *modl = calloc ((nz*nx1*nx2),sizeof(float));

// Insert a check for a NULL return here, e.g.
if (!modl)
{
    printf("ERROR: Could not allocate memory\n");
    exit(1);
}

for(ix1=0; ix1<nx1; ix1++){
  for(ix2=0; ix2<nx2; ix2++){
    for(iz=0; iz<nz; iz++){
      int index = iz+nz*(ix1+nx2*ix1);
      // Print out your calculated index here to verify your math
      printf("index = %d\n");
      // Comment this out until you have verified your index math
//      modl[iz+nz*(ix1+nx2*ix1)] += data[it];
    }
  }
}

// Add some code here to free your memory, or you will leak 12M
free(modl);
Build it with those changes and check your index math first.
 
Old 10-02-2008, 03:58 PM   #5
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by Mr Tummyfluff View Post
And it's definately faulting at the point of the array definition because I have a printf statement straight after which doesn't get run.
This isn't a reliable indicator unless the buffer gets flushed right after the printf call. In theory, standard output is supposed to be line-buffered (meaning a flush is performed after \n,) but that isn't always the case. Just for future reference! Also, you might consider using an mmaped file to prevent page faults, possibly speeding things up a little.
ta0kira
 
Old 10-03-2008, 06:21 AM   #6
dmail
Member
 
Registered: Oct 2005
Posts: 970

Rep: Reputation: Disabled
std::vector anyone?

Last edited by dmail; 10-03-2008 at 06:23 AM.
 
Old 10-03-2008, 09:16 AM   #7
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,302
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by dmail View Post
std::vector anyone?
We had some problems at work using GCC's std::vector class. We could only add 4 elements before it threw an exception.
 
Old 10-03-2008, 09:22 AM   #8
dmail
Member
 
Registered: Oct 2005
Posts: 970

Rep: Reputation: Disabled
Quote:
Originally Posted by David1357 View Post
We had some problems at work using GCC's std::vector class. We could only add 4 elements before it threw an exception.
Maybe you could give some info (I don't know if you have an NDA), but I would really like to see code that does that. gcc version, element type and its structure (ie copy constructor, assignment operator, default constructor) code sample, what exception etc.

Last edited by dmail; 10-03-2008 at 09:29 AM.
 
Old 10-03-2008, 09:42 AM   #9
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,302
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by dmail View Post
Maybe you could give some info (I don't know if you have an NDA), but I would really like to see code that does that. gcc version, element type and its structure (ie copy constructor, assignment operator, default constructor) code sample, what exception etc.
Sorry, it was std::list, and the method was push_back. When we tried to push_back the fifth element, it threw an exception.
 
Old 10-03-2008, 10:07 AM   #10
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,455

Rep: Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172
What I'm wondering about is this... do you really have 12 million equal possibilities? Whenever this program runs for a while, is it really likely that every single one of those "[3001]*[1047]*[1]" buckets will be full?

Or could the actual data-distribution be sparse?

In the latter case, what if you consider the "address" of the input to be a 3-tuple, where the range of values in this particular case is ([0..3001], [0..1047], [0..1]) but might be somewhat smaller or larger the next time the program is run.

So, what you use for the storage in this case is a hash table. The values are stored in the hash-table using a 3-tuple as a hash key.

The advantage of this approach is that, as memory is allocated for the hash-table as it grows, that memory will tend to be fairly contiguous even though the distribution of key-values might not. Let's say for instance that you take-in about a hundred-thousand unique numbers in a particular run. The hash-table will have grown to about a hundred-thousand entries, stored in a few more-or-less contiguous megabytes of virtual storage, and it will have done this no matter how widely scattered the key-values might have been.

This will prevent the problem that would otherwise bring your program to its knees: thrashing. By pre-determining the location in virtual storage where each value is to be placed, you potentially force the application to suffer a page-fault for every hit, if the key-value distribution is widely scattered.

Thrashing can easily destroy your program, causing runs to take many hours or days... that ought to take seconds.

Bright(!) Idea... If you have any opportunity to exercise any sort of control over how the data is presented to you (i.e. if it's in a static file rather than coming in real-time), you can radically improve the situation by sorting the values first. Extract them from the funny input file, dump them into a temporary file in an easy-to-use format, and disk-sort that file.

In doing so, you just might eliminate the need for "random access" altogether: identical data-points are now adjacent in the file. Gaps can be found by comparing "this record" to "the previous one." And sorting is an unexpectedly-fast algorithm. A 12 or even 120-megabyte file is "no big deal."

"That's how they did it with punched cards," and the technique still works today!

Last edited by sundialsvcs; 10-03-2008 at 10:09 AM.
 
  


Reply

Tags
allocation, c++, memory, programming


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to find out an array size on c++ poeta_boy Programming 9 06-22-2004 02:28 PM
Size of an array in C Hady Programming 6 04-06-2004 06:33 AM
PERL: Size of an array of an Array inspleak Programming 2 03-10-2004 03:24 PM
size of an array Longinus Programming 8 03-08-2004 02:48 PM
getting size of an array in c++ schatoor Programming 8 11-06-2003 06:19 PM


All times are GMT -5. The time now is 04:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration