Huge binary file

dimah · 03-21-2007, 06:28 AM

I wrote program that read binary file. Everything is ok, but if i try to read binary file that is about 4,5 Gb, I can't read it. "Segmentation fault". I use Qt and fopen. Any ideas?

Nick_Battle · 03-21-2007, 08:14 AM

If you're trying to read the whole binary file into memory, you'll have a problem unless you're on a 64-bit machine with a lot of RAM/swap. Otherwise... we need more detail. If you can get a core dump (see ulimit -c), what's on top of the stack when it stops?

dimah · 03-21-2007, 11:20 PM

I just write "ulimit -c" in the shell windows and the result is "0"? What does it mean? Can i read this file without working with swap?

Nick_Battle · 03-22-2007, 04:00 AM

Sorry, I should have said "ulimit -c unlimited". This sets the maximum size that a core file can be. If yours was set to zero, it would never produce core files.

I still don't know whether your application is reading the file a buffer at a time, or trying to read the whole thing in to memory. But if it's the latter, the simple fact that 4.5Gb is greater than the whole 32-bit address space means that you are forced to read it with a 64-bit program. Whether or not you need to configure swap depends on the amount of RAM you have - if you have (say) 6Gb of RAM, it won't use swap unless the system is busy with other very large tasks; if you have 2Gb RAM and 4Gb swap, it will use swap.

There may also be large memory-model issues with handling this much data in Linux that I'm not aware of.

If you're just reading the file (say) 1024 bytes at a time, you have a "bug". Much easier to fix :-)

HTH,
-nick

dimah · 03-22-2007, 04:49 AM

I wrote your command to the shell, so as a result there is nothing changed. Fault: "Segmentation fault(Core dumped)". I think there is no really right way, because i want to read my binary file and read it's data, there is a code, for example:
FILE * file;
int number=5000;
if((file=fopen("big_binary_file", "rb+")) == NULL)
printf("File is not open \n");
else
printf("File is not open \n");

fseek (file,number,SEEK_SET);................fread ................ etc...
So, my program must open BIG binary files on other machines,and is there any way to read this files easy or without loading to memory???

Nick_Battle · 03-22-2007, 05:09 AM

But now the error message indicates that a core file has been dumped. You should find a file called "core" in the current directory after the error. You can open that with gdb (gdb -c core myprog), and ask for a backtrace. That should tell you something about where the error occurred. Recompile the program with -g to get more information.

If you're just opening the file and seeking to a large offset, you don't actually read the whole file into memory. So ignore all that stuff about limits and swap space. It sounds like you have "a bug". Using gdb to analyse the core file will lead to "a fix" :-)

HTH,
-nick

Nick_Battle · 03-22-2007, 05:13 AM

PS. The reason I originally thought the 4.5Gb error was to do with the 32-bit address space is that it's about the right value to cause problems.

But it might be that in your code, the file offset is a 32-bit (signed?) integer (the variable "number" above). In which case, that number can't hold a value big enough to represent the offset beyond 2Gb (signed) or 4Gb (unsigned). The result could be treated as a negative number, which will cause an fseek error, and if your program doesn't handle the error correctly you could get a SIGSEGV like this.

Just a guess :-)

dimah · 03-22-2007, 05:57 AM

I changed int to unsigned int.
Result from gdb 2d:
Program received signal SIGSEGV, Segmentation fault.
0x00b1287b in fseek () from /lib/tls/libc.so.6

jlinkels · 03-22-2007, 06:26 AM

The first step seems to me to find out when exactly the program crashes. This could give you a pointer to what causes the error.

I would start at a smaller file size and increase it gradually to see when you expierence the segfault.

To do it fast as possible, use a binary search method. That is:

Code:

set file size at 2 GB
if segfault {
  set file size at 1 GB
  if segfault {
    set file size at 0.5 GB
  } else {
    set file size at 1.5 GB
} else {
  set file size at 3 GB
  ...
}

etc.

This should bring you in a minimum number of steps to the actual file size where the segfault happens. If this size happens to be close to a nice hex value 10000... you know where/what the trouble is.

You can create large files using the dd command

jlinkels

dimah · 03-22-2007, 07:13 AM

2 GB is a limit. (

Nick_Battle · 03-22-2007, 07:33 AM

OK, 2Gb fits with the limit of a signed 32-bit int offset.

I think this is a fundamental problem with the fseek interface - it's defined to use long offset (check this), but on most 32-bit systems a long is 32-bits not 64-bits. So you can't seek to an absolute position in a file greater than 2Gb.

You could seek absolutely up to 2Gb and then seek relatively from there, presumably(?). Ugly, I know. A quick Google for "fseek" and "beyond 2Gb" reveals that this is a known limitation.

dimah · 03-22-2007, 07:48 AM

Thanks a lot, i'll try to birth something...

son_t · 03-22-2007, 09:39 AM

Try using lseek and lseek64.

Nick_Battle · 03-22-2007, 10:16 AM

Quote:

Originally Posted by son_t

Try using lseek and lseek64.

And note the comment in the lseek64 man page (regarding lseek) about:

#define _FILE_OFFSET_BITS 64

nx5000 · 03-22-2007, 10:19 AM

Code:

CFLAGS=-D_FILE_OFFSET_BITS=64 gcc ...