LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-21-2007, 06:28 AM   #1
dimah
LQ Newbie
 
Registered: May 2006
Posts: 28

Rep: Reputation: 15
Huge binary file


I wrote program that read binary file. Everything is ok, but if i try to read binary file that is about 4,5 Gb, I can't read it. "Segmentation fault". I use Qt and fopen. Any ideas?
 
Old 03-21-2007, 08:14 AM   #2
Nick_Battle
Member
 
Registered: Dec 2006
Location: Bracknell, UK
Distribution: SUSE 13.1
Posts: 159

Rep: Reputation: 33
If you're trying to read the whole binary file into memory, you'll have a problem unless you're on a 64-bit machine with a lot of RAM/swap. Otherwise... we need more detail. If you can get a core dump (see ulimit -c), what's on top of the stack when it stops?
 
Old 03-21-2007, 11:20 PM   #3
dimah
LQ Newbie
 
Registered: May 2006
Posts: 28

Original Poster
Rep: Reputation: 15
I just write "ulimit -c" in the shell windows and the result is "0"? What does it mean? Can i read this file without working with swap?
 
Old 03-22-2007, 04:00 AM   #4
Nick_Battle
Member
 
Registered: Dec 2006
Location: Bracknell, UK
Distribution: SUSE 13.1
Posts: 159

Rep: Reputation: 33
Sorry, I should have said "ulimit -c unlimited". This sets the maximum size that a core file can be. If yours was set to zero, it would never produce core files.

I still don't know whether your application is reading the file a buffer at a time, or trying to read the whole thing in to memory. But if it's the latter, the simple fact that 4.5Gb is greater than the whole 32-bit address space means that you are forced to read it with a 64-bit program. Whether or not you need to configure swap depends on the amount of RAM you have - if you have (say) 6Gb of RAM, it won't use swap unless the system is busy with other very large tasks; if you have 2Gb RAM and 4Gb swap, it will use swap.

There may also be large memory-model issues with handling this much data in Linux that I'm not aware of.

If you're just reading the file (say) 1024 bytes at a time, you have a "bug". Much easier to fix :-)

HTH,
-nick
 
Old 03-22-2007, 04:49 AM   #5
dimah
LQ Newbie
 
Registered: May 2006
Posts: 28

Original Poster
Rep: Reputation: 15
I wrote your command to the shell, so as a result there is nothing changed. Fault: "Segmentation fault(Core dumped)". I think there is no really right way, because i want to read my binary file and read it's data, there is a code, for example:
FILE * file;
int number=5000;
if((file=fopen("big_binary_file", "rb+")) == NULL)
printf("File is not open \n");
else
printf("File is not open \n");

fseek (file,number,SEEK_SET);................fread ................ etc...
So, my program must open BIG binary files on other machines,and is there any way to read this files easy or without loading to memory???
 
Old 03-22-2007, 05:09 AM   #6
Nick_Battle
Member
 
Registered: Dec 2006
Location: Bracknell, UK
Distribution: SUSE 13.1
Posts: 159

Rep: Reputation: 33
But now the error message indicates that a core file has been dumped. You should find a file called "core" in the current directory after the error. You can open that with gdb (gdb -c core myprog), and ask for a backtrace. That should tell you something about where the error occurred. Recompile the program with -g to get more information.

If you're just opening the file and seeking to a large offset, you don't actually read the whole file into memory. So ignore all that stuff about limits and swap space. It sounds like you have "a bug". Using gdb to analyse the core file will lead to "a fix" :-)

HTH,
-nick
 
Old 03-22-2007, 05:13 AM   #7
Nick_Battle
Member
 
Registered: Dec 2006
Location: Bracknell, UK
Distribution: SUSE 13.1
Posts: 159

Rep: Reputation: 33
PS. The reason I originally thought the 4.5Gb error was to do with the 32-bit address space is that it's about the right value to cause problems.

But it might be that in your code, the file offset is a 32-bit (signed?) integer (the variable "number" above). In which case, that number can't hold a value big enough to represent the offset beyond 2Gb (signed) or 4Gb (unsigned). The result could be treated as a negative number, which will cause an fseek error, and if your program doesn't handle the error correctly you could get a SIGSEGV like this.

Just a guess :-)
 
Old 03-22-2007, 05:57 AM   #8
dimah
LQ Newbie
 
Registered: May 2006
Posts: 28

Original Poster
Rep: Reputation: 15
I changed int to unsigned int.
Result from gdb 2d:
Program received signal SIGSEGV, Segmentation fault.
0x00b1287b in fseek () from /lib/tls/libc.so.6
 
Old 03-22-2007, 06:26 AM   #9
jlinkels
LQ Guru
 
Registered: Oct 2003
Location: Bonaire, Leeuwarden
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195

Rep: Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043
The first step seems to me to find out when exactly the program crashes. This could give you a pointer to what causes the error.

I would start at a smaller file size and increase it gradually to see when you expierence the segfault.

To do it fast as possible, use a binary search method. That is:

Code:
set file size at 2 GB
if segfault {
  set file size at 1 GB
  if segfault {
    set file size at 0.5 GB
  } else {
    set file size at 1.5 GB
} else {
  set file size at 3 GB
  ...
}
etc.

This should bring you in a minimum number of steps to the actual file size where the segfault happens. If this size happens to be close to a nice hex value 10000... you know where/what the trouble is.

You can create large files using the dd command

jlinkels
 
Old 03-22-2007, 07:13 AM   #10
dimah
LQ Newbie
 
Registered: May 2006
Posts: 28

Original Poster
Rep: Reputation: 15
2 GB is a limit. (
 
Old 03-22-2007, 07:33 AM   #11
Nick_Battle
Member
 
Registered: Dec 2006
Location: Bracknell, UK
Distribution: SUSE 13.1
Posts: 159

Rep: Reputation: 33
OK, 2Gb fits with the limit of a signed 32-bit int offset.

I think this is a fundamental problem with the fseek interface - it's defined to use long offset (check this), but on most 32-bit systems a long is 32-bits not 64-bits. So you can't seek to an absolute position in a file greater than 2Gb.

You could seek absolutely up to 2Gb and then seek relatively from there, presumably(?). Ugly, I know. A quick Google for "fseek" and "beyond 2Gb" reveals that this is a known limitation.
 
Old 03-22-2007, 07:48 AM   #12
dimah
LQ Newbie
 
Registered: May 2006
Posts: 28

Original Poster
Rep: Reputation: 15
Thanks a lot, i'll try to birth something...
 
Old 03-22-2007, 09:39 AM   #13
son_t
Member
 
Registered: Sep 2006
Posts: 49

Rep: Reputation: 15
Try using lseek and lseek64.
 
Old 03-22-2007, 10:16 AM   #14
Nick_Battle
Member
 
Registered: Dec 2006
Location: Bracknell, UK
Distribution: SUSE 13.1
Posts: 159

Rep: Reputation: 33
Quote:
Originally Posted by son_t
Try using lseek and lseek64.
And note the comment in the lseek64 man page (regarding lseek) about:

#define _FILE_OFFSET_BITS 64
 
Old 03-22-2007, 10:19 AM   #15
nx5000
Senior Member
 
Registered: Sep 2005
Location: Out
Posts: 3,307

Rep: Reputation: 57
Code:
CFLAGS=-D_FILE_OFFSET_BITS=64 gcc ...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash File Name Matching - Binary file .ogg matches !!! maxvonseibold Linux - General 8 01-30-2007 06:31 PM
Huge log file john_h_grubb Linux - General 2 01-15-2007 02:17 PM
Reading text file-writting binary file cdog Programming 5 06-13-2006 11:56 AM
Moving huge file? jbrashear Debian 4 02-21-2004 07:23 PM
Large tar file taking huge disk space in ext3 file system pcwulf Linux - General 2 10-20-2003 07:45 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration