LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   GNU getline appears to choke with large file support (can't read >2GB) (https://www.linuxquestions.org/questions/programming-9/gnu-getline-appears-to-choke-with-large-file-support-can%27t-read-2gb-647357/)

VelocideX 06-06-2008 01:28 AM

GNU getline appears to choke with large file support (can't read >2GB)
 
Hi all,

I have compiled my program to enable large file support. That is, I pass -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64 to gcc.

I can open files larger than 2GB file (whereas I could not if these options were not enabled).

I have been reading in data from the text files using the GNU getline command. Getline also reads in data fine from before the 2GB mark, but immediately after it cannot read any data in.

Does anyone know why this is, and how I can fix it? Is GNU getline compatible with LFS, or do I have to write my own equivalent routine?

Cheers :)

jschiwal 06-06-2008 04:01 AM

You might want to check what resource limits might be imposed. You aren't trying to read a binary file without returns, are you?

If your ulimit restricts memory to below 2GB, then getline's realloc call will probably fail.

jschiwal 06-06-2008 05:34 AM

I compiled this simple test program from the getline manpage. The "hardwired" file is a text file copy of the info bash manual cat'ed over & over until test.txt was 2.7GB. I'll have to let it run for a while to see if it reaches the end. I defined _FILE_OFFSET_BITS = 64 as per the feature_test_macros manpage.

Code:

#define _GNU_SOURCE
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
    FILE * fp;
    char * line = NULL;
    size_t len = 0;
    ssize_t read;
    fp = fopen("test.txt", "r");
    if (fp == NULL)
        exit(EXIT_FAILURE);
    while ((read = getline(&line, &len, fp)) != -1) {
        printf("Retrieved line of length %zu :\n", read);
        printf("%s", line);
    }
    if (line)
        free(line);
    return EXIT_SUCCESS;
}


VelocideX 06-06-2008 06:31 AM

Quote:

Originally Posted by jschiwal (Post 3176440)
You might want to check what resource limits might be imposed. You aren't trying to read a binary file without returns, are you?

If your ulimit restricts memory to below 2GB, then getline's realloc call will probably fail.

It's not a binary fine.. It's a text file that I have made as the output of a fortran simulation. Each like is about 40,000 characters. There are PLENTY of line returns.

I've checked ulimit and there's no restrictions I can see that would affect it. realloc shouldn't matter because the string that is being filled is only about ~40kB. Each string is discarded. There is another few dynamic vectors allocated, but the memory usage for those is only about 300MB (I have 3.5GB memory).

It's suspicious that getline fails as soon as the file position hits 2^31.

jschiwal - thanks for taking the time to compile a test routine. My routine checks the output of getline, and it's never < 0 (it prints an error message then dumps if it is), which is strange.

jschiwal 06-06-2008 06:36 AM

I checked and the test program is still running.
Please note this from the feature_test_macro page:
Quote:

_LARGEFILE64_SOURCE
Expose definitions for the alternative API specified by the LFS
(Large File Summit) as a "transitional extension" to the Single
UNIX Specification. (See http://opengroup.org/plat‐
form/lfs.html.) The alternative API consists of a set of new
objects (i.e., functions and types) whose names are suffixed
with "64" (e.g., off64_t versus off_t, lseek64() versus lseek(),
etc.). New programs should not employ this interface; instead
_FILE_OFFSET_BITS=64 should be employed.

I compiled my example with "#define _FILE_OFFSET_BITS 64" to test the getline() function and not the _LARGEFILE64_SOURCE feature test macro.

jschiwal 06-06-2008 10:24 AM

Update:
Hours after starting the little demo program it finished reading the 2.7GB text file. EXIT_SUCCESS!


All times are GMT -5. The time now is 01:46 AM.