LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   what could cause this fwrite() to seg fault? (http://www.linuxquestions.org/questions/programming-9/what-could-cause-this-fwrite-to-seg-fault-413230/)

hedpe 02-08-2006 08:46 PM

what could cause this fwrite() to seg fault?
 
Hey guys,

I am getting a segmentation fault which is backtraced to this block of code, generated by the fwrite() call.

Code:

    /* Append new data to our temporary file */
    sprintf(filename, "%s.tmp", cl->cam->name);
    fd = fopen(filename, "a");

    if(fd == NULL)
      printf("Error opening file %s: %s", filename, strerror(errno));

    fwrite(data, sizeof(char), cl->in_cnt, fd);
    fclose(fd);

There is no "Error opening file" printed out on the screen, so fd is NOT null.

Here is my backtrace:
Code:

Program received signal SIGSEGV, Segmentation fault.
0x42062d67 in fwrite () from /lib/tls/libc.so.6
#0  0x42062d67 in fwrite () from /lib/tls/libc.so.6
#1  0x0804c042 in cam_data (c=0x804d980, cl=0x80829a0) at caching.c:1355
#2  0x080497cf in parse_command (c=0x804d980, cl=0x80829a0) at caching.c:307
#3  0x080491a2 in check_data (c=0x804d980) at caching.c:144
#4  0x08048e7b in main () at caching.c:52
#5  0x42015704 in __libc_start_main () from /lib/tls/libc.so.6

If any other information would prove to be helpful let me know and I can get it by making it crash again :) (easily done!)

Thanks!
George

paulsm4 02-09-2006 01:01 AM

1. You want to make sure "data" is either a buffer to writeable memory, or it's a pointer that's correctly initialized to point to a writeable buffer, at the time "fwrite" is called

2. "sizeof(char)", of course, should equal "1"

3. You should also check the value of "cl->in_cnt" when the crash occurs, and make sure that it's smaller than your buffer.

4. What's a "writable buffer"? Any array that you've declared locally, declared statically, or allocated via "malloc()" or "new" (and, of course, have *not* inadvertantly deleted before calling "fwrite()".

Step through the debugger and use the "print" (e.g. "p cl->in_cnt") and "dump memory" (e.g. "x/16 data") when the crash occurs to investigate these points.

'Hope that helps .. PSM

PS:
I assume you declared "fd" as "FILE *fd" ("fp" might have been a better choice; "fd" is generally for numeric "file descriptors" instead of stdio "file pointers").

I also assume that "filename" is a character array long enough to hold the actual filename you generated with "sprintf()".

If either assumption is incorrect, that, too might cause fwrite() to crash...

hedpe 02-09-2006 01:52 AM

thanks for the response paulsm4

I am going to add that extra debugging and print out the memory

unfortunately it takes a little while because the code is running on emulab, and I have to restart the experiment

I wish it would crash locally on my computer, but it only crashes when running on emulab, so i'm not sure of the problem yet

you are correct in assuming fd is decalred as FILE *fd;

filename is definately of sufficient size too

i suspect something is wrong with "data"

hedpe 02-09-2006 02:10 AM

i get this printed out about a hundred times before my seg fault:

Code:

data: 0x0x8082c5e      cl->in_buf: 0x0x8082c59 cl->in_cnt before: 512  after: 507
after this code mod:
Code:

    printf("data: 0x%p\tcl->in_buf: 0x%p\tcl->in_cnt before: %d", data, cl->in_buf, cl->in_cnt);
    cl->in_cnt = cl->in_cnt - (data - cl->in_buf); /* the number of bytes to write to the file will be less
                                                      after stripping out the "numbytes|" part of the data
                                                      so we recalculate subtracting pointers */
    printf("\tafter: %d\n", cl->in_cnt);
    fflush(stdout);


hedpe 02-11-2006 11:59 AM

bump

i've ran it in valgrind successfully with no memory leaks or out of bounds errors, i have no clue what the problem is

paulsm4 02-11-2006 12:48 PM

Hi -

At first, this seemed like a really simple problem. But clearly, there's more here than superficialy meets the eye.

Modest suggestions:
1. Please add the following instrumentation:

a) Outside of your function (if possible)
Code:

static unsigned long ct = 0;
b) Just before your "fwrite()":
Code:

fprintf (stderr, "data: 0x%x, cl: 0x%x, cl->in_cnt: %d, fp: 0x%x, ct: %d\n",
    data, cl, cl->in_cnt, fp, ct++); 
  fwrite(data, sizeof(char), cl->in_cnt, fp);

c) Just after your "fclose()" (and after every other pointer you "fclose()" or "free()"):
Code:

  fclose (fp);
  fp = NULL;

2. The benefit of "fprintf (stderr)" is that you're using unbuffered I/O. Sometimes, frankly, "printf()/fflush (stdout)", doesn't always print out everything you need to see.

3. The "fp" vs "fd" stuff is just an idiom for differentiating between something you "open()" vs something you "fopen()". Just housekeeping. Please humor me.

4. If possible, it'd be interesting to run the SAME test inside of and outside of emulab (I don't know anything about emulab, so I really don't have any advice here).

5. Finally, see if there's any way to instrument your "data" buffer. Valgrind was an excellent idea. Perhaps you can put "canaries" - "sentinel values" - at the start and end of your data buffer and check them each time you read from/write to your buffer?

Feel free to contact me directly via e-mail, or continue posting to this LQ thread.

Good luck!

Your .. PSM

hedpe 02-11-2006 01:05 PM

thanks for your constant suggestions, they are very helpful

what are you suggesting by the "ct" variable? To count the number of fwrites that complete?

paulsm4 02-11-2006 01:22 PM

Yes, exactly. You could also use "ct" to set a conditional breakpoint in gdb (if, for example, it succeeds 99 times, and you don't want to break until just before iteration #100).

hedpe 02-11-2006 01:53 PM

fprintf is definately helping me see more than i had seen before:
Code:

data: 0x80a5602, cl: 0x80a53e4, cl->in_cnt: 507, fp: 0x83d62f0
data: 0x80a5602, cl: 0x80a53e4, cl->in_cnt: 0, fp: 0x83d62f0

with printfs i never saw an instance where cl->in_cnt was 0, there is a case which I need to look over more carefully, will report back, thanks for the suggestion :)

paulsm4 02-11-2006 03:37 PM

One other potential "gotcha" - if DOS/Windows is in the mix - is reading/writing binary data without doing an "fopen (myfile, "a+b")".

The "binary" attribute is a no-op on Linux ... but could cause you MUCH grief on DOS or Windows...

hedpe 02-12-2006 05:09 PM

I found the cause! Well, kind of, I know what exactly its doing now, but i don't know why. We're switching over to a different piece of code, which has new and improved error handling thanks to your suggestion :)

Code:

[gnychis@caching ~]$ cat /local/logs/caching_startcmd.err
Error opening file camera15.tmp: Permission denied
[gnychis@caching Caching]$ ls -l camera15.tmp
-rw-r--r--    1 gnychis  SensorNets    7163 Feb 12 15:57 camera15.tmp

8O

I am actually not sure why it gets permission denied at all yet...

here is the code surrounding the error:
Code:

    /* Append new data to our temporary file */
    sprintf(filename, "%s.tmp", cl->cam->name);
    fp = fopen(filename, "a");

    if(fp == NULL) {
      fprintf(stderr, "Error opening file %s: %s", filename, strerror(errno));
      exit(-1);
    }

    fwrite(data, 1, cl->in_cnt, fp);
    fclose(fp);
    fp = NULL;

That error happened after the code ran for about 15 minutes, and there were about 15 other .tmp files being created and modified without a problem, so its not something that just happened quickly, it works at times. The same process creates the file also. There are also no race conditions for files because it is not a threading program.

if it were a disk space error the error would have returned out of space, correct?

hedpe 02-12-2006 10:02 PM

it turns out it was because my files were on NFS and i suppose during some sort of NFS packet loss or data inconsistency problem, it was seg faulting on fread() fwrite() and fclose()'s ... oh well!

I'm doing the experiments in /tmp instead :-P


All times are GMT -5. The time now is 04:00 AM.