ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
There is no "Error opening file" printed out on the screen, so fd is NOT null.
Here is my backtrace:
Code:
Program received signal SIGSEGV, Segmentation fault.
0x42062d67 in fwrite () from /lib/tls/libc.so.6
#0 0x42062d67 in fwrite () from /lib/tls/libc.so.6
#1 0x0804c042 in cam_data (c=0x804d980, cl=0x80829a0) at caching.c:1355
#2 0x080497cf in parse_command (c=0x804d980, cl=0x80829a0) at caching.c:307
#3 0x080491a2 in check_data (c=0x804d980) at caching.c:144
#4 0x08048e7b in main () at caching.c:52
#5 0x42015704 in __libc_start_main () from /lib/tls/libc.so.6
If any other information would prove to be helpful let me know and I can get it by making it crash again (easily done!)
1. You want to make sure "data" is either a buffer to writeable memory, or it's a pointer that's correctly initialized to point to a writeable buffer, at the time "fwrite" is called
2. "sizeof(char)", of course, should equal "1"
3. You should also check the value of "cl->in_cnt" when the crash occurs, and make sure that it's smaller than your buffer.
4. What's a "writable buffer"? Any array that you've declared locally, declared statically, or allocated via "malloc()" or "new" (and, of course, have *not* inadvertantly deleted before calling "fwrite()".
Step through the debugger and use the "print" (e.g. "p cl->in_cnt") and "dump memory" (e.g. "x/16 data") when the crash occurs to investigate these points.
'Hope that helps .. PSM
PS:
I assume you declared "fd" as "FILE *fd" ("fp" might have been a better choice; "fd" is generally for numeric "file descriptors" instead of stdio "file pointers").
I also assume that "filename" is a character array long enough to hold the actual filename you generated with "sprintf()".
If either assumption is incorrect, that, too might cause fwrite() to crash...
printf("data: 0x%p\tcl->in_buf: 0x%p\tcl->in_cnt before: %d", data, cl->in_buf, cl->in_cnt);
cl->in_cnt = cl->in_cnt - (data - cl->in_buf); /* the number of bytes to write to the file will be less
after stripping out the "numbytes|" part of the data
so we recalculate subtracting pointers */
printf("\tafter: %d\n", cl->in_cnt);
fflush(stdout);
c) Just after your "fclose()" (and after every other pointer you "fclose()" or "free()"):
Code:
fclose (fp);
fp = NULL;
2. The benefit of "fprintf (stderr)" is that you're using unbuffered I/O. Sometimes, frankly, "printf()/fflush (stdout)", doesn't always print out everything you need to see.
3. The "fp" vs "fd" stuff is just an idiom for differentiating between something you "open()" vs something you "fopen()". Just housekeeping. Please humor me.
4. If possible, it'd be interesting to run the SAME test inside of and outside of emulab (I don't know anything about emulab, so I really don't have any advice here).
5. Finally, see if there's any way to instrument your "data" buffer. Valgrind was an excellent idea. Perhaps you can put "canaries" - "sentinel values" - at the start and end of your data buffer and check them each time you read from/write to your buffer?
Feel free to contact me directly via e-mail, or continue posting to this LQ thread.
Yes, exactly. You could also use "ct" to set a conditional breakpoint in gdb (if, for example, it succeeds 99 times, and you don't want to break until just before iteration #100).
with printfs i never saw an instance where cl->in_cnt was 0, there is a case which I need to look over more carefully, will report back, thanks for the suggestion
I found the cause! Well, kind of, I know what exactly its doing now, but i don't know why. We're switching over to a different piece of code, which has new and improved error handling thanks to your suggestion
That error happened after the code ran for about 15 minutes, and there were about 15 other .tmp files being created and modified without a problem, so its not something that just happened quickly, it works at times. The same process creates the file also. There are also no race conditions for files because it is not a threading program.
if it were a disk space error the error would have returned out of space, correct?
it turns out it was because my files were on NFS and i suppose during some sort of NFS packet loss or data inconsistency problem, it was seg faulting on fread() fwrite() and fclose()'s ... oh well!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.