Recognizing a (seemingly) random crashes
My application has a (seemingly) random crashes now and then. Sometimes, I can get it to run 3 days straight without encountering it. But occasionally, when I only left it for a few hours it gets crashed. I don't know why it crashed. I can't even reproduce the crash when I wanted to. How do I trap and debug the application for solving that random crashes? Any idea? Thanks in advance.
|
Are you working in C or C++ under linux? Is the crash a segmentation fault? I assume yes because that's the typical situation that I would expect in this forum, tell me if this is not the case.
You should make sure to compile your app giving the -g flag to gcc, and run the command ulimit -c 1024 before launching your application. Next time your program crashes it will leave a file named 'core' containing debug info. You may then run the command gdb <your executable file> core, it will show you where it crashed. Additionally, you may type the command bt inside gdb to see what there was in the stack at the moment of crashing. It will print the names of all functions invoked and the values of their parameters. Hope it helps:rolleyes: |
Yes I'm using C++ with SDL library. I don't know whether it's a segmentation faults or not because it freezes on the SDL (or maybe the X server). My application has a loop for drawing and event handler routines and it recognized the ESC button to exit the application. But when it crashes, I can't even exit it by using the ESC button. Unfortunately, the application runs in full screen, so the terminal console didn't even show. Maybe I'll try using the core file. Thanks alot.
|
Yeah, I've also had problems when running SDL applications in full-screen mode. It appears that keyboard input focus is set to no window so keyboard actions do nothing.
It's difficult to know wether your application actually exits or not. If your problem is a segmentation fault then it should terminate, otherwise you should suspect about an infinite loop or something alike. I know, it's not easy because you see nothing on screen. Next time it happens try switching to a virtual console by pressing Ctrl-Alt-F1, you should get a working text mode terminal where you can use ps to see if your program is still running. You can also kill your program or the X server (it should automatically restart in most distributions), maybe it gives you control back. Another good advise is to read something about gdb or it's front-end 'ddd'. You will need to set up breakpoints and run step-by-step if you have indeed an infinite loop. |
Quote:
Thanks. |
Sorry to bump into this pretty old thread of mine. The reason is, the random crash is still happening. Even worst, now it's occasionally freezes.
The crash usually happens inside an SDL_Thread. The thread is for printing with LaTeX and CUPS while the main application does some sort of animation and progress bar. This is what's inside the thread's function: Code:
int CMainApp::printNow(void *data) Code:
void CMainApp::createTempLatex() Code:
void CMainApp::createTempPostScript() I don't know what's wrong with it but this thread function randomly generates crashes. Often it worked fine until few printings. But every now and then it will crash. And the crash is always in this printing routine. The crash itself is not always a segmentation error. This is what usually happen if the application crashed: 1. Back to terminal with segmentation error 2. Hangs up. The animation and progress bar are not working. The event handler also not working. It's like the system freezes out. 3. Infinite loop. The event handler, animation, and progress bar are working. But because the application needed the thread to finish before going to the next state, it will stay inside the current state (printing state). This always happens if the thread is still doing its stuffs in one of the system calls yet SDL_WaitThread() has been called. Can anybody give me some help here? How do you usually solve this kind of problem? Thanks in advance. |
I notice that you are not checking to see if the system calls are successful. If for some reason the (for example) latex call fails then the temp file may not be generated, and may cause a problem down the line...
When you get a segfault which line does it occur in? |
I agree that checking errors is necessary. Not only the system() calls may fail, but also the creation of the temp file.
In his book The unix programming environment, Kernighan says every single error generated by every single call that may fail must be checked. This greatly improves debugging time, and also helps the user in solving problems (eg. disk full, file permissions, etc). BTW, when you wrote: Code:
fstream<<"\\begin{document}"<<endl; Another thing that I would do is adding some debug output: Code:
#define DEBUG Now I'll try to guess something :p I've experienced segmentation faults when trying to write to a file that hadn't been successfully open. Good luck. |
@nacio
Oops, right that one is a typo. It should be filestr. Silly me :D @graemef Well, I forgot which one but I think it's definitely in one of the lines in the thread function. Yeah, both of you are right. Actually lots more code that didn't got error checked. Mostly because I was sure this won't generate an error (and because I've gotten a little lazy). The ones I did error check were the ones that were changed a lot (eg. the image assets loading for SDL, opening & manipulating data files, eg.) and more error prone. Okay, maybe that would've solved the seg fault issue. But what about if the freezes / hangs? One of the random occurrences was random freezes. I think it's not a memory leak issue because I've tried testing the application for 3 days straight non stop, but the application worked fine. But when my boss tried it, the application freezes even when it was only turned on for a few hours. The animation freezes, the event handler didn't work, the application was just pain dead. What I can't figure out was what triggered the randomness? What I did was the same as what my boss did. Now I am just trying to recreate the crash. Thanks a lot guys. |
Just a thought but what processes are running when it freezes? Can you determine if it is stuck in a system() called process?
|
Well, just the other day it was crashed again. Right after I valgrind-ed it, I retried executing it again. But it crashed even when it hasn't entered the main loop yet (or it has entered it but freezed). Just a blank screen. And because the keyboard wasn't working also, I just reboot it win the on/off switch. And after that I retried it again fresh from the new boot up. The application worked for a while but this morning I found it freezed again. This time it wasn't in the system call / another thread. But rather the main GUI animation thread (the GUI with SDL before printing with LaTeX system calls). And as usual, I just reboot it again with on / off switch. So I think the seg faults are always in the printing thread (especially in the LaTeX system calls), but the system freezes can happen pretty much anywhere in the application.
BTW, can a freeze like that generate a core dump files? |
Actually, a "hang" might be even better for you than a core file.
GDB allows you to attach to a live process. One you've attached, you can use "where" to get a traceback and determine exactly where the hang is occurring. Here are two links that explain further: http://www-128.ibm.com/developerwork...ix-strace.html http://www.network-theory.co.uk/docs...cintro_76.html 'Hope that helps .. PSM |
Thanks, will look at those links. I'm still a newb concerning bug hunting like this. Especially in Linux. Thanks a lot.
|
All times are GMT -5. The time now is 07:52 PM. |