LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Recognizing a (seemingly) random crashes (https://www.linuxquestions.org/questions/programming-9/recognizing-a-seemingly-random-crashes-539177/)

g4j31a5 03-20-2007 08:35 PM

Recognizing a (seemingly) random crashes
 
My application has a (seemingly) random crashes now and then. Sometimes, I can get it to run 3 days straight without encountering it. But occasionally, when I only left it for a few hours it gets crashed. I don't know why it crashed. I can't even reproduce the crash when I wanted to. How do I trap and debug the application for solving that random crashes? Any idea? Thanks in advance.

nacio 03-21-2007 07:12 AM

Are you working in C or C++ under linux? Is the crash a segmentation fault? I assume yes because that's the typical situation that I would expect in this forum, tell me if this is not the case.

You should make sure to compile your app giving the -g flag to gcc, and run the command ulimit -c 1024 before launching your application. Next time your program crashes it will leave a file named 'core' containing debug info.

You may then run the command gdb <your executable file> core, it will show you where it crashed. Additionally, you may type the command bt inside gdb to see what there was in the stack at the moment of crashing. It will print the names of all functions invoked and the values of their parameters.

Hope it helps:rolleyes:

g4j31a5 03-21-2007 11:02 PM

Yes I'm using C++ with SDL library. I don't know whether it's a segmentation faults or not because it freezes on the SDL (or maybe the X server). My application has a loop for drawing and event handler routines and it recognized the ESC button to exit the application. But when it crashes, I can't even exit it by using the ESC button. Unfortunately, the application runs in full screen, so the terminal console didn't even show. Maybe I'll try using the core file. Thanks alot.

nacio 03-22-2007 07:07 AM

Yeah, I've also had problems when running SDL applications in full-screen mode. It appears that keyboard input focus is set to no window so keyboard actions do nothing.

It's difficult to know wether your application actually exits or not. If your problem is a segmentation fault then it should terminate, otherwise you should suspect about an infinite loop or something alike. I know, it's not easy because you see nothing on screen.

Next time it happens try switching to a virtual console by pressing Ctrl-Alt-F1, you should get a working text mode terminal where you can use ps to see if your program is still running. You can also kill your program or the X server (it should automatically restart in most distributions), maybe it gives you control back.

Another good advise is to read something about gdb or it's front-end 'ddd'. You will need to set up breakpoints and run step-by-step if you have indeed an infinite loop.

g4j31a5 04-15-2007 11:21 PM

Quote:

Originally Posted by nacio
Yeah, I've also had problems when running SDL applications in full-screen mode. It appears that keyboard input focus is set to no window so keyboard actions do nothing.

It's difficult to know wether your application actually exits or not. If your problem is a segmentation fault then it should terminate, otherwise you should suspect about an infinite loop or something alike. I know, it's not easy because you see nothing on screen.

Next time it happens try switching to a virtual console by pressing Ctrl-Alt-F1, you should get a working text mode terminal where you can use ps to see if your program is still running. You can also kill your program or the X server (it should automatically restart in most distributions), maybe it gives you control back.

Another good advise is to read something about gdb or it's front-end 'ddd'. You will need to set up breakpoints and run step-by-step if you have indeed an infinite loop.

Sorry for the late response. Been busy doing some other issue. True, it should've been terminated if it was a segmentation fault. But then again if it was an infinite loop, the event handler should be working. The crash always occur when it was waiting for an SDL thread to finish. Maybe I'll just have to try creating the core dump file.

Thanks.

g4j31a5 05-06-2007 10:46 PM

Sorry to bump into this pretty old thread of mine. The reason is, the random crash is still happening. Even worst, now it's occasionally freezes.

The crash usually happens inside an SDL_Thread. The thread is for printing with LaTeX and CUPS while the main application does some sort of animation and progress bar. This is what's inside the thread's function:
Code:

int CMainApp::printNow(void *data)
{
    /// @todo implement me
  CMainApp *pMyApp=static_cast<CMainApp*>(data);
  pMyApp->createTempLatex(); //Create temporary latex file
  pMyApp->createTempPostScript(); //Create temporary postscript file
  pMyApp->printPostScriptFile(); //Print with CUPS
  return 1;
}

The createTempLatex() is only a function that create a LaTeX file as an output like this:

Code:

void CMainApp::createTempLatex()
{
  fstream filestr;
  filestr.open ("./temp-print.tex", fstream::out);
  fstream<<"\\begin{document}"<<endl;
  .....  //The body of the LaTeX file
  fstream<<"\\end{document}"<<endl;
  filestr.close();
}

And inside the createTempPostScript() function there's 2 system calls for converting the Latex to a DVI file, and a DVI to PS file like this:

Code:

void CMainApp::createTempPostScript()
{
    system("latex temp-print.tex -halt-on-error");
    system("dvips temp-print -Pcmz -t landscape -o temp-print.ps");
}

And the printPostScriptFile() is for sending the PS file to the CUPS spooling queue for printing.

I don't know what's wrong with it but this thread function randomly generates crashes. Often it worked fine until few printings. But every now and then it will crash. And the crash is always in this printing routine.

The crash itself is not always a segmentation error. This is what usually
happen if the application crashed:
1. Back to terminal with segmentation error
2. Hangs up. The animation and progress bar are not working. The event
handler also not working. It's like the system freezes out.
3. Infinite loop. The event handler, animation, and progress bar are
working. But because the application needed the thread to finish before
going to the next state, it will stay inside the current state (printing
state). This always happens if the thread is still doing its stuffs in one
of the system calls yet SDL_WaitThread() has been called.

Can anybody give me some help here? How do you usually solve this kind of
problem? Thanks in advance.

graemef 05-07-2007 09:29 PM

I notice that you are not checking to see if the system calls are successful. If for some reason the (for example) latex call fails then the temp file may not be generated, and may cause a problem down the line...
When you get a segfault which line does it occur in?

nacio 05-08-2007 10:56 AM

I agree that checking errors is necessary. Not only the system() calls may fail, but also the creation of the temp file.
In his book The unix programming environment, Kernighan says every single error generated by every single call that may fail must be checked. This greatly improves debugging time, and also helps the user in solving problems (eg. disk full, file permissions, etc).

BTW, when you wrote:
Code:

fstream<<"\\begin{document}"<<endl;
didn't you mean filestr instead of fstream?

Another thing that I would do is adding some debug output:
Code:

#define DEBUG

int CMainApp::printNow(void *data)
{
    /// @todo implement me
  CMainApp *pMyApp=static_cast<CMainApp*>(data);
  pMyApp->createTempLatex(); //Create temporary latex file
#ifdef DEBUG
  cerr <<"Latex file created\n";
#endif
  pMyApp->createTempPostScript(); //Create temporary postscript file
#ifdef DEBUG
  cerr <<"PostScript file created\n";
#endif
  pMyApp->printPostScriptFile(); //Print with CUPS
#ifdef DEBUG
  cerr <<"File printed\n";
#endif
  return 1;
}

When you finish debugging your application, just remove the #define line, it'll be the same as removing all the debug output. This way at last you can know where your applicaton got caught in the infinite loop.

Now I'll try to guess something :p I've experienced segmentation faults when trying to write to a file that hadn't been successfully open.

Good luck.

g4j31a5 05-09-2007 10:54 PM

@nacio
Oops, right that one is a typo. It should be filestr. Silly me :D

@graemef
Well, I forgot which one but I think it's definitely in one of the lines in the thread function.

Yeah, both of you are right. Actually lots more code that didn't got error checked. Mostly because I was sure this won't generate an error (and because I've gotten a little lazy). The ones I did error check were the ones that were changed a lot (eg. the image assets loading for SDL, opening & manipulating data files, eg.) and more error prone.

Okay, maybe that would've solved the seg fault issue. But what about if the freezes / hangs? One of the random occurrences was random freezes. I think it's not a memory leak issue because I've tried testing the application for 3 days straight non stop, but the application worked fine. But when my boss tried it, the application freezes even when it was only turned on for a few hours. The animation freezes, the event handler didn't work, the application was just pain dead. What I can't figure out was what triggered the randomness? What I did was the same as what my boss did. Now I am just trying to recreate the crash.

Thanks a lot guys.

graemef 05-10-2007 05:01 PM

Just a thought but what processes are running when it freezes? Can you determine if it is stuck in a system() called process?

g4j31a5 05-13-2007 09:16 PM

Well, just the other day it was crashed again. Right after I valgrind-ed it, I retried executing it again. But it crashed even when it hasn't entered the main loop yet (or it has entered it but freezed). Just a blank screen. And because the keyboard wasn't working also, I just reboot it win the on/off switch. And after that I retried it again fresh from the new boot up. The application worked for a while but this morning I found it freezed again. This time it wasn't in the system call / another thread. But rather the main GUI animation thread (the GUI with SDL before printing with LaTeX system calls). And as usual, I just reboot it again with on / off switch. So I think the seg faults are always in the printing thread (especially in the LaTeX system calls), but the system freezes can happen pretty much anywhere in the application.

BTW, can a freeze like that generate a core dump files?

paulsm4 05-13-2007 11:19 PM

Actually, a "hang" might be even better for you than a core file.

GDB allows you to attach to a live process.

One you've attached, you can use "where" to get a traceback and determine exactly where the hang is occurring.

Here are two links that explain further:

http://www-128.ibm.com/developerwork...ix-strace.html

http://www.network-theory.co.uk/docs...cintro_76.html

'Hope that helps .. PSM

g4j31a5 05-16-2007 03:45 AM

Thanks, will look at those links. I'm still a newb concerning bug hunting like this. Especially in Linux. Thanks a lot.


All times are GMT -5. The time now is 07:52 PM.