LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-27-2009, 01:56 AM   #1
aryan1
Member
 
Registered: Jul 2009
Posts: 50

Rep: Reputation: 16
Question best methodology to debug a Linux daemon written in C++


Hi All,

I have a bunch of C++ application which runs as daemons in Linux.

Furthermore, since these daemons rely on each other, they use IPC (inter-process communication) to communicate with each other.

These daemons are compiled with -g -O0 compiler flags, and started in right order using a shell script.

These daemons seems to have buggy behaviour and "sometimes" crash.

Another reason for it to crash might be corrupted input data that it processes.

To find out about the real cause, I tried both gdb and Valgrind to debug it.

I used --leak-check=yes and --log-file=foo.txt options with Valgrind.

However, Valgrind did not report any invalid write/read or uninitialized variable errors which may have caused the crash. FAQ on Valgrind's official website says that this is the nature of Valgrind that you can not change such that it can not replicate the native execution environment.

What is the best debugger options to use with Valgrind or gdb to debug a Linux daemon and find out exactly what is happening in it at the time of crash ?

Should I attach Valgrind or gdb to already running process ? or Is it ok to start the daemon with Valgrind in shell script ?

Thanks.
 
Old 12-27-2009, 07:18 AM   #2
GooseYArd
Member
 
Registered: Jul 2009
Location: Reston, VA
Distribution: Slackware, Ubuntu, RHEL
Posts: 183

Rep: Reputation: 46
Quote:
Originally Posted by aryan1 View Post
Hi All,


Should I attach Valgrind or gdb to already running process ? or Is it ok to start the daemon with Valgrind in shell script ?

Thanks.
I would probably save valgrind until after you've identified where the crash is happening.

Are the daemons set up to produce a core file when they crash? Rather than attach gdb to all the pids, its probably easiest just to make sure that they dump core when they crash, and then use gdb -f to load the core after the crash. You'll be able to use bt to get a backtrace at the time of the crash, without actually having to start up another instance of the daemon.

Do your daemons run setuid as some other user? If so, it's a little tricky to get the kernel to dump core for you- it requires a slight modification to the code- theres a syscall called prctrl (use options PR_SET_DUMPABLE with value 1), to get it to core.
 
Old 12-27-2009, 11:37 AM   #3
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
If it doesn't crash under valgrind, try under gdb. When it crashes, you can use the 'bt' command and see which path it used. If it crashes only sometimes, however, it would be good to try to get a scenario when it crashes more often. For instance, try to provide incorrect input data.
 
Old 12-27-2009, 11:53 AM   #4
aryan1
Member
 
Registered: Jul 2009
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by Mara View Post
If it doesn't crash under valgrind, try under gdb. When it crashes, you can use the 'bt' command and see which path it used.
Actually, I already tried it under gdb.

However, gdb seems not to be able to catch the operation which causes the application to crash; it says "program exited normally" instead, and "bt" produces no results.

Quote:
If it crashes only sometimes, however, it would be good to try to get a scenario when it crashes more often. For instance, try to provide incorrect input data.
I am not sure if the real cause of the bug is the incorrect input data - that's what I am trying to find out. Since the the amount of test data is big, it is really difficult to come up with a incorrect data pattern.

Maybe the best way is to get the application to generate a core dump, and debug the core dump with either gdb or Valgrind.

What do you think ?
 
Old 12-27-2009, 12:00 PM   #5
aryan1
Member
 
Registered: Jul 2009
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by GooseYArd View Post
Are the daemons set up to produce a core file when they crash?
Actually, I have never set up any application to produce core dump. Yet, I came across "ulimit -c unlimited" command, which lets the application to produce core dumps. Still, the daemon did not produce any core dump.

Quote:
Do your daemons run setuid as some other user?
My daemon first runs fork(), after which setsid() and chroot("/") commands are executed. As far as I know, these are already common operations to create a daemon.

Do you think that my daemon needs the modification that you described ?
 
Old 12-27-2009, 12:10 PM   #6
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
Quote:
Originally Posted by aryan1 View Post
However, gdb seems not to be able to catch the operation which causes the application to crash; it says "program exited normally" instead, and "bt" produces no results.
That's very important! Are you sure it really crashes instead of just exiting? Gdb behavior suggest just exit() or something similar. The direction depends on the source code. Does it use assert() or similar method? If so, is it possible to turn on some debug to show from which point and why it actually exits? Or just overload assert() and friends and add some debug? If not, you my try to use on_exit() and print state of some variable etc you guess may be wrong.
 
Old 12-27-2009, 12:41 PM   #7
aryan1
Member
 
Registered: Jul 2009
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by Mara View Post
That's very important! Are you sure it really crashes instead of just exiting?
Yes, I am sure. This daemon runs in an infinite loop. gdb says "program exited normally". However, when I run it using my shell script, it crashes.

Quote:
Does it use assert() or similar method? If so, is it possible to turn on some debug to show from which point and why it actually exits?
I use syslog to log some important steps in the application. However, as far as I experienced so far, syslog facility is not quite ideal to provide useful info to track down a bug - I send the log message to syslog daemon, and I do not know in detail how it handles these messages.

Let me give you more info on how I start my daemons.

Normally, I start them in order in a shell script as follows:

./deamon1
./daemon2
./daemon3

This way of starting daemons cause a crash in daemon2. (daemon1, 2 and 3 use IPC to communicate to each other)

For debugging, I modify the above script into the following:

./deamon1
gdb daemon2 (or valgrind --leak-check=yes --log-file=test --trace-children=yes daemon2)
./daemon3

Neither gdb nor valgrind identifies any possibly bad operation that may cause a crash.

Somehow, the second way of starting the daemons creates an execution environment that is different from the native execution environment.
 
Old 12-28-2009, 01:26 PM   #8
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
If it looks this way, try to enable core dumps by using
ulimit -c unlimited
Maybe the core file will show something.
 
Old 12-28-2009, 01:48 PM   #9
GooseYArd
Member
 
Registered: Jul 2009
Location: Reston, VA
Distribution: Slackware, Ubuntu, RHEL
Posts: 183

Rep: Reputation: 46
Have you tried setting a breakpoint in exit()? Unless the program is exiting via a syscall to exit, you should be able to break in libc exit() and get a backtrace there.

gdb 7.0 also has some insane capabilities you might like- you can do step backward debugging, or do "catch syscall" if the program is exiting via syscall exit.
 
Old 12-29-2009, 12:51 AM   #10
aryan1
Member
 
Registered: Jul 2009
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by GooseYArd View Post
Have you tried setting a breakpoint in exit()? Unless the program is exiting via a syscall to exit, you should be able to break in libc exit() and get a backtrace there.
Based on the diagnostic messages the application prints out, I can say that it does not exit via a syscall to exit.

How can I set a breakpoint in libc exit() ? How is libc exit() different from a syscall to exit ?
 
Old 12-29-2009, 07:36 AM   #11
GooseYArd
Member
 
Registered: Jul 2009
Location: Reston, VA
Distribution: Slackware, Ubuntu, RHEL
Posts: 183

Rep: Reputation: 46
b exit

will do the trick.
 
Old 12-29-2009, 09:34 AM   #12
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Please do not be angry at this suggestion.

If you have a hunch that a daemon might be crashing because of invalid input data, and there's too much data to comb through, maybe you've already considered the possibility of sketching out, on paper, just what you require of the data for it to be valid, show those requirements to a local friend, and have him look over the parts of your code that check for data validity.

If not, then it's something to think about. Could save a lot of time.
 
Old 12-30-2009, 05:01 AM   #13
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
I personally think multi-process systems integrated across IPC are a nightmare to debug with debuggers. I've had my best luck with good-old fprintf to piece together the chain of events and data states resulting in a crash. One thing leads to another and eventually I get to the bottom of it. This is an effective method, even with hundreds of source files across 10 or 20 libraries and as many programs, so long as you know your sources well. Knowing where the crash is is only half of it.
Kevin Barry
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to get Openssh sshd daemon debug information on Solaris 10? johncsl82 Solaris / OpenSolaris 3 09-01-2009 07:49 AM
LXer: Net Applications Changes Methodology: Windows & Linux Market Share Rises LXer Syndicated Linux News 0 08-07-2009 11:30 AM
Question of methodology taboma Linux - General 3 05-12-2009 12:28 AM
any idea how debug the daemon process with gdb ? jujose Linux - General 1 10-24-2008 02:22 AM
Windows methodology nowonmai Linux - General 4 12-15-2003 10:11 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration