LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Blogs > rtmistler
User Name
Password

Notices


Rate this Entry

Debugging C Programs Using GDB (Including core dumps!)

Posted 12-01-2015 at 02:32 PM by rtmistler
Updated 08-30-2018 at 11:43 AM by rtmistler

The added (including core dumps!) to my title is actually the main reason for me deciding to write this entry.

So very many times I see new programmers who write something, it compiles, so they run it, and then they are stumped because their programming resulted in a segment violation and possibly a core file. And now they're stuck, not knowing why this occurred and also having no real idea how to proceed.

Background - What Likely Led to This State/Situation

You started C programming. First example is Hello World. That works great, you've followed the example and template exactly, and you really would be hard pressed to not do that correctly, and the result is that it all worked.

The next problem is that once you start writing some code to do stuff, be that calculations, storing and processing data, drawing images, playing media, or doing other functions in a computer; you have not yet been given any example about how to debug the eventual problems which may arise your experimentations.

Common problems or oddities which you may encounter would be:
  1. Compilation errors
  2. Compilation warnings
  3. Unexplained results when you run your program
  4. Unexpected infinite loops when you run your program
  5. Crashes, core files, or just errors when you run your program

Expert Craftsmanship Does not Always Beget Perfect Results

A favorite gem from many, may years ago was where I was with a software team and our company had decided to proceed down a path of software design refinement where we adhered to a very strict process of design, review, update, design lower, review, update, code, review, unit test, integration test, and so forth. It was a very involved process, and with many persons who write software for a living, one can tell that these review meetings became lengthy as well as contained many elevated tempers because we were all critiquing each other's work, as well as adapting to changes in the product specifications, ad nausea. And however tiresome it all was, people clearly felt they were measured on results, as well as their contributions to the team during all this, and thus competitiveness at the career level also came into play.

After completing much of this very involved process, a particular engineer was in the lab unit testing their code, which recently passed a detailed code review. They found a few bugs. They found more than a few bugs. Eventually they became frustrated and was heard to exclaim:
Quote:
I can't BELIEVE that there are THIS MANY bugs in this code!!!


It was all reviewed!!!!
Threw up their hands and left the lab to take a break.

The problem there was that this very talented engineer forgot that ultimately "they" are responsible for their code, their designs, and had gone down a path of thinking that "the process" would alleviate virtually any possibility of mistake or errors.

How Does Someone Proceed From This Point?

So now that you have the basics to write and compile a C program, what's all the rest of the stuff you need to know?

First, short answer, is volumes, but also those volumes are more related to the subjects of how to program, how to design software, and so forth. My intentions here are instead to equip you with diagnostic tools, methodologies, and thought processes so that you can code and diagnose that code and be able to do this with fewer blocking problems.

Merely improving your expertise in the following subjects will enhance your capabilities to debug your code.
  1. Using the compiler
  2. Coding defensively
  3. Using the debugger
Using the Compiler

The compiler is the top level interpreter of your code. The intention of the compiler is to follow the rules of the language. The C programming language does have a syntax, and the compiler's job is to interpret what you've written in accordance with what it understands the syntax of the language to be.

The preprocessor allows you to define special terms, or to do things like selectively include/exclude code. For instance you can remove an entire section of code using #if 0 and #endif. Sometimes you may write a whole chunk of new code and find that suddenly your system is totally not working anywhere near your previous manner. While it may be obvious that your newest efforts are the likely problems, what would you do to proceed here? One method would be to take some code out by using the #if 0 directive, or several, and then slowly adding back in your newest efforts until you zero in at the point where your new addition is not working properly with the system as a whole. And remember that things are a problem, because you made a non-obvious error. You may have assumed something was in a certain state where it wasn't, but you aren't realizing that because you may have written so much added new code that some of the specific details about your entire system are not yet fully initialized or in the state you anticipated.

Using compilation success is another method. Ever run into a case where you change one silly line and then suddenly the whole build is broken? Maybe not as simple as one line, perhaps you have a case where you make a change but then find that the change does not get into your final result. I have had the pleasure of working with very extensive sets of code where I've added print statements to illustrate my progress and give me some needed diagnosis at the point where I added the print statements, only to find that I never SEE my print statements in my output. Why? The answer became simple when a peer came over and put a very obvious error in my code right before, or at, my print statement. They changed it from "printf" to be "paintf". And guess what? The compile never broke. In that large body of code I was working with, there were very many preprocessor directives and circumstances were such that the particular code I was debugging because I "thought" it was a potential problem area, was not included in the build at all. While I had frustrated myself sufficiently, in about 2 seconds my peer made me look/feel like an idiot. End result is that this is one of the memories I recall vividly anytime I encounter a situation where something similar occurs. This may seem fundamental at the surface, but do realize that if you work with some extensive code, thousands of lines of source, and cases where a function exists in multiple places, then this situation can occur.

Paying attention to compiler warnings is my #1 recommendation for the world. Using GCC and Make, I keep around a working, "latest" Makefile where I maintain GCC compilation switches. I'm being lengthy enough in this blog, so I won't describe what they all do, I will however post them as my recommendation for your consideration. Note that one of them, -Werror causes warnings to be interpreted as errors and thus warnings will halt compilation from going any further. For me, warning free code as much as I can is key. I still make assumptions and mistakes, the fewer I can have deployed in my code, the better.
Code:
-O0 // This sets optimization off, not a forever recommendation, use selectively
-ggdb // This compiles for GDB debugging capabilities
-Wformat=2
-Werror
-Wall
-Wextra
-Wswitch-default
-Wswitch-enum
-Wstrict-prototypes
-Wmissing-prototypes
-Wmissing-declarations
-Wmissing-noreturn
Coding Defensively

I try to take a minimalist view of my code. No I do not try to write code so very brief that it is near un-readable. What I mean instead is that I like to ensure that I understand my list of #include files, why they are there and whether or not I need them all. Further, and here is where it's important to understand that we're talking mainly about C programming under Linux, when you view the manual page for any given C function, at the top it shows the required include files for that function to properly link. Further, if you need to link with a given library, like pthread or math, the manual page also explains this.

The other good part about reading the manual pages for library functions is that you see the expected behaviors and understand the possible return values. Thus I code in anticipation of all returns and will output debug information if I do encounter an error, also checking the errno value and printing out the errno string. It's not always intuitive exactly what you've done wrong when you get particular errno values, however being cognizant of them at least will start you down the path of understanding where you've done something wrong. For instance, on a socket connect() call you may get an error not because of the connect() arguments, but instead because you never created the socket properly. But being told "no such descriptor" when you have a valid number is frustrating. Likely it was because you saved an old descriptor value on a former socket which you have closed, or just never paid any attention to the output of the original socket creation call.

Preparing your code to detect and deal with error returns or unexpected input, also writing your code to test and qualify the inputs to each function, will go a long way to helping you diagnose problems as you begin to test a complex, and large design of code.

Don't have if statements without an else. Don't leave out default cases in switch statements, but also don't also code the switch statement such that you check one case and then offload all other possibilities to the default case. Further, enumerate for your case statements and your switch variable. Why? Because when you have an enumerated variable type and cases, the compiler will warn you when you ignore or leave out one or more cases, providing you use some of my suggested compilation flags. Try to achieve single-entry/single-exit functions. And do use functions versus one large monolith of code in a single file. There's no problem why you can't write object oriented, manually, without having gone though any form of object oriented analysis and design. If you think about your lowest level of functionality and write helper functions to abstract that interface, then your higher order functions become cleaner and you also have a better test situation because you test and validate the lower function first, get them 100% correct, so that when you use them in the future, you worry less about their performance, providing you have tested them sufficiently.

A final "code defensively" recommendation is to use print, be that printk() for kernel modules, or printf() for normal code. If printing to the console or terminal is not an option, then create a logging function, or macro. For this I offer an example of something I've used many times in the past, which is a macro that accepts variable arguments like printf() and will put the information into a file versus to stdout.

The C source needs to declare the log file handle, but otherwise the H file contains the entire macro. Since I segment functional methods by file in this case, there were no needs to extern or cross reference this macro. I chose instead for each component of the source to have it's own log file, however one can make a universal logging function, or one can even choose to use the Linux system logger. This just merely was my personal choice.

As declared in the C source:
Code:
FILE *appLog;
Macro declared in the H file:
Code:
#define APP_LOG "/home/myuser/myapp/app.log"
#define app_log(format, ...) { appLog = fopen(APP_LOG, "a"); if(appLog != NULL) { fprintf(appLog, format, ##__VA_ARGS__); fclose(appLog); } }
From this point you now use app_log() as if you'd use printf() and the result is everything "printed" gets sent to /home/myuser/myapp/app.log.

Using The Debugger

Once again, I will remind that I'm talking about C and Linux,. compiled with GCC. This also should work similarly with C++ code compiled with G++, just my more common method is C programming.

Long ago I answered a question with the following example code which has an intentional segment violation.
Code:
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    int *bad_addr = NULL;  // Address is set to NULL or 0x00000000

    *bad_addr = 12345; // Assigning data to a bad address is a segment violation, note that this is Line 8 of the source file

    return 0;
}
Nothing special, in fact I didn't compile this with any of my extended flags, just used:
Code:
gcc -ggdb -o segv segv.c
Key Concept:

One needs to ensure that their terminal session will allow a core file dump to occur. Typical Linux terminal sessions do not have this enabled unless the system has been set up to have it be that way for all sessions. Running the ulimit -a command, you will see a report on all user file limits. Core file size is one of these. Typically you'll see a value of zero, this means that core files will not occur. To change this on the given terminal session you are within, you use the first command shown below. Thus after ensuring that my terminal ulimit -c attributes were correct, I ran the executable:
Code:
ulimit -c unlimited
./segv
Segmentation fault (core dumped)
So what do I do now? Well I have a core file, and I've compiled this code using the GDB symbol flag.

So I can debug the core file using the following command:
Code:
gdb segv core
What I see is:
Code:
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/myuser/testcode/segv...done.
[New LWP 24497]
Core was generated by `./segv'.
Program terminated with signal 11, Segmentation fault.
#0  0x080483c4 in main (argc=1, argv=0xbf914914) at segv.c:8
8        *bad_addr = 12345;
(gdb)
As you can see, it's reporting a segmentation violation at segv.c:8, Line #8. My choice at this time could be to view the backtrace, via the 'bt' command. To examine the value of bad_addr, for instance using the print 'p' command:
Code:
(gdb) bt
#0 0x080483c4 in main (argc=1, argv=0xbf914914) at segv.c:8
(gdb) p bad_addr
$1 = (int *) 0x0
At this point I have to use some personal intuition and programmer's knowledge and realize that I'm trying to assign the value 12345 to address 0x0 and this is disallowed.

Even if I don't know, I've been told by the debugger that there's a problem at this line. Being on LQ, I could at least now post all that I've done and that I see this all, but note that I'm stumped, whereupon someone would be able to see the results of your labor and explain why the error occurred.

About Core File Limits

Earlier you noticed that the command "ulimit -c unlimited" was used to enable core file dumping for the given terminal session. Another way to edit this value is to change your /etc/security/limits.conf(5) file so that all sessions will support this. Personally I do not do this, I've never looked into the system level ramifications of doing this to all processes. There are examples shown in the manual page, and for those who may miss the value (because it is not shown in examples), NEGATIVE ONE (-1) designates unlimited. Note also that the wildcard STAR (*) will not designate the root login; hence why most examples show a line using the wildcard * and another line showing root. Here's an example of what I "might" do, but as I've already said, I do not do this and instead just type the command into the terminal at the time I'm running my program:
Code:
*             soft        core     -1
Now what if you wanted to use GDB to run this program and single step through it, or part of it? And what if the program requires arguments? Well, you can, and here are some helpful commands:

From the command line: gdb segv

From within GDB:
Code:
(gdb) set args <value> <value> etc
(gdb) b segv.c:6
Breakpoint 1 at 0x80483ba: file segv.c, line 6.
(gdb) r
Starting program: /home/myuser/testcode/segv

Breakpoint 1, main (argc=1, argv=0xbffff364) at segv.c:6
6        int *bad_addr = NULL;
(gdb) p bad_addr
$3 = (int *) 0x0
(gdb) s
8        *bad_addr = 12345;
(gdb) p bad_addr
$4 = (int *) 0x0
(gdb) s

Program received signal SIGSEGV, Segmentation fault.
0x080483c4 in main (argc=1, argv=0xbffff364) at segv.c:8
8        *bad_addr = 12345;
What the previous shows is me entering the code, it does not run when entering the debugger, it reads the symbols and places the execution point at the start of main().

I chose to put in a breakpoint at my declaration line for bad_addr, which happens to be line #6 of my source file. This is the 'b' command.

Next I ran the program using the run command, 'r'. You see that it runs and hits my breakpoint. Good news is that the "bug" didn't occur prior to my breakpoint. This is a very simplified example, however in much larger code, with loops, function calls, etc, you will likely place your breakpoint at a suspect point in the code, to find yes or no whether or not the bug manifests itself earlier or later than your breakpoint.

Once at the breakpoint I chose to examine the variable bad_addr, using the print command 'p', however an important point to note here is that it was non-sense "at" the breakpoint because that line of code has not run as yet. Therefore examining the variable is purposeless. Single stepping one line, using 's', caused that line to be executed, and thus I now can examine the variable and trust the outcome. You'll note that it is unremarkable. Either the optimizer, or other circumstances led to bad_addr being NULL in advance of my declaration and assignment. Either case I can see that the pointer is 0x0. And unfortunately at that point I have to rely on my knowledge and experience to realize that yes, this is an invalid pointer value. If I do not realize this, then any form of single stepping or continuing my program running will eventually encounter the segment violation on line #8.

Summary

This is not all inclusive. I've abbreviated GDB commands, and only shown a few of them. Further there are options per command sometimes, such as viewing a variable in binary or hexadecimal, or dumping a memory region because it contains an array of information. Note for instance that you can print a structure and the debugger will show you the form of that structure so you understand which elements are what values. Another thing to know is that you can run GDB from within another program. The only one I know of is gnuemacs, which is my personal preferred editor. It will actually perform visual debugging, by way of showing you the source code, an execution line indicator, and also offer you menus to control the debugger. Sometimes that's more work that it's worth. I use GDB to debug something fast, and thus I use the command line.

Happy coding, I hope that this helps.

Suggestions towards better organization or content are encouraged and I'll incorporate those which I agree do enhance this entry.
Views 4820 Comments 0
« Prev     Main     Next »

  



All times are GMT -5. The time now is 02:54 AM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration