What do you use to analyse a C/C++ codebase?

dugan · 12-23-2015, 06:15 PM

What tools do you use to analyze and understand C and C++ projects? So far I've only been using cscope and ack (a grep-like tool).

NoStressHQ · 12-23-2015, 08:24 PM

Quote:

Originally Posted by dugan

What tools do you use to analyze and understand C and C++ projects? So far I've only been using cscope and ack (a grep-like tool).

Hard to find a free tool for that. I use Doxygen sometimes, need some tweaking but it ca do some nice doc with basic class diagram.

But at the end I often do "human crawling" as it's still more efficient (reading the code and take some notes of the important components. Basic reverse engineering but with the source code, still easier that disassembling

.

Good luck.

Garry.

Ztcoracat · 12-23-2015, 08:33 PM

Maybe try NCC?

http://ask.slashdot.org/story/08/01/...rstanding-code
http://codist.tripod.com/

"Understand" is a static analysis tool for maintaining, measuring, and analyzing:-
http://alternativeto.net/software/codenavigator/

There are some tools on this page 'similar to grep'. Not sure if they are of any use.
http://www.gnu.org/software/global/links.html

Gave it my best-

HTH

ttk · 12-23-2015, 08:41 PM

I use find(1), grep(1), gdb(1) and sometimes make notes in a text editor, particularly to collapse excessive layers of abstraction -- I'll put a function or structure type name at indentation level 0, paste in the points of interest at level 1, and then follow the calls and put their points of interest at indentation level 2, and then follow their calls etc. So I end up with all of the points of interest and their dependencies on a single page, each line annotated with file names and notes. This usually gives me a much clearer idea of what's going on.

I'll also mark up code with comments as I discover things, else I'll forget them (if not that very session, then days or weeks or months later).

Since I learned programming via Wirth's data structure centric philosophy, I'll often start by finding the data types which interest me, and then grep around for everywhere those types are used. If I can understand the code as transformations of data, then remembering every detail of the code's implementation is less important, and I can think about the program in terms of intention.

This is a lot easier with C than with Python, which will often interpose ludicrous amounts of indirection between types and their use. Sometimes it's impossible to tell from looking at Python code what type a variable will reference, and I have to insert logging statements and actually run it to learn more. This is seldom necessary with C, and when it is, gdb works great for exploring runtime state.

Richard Cranium · 12-23-2015, 09:51 PM

Quote:

Originally Posted by ttk

This is a lot easier with C than with Python, which will often interpose ludicrous amounts of indirection between types and their use. Sometimes it's impossible to tell from looking at Python code what type a variable will reference, and I have to insert logging statements and actually run it to learn more. This is seldom necessary with C, and when it is, gdb works great for exploring runtime state.

Actually, Python's classes are little more than suggestions. It is absolutely possible to pick an object and replace the implementation of any or all of its methods with unique code.

Not many people actually do that, but it certainly can be done. I've done it (snip from some code that I wrote quite a while back)...

Code:

config_file = ConfigParser.ConfigParser()
# The next line replaces the optionxform method with the str()
# function. Wild. That (BTW) makes the options lookup case sensitive.
config_file.optionxform = str

That actually makes unit testing and code coverage very important with Python code.

Richard Cranium · 12-23-2015, 09:58 PM

And to provide an actual comment for the OP (sorry for the derail above), you could use Emacs with CEDET and ECB (see https://www.logilab.org/blogentry/173886 for some discussion as well as http://cedet.sourceforge.net/ and http://ecb.sourceforge.net/screenshots/index.html). When I was still doing C++ development, I used ebrowse (https://www.gnu.org/software/emacs/m...ode/ebrowse/); however that was almost 16 years ago.

Eclipse appears to support C/C++ projects as well as Netbeans.

a4z · 12-24-2015, 01:53 AM

the documentation if exists, the with coming samples if exist, an IDE with proper code navigation, nearly all these days but eclipse CDT is very good, a debugger, ask the author.
and simply work with the code, it's always the same, at the begin you think WTF, than after some pain it becomes better, and if you don't give up after some time and more pain you will start understand it or find out that it is a organically grown mass of lines of code that accidental do something

astrogeek · 12-24-2015, 02:49 AM

I am with these guys and gals...

Quote:

Originally Posted by NoStressHQ

...I often do "human crawling" as it's still more efficient (reading the code and take some notes of the important components. Basic reverse engineering but with the source code...

Quote:

Originally Posted by Ztcoracat

Gave it my best-

HTH

Which I will slightly transform into, "Give it my best"!

Quote:

Originally Posted by ttk

I use find(1), grep(1), gdb(1) and sometimes make notes in a text editor, particularly to collapse excessive layers of abstraction -- [description of notation methods] ...So I end up with all of the points of interest and their dependencies on a single page, each line annotated with file names and notes. This usually gives me a much clearer idea of what's going on.

I'll also mark up code with comments as I discover things, else I'll forget them (if not that very session, then days or weeks or months later).

...grep around... If I can understand the code as transformations of data, then remembering every detail of the code's implementation is less important, and I can think about the program in terms of intention.

NOTE: I do this but didn't know it was a method with a name!

Which I will summarize as follows: Generate well organized, useful notes to serve as a kind of model, comment code inline when you actually come to understand it, and try to understand what it does before worrying about how it does it!

Quote:

Originally Posted by a4z

...simply work with the code, it's always the same, at the begin you think WTF, than after some pain it becomes better, and if you don't give up after some time and more pain you will start understand it or find out that it is a organically grown mass of lines of code that accidental do something

I never adapted to using an IDE and find the *nix built-ins preferable for most of my own purposes - so vim, find, grep, etc... and I make lots of notes!

For other people's code I usually try to work up a few simple annotated UML-type diagrams, mostly use case, class and sequence, sufficient to cover my area of interest - rarely more complete. This is the only way I can maintain the continuity for more than single session!

For my own projects I have found that it is always worthwhile to start with similar simple, but more complete, modeling diagrams and notes, and to keep them in sync with the real world as I write code.

Diagnostics, mostly an interactive style of debugging, inserting messages and break points. And valgrind to keep memory leaks and execution choke points under control. I use gdb at times but usually find I get by with my own ecclectic collection of habits and methods.

My own most valuable tips:

* Organize project spaces from a shell, never a graphical interface, within a purpose built directory structure.
* Use a terminal mux like screen or tmux to allow efficient edit, search, reference operations using those wonderful *nix built-ins, always at your fingertips.
* Write your own makefiles at least in early project phases - it has a terrific focusing effect! Later if you use autotools or other makefile generator framework you can make it do what you want instead of spending several days figuring out what it wants you to do!
* Write code and validate against your model in smallish, well defined increments - you never get lost that way!
* Write and maintain test cases and code in parallel with the project code - always!

I find these things put my code much more clearly in mind and speed development literally at every keystroke.

dugan · 12-24-2015, 11:55 AM

I've just discovered codequery and I'll be trying it out soon. I like the fact that it uses cscope as part of its backend.

dugan · 12-24-2015, 12:15 PM

Quote:

Originally Posted by Richard Cranium

That actually makes unit testing and code coverage very important with Python code.

Oh, absolutely. Having 100 unit test coverage but no asserts, for a Python codebase, is as good as compiling a static-language codebase.

It's also true that one of the tradeoffs with dynamically typed languages is that static-analysis tools are much less powerful. That's something I would consider when choosing between go and nodejs.

Richard Cranium · 12-24-2015, 01:28 PM

You can get the worst of both worlds with Groovy.

pan64 · 12-24-2015, 04:39 PM

yes, mainly the documentation and probably the original creators/owners may give you helpful answers. Anyway, it depends on the size, timeframe, the costs and the reasons too.

tronayne · 12-26-2015, 09:57 AM

Because I'm old and been doing this for a long, long time I have a fondness for lint.

lint? Yeah, lint. Ain't no lint in Linux but there is splint. Oldie, goodie, freebie, works well enough for me: http://www.splint.org/

Builds clean in Slackware 32- and 64 bit, runs just fine in Slackware 64-bit 14.1.

Yammers at you about everything, just like Unix lint does.

Here's an example, a function for convert degrees, minutes, and seconds to decimal degrees (for map making):

Code:

cat dms2deg.c

#ident	"$Id: dms2deg.c,v 1.1.1.1 2009/10/07 17:59:37 trona Exp $"

/*
 *	Copyright (C) 2000-2009 Thomas Ronayne
 *
 *	This program is free software; you can redistribute it and/or
 *	modify it under the terms of version 2 of the GNU General
 *	Public License as published by the Free Software Foundation.
 *
 *	This program is distributed in the hope that it will be useful,
 *	but WITHOUT ANY WARRANTY; without even the implied warranty of
 *	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 *	General Public License for more details.
 *
 *	You should have received a copy of the GNU General Public
 *	License along with this program; if not, write to the Free
 *	Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
 *	MA 02111-1307, USA.
 *
 *	Name:		$Source: /usr/local/cvsroot/gnis/dms2deg.c,v $
 *	Purpose:	convert DDD MM SS to decimal degrees
 *	Version:	$Revision: 1.1.1.1 $
 *	Modified:	$Date: 2009/10/07 17:59:37 $
 *	Author:		T. N. Ronayne
 *	Date:		21 Jul 2009
 *	$Log: dms2deg.c,v $
 *	Revision 1.1.1.1  2009/10/07 17:59:37  trona
 *	initial installation Slackware 13.0
 *	
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include "gnis.h"

#ifndef	TRUE
#	define	TRUE	1
#endif
#ifndef	FALSE
#	define	FALSE	0
#endif

void	main	(int argc, char *argv [])
{
	char	where [2];		/* N S E W			*/
	int	c;			/* general-purpose		*/
	int	error = FALSE;		/* error flag			*/
	int	vopt = FALSE;		/* verbose option		*/
	double	degree = 0.0;		/* degree			*/
	double	minute = 0.0;		/* minute			*/
	double	second = 0.0;		/* second			*/
	time_t	t0 = (time_t) 0;	/* start time			*/
	time_t	t1 = (time_t) 0;	/* finish time			*/
	FILE	*in;

	/*	process the command line arguments			*/
	while ((c = getopt (argc, argv, "?d:m:s:w:v")) != EOF) {
		switch (c) {
		case '?':
			error = TRUE;
			break;
		case 'd':
			degree = strtod (optarg, (char **) NULL);
			break;
		case 'm':
			minute = strtod (optarg, (char **) NULL);
			break;
		case 's':
			second = strtod (optarg, (char **) NULL);
			break;
		case 'w':
			(void) strcpy (where, optarg);
			break;
		case 'v':
			vopt = TRUE;
			break;
		default:
			(void) fprintf (stderr, "getopt() bug\n");
			exit (EXIT_FAILURE);
		}
	}
	/*	any errors in the arguments, or a '?' entered...*/
	if (error) {
		(void) fprintf (stderr, "usage: s [-v] argument...\n",
		    argv [0]);
		exit (EXIT_FAILURE);
	}
	/*	get a start time				*/
	if (time (&t0) < (time_t) 0)
		(void) fprintf (stderr,
		    "%s:\tcan't read system clock\n", argv [0]);
	(void) fprintf (stdout,
	     "%.0lf:%.0lf:%.0lf %s = %.8lf\n",
	     degree, minute, second, where,
	     dmsdeg (degree, minute, second, where[0]));
	/*	get a finish time			*/
	if (time (&t1) < (time_t) 0)
		(void) fprintf (stderr,
		    "%s:\tcan't read system clock\n", argv [0]);
	if (vopt)
		(void) fprintf (stderr,
		    "%s duration %g seconds\n",
		    argv [0], difftime (t1, t0));
	exit (EXIT_SUCCESS);
}

Here's what splint says about it:

Code:

splint dms2deg.c
Splint 3.1.2 --- 26 Dec 2015

dms2deg.c:36: Include file <unistd.h> matches the name of a POSIX library, but
    the POSIX library is not being used.  Consider using +posixlib or
    +posixstrictlib to select the POSIX library, or -warnposix to suppress this
    message.
  Header name matches a POSIX header, but the POSIX library is not selected.
  (Use -warnposixheaders to inhibit warning)
dms2deg.c:47:6: Function main declared to return void, should return int
  The function main does not match the expected type. (Use -maintype to inhibit
  warning)
dms2deg.c: (in function main)
dms2deg.c:58:8: Variable in shadows outer declaration
  An outer declaration is shadowed by the local declaration. (Use -shadow to
  inhibit warning)
   gnis.h:159:7: Previous definition of in: FILE *
dms2deg.c:87:6: Test expression for if not boolean, type int: error
  Test expression type is not boolean or int. (Use -predboolint to inhibit
  warning)
dms2deg.c:99:39: Array element where[0] used before definition
  An rvalue is used that may not be initialized to a value on some execution
  path. (Use -usedef to inhibit warning)
dms2deg.c:104:6: Test expression for if not boolean, type int: vopt
dms2deg.c:58:8: Variable in declared but not used
  A variable is declared but never used. Use /*@unused@*/ in front of
  declaration to suppress message. (Use -varuse to inhibit warning)

Finished checking --- 7 code warnings

Dang, I gotta clean that thing up.

Hope this helps some.

NoStressHQ · 12-26-2015, 10:49 AM

Quote:

Originally Posted by tronayne

Because I'm old and been doing this for a long, long time I have a fondness for lint.

I think the OP asks about "symbolic analysis" to "browse" the source and "understand" it... Not the static code analysis to find bugs

.

rtmistler · 12-30-2015, 08:06 AM

Eyeballs and fingers.

The larger a code base is, the more disorganized it is and the more inactive or unused code exists, or the more optional features exist.

Some very serious project, like the Linux kernel, follows a typical organizational pattern.

If there's some huge code base that has no documentation, no general organization, then it's rather difficult and risky to consider using that.

I'm talking about a case where someone comes to me as a developer and suggests that we use some base of code to produce a serious result. The larger the project, the more time expected to be put into it. Starting with some huge base of code that is unproven, undocumented, and not well enough organized to be figured out in a few hours of browsing is not worth anyone's time.

On the other hand, I've purchased source intended for use as part of a solution and in evaluating the product, or engaging it in use, the ones which have gone the best are the exact opposite of the negative things I'm talking about here. They are documented, they have test and validation harnesses, they have comments, and they follow a general organizational method.

Therefore whatever code viewer/organizer works best for you.

I'd also try and build it to see how many errors and warnings there are.