[SOLVED] C++ map segfaulting

MTK358 · 04-10-2011, 08:26 PM

I'm having yet another big problem in my interpreter project that I'm unable to figure out. In this function, the call to map.find() causes a segfault:

Code:

Table* Table::findParentWithVar(char *name)
{
	if (map.find(name) <-- segfault here != map.end())
		return this;
	
	std::vector<Table*>::iterator i;
	for (i=parents.begin(); i!=parents.end(); i++) {
		Table *temp = (*i)->findParentWithVar(name);
		if (temp)
			return temp;
	}
	
	return NULL;
}

I don't have the slightest clue why. Items were successfully added to the same map before, and it couldn't be a NULL pointer problem becasue the map is stored directly in the class, not as a pointer.

Sergei Steshenko · 04-11-2011, 01:30 AM

Quote:

Originally Posted by MTK358

I'm having yet another big problem in my interpreter project that I'm unable to figure out. In this function, the call to map.find() causes a segfault:

Code:

Table* Table::findParentWithVar(char *name)
{
	if (map.find(name) <-- segfault here != map.end())
		return this;
	
	std::vector<Table*>::iterator i;
	for (i=parents.begin(); i!=parents.end(); i++) {
		Table *temp = (*i)->findParentWithVar(name);
		if (temp)
			return temp;
	}
	
	return NULL;
}

I don't have the slightest clue why. Items were successfully added to the same map before, and it couldn't be a NULL pointer problem becasue the map is stored directly in the class, not as a pointer.

You have to check your assumptions. For example, 'map.find(name)' looks like a function call. If so, then check the function address at a place where you are sure it is correct (e.g. just after creating the corresponding class instance) and just before the segfault. The addresses should be the same.

johnsfine · 04-11-2011, 07:21 AM

Quote:

Originally Posted by MTK358

it couldn't be a NULL pointer problem becasue the map is stored directly in the class, not as a pointer.

You should learn to use gdb to look at the details of a seg fault.

From what you showed it could easily be a NULL pointer problem: name could be null or this could be null.

It is also possible the map is corrupted by some previous bug.

MTK358 · 04-11-2011, 08:10 AM

Quote:

Originally Posted by johnsfine

From what you showed it could easily be a NULL pointer problem: name could be null or this could be null.

I did use GDB, and neither this nor name are NULL.

Quote:

Originally Posted by johnsfine

It is also possible the map is corrupted by some previous bug.

How would I find that out?

johnsfine · 04-11-2011, 08:25 AM

With GDB you should be able to see exactly where the seg fault occurs.

Are you using an optimized build or an ordinary debug build?

You told us where the seg fault occurs, but I don't know how exactly you checked that. Is it on the actual code you indicated or is it in some function called by that code?

In an optimized build, some functions called by that code might be inlined, so bugs such as a bad name pointer would seg fault apparently in that code rather than in called code.

If it is on that actual code, that should indicate that the this pointer is bad. If find were a virtual function (which I expect it isn't) then a vtable pointer might be bad or the find pointer in a vtable might be bad. But even the symptom of a bad vtable pointer is more likely caused by a bad this pointer.

I always look at the asm code around the point of the seg fault. You should learn how to display that asm code in GDB. If you don't know enough asm to learn anything from that asm code, you can post it here for help.

In x86_64, to understand this kind of seg fault, you also need to look at the general registers: rdi, rsi, rax, etc.

In x86 (32 bit) you need to look at the top several values on the stack.

dwhitney67 · 04-11-2011, 08:36 AM

As Sergei mentioned, re-verify your assumptions; for example, you should check the iterator value to assure that it is not pointing to NULL or perhaps to an address of a Table object that may have been previously deleted.

Code:

Table *temp = (*i)->findParentWithVar(name);

MTK358 · 04-11-2011, 08:41 AM

Quote:

Originally Posted by johnsfine

With GDB you should be able to see exactly where the seg fault occurs.

Are you using an optimized build or an ordinary debug build?

You told us where the seg fault occurs, but I don't know how exactly you checked that. Is it on the actual code you indicated or is it in some function called by that code?

I found the segfault using GDB and it was compiled with optimization turned off.

Here's something I just found out:

It was crashing when interpreting this AST:

Code:

new CallNode(new MemberNode(new IntegerNode(5) , "print"), new NodeCallParamList())

(MemberNode calls Table::get, which calls Table::getParentWithVar, which calls map::find(), and that's where the segfault happens)

But it doesn't crash with this one, which contains the part that crashes in the previous AST:

Code:

new MemberNode(new IntegerNode(5) , "print")

The problem is that I don't see anything wrong with CallNode that would cause this. All it does is call node->eval(scope) just like main() calls it in the second example.

Quote:

Originally Posted by johnsfine

In an optimized build, some functions called by that code might be inlined, so bugs such as a bad name pointer would seg fault apparently in that code rather than in called code.

If it is on that actual code, that should indicate that the this pointer is bad. If find were a virtual function (which I expect it isn't) then a vtable pointer might be bad or the find pointer in a vtable might be bad. But even the symptom of a bad vtable pointer is more likely caused by a bad this pointer.

I always look at the asm code around the point of the seg fault. You should learn how to display that asm code in GDB. If you don't know enough asm to learn anything from that asm code, you can post it here for help.

In x86_64, to understand this kind of seg fault, you also need to look at the general registers: rdi, rsi, rax, etc.

In x86 (32 bit) you need to look at the top several values on the stack.

I once played around with x86 (not x86_64, but it should be similar enough) assembler, but not very much and I forgot a lot of it by now. And i don't know how to make GDB print it out.

dwhitney67 · 04-11-2011, 08:47 AM

Quote:

Originally Posted by MTK358

The problem is that I don't see anything wrong with CallNode that would cause this. All it does is call node->eval(scope) just like main() calls it in the second example.

Same here... I don't see anything wrong with CallNode, although I must admit my judgement is biased because I have no idea what you have implemented in that class' constructor.

MTK358 · 04-11-2011, 08:50 AM

Code:

class CallNode : public Node
	{
	public:
		CallNode(Node *funcNode, NodeCallParamList *param);
		~CallNode();
		LangObject* eval(Table *scope);
		
	private:
		NodeCallParamList *param;
		Node *funcNode;
	};

Code:

CallNode::CallNode(Node *funcNode, NodeCallParamList *param) {
	this->funcNode = (Node*) funcNode->getref();
	this->param = param;
}

CallNode::~CallNode() {
	funcNode->putref();
	delete param;
}

LangObject* CallNode::eval(Table *scope) {
	CallParamList *l = param->evaluateParameters(scope);
	Function *f = (Function*) funcNode->eval(scope); //LangObject::discardIfWrongType(funcNode->eval(scope), LangObject::FunctionType);
	if (f) {
		LangObject *result = f->call(l);
		l->putref();
		f->putref();
		return result;
	}
	//TODO throw error
	l->putref();
	f->putref();
	return NULL;
}

dwhitney67 · 04-11-2011, 09:00 AM

It seems that you are using a variant of smart-pointers, however they don't seem so "smart" if you must call getref() and putref() when you want to increase and decrease the number of references. Is this a home-made smart-pointer class that you are using?

In your function eval(), you never check to see if the pointer to 'l' is valid. It seems like you are biased against developing "safe" code. This lax attitude may very well have placed you into your current predicament of checking for a NULL pointer or a memory corruption error.

Anyhow, consider using Boost's shared pointer; it is a lot easier to use than what you have.

johnsfine · 04-11-2011, 02:38 PM

Quote:

Originally Posted by MTK358

It was crashing when interpreting this AST:

What does "AST" mean?

Quote:

which calls Table::getParentWithVar, which calls map::find(), and that's where the segfault happens)

Is the contradiction to your earlier post that I marked in red a typo, or a change or what?

The part in purple points marks where the inherent ambiguity of English (plus my inherent belief that PEBKAC is most likely) leaves me guessing at what you really saw.

When you want us to know what a gdb backtrace showed you, it is usually best to just issue the bt command in gdb and then copy/paste the result into a CODE block in your post.

Quote:

But it doesn't crash with this one, which contains the part that crashes in the previous AST:

That tells us a whole lot less than you might think it should.

Quote:

I once played around with x86 (not x86_64, but it should be similar enough)

Doesn't quite answer which architecture your current C++ code was compiled to. But if I see any disassembly, I'll know anyway.

Quote:

i don't know how to make GDB print it out.

Various forms of the disas command. With no parameters, that shows you disassembly (reconstructed, not original assembly) code for whatever gdb thinks is the current function. From that you should find a block from several instructions before the failure point through a few instructions after it.

The command inf r dumps the basic registers. For x86_64, that will include rax through rip that are interesting for C++ debugging, followed by a bunch of obscure registers only interesting for kernel debugging. For x86, the interesting ones are eax through eip, but most of the interesting stuff is usually on the stack rather than in registers.

The exact point of the failure is in the rip or eip register. It should be possible to match that against the addresses in disas output, but sometimes some further effort is required. I don't use gdb enough myself to know when to expect raw hex addresses (such as in rip or eip) vs. various symbol and offset forms in the bt output or the disas output. Usually I'd like to deal with all three together but one or more are in a different format requiring conversion.

MTK358 · 04-11-2011, 02:44 PM

Quote:

Originally Posted by dwhitney67

It seems that you are using a variant of smart-pointers, however they don't seem so "smart" if you must call getref() and putref() when you want to increase and decrease the number of references. Is this a home-made smart-pointer class that you are using?

In your function eval(), you never check to see if the pointer to 'l' is valid. It seems like you are biased against developing "safe" code. This lax attitude may very well have placed you into your current predicament of checking for a NULL pointer or a memory corruption error.

Anyhow, consider using Boost's shared pointer; it is a lot easier to use than what you have.

I have a RefcountObject class, and all objects that will be stored as variables in the interpreted language (and all members of those classes, unless the member's value won't be shared with other objects) are subclasses of it.

MTK358 · 04-11-2011, 02:54 PM

Quote:

Originally Posted by johnsfine

What does "AST" mean?

http://en.wikipedia.org/wiki/Abstract_syntax_tree

The Node class repretents an AST node, and all *Node classes are subclasses of Node that implement a certain behavior.

Quote:

Originally Posted by johnsfine

Is the contradiction to your earlier post that I marked in red a typo, or a change or what?

I don't really understand.

The segfault occurs within the call to map::find(), and map::find() is called by Table::getParentWithVar()

Quote:

Originally Posted by johnsfine

inherent ambiguity of English

http://en.wikipedia.org/wiki/Lojban

(I never learned it, but it might be a fun thing to try sometime)

Quote:

Originally Posted by johnsfine

When you want us to know what a gdb backtrace showed you, it is usually best to just issue the bt command in gdb and then copy/paste the result into a CODE block in your post.

Code:

(gdb) run
Starting program: /home/michael/Projects/lang/build/src/lang 
Creating integer 5
setting table entry "__internal_data__" to 0x602250
setting table entry "__op_plus__" to 0x602358
setting table entry "__op_minus__" to 0x602448
setting table entry "__op_times__" to 0x602538
setting table entry "__op_div__" to 0x602628
setting table entry "__comp_ne__" to 0x602718
setting table entry "print" to 0x602808                                                               
                                                                                                      
Program received signal SIGSEGV, Segmentation fault.                                                  
0x00007ffff7bce7d6 in std::_Rb_tree<char*, std::pair<char* const, lang::LangObject*>, std::_Select1st<std::pair<char* const, lang::LangObject*> >, lang::CStringComparisonClass, std::allocator<std::pair<char* const, lang::LangObject*> > >::_M_lower_bound (
    this=0x7ffff7bd2908, __x=0x8b48008b4820408b, __y=0x7ffff7bd2910, __k=@0x7fffffffd7c0)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1004
1004            if (!_M_impl._M_key_compare(_S_key(__x), __k))                                                   
(gdb) bt
#0  0x00007ffff7bce7d6 in std::_Rb_tree<char*, std::pair<char* const, lang::LangObject*>, std::_Select1st<std::pair<char* const, lang::LangObject*> >, lang::CStringComparisonClass, std::allocator<std::pair<char* const, lang::LangObject*> > >::_M_lower_bound (
    this=0x7ffff7bd2908, __x=0x8b48008b4820408b, __y=0x7ffff7bd2910, __k=@0x7fffffffd7c0)                               
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1004
#1  0x00007ffff7bd06f6 in std::_Rb_tree<char*, std::pair<char* const, lang::LangObject*>, std::_Select1st<std::pair<char* const, lang::LangObject*> >, lang::CStringComparisonClass, std::allocator<std::pair<char* const, lang::LangObject*> > >::find (
    this=0x7ffff7bd2908, __k=@0x7fffffffd7c0)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1519
#2  0x00007ffff7bd0209 in std::map<char*, lang::LangObject*, lang::CStringComparisonClass, std::allocator<std::pair<char* const, lang::LangObject*> > >::find (this=0x7ffff7bd2908, __x=@0x7fffffffd7c0)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_map.h:697
#3  0x00007ffff7bd3036 in lang::Table::findParentWithVar (this=0x7ffff7bd28e8, name=0x602070 "print")
    at /home/michael/Projects/lang/lib/table.cc:31
#4  0x00007ffff7bd3117 in lang::Table::get (this=0x7ffff7bd28e8, name=0x602070 "print") at /home/michael/Projects/lang/lib/table.cc:46
#5  0x00007ffff7bd2512 in lang::MemberNode::eval (this=0x602040, scope=0x602110) at /home/michael/Projects/lang/lib/nodes.cc:119
#6  0x00007ffff7bd2935 in lang::CallNode::eval (this=0x6020e0, scope=0x602110) at /home/michael/Projects/lang/lib/nodes.cc:186        
#7  0x0000000000400bcc in main () at /home/michael/Projects/lang/src/main.cc:21
(gdb)

Quote:

Originally Posted by johnsfine

Various forms of the disas command. With no parameters, that shows you disassembly (reconstructed, not original assembly) code for whatever gdb thinks is the current function. From that you should find a block from several instructions before the failure point indicated by bt through a few instructions after it.

The command inf r dumps the basic registers. For x86_64, that will include rax through rip that are interesting for C++ debugging, followed by a bunch of obscure registers only interesting for kernel debugging. For x86, the interesting ones are eax through eip, but most of the interesting stuff is usually on the stack rather than in registers.

What stack frame to call them in?

johnsfine · 04-11-2011, 03:24 PM

In your backtrace I see

Quote:

in std::_Rb_tree<...>::_M_lower_bound(
this=0x7ffff7bd2908, __x=0x8b48008b4820408b, __y=0x7ffff7bd2910, __k=@0x7fffffffd7c0)

I don't have the internals of GNU Rb_tree either memorized or handy. If I debug one of these myself, I just look as disassembly and register values to be sure of what I can guess at from just the above.

I assume all three of the above items are pointers. this, __y and __k are valid pointers. __x is very much not a valid pointer.

I expect __x and __y are nodes withing the existing map. this is the map itself and __k is the new name.

So __x being bad implies the map was corrupt before you got into the code with the actual crash.

It's always harder to debug something where the crash occurs as an after effect of a previous silent bug.

If it is a memory clobber bug (rather than something that specifically hits the map) that is harder still.

Assuming a map clobber bug (rather than a memory clobber bug), I would suggest writing or finding some map testing function and insert it in a bunch of asserts scattered through the code to help find the point at which the map is corrupted.

MTK358 · 04-11-2011, 03:27 PM

What's Rb_tree?