LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-17-2009, 03:18 PM   #1
dax2112rush
LQ Newbie
 
Registered: Apr 2009
Posts: 7

Rep: Reputation: 0
Problems debugging intermittent segmentation fault


Hi,

I am developping an application for my master's degree (not in computer science/engineering, don't worry :P) and have a problem debugging a segmentation fault that happens only every once in a while (maybe after every 10 mins of use).

I have run the program in debug mode and my problem is that the I can read all variables on the line GDB reports the app to have stopped on.

Code:
(gdb) info stack
#0  0x0817fa7e in Signal1D<float>::plot (this=0xb1426ef0, cr=@0xbfa74368, allocation={gobject_ = {x = 10, y = 0, width = 1668, height = 130}}, zoom=
      {y_start = 0.44753885269165039, y_end = -0.6947590708732605, x_start = 0, x_end = 12.431979166666666}, area=@0xbfa747e0) at ../Signal1D.h:222
...

(gdb) l 222
217					double xval = (double)i*1.0/(double)f_sampleRate;
218					T minVal = (T)(1.0/0.0);
219					T maxVal = (T)(-1.0/0.0);
220					for(uint16_t j=0; j<decimation_factor && i+j<index_end; j++)
221					{
222						if(f_values[i+j] > maxVal)
223							maxVal = f_values[i+j];
224						if(f_values[i+j] < minVal)
225							minVal = f_values[i+j];
226					}

(gdb) p f_values[i+j]
$2 = -0.00149934844

(gdb) p maxVal
$3 = 0.00983929168
T is a templated type (float in that case).

I'm quite surprised since I can read all variables. I also have disabled optimizations (-O0).

The assembler directive being executed is

Code:
(gdb) x/i $pc
0x817fa7e <_ZN8Signal1DIfE4plotERN5Cairo6RefPtrINS1_7ContextEEEN3Gdk9RectangleE10ZoomStructR13_GdkRectangle+1930>:	flds   (%eax)

(gdb) p *(float*)$eax
Cannot access memory at address 0xaf85fff0
So the memory pointed by eax is not accessible. However, I don't understand what it's trying to do since I can read all variables that should be needed.

Dissassembly looks like

Code:
				for(uint16_t j=0; j<decimation_factor && i+j<index_end; j++)
 817fa53:	66 c7 45 f2 00 00    	movw   $0x0,-0xe(%ebp)
 817fa59:	e9 a9 00 00 00       	jmp    817fb07 <_ZN8Signal1DIfE4plotERN5Cairo6RefPtrINS1_7ContextEEEN3Gdk9RectangleE10ZoomStructR13_GdkRectangle+0x813>
				{
					if(f_values[i+j] > maxVal)
 817fa5e:	8b 55 0c             	mov    0xc(%ebp),%edx
 817fa61:	8b 45 0c             	mov    0xc(%ebp),%eax
 817fa64:	8b 00                	mov    (%eax),%eax
 817fa66:	83 e8 3c             	sub    $0x3c,%eax
 817fa69:	8b 00                	mov    (%eax),%eax
 817fa6b:	8d 04 02             	lea    (%edx,%eax,1),%eax
 817fa6e:	8b 50 04             	mov    0x4(%eax),%edx
 817fa71:	0f b7 45 f2          	movzwl -0xe(%ebp),%eax
 817fa75:	03 45 e4             	add    -0x1c(%ebp),%eax
 817fa78:	c1 e0 02             	shl    $0x2,%eax
 817fa7b:	8d 04 02             	lea    (%edx,%eax,1),%eax
 817fa7e:	d9 00                	flds   (%eax)
 817fa80:	d9 45 d8             	flds   -0x28(%ebp)
 817fa83:	d9 c9                	fxch   %st(1)
 817fa85:	da e9                	fucompp 
 817fa87:	df e0                	fnstsw %ax
 817fa89:	9e                   	sahf   
 817fa8a:	76 25                	jbe    817fab1 <_ZN8Signal1DIfE4plotERN5Cairo6RefPtrINS1_7ContextEEEN3Gdk9RectangleE10ZoomStructR13_GdkRectangle+0x7bd>
						maxVal = f_values[i+j];
 817fa8c:	8b 55 0c             	mov    0xc(%ebp),%edx
 817fa8f:	8b 45 0c             	mov    0xc(%ebp),%eax
 817fa92:	8b 00                	mov    (%eax),%eax
 817fa94:	83 e8 3c             	sub    $0x3c,%eax
 817fa97:	8b 00                	mov    (%eax),%eax
 817fa99:	8d 04 02             	lea    (%edx,%eax,1),%eax
 817fa9c:	8b 50 04             	mov    0x4(%eax),%edx
 817fa9f:	0f b7 45 f2          	movzwl -0xe(%ebp),%eax
 817faa3:	03 45 e4             	add    -0x1c(%ebp),%eax
 817faa6:	c1 e0 02             	shl    $0x2,%eax
 817faa9:	8d 04 02             	lea    (%edx,%eax,1),%eax
 817faac:	8b 00                	mov    (%eax),%eax
 817faae:	89 45 d8             	mov    %eax,-0x28(%ebp)
					if(f_values[i+j] < minVal)
I'm not comfortable with x86 asm and can't really draw any conclusions about what's wrong. If anybody could help me pinpoint the problem I'd be really happy.

Also, is it possible to find what eax points to (in terms of variable), or to have some kind of memory map that would help me figure out what library/program is located in that area of the memory? I'm not yet very used to GDB

TIA!

Ric
 
Old 04-17-2009, 04:44 PM   #2
dwhitney67
Senior Member
 
Registered: Jun 2006
Location: Maryland
Distribution: Kubuntu, Fedora, RHEL
Posts: 1,541

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Before pursuing a Master's degree, you may want to consider remedial math courses.

Since when is a divide by zero desirable?
Code:
...
218					T minVal = (T)(1.0/0.0);
219					T maxVal = (T)(-1.0/0.0);
...
 
Old 04-17-2009, 04:52 PM   #3
jf.argentino
Member
 
Registered: Apr 2008
Location: Toulon (France)
Distribution: FEDORA CORE
Posts: 493

Rep: Reputation: 50
Quote:
I'm not comfortable with x86 asm
nobody is...
By the way, I think you'd better use valgrind-memcheck to this kind of problem, it will check for invalid memory use, certainly that i+j get out of f_values bounds.
And, as valgrind-memcheck has many options to set, try alleyoop, a nice gui for it really straightforward to use.
Quote:
(1.0/0.0)
I think it's really dangerous since not sure how different compiler will handle this, you can use "Inf", but can't remember if you have to get it through a function. But even that is not really a good idea, I think you can have portability issue, so take a look into float.h (or maybe limits.h?) for something like DOUBLE_MAX which is the maximum value that can be represented by a double.
 
Old 04-17-2009, 04:58 PM   #4
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
I'm pretty sure these seven instructions just compute the value of the pointer f_values and store it in edx. I don't see the source for f_values, so I can't guess why that is so complicated.
Code:
 817fa5e:	8b 55 0c             	mov    0xc(%ebp),%edx
 817fa61:	8b 45 0c             	mov    0xc(%ebp),%eax
 817fa64:	8b 00                	mov    (%eax),%eax
 817fa66:	83 e8 3c             	sub    $0x3c,%eax
 817fa69:	8b 00                	mov    (%eax),%eax
 817fa6b:	8d 04 02             	lea    (%edx,%eax,1),%eax
 817fa6e:	8b 50 04             	mov    0x4(%eax),%edx
Then it promotes j to 32 bit and adds i
Code:
 817fa71:	0f b7 45 f2          	movzwl -0xe(%ebp),%eax
 817fa75:	03 45 e4             	add    -0x1c(%ebp),%eax
Then it computes the address of f_values[i+j]
Code:
 817fa78:	c1 e0 02             	shl    $0x2,%eax
 817fa7b:	8d 04 02             	lea    (%edx,%eax,1),%eax
But you determined that is an invalid address. It would be nice to know whether it is invalid because of an error in j (-0xe(%ebp)) or in i (-0x1c(%ebp)) or in f_values which is in edx at the point of failure.

So on the next failure, I would look at all of those things:
eax
edx
-0xe(%ebp)
-0x1c(%ebp)
i
j
f_values

Quote:
Originally Posted by dax2112rush View Post
I can read all variables on the line GDB reports the app to have stopped on.
Never trust the values of variables reported by any debugger. If they happen to be correct, it is convenient that the debugger can display them. But when the debugger is confused, it really helps to be able to look at the register which the code just put the variable into and see what what the value really is.

Quote:
Originally Posted by jf.argentino View Post
nobody is...
Nonsense. When I'm confused by what some C++ code really means, I look at the generated asm code. It is usually very understandable.

Last edited by johnsfine; 04-17-2009 at 05:05 PM.
 
Old 04-17-2009, 05:17 PM   #5
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
I think it is some kind of inheritance bug. Is f_values a member variable of a multiply inherited base class of the current class, or something like that?

Code:
 8b 55 0c             	mov    0xc(%ebp),%edx
 817fa61:	8b 45 0c             	mov    0xc(%ebp),%eax
The above two instructions put a pointer to an object into both eax and edx.

Code:
 817fa64:	8b 00                	mov    (%eax),%eax
Next we get a pointer to the vtable of the object.

Code:
 817fa66:	83 e8 3c             	sub    $0x3c,%eax
Then compute a negative offset from the start of the vtable ???

I don't use bad constructs such as multiple inheritance of virtual classes enough to remember exactly which bad construct gives code like that.

Code:
 817fa69:	8b 00                	mov    (%eax),%eax
Then we get the offset of some related class stored there (the class that f_values is a member of).

Code:
 817fa6b:	8d 04 02             	lea    (%edx,%eax,1),%eax
Then we compute the address of the subobject of that class.

Code:
 817fa6e:	8b 50 04             	mov    0x4(%eax),%edx
Finally get the wrong value of f_values from that subobject.

Maybe the object simply has a wrong f_values in it, but more likely we navigated incorrectly, because the object relationship at run time wasn't the same as the object relationship at compile time (as would happen if the caller had done a static_cast to the wrong related class and the current object isn't the type it is supposed to be).

Last edited by johnsfine; 04-17-2009 at 05:18 PM.
 
Old 04-18-2009, 12:27 PM   #6
dax2112rush
LQ Newbie
 
Registered: Apr 2009
Posts: 7

Original Poster
Rep: Reputation: 0
You are right, f_value is an inherited member variable. The class structure looks like that:

Code:
template <class T>
class Signal1D : public virtual SignalBase, public virtual SignalData<T>
{
public:

	using SignalData<T>::f_values;
...
}

class SignalBase: public virtual PlottableAndViewable, public virtual SignalDataBase ...


class SignalDataBase ...

template <class T>
class SignalData: public virtual SignalDataBase {
public:
...

	T* f_values;
};
Agreed, this is a very ugly class structure, it's my first real C++ project and I have made quite a few mistakes I would not do again. I was basically trying to achieve what I was seeing in Java where objects would implement a few interfaces and the receiver would cast to these interfaces, ie. I wanted to have vectors of "signals" or of "plottable" objects without restricting myself to one specific implementation. I know there are probably better ways to do that and I'll probably read a book on design patterns before starting my next project, but for this one, I am stuck with my errors since too much code depends on that.

Unfortunately, with my home PC (x86_64 vs x86 at research lab), I am unable to reproduce the segmentation fault. Monday, I will further investigate based on the input you guys gave me and let you know if I was able to fix it!

As for Valgrind, I use it quite often, but since the slowdown is really huge and my bug isn't happening very often, I did not manage to trigger it with valgrind monitoring it. It could be useful though.

And FYI divisions by 0 are to generate +inf and -inf. Since I had a templated type, I thought it would be the best way to do it, but I'll have a look at DOUBLE_MAX (or any other constant in std headers) since I guess DOUBLE_MAX will cast to the maximum value of integer types as well.

BTW, is there any good resource (tutorial, book, etc..) that you guys recommend to be able to learn about how those C++ constructs get translated to asm? It would be a great skill I'd like to have.

Thanks for all your replies, I'll give you feedback on Monday!

Ric

Last edited by dax2112rush; 04-18-2009 at 12:37 PM.
 
Old 04-18-2009, 01:31 PM   #7
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
The code it crashes in expects to be working on an object of type Signal1D<float>

The most likely cause of the crash it that it is not that kind of object.

1) It may have been that kind of object but has been deleted.

2) It may be a different kind of object that has reached this point in the code due to an incorrect cast.

So, you should examine the current object (*this) to see what it looks like.

Hopefully the debugger knows what "this" is in that context. If not, the asm code clearly shows "this" is stored at 0xc(%ebp)
 
Old 04-21-2009, 08:48 AM   #8
dax2112rush
LQ Newbie
 
Registered: Apr 2009
Posts: 7

Original Poster
Rep: Reputation: 0
It's fixed!

edx wasn't equal to f_values. I found out f_value had been modified after f_values's value was transfered to edx. I had forgotten I had a method that could grow f_values (realloc the array) and this was happenning during the call to the function it was crashing in (in 2 separate threads). A simple mutex solved the issue.

Thanks a lot!
At least I learned why values reported by GDB can't be trusted!
Ric
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
segmentation fault debugging prakash_m80 Linux - Newbie 1 03-22-2009 06:07 AM
Problems Building 'fontconfig-2.6.0' - Segmentation Fault haroldjclements Linux - Software 0 11-04-2008 11:10 AM
yast segmentation fault, system freezing - nvidia driver at fault? BaltikaTroika SUSE / openSUSE 2 12-02-2005 09:34 AM
SDL-Quake, Segmentation fault, Fullscreen and LAN-problems... Jonthebest Linux - Software 0 09-13-2003 10:45 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration