![]() |
SEGMENTATION FAULT using gcc 4.4.4 -O2 , works with gcc 4.1.0 -O2 or gcc 4.4.4 -O1
Hi,
I'm trying to update a relatively old software to be used with new 64-bit systems and also new version of gcc. Becuase the original software is written for 32-bit systems, I decide to use controlled data types which are the same on both 64 bit and 32 bit Machines. I change the code according to these new defined types, Here is the situation: 1- As I expect everything works on 32-bit machine. 2-If I use gcc 4.1.0 on 64-bit Machine everything is working 3-If I use gcc 4.4.4 on 64-bit Machine Segmentation Fault would occur! (Optimization O2) 4-If I use gcc 4.4.4 on 64- bit Machine with -O everything works!! Here is the output of Valgrind: ==13412== ==13412== ==13412== Process terminating with default action of signal 11 (SIGSEGV) ==13412== Access not within mapped region at address 0x700000008 ==13412== at 0x409F29: SysString::clear(Integral::CMODE) (sstr_03.cc:1623) ==13412== by 0x40B73D: SysString::assign(unsigned char, wchar_t const*) (sstr_03.cc:838) ==13412== by 0x403474: SysString::diagnose(Integral::DEBUG) (sstr_02.cc:221) ==13412== by 0x401E1C: main (in /usr/local/isip/tools/ifc/class/system/SysString/SysString.exe) ==13412== If you believe this happened as a result of a stack ==13412== overflow in your program's main thread (unlikely but ==13412== possible), you can try to increase the size of the ==13412== main thread stack using the --main-stacksize= flag. ==13412== The main thread stack size used in this run was 16777216. ==13412== ==13412== For counts of detected and suppressed errors, rerun with: -v Why I am getting Segmentation Error in case 3? tnx |
Hi -
As I'm sure you know, just because some code happens to run without crashing, doesn't necessarily mean that code is "correct". There could have been a latent bug there since Day One. On the other hand (as Valgrind is reporting), maybe you're getting a stack overflow. Certainly worth instrumenting and looking for: It looks like the code in question is trying to emulate Windows MFC functionality (which, itself, is probably fraught with danger ;)). STRONG SUGGESTION: 1. See if you can reproduce the problem with "-g" 2. If so, see if you can troubleshoot whether your input values are correct and your data structures are uncorrupted, and your stack OK under GDB. 3. You might also be interested in using libsigsegv() for your troubleshooting: http://savannah.gnu.org/projects/libsigsegv/ 'Hope that helps .. PSM |
Quote:
-Wall -Wextra -Wformat=2 during compilation - sometimes warnings produced by the compiler give the clue. |
Quote:
Actually, I have checked different stack sizes already and it is not helping. This code is a part of a bigger code and is relatively complex system :D I have complied with -g and it is the result of gdb: gdb) r Starting program: /usr/local/isip/tools/ifc/class/system/SysString/SysString.exe diagnosing class SysString: testing required public methods... <SysString::str1> value_d = (16 >= 16) "hello my name is" <SysString::str2> value_d = (4 >= 4) "rjck" <SysString::str3> value_d = (100 >= 0) "" <SysString::str4> value_d = (4 >= 4) "rjck" testing class-specific public methods: extensions to required methods... Program received signal SIGSEGV, Segmentation fault. SysString::clear (this=0x7f00000000, cmode_a=Integral::RESET) at sstr_03.cc:1623 1623 if (capacity_d > 0) { Current language: auto; currently c++ (gdb) I will work with libsigsegv to see if it can help or not!!! anyway, thanks for the reply |
Quote:
|
Quote:
To find out whether it is a strict-aliasing problem, replace the -O2 option with -O2 -fno-strict-aliasing If that fixes it, the problem was probably strict-aliasing (though that wouldn't be certain). If -fno-strict-aliasing doesn't fix the problem, then the problem definitely wasn't strict-aliasing. If the problem is strict-aliasing, it is best to find and fix that error each place where it occurs in your code. But it large old programs that usually isn't practical, so -fno-strict-aliasing becomes a long term part of your compile command. |
Quote:
|
Quote:
Seg faults should be pretty easy to understand when you catch them this way in GDB. The this pointer 0x7f00000000 looks a little improbable, but not definitely wrong. GDB commands can be used to examine the *this object and/or the contents of memory at 0x7f00000000 to see whether that pointer is wrong. I don't know whether your Valgrind results were run with the same addresses used as your GDB results. The faulting address reported by Valgrind 0x700000008 seems quite unlikely for that line of code (a simple read of capactity_d) and the GDB reported value of the this pointer. If you post a bit more of the source of SysString::clear, that might make the problem obvious. If you know any asm, it is very effective to look at some disassembly and register values in GDB at the point of the seg fault. The seg fault means some address was bad. You need to figure out what address was bad and what the code was supposed to be doing with that address and why it had a wrong value instead. All that should be pretty easy to find in GDB at the point of the seg fault. |
Quote:
Here is the snap of the code: Code:
// method: clearCode:
// --------------------------------------------------------------Actually the error tends to move, for example if I comment out some part of the code it would appear somewhere else! output of gdb and backtrace: gdb) r Starting program: /usr/local/isip/tools/ifc/class/system/SysString/SysString.exe diagnosing class SysString: testing required public methods... <SysString::str1> value_d = (16 >= 16) "hello my name is" <SysString::str2> value_d = (4 >= 4) "rjck" <SysString::str3> value_d = (100 >= 0) "" <SysString::str4> value_d = (4 >= 4) "rjck" testing class-specific public methods: extensions to required methods... Program received signal SIGSEGV, Segmentation fault. SysString::clear (this=0x7f00000000, cmode_a=Integral::RESET) at sstr_03.cc:1623 1623 if (capacity_d > 0) { Current language: auto; currently c++ (gdb) backtrace #0 SysString::clear (this=0x7f00000000, cmode_a=Integral::RESET) at sstr_03.cc:1623 #1 0x000000000040b73e in SysString::assign (this=0x7f00000000, arg_a=27 '\033', fmt_a=<value optimized out>) at sstr_03.cc:838 #2 0x0000000000403475 in SysString::diagnose (level_a=<value optimized out>) at sstr_02.cc:221 #3 0x0000000000401e1d in main () (gdb) |
Quote:
Try setting a breakpoint earlier and see where this pointer is coming from. |
Quote:
An error that moves like that, usually is a memory clobber bug: The code with the actual bug uses some memory that doesn't belong to it. Then the error appears when the section of code that does own that memory uses it. A memory clobber bug usually needs to be backtracked in two stages. First you need to follow the bad value (the this pointer in your example) back to the memory location where it was clobbered. Then you need to restart and set a data breakpoint to catch the real bug (In GDB I don't know how, nor even the correct terminology. I'm usually chasing such bugs in Visual Studio). The info you posted makes it much more likely that the this pointer is bad (otherwise GDB is wrong about the line number, which is possible, but less likely). You also showed that the this pointer came through SysString::assign. So you should be looking in SysString::assign, or more likely the code that called it, for the point where this got clobbered. |
I have found something that might be related to the problem :
If use gdb and put a breakpoint just before the segmentation fault occurs in sstr_02.cc (at line 220) and then examine the value of "value_d" (value_d is a pointer to unichar) and then go one step into assign function and examine the "value_d" again I see this: (gdb) r Starting program: /usr/local/isip/tools/ifc/class/system/SysString/SysString.exe testing class SysString diagnosing class SysString: testing required public methods... <SysString::str1> value_d = (16 >= 16) "hello my name is" <SysString::str2> value_d = (4 >= 4) "rjck" <SysString::str3> value_d = (100 >= 0) "" <SysString::str4> value_d = (4 >= 4) "rjck" testing class-specific public methods: extensions to required methods... Breakpoint 1, SysString::diagnose (level_a=<value optimized out>) at sstr_02.cc:221 (gdb) p num.value_d $3 = (unichar *) 0x61fcb0 (gdb) s SysString::assign (this=0x7f00000000, arg_a=27 '\033', fmt_a=0x41afd8) at sstr_03.cc:818 (gdb) p value_d $4 = (unichar *) 0x0 (gdb) on 32 bit system it like this: (gdb) r Starting program: /home/amir/local/isip/tools/system-ifc/class/system/SysString/SysString.exe testing class SysString diagnosing class SysString: testing required public methods... testing class-specific public methods: extensions to required methods... Breakpoint 1, SysString::diagnose (level_a=Integral::BRIEF) at sstr_02.cc:221 (gdb) p num.value_d $3 = (unichar *) 0x81672e8 L"27" (gdb) s SysString::assign (this=0xbfffeb84, arg_a=27 '\033', fmt_a=0x8063ac0 L"asdf = %u xyz") at sstr_03.cc:828 (gdb) p value_d $4 = (unichar *) 0x81672e8 L"27" (gdb) As you can see for some reason "value_d" is pointing the NULL in the first case which is wrong, How this could happen? |
Quote:
Meanwhile, there is something strange in what you just provided. Can you explain this: In your 64 bit version line 221 in SysString::diagnose called a version of SysString::assign at line 818. But in your 32 bit version line 221 in SysString::diagnose called an apparently different version of SysString::assign at line 828. If you don't have a good explanation for that, post the area around each of those lines (around 221 in sstr_02.cc as well as around 818 through 828 in sstr_03.cc). |
Quote:
Here is the code: bool8 SysString::assign(byte8 arg_a, const unichar* fmt_a){<---Line 818 // allocate a static buffer for printing // static char buf[MAX_LENGTH]; static char fmt[MAX_LENGTH]; static char* fmt_ptr; // check the arguments // if (fmt_a == (unichar*)NULL) { <---- Line 828 return Error::handle(name(), L"assign", Error::ARG, __FILE__, __LINE__); } SysString temp(fmt_a); temp.getBuffer((byte8*)fmt, MAX_LENGTH); fmt_ptr = fmt; // clear out the current value // clear(Integral::RESET); // create and possibly assign the string // if (sprintf(buf, fmt_ptr, (uint32)arg_a) > 0) { assign((byte8*)buf); return true; } // exit gracefully // return false; } I think it is a gdb issue that shows line 828 instead of 818 |
OK, now I see I misunderstood GDB output regarding 818 vs. 828. That is just a difference in the optimizer behavior of the two compiles.
I don't know how much to trust GDB regarding the values of this and value_d when stopped at line 818. Generally I don't trust any implausible variable values reported by GDB. GDB and/or the compiler are not very good at tracking which variables are in which registers and/or stack locations at which lines of the source code. zirias expressed the opinion (that I mostly share) that 0x0x7f00000000 is an unreasonable value for this. You told me that value_d is a member of SysString so at line 818 value_d should be equivalent to this->value_d which (assuming this is invalid) should have been Cannot access memory at address rather than $4 = (unichar *) 0x0 If I were debugging it, I would poke around a bit more at that point to find out which, if any, of the apparently contradictory pieces of info represent the result of the bug you're looking for, vs. which represent wrong info displayed by GDB. At 818 and maybe at an s further into that function, I would want to know what is: this &value_d this->value_d If those don't start to add up to something consistent, I'd look at disassembly of the code at that point and at register values and also try directly looking at memory at address 0x7f00000000 |
| All times are GMT -5. The time now is 07:19 PM. |