LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-16-2010, 09:50 PM   #31
SeymourButts
LQ Newbie
 
Registered: Jan 2010
Posts: 28

Original Poster
Rep: Reputation: 15

Quote:
Originally Posted by smeezekitty View Post
Look closly:That code is assuming the memory location will always be:
Code:
4278168392
4278168392 is likely not memory that belongs to the program!
I just placed that value in there for display purposes when posting. That was the actual value as given by gdb while stepping through the code.
 
Old 01-16-2010, 09:54 PM   #32
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,339

Rep: Reputation: 231Reputation: 231Reputation: 231
Quote:
Originally Posted by SeymourButts View Post
I just placed that value in there for display purposes when posting. That was the actual value as given by gdb while stepping through the code.
Ok LOL.
 
Old 01-16-2010, 10:19 PM   #33
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by SeymourButts View Post
I found the solution. The application was developed on a 32-bit Solaris server. but the Linux server is a 64-bit system.

By setting the "-m32" option on gcc, I was able to compile the application using 32-bit data type sizes.
Maybe you found a solution, but not the bug.

You said
Quote:
The ucPacket address goes from [0x7feffcaff] to [0xfffffffffeffcaff] on the return from this function.
I can't see anything in the code you quoted that could make that happen.

There are lots of errors common to the situation of porting code from 32 bit to 64 bit. But I just don't see any such errors in the code you posted.
 
Old 01-16-2010, 10:27 PM   #34
SeymourButts
LQ Newbie
 
Registered: Jan 2010
Posts: 28

Original Poster
Rep: Reputation: 15
Not sure myself. I was thinking it had soemthing to do with the difference in sizes, where the long and pointer on Linux are 8 bytes and 4 bytes on Solaris.
 
Old 01-17-2010, 04:51 AM   #35
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Bookworm (Fluxbox WM)
Posts: 1,391
Blog Entries: 54

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
Quote:
What value does OBJECT_ID_tag have in this context?
It evaluates to the value 0x06.

Quote:
it says there is a Memory Fault on the bolded build_tag() function below when the OBJECT_ID_tag is the first parm
Since the OBJECT_ID is just a number, it is more like that the second parameter (the pointer) is the cause of the problem. However, since this particular invocation is presumably just setting the tag in the packet, it is quite likely that the failure has already occurred prior to this point, and has caused a corruption (perhaps the call to ucbuild_oid).
==EDIT== oops, only read the first page of the thread when I responded

Porting code like this is non-trivial, especially if (as johnsfine has deduced) you are moving from a 32 bit system to a 64 bit system. The reason the post seems strange is because your knowledge of C appears to be less than required for this sort of job. Sergei is just suggesting learning a bit more about the language before attempting it.

Last edited by neonsignal; 01-17-2010 at 04:55 AM.
 
Old 01-17-2010, 08:50 AM   #36
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by SeymourButts View Post
I was thinking it had soemthing to do with the difference in sizes, where the long and pointer on Linux are 8 bytes and 4 bytes on Solaris.
Probably, but that doesn't mean much until you find the actual bug.

I've seen a lot of code where for various reasons someone casts a pointer to an int, then processes or stores it, then casts it back to a pointer. That works fine on all the platforms where int and pointer are the same size. But breaks when they are different size.

You have a pointer that you say should equal 0x7feffcaff but it actually equals 0xfffffffffeffcaff. That is exactly what would happen if you cast the pointer to an int then back to a pointer. But the code you posted doesn't do that.

You said ucPacket becomes 0xfffffffffeffcaff on the return from ucbuild_oid. If that is true and you look just a little closer, you should be able to identify the bug.

You have the line
Code:
ucPacket = ucbuild_oid (ber_oid, OIDpollerid_length - 1, ucPacket);
In most debuggers, if you single step into ucbuild_oid and later single step out of it, you will still be on that line, indicating the function call has finished and the assignment hasn't happened yet. So at that moment, ucPacket should still hold its previous value and the register %rax should hold the value that is about to be assigned to ucPacket. Which of those is 0xfffffffffeffcaff?

The function returns with the line
Code:
return  build_len( op - p, p );
You should be able to single step up to or break at that line. What are p and op before that line executes? If you step over that line, I think the return will not occur (until you step again). At that point the value being returned is in the register %rax. What is it?
 
Old 01-17-2010, 11:48 AM   #37
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by johnsfine View Post
Probably, but that doesn't mean much until you find the actual bug.

I've seen a lot of code where for various reasons someone casts a pointer to an int, then processes or stores it, then casts it back to a pointer. That works fine on all the platforms where int and pointer are the same size. But breaks when they are different size.

You have a pointer that you say should equal 0x7feffcaff but it actually equals 0xfffffffffeffcaff. That is exactly what would happen if you cast the pointer to an int then back to a pointer. But the code you posted doesn't do that.

You said ucPacket becomes 0xfffffffffeffcaff on the return from ucbuild_oid. If that is true and you look just a little closer, you should be able to identify the bug.

You have the line
Code:
ucPacket = ucbuild_oid (ber_oid, OIDpollerid_length - 1, ucPacket);
In most debuggers, if you single step into ucbuild_oid and later single step out of it, you will still be on that line, indicating the function call has finished and the assignment hasn't happened yet. So at that moment, ucPacket should still hold its previous value and the register %rax should hold the value that is about to be assigned to ucPacket. Which of those is 0xfffffffffeffcaff?

The function returns with the line
Code:
return  build_len( op - p, p );
You should be able to single step up to or break at that line. What are p and op before that line executes? If you step over that line, I think the return will not occur (until you step again). At that point the value being returned is in the register %rax. What is it?

When a pointer from a 32 bit system with MSB set is passed as is to a 64 bit system, 'gcc' sign extends the pointer - this is not a bug (since C99 standard does not mandate any behavior in this case), and it is written in 'gcc' documentation.

The sign extension is most likely not what is functionally needed in this case. I read about a similar problem somewhere else - the guy spent quite a time debugging such a 32 -> 64 bits transition.

Well, my comment is a long shot, but who knows ...
 
Old 01-18-2010, 07:43 AM   #38
SeymourButts
LQ Newbie
 
Registered: Jan 2010
Posts: 28

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by johnsfine View Post
The function returns with the line
Code:
return  build_len( op - p, p );
You should be able to single step up to or break at that line. What are p and op before that line executes? If you step over that line, I think the return will not occur (until you step again). At that point the value being returned is in the register %rax. What is it?
Given the function:
Code:
uchar* ucbuild_oid  ( register sid* x, register uint l, register uchar* p ){
   register ulong *sidp = x + l, val;
   register uchar *op = p;
   uchar *rc;

   while( l-- ){
      val = *--sidp;
      *--p = val & 0x7f;
      while( val >>= 7 )
         *--p = val | 0x80;
   }

   rc =  build_len( op - p, p );
   return (rc);
}
Using gdb, I stopped at the "rc = build_len(...)" line and printed out the memory address for variables op, p, and rc.

(gdb) print op
$1 = (uchar *) 0x7fbfffbb0b "\004\003215@Âÿ¿\177"
(gdb) print p
$2 = (uchar *) 0x7fbfffbb00 "+\006\001\004\001J\002¹\f\001\023\004\003215@Âÿ¿\177"
(gdb) print rc
$3 = (uchar *) 0x7fbfffc040 "1007::11.22.33.44:Ethernet:"

I stepped into "build_len(..)" function...
(gdb) step
build_len (x=11, p=0x7fbfffbb00 "+\006\001\004\001J\002¹\f\001\023\004\003215@Âÿ¿\177")

I stepped to the end of "build_len(...)" and printed variable p.

(gdb) print p
$5 = (uchar *) 0x7fbfffbb00 "+\006\001\004\001J\002¹\f\001\023\004\003215@Âÿ¿\177"

Stepped back into ubuild_oid(...) at the line "return (rc);" and printed out rc.

(gdb) step
ucbuild_oid (x=0xb, l=4294967295, p=0x7fbfffbb00 "+\006\001\004\001J\002¹\f\001\023\004\003215@Âÿ¿\177") at ber.c:274
274 in ber.c
(gdb) print rc
$8 = (uchar *) 0x7fbfffbaff "\v+\006\001\004\001J\002¹\f\001\023\004\003215@Âÿ¿\177"

The next step took me back to the call to ucbuild_oid(..) function where I get the address out of bounds error for ucPacket.

(gdb) step
ucBuildSNMPv2Trap (ucPacket=0xffffffffbfffbaff <Address 0xffffffffbfffbaff out of bounds>,


The variable rc is what is returned to variable ucPacket.
 
Old 01-18-2010, 08:15 AM   #39
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by SeymourButts View Post
The next step took me back to the call to ucbuild_oid(..) function where I get the address out of bounds error for ucPacket.

(gdb) step
ucBuildSNMPv2Trap (ucPacket=0xffffffffbfffbaff <Address 0xffffffffbfffbaff out of bounds>,
If I understand you correctly, that step did a lot more than one step. In fact, it probably just started running and continued to the failure point.

GDB does that to me a lot. I don't know GDB well enough to know any work around. In other debuggers, with similar problems, I always switch to assembler view and do single assembler instruction steps. In GDB, even single assembler steps can lose control and run all the way up to the failure.

For GDB to mess up the step at that point means it misunderstands the call stack at that point. In the heavily templated code that I work on, GDB just misunderstands the call stack a lot and it would be no surprise. But in code as simple as your code, GDB doesn't usually misunderstand the call stack. Maybe it misunderstood the call stack because some incorrect memory write had already overwritten the call stack.
 
Old 01-18-2010, 10:06 AM   #40
SeymourButts
LQ Newbie
 
Registered: Jan 2010
Posts: 28

Original Poster
Rep: Reputation: 15
I used stepi of gdb instead of step and got a slightly different result.

Code:
uchar* build_len( register uint  x, register uchar* p ){

   if( x > 127 ){
      register uchar len = 0x81;
      *--p = x;
      while( x >>= 8 ){
         *--p = x;
         len++;
      }
      *--p = len;
   } else {
      *--p = x;
   }

   return p; 
}
On the line "return p" of build_len() function, I printed out address of p and it was good.

(gdb) print p
$5 = (uchar *) 0x7fbfffbaff "\v+\006\001\004\001J\002¹\f\001\023\004\003215@Âÿ¿\177"

I did a stepi to get to line 180 with is the closing bracket of build_len() function.

(gdb) stepi
180 in ber.c

Did another stepi and it apprears that the variable p address is out of bounds.

(gdb) stepi
0x00000000004100c7 in build_len (x=127, p=0xffffffff000003ef <Address 0xffffffff000003ef out of bounds>) at ber.c:180

Still not sure why, but atleast the out of bounds showed itself while still in this function.
 
Old 01-18-2010, 10:26 AM   #41
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
As you stepi through the epilogue of a function, the local variables go out of scope. GDB may not know the exact point a variable goes out of scope and may display garbage instead of the value. That doesn't mean anything is wrong.

I forget the GDB command for displaying a register. During or before the epilogue of the function, the register rax gets the return value. With an instruction such as return p; you should be able to see that rax gets the return value either before p gets clobbered or at the moment that the debugger stops understanding what p is.

There is also a GDB command that I forget for disassembling the instruction pointed to by rip (the next instruction that will execute). It is hard to have any idea what is going on with each stepi unless you have that disassembled instruction.
 
Old 01-18-2010, 10:46 AM   #42
SeymourButts
LQ Newbie
 
Registered: Jan 2010
Posts: 28

Original Poster
Rep: Reputation: 15
Break on "return p" call.

/* Print p */
(gdb) print p
$1 = (uchar *) 0x7fbfffbb0c "\003215@Âÿ¿\177"

/* Register rax */
(gdb) info registers
rax 0x3 3

Stepi to closing bracket in function. Print p and rax addresses. Looks good so far.

(gdb) print p
$4 = (uchar *) 0x7fbfffbb0c "\003215@Âÿ¿\177"

(gdb) info registers
rax 0x7fbfffbb0c 548682054412

Stepi again and see that p is out of bounds and rax still has a good address.

(gdb) stepi
0x00000000004100c7 in build_len (x=127, p=0x3ef <Address 0x3ef out of bounds>) at ber.c:180

(gdb) info registers
rax 0x7fbfffbb0c 548682054412
rbx 0x7fbfffbb0d 548682054413
rcx 0x353132 3486002
rdx 0x7fbfffbb0c 548682054412
rsi 0x7fbfffbb0d 548682054413
rdi 0x3 3
rbp 0x7fbfff9100 0x7fbfff9100
rsp 0x7fbfff90e8 0x7fbfff90e8
r8 0xfefefefefefefeff -72340172838076673
r9 0x0 0
r10 0x7fbfff9001 548682043393
r11 0x373d672650 237253371472
r12 0x3 3
r13 0x7b 123
r14 0x0 0
r15 0xffffffffffffffff -1
rip 0x4100c7 0x4100c7 <build_len+100>
eflags 0x307 775
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0


** When I step up to "return p", rip is...

rip 0x4100c2 0x4100c2 <build_len+95>


** Stepi again past "return p", rip is....

rip 0x4100c6 0x4100c6 <build_len+99>





(gdb) disassemble 0x4100c7
Dump of assembler code for function build_len:
0x0000000000410063 <build_len+0>: push %rbp
0x0000000000410064 <build_len+1>: mov %rsp,%rbp
0x0000000000410067 <build_len+4>: mov %edi,0xfffffffffffffffc(%rbp)
0x000000000041006a <build_len+7>: mov %rsi,0xfffffffffffffff0(%rbp)
0x000000000041006e <build_len+11>: cmpl $0x7f,0xfffffffffffffffc(%rbp)
0x0000000000410072 <build_len+15>: jbe 0x4100b4 <build_len+81>
0x0000000000410074 <build_len+17>: movb $0x81,0xffffffffffffffef(%rbp)
0x0000000000410078 <build_len+21>: decq 0xfffffffffffffff0(%rbp)
0x000000000041007c <build_len+25>: movzbl 0xfffffffffffffffc(%rbp),%eax
0x0000000000410080 <build_len+29>: mov 0xfffffffffffffff0(%rbp),%rdx
0x0000000000410084 <build_len+33>: mov %al,(%rdx)
0x0000000000410086 <build_len+35>: shrl $0x8,0xfffffffffffffffc(%rbp)
0x000000000041008a <build_len+39>: mov 0xfffffffffffffffc(%rbp),%eax
0x000000000041008d <build_len+42>: test %eax,%eax
0x000000000041008f <build_len+44>: je 0x4100a4 <build_len+65>
0x0000000000410091 <build_len+46>: decq 0xfffffffffffffff0(%rbp)
0x0000000000410095 <build_len+50>: movzbl 0xfffffffffffffffc(%rbp),%eax
0x0000000000410099 <build_len+54>: mov 0xfffffffffffffff0(%rbp),%rdx
0x000000000041009d <build_len+58>: mov %al,(%rdx)
0x000000000041009f <build_len+60>: incb 0xffffffffffffffef(%rbp)
0x00000000004100a2 <build_len+63>: jmp 0x410086 <build_len+35>
0x00000000004100a4 <build_len+65>: decq 0xfffffffffffffff0(%rbp)
0x00000000004100a8 <build_len+69>: movzbl 0xffffffffffffffef(%rbp),%eax
0x00000000004100ac <build_len+73>: mov 0xfffffffffffffff0(%rbp),%rdx
0x00000000004100b0 <build_len+77>: mov %al,(%rdx)
0x00000000004100b2 <build_len+79>: jmp 0x4100c2 <build_len+95>
0x00000000004100b4 <build_len+81>: decq 0xfffffffffffffff0(%rbp)
0x00000000004100b8 <build_len+85>: movzbl 0xfffffffffffffffc(%rbp),%eax
0x00000000004100bc <build_len+89>: mov 0xfffffffffffffff0(%rbp),%rdx
0x00000000004100c0 <build_len+93>: mov %al,(%rdx)
0x00000000004100c2 <build_len+95>: mov 0xfffffffffffffff0(%rbp),%rax
0x00000000004100c6 <build_len+99>: leaveq
0x00000000004100c7 <build_len+100>: retq
End of assembler dump.

Last edited by SeymourButts; 01-18-2010 at 11:46 AM.
 
Old 01-18-2010, 03:44 PM   #43
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Everything you posted there looks perfectly normal and correct. You stopped right at the ASM level return instruction. Nothing hints at anything being wrong at that point. Those register values ruled out completely a lot of possibilities that seemed unlikely but possible before (such as the buffer not being big enough).

I think I might have misunderstood something you said earlier.
Does build_len return correctly into the body of ucbuild_oid?
The line return (rc); in ucbuild_oid might or might not have any ASM code associated with it. A stepi from the return instruction of build_len, if that worked correctly, might go directly to the epilogue inside ucbuild_oid, which likely includes a leaveq instruction, just like the epilogue of build_len that you posted.

The value of local variables in that epilogue also doesn't matter. Only the value of rax matters.

When you reach the retq instruction inside ucbuild_oid, another stepi (if it works) should reach the instruction where that rax value is finally stored into ucPacket.
 
Old 01-19-2010, 08:20 AM   #44
SeymourButts
LQ Newbie
 
Registered: Jan 2010
Posts: 28

Original Poster
Rep: Reputation: 15
Kind of interesting. I added a printf() just before the "return (rc);" call and it seemed to push back the point at which the address out of bounds occurs.

(gdb) print ucPacket
$5 = (unsigned char *) 0x7fbfffbb0b "\004\003215@Âÿ¿\177"

(gdb) info registers
rax 0x7fbfffbaff 548682054399
rbx 0x3ef 1007
rcx 0x413a69 4274793
rdx 0x373d8345f0 237255214576
rsi 0x400 1024
rdi 0x1 1
rbp 0x7fbfff9e40 0x7fbfff9e40
rsp 0x7fbfff9110 0x7fbfff9110
r8 0x0 0
r9 0x1 1
r10 0x1 1
r11 0x1 1
r12 0x7fbfffc040 548682055744
r13 0x7b 123
r14 0x0 0
r15 0xffffffffffffffff -1
rip 0x41103e 0x41103e <ucBuildSNMPv2Trap+3102>
eflags 0x302 770
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0

(gdb) disassemble 0x41103e
..
0x000000000041103e <ucBuildSNMPv2Trap+3102>: cltq
..

(gdb) stepi
0x0000000000411040 263 in berencode.c

(gdb) print ucPacket
$6 = (unsigned char *) 0x7fbfffbb0b "\004\003215@Âÿ¿\177"

(gdb) info registers
rax 0xffffffffbfffbaff -1073759489
rbx 0x3ef 1007
rcx 0x413a69 4274793
rdx 0x373d8345f0 237255214576
rsi 0x400 1024
rdi 0x1 1
rbp 0x7fbfff9e40 0x7fbfff9e40
rsp 0x7fbfff9110 0x7fbfff9110
r8 0x0 0
r9 0x1 1
r10 0x1 1
r11 0x1 1
r12 0x7fbfffc040 548682055744
r13 0x7b 123
r14 0x0 0
r15 0xffffffffffffffff -1
rip 0x411040 0x411040 <ucBuildSNMPv2Trap+3104>
eflags 0x302 770
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0


(gdb) disassemble 0x411040
..
0x0000000000411040 <ucBuildSNMPv2Trap+3104>: mov %rax,0xfffffffffffffff8(%rbp)
..

(gdb) stepi
0x0000000000411040 264 in berencode.c

(gdb) print ucPacket
$8 = (unsigned char *) 0xffffffffbfffbaff <Address 0xffffffffbfffbaff out of bounds>

Last edited by SeymourButts; 01-19-2010 at 08:21 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
m4 define in a define Four Programming 0 03-04-2007 09:41 PM
c++ question, how to define a member array, and it's size, outside of the class dec.. Winter Knight Programming 2 01-23-2007 07:28 AM
C++ question #define ... Bluesuperman Programming 4 02-01-2005 09:49 AM
Question: How to define an SQL HAVING clause in relational algebra? jdruin Linux - Software 1 11-08-2004 07:50 AM
question on #define h/w Programming 7 12-03-2003 05:14 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration