ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Not sure myself. I was thinking it had soemthing to do with the difference in sizes, where the long and pointer on Linux are 8 bytes and 4 bytes on Solaris.
What value does OBJECT_ID_tag have in this context?
It evaluates to the value 0x06.
Quote:
it says there is a Memory Fault on the bolded build_tag() function below when the OBJECT_ID_tag is the first parm
Since the OBJECT_ID is just a number, it is more like that the second parameter (the pointer) is the cause of the problem. However, since this particular invocation is presumably just setting the tag in the packet, it is quite likely that the failure has already occurred prior to this point, and has caused a corruption (perhaps the call to ucbuild_oid).
==EDIT== oops, only read the first page of the thread when I responded
Porting code like this is non-trivial, especially if (as johnsfine has deduced) you are moving from a 32 bit system to a 64 bit system. The reason the post seems strange is because your knowledge of C appears to be less than required for this sort of job. Sergei is just suggesting learning a bit more about the language before attempting it.
Last edited by neonsignal; 01-17-2010 at 04:55 AM.
I was thinking it had soemthing to do with the difference in sizes, where the long and pointer on Linux are 8 bytes and 4 bytes on Solaris.
Probably, but that doesn't mean much until you find the actual bug.
I've seen a lot of code where for various reasons someone casts a pointer to an int, then processes or stores it, then casts it back to a pointer. That works fine on all the platforms where int and pointer are the same size. But breaks when they are different size.
You have a pointer that you say should equal 0x7feffcaff but it actually equals 0xfffffffffeffcaff. That is exactly what would happen if you cast the pointer to an int then back to a pointer. But the code you posted doesn't do that.
You said ucPacket becomes 0xfffffffffeffcaff on the return from ucbuild_oid. If that is true and you look just a little closer, you should be able to identify the bug.
In most debuggers, if you single step into ucbuild_oid and later single step out of it, you will still be on that line, indicating the function call has finished and the assignment hasn't happened yet. So at that moment, ucPacket should still hold its previous value and the register %rax should hold the value that is about to be assigned to ucPacket. Which of those is 0xfffffffffeffcaff?
The function returns with the line
Code:
return build_len( op - p, p );
You should be able to single step up to or break at that line. What are p and op before that line executes? If you step over that line, I think the return will not occur (until you step again). At that point the value being returned is in the register %rax. What is it?
Probably, but that doesn't mean much until you find the actual bug.
I've seen a lot of code where for various reasons someone casts a pointer to an int, then processes or stores it, then casts it back to a pointer. That works fine on all the platforms where int and pointer are the same size. But breaks when they are different size.
You have a pointer that you say should equal 0x7feffcaff but it actually equals 0xfffffffffeffcaff. That is exactly what would happen if you cast the pointer to an int then back to a pointer. But the code you posted doesn't do that.
You said ucPacket becomes 0xfffffffffeffcaff on the return from ucbuild_oid. If that is true and you look just a little closer, you should be able to identify the bug.
In most debuggers, if you single step into ucbuild_oid and later single step out of it, you will still be on that line, indicating the function call has finished and the assignment hasn't happened yet. So at that moment, ucPacket should still hold its previous value and the register %rax should hold the value that is about to be assigned to ucPacket. Which of those is 0xfffffffffeffcaff?
The function returns with the line
Code:
return build_len( op - p, p );
You should be able to single step up to or break at that line. What are p and op before that line executes? If you step over that line, I think the return will not occur (until you step again). At that point the value being returned is in the register %rax. What is it?
When a pointer from a 32 bit system with MSB set is passed as is to a 64 bit system, 'gcc' sign extends the pointer - this is not a bug (since C99 standard does not mandate any behavior in this case), and it is written in 'gcc' documentation.
The sign extension is most likely not what is functionally needed in this case. I read about a similar problem somewhere else - the guy spent quite a time debugging such a 32 -> 64 bits transition.
Well, my comment is a long shot, but who knows ...
You should be able to single step up to or break at that line. What are p and op before that line executes? If you step over that line, I think the return will not occur (until you step again). At that point the value being returned is in the register %rax. What is it?
Given the function:
Code:
uchar* ucbuild_oid ( register sid* x, register uint l, register uchar* p ){
register ulong *sidp = x + l, val;
register uchar *op = p;
uchar *rc;
while( l-- ){
val = *--sidp;
*--p = val & 0x7f;
while( val >>= 7 )
*--p = val | 0x80;
}
rc = build_len( op - p, p );
return (rc);
}
Using gdb, I stopped at the "rc = build_len(...)" line and printed out the memory address for variables op, p, and rc.
The next step took me back to the call to ucbuild_oid(..) function where I get the address out of bounds error for ucPacket.
(gdb) step
ucBuildSNMPv2Trap (ucPacket=0xffffffffbfffbaff <Address 0xffffffffbfffbaff out of bounds>,
If I understand you correctly, that step did a lot more than one step. In fact, it probably just started running and continued to the failure point.
GDB does that to me a lot. I don't know GDB well enough to know any work around. In other debuggers, with similar problems, I always switch to assembler view and do single assembler instruction steps. In GDB, even single assembler steps can lose control and run all the way up to the failure.
For GDB to mess up the step at that point means it misunderstands the call stack at that point. In the heavily templated code that I work on, GDB just misunderstands the call stack a lot and it would be no surprise. But in code as simple as your code, GDB doesn't usually misunderstand the call stack. Maybe it misunderstood the call stack because some incorrect memory write had already overwritten the call stack.
As you stepi through the epilogue of a function, the local variables go out of scope. GDB may not know the exact point a variable goes out of scope and may display garbage instead of the value. That doesn't mean anything is wrong.
I forget the GDB command for displaying a register. During or before the epilogue of the function, the register rax gets the return value. With an instruction such as return p; you should be able to see that rax gets the return value either before p gets clobbered or at the moment that the debugger stops understanding what p is.
There is also a GDB command that I forget for disassembling the instruction pointed to by rip (the next instruction that will execute). It is hard to have any idea what is going on with each stepi unless you have that disassembled instruction.
Everything you posted there looks perfectly normal and correct. You stopped right at the ASM level return instruction. Nothing hints at anything being wrong at that point. Those register values ruled out completely a lot of possibilities that seemed unlikely but possible before (such as the buffer not being big enough).
I think I might have misunderstood something you said earlier.
Does build_len return correctly into the body of ucbuild_oid?
The line return (rc); in ucbuild_oid might or might not have any ASM code associated with it. A stepi from the return instruction of build_len, if that worked correctly, might go directly to the epilogue inside ucbuild_oid, which likely includes a leaveq instruction, just like the epilogue of build_len that you posted.
The value of local variables in that epilogue also doesn't matter. Only the value of rax matters.
When you reach the retq instruction inside ucbuild_oid, another stepi (if it works) should reach the instruction where that rax value is finally stored into ucPacket.
Kind of interesting. I added a printf() just before the "return (rc);" call and it seemed to push back the point at which the address out of bounds occurs.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.