LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices

Reply
 
Search this Thread
Old 04-23-2010, 04:36 PM   #1
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Rep: Reputation: 94
Slackware 64 - Static compilation broken !


Hi,

I have a weird bug : segmentation fault appears when executing the 'retq' instruction of my sigalrm callback in static link...
It seems it happens only on slackware...
Here is a simple test case, compiled in shared -> no problem, static -> crash...
Paste the following script in a file named "test-sigalrm-pack2.sh", and execute it: it will generate the C++ source and a simple build/test script.... Just launch the build script (tst-sigalrm-build).

Code:
#!/bin/sh
#test-sigalrm-pack2.sh
# 64bit sigalrm segmentation fault test case package...

echo " * Generating source..."
cat	>tst-sigalrm.cpp	<<TESTSRC
//tst-sigalrm.cpp
#include <stdio.h>
#include <unistd.h>
#include <wait.h>
#include <sys/time.h>

typedef	void*	pvoid;

namespace{
	volatile	unsigned	int	alarmed	=0;
	struct	sigaction	action,oldAction;

	void	_onAlarmSignal(int	signal,siginfo_t* sigInfo,pvoid pUContext) {
		printf("Tick !\n");
		++alarmed;
	}

	void	_registerSignal() {
		action.sa_flags		=SA_SIGINFO;
		action.sa_sigaction	=_onAlarmSignal;
		action.sa_restorer	=NULL;
		sigemptyset(&action.sa_mask);

		sigaction(SIGALRM,&action,&oldAction);
	}

	void	_startTimer() {
		itimerval	value;
		value.it_interval.tv_sec	=0;
		value.it_interval.tv_usec	=100;
		value.it_value	=value.it_interval;
		setitimer(ITIMER_REAL,&value,NULL);
	}
}

int	main(int argc,const char **argv) {

	_registerSignal();
	_startTimer();

	do	;	while(alarmed<10);

	return	0;
}
TESTSRC

echo " * Generating build script..."
cat	>tst-sigalrm-build	<<TESTBUILD
#!/bin/sh

#Builds of the sigalrm test case:
g++ tst-sigalrm.cpp -o tst-sigalrm-shared
g++ -static tst-sigalrm.cpp -o tst-sigalrm-static

echo " * Shared run :"
tst-sigalrm-shared
echo " * Static run :"
tst-sigalrm-static
TESTBUILD
chmod a+x "tst-sigalrm-build"
I suspect some 'mismatch' in the static libraries that uses some '32bit' somewhere and when the 'retq' pops back the return adress, it's totally wrong (it's my guess, but I have no clue, after several weeks of debugging with ddd/google/glic mailing list/LQ programming forums)

That's so stupid... I need alarm to make my cursor to blink !

Thanks

Garry.
 
Old 04-23-2010, 04:59 PM   #2
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Original Poster
Rep: Reputation: 94
Forgot to mention...

Sorry I forgot to mention... The same code was doing good on slackware 32, I just encountered it switching my system to slackware 64 (I use -current branch).
I had some feedback from ubuntu 64 users being able to run it without problem, but I have no guarantee at the time that they really tried static compilation.

Cheers.
 
Old 04-24-2010, 12:15 PM   #3
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,758

Rep: Reputation: 468Reputation: 468Reputation: 468Reputation: 468Reputation: 468
You may need '-fPIC' in the g++ options.
 
Old 04-25-2010, 04:39 AM   #4
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Original Poster
Rep: Reputation: 94
Quote:
Originally Posted by gnashley View Post
You may need '-fPIC' in the g++ options.
Hey thank you,
I tried, and it didn't change anything, still crash at the same exact place for the same reason.

Anyway I didn't believed it was that, because as I mentionned : generated code is 64bit (it's a retQ seen in the debugger...) and it's working without problem in shared model which doesn't change "anything" but the glibc library version used to link. (And I checked gcc target config which is, as expected, default on x86_64...)

So am I the only one who got this piece of code crashing on slackware64-current ?

Cheers

Garry.
 
Old 04-25-2010, 09:42 AM   #5
bgeddy
Senior Member
 
Registered: Sep 2006
Location: Liverpool - England
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810

Rep: Reputation: 227Reputation: 227Reputation: 227
Quote:
So am I the only one who got this piece of code crashing on slackware64-current ?
I haven't tried this on current but out of curiosity I tried it on Slackware64 -13 and it crashes the same with a segmentation fault. The shared lib version runs fine.

I'm afraid I don't know much about building statically and so can't be of any guidance.
 
Old 04-25-2010, 03:01 PM   #6
GazL
Senior Member
 
Registered: May 2008
Posts: 3,392

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
I can confirm a segfault on 64-current.

I tried converting it to pure C and using gcc rather than g++ but it does exactly the same thing.
I also tried changing it to use the slightly simpler action.sa_handler invocation rather than using action.sa_sigaction but again, it gives exactly the same segfault issue when built statically.

Some sort of libc bug perhaps?
 
1 members found this post helpful.
Old 04-25-2010, 08:44 PM   #7
GazL
Senior Member
 
Registered: May 2008
Posts: 3,392

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
I've done a little more digging on this one. I worked with my C version of the code, which is slightly different to the OPs, but show the same symptoms (I'm not much good with C++)

gazl-sig.c:
Code:
#include <sys/time.h>
#include <signal.h>
#include <stdio.h>

static int alarmed;
struct sigaction action,oldaction;

void onAlarmSignal( int signal)
{
   printf("Tick!\n"); 
   ++alarmed;
}

void registerSignal()
{
   action.sa_handler=onAlarmSignal;
   sigemptyset(&action.sa_mask);

   sigaction(SIGALRM,&action,&oldaction);
}


void startTimer()
{
   struct itimerval value;
   value.it_interval.tv_sec=0;
   value.it_interval.tv_usec=100;
   value.it_value=value.it_interval;
   setitimer(ITIMER_REAL, &value, NULL);
}

int main( int argc, const char *argv[])
{
   alarmed=0;
   registerSignal();
   startTimer();

   do 
    ;
   while (alarmed <10);

   return 0;
}
Then I compiled it with:
gcc -Wall -O0 -g -static gazl-sig.c -o sig-static

Now for the interesting bit... gdb time.
Code:
(gdb) run
Starting program: /tmp/sig-static 
Tick!

Program received signal SIGSEGV, Segmentation fault.
0x00000000004002d1 in onAlarmSignal (signal=1) at gazl-sig.c:12
12      }
(gdb) disassemble
Dump of assembler code for function onAlarmSignal:
0x00000000004002ac <onAlarmSignal+0>:   push   %rbp
0x00000000004002ad <onAlarmSignal+1>:   mov    %rsp,%rbp
0x00000000004002b0 <onAlarmSignal+4>:   sub    $0x10,%rsp
0x00000000004002b4 <onAlarmSignal+8>:   mov    %edi,-0x4(%rbp)
0x00000000004002b7 <onAlarmSignal+11>:  mov    $0x46f824,%edi
0x00000000004002bc <onAlarmSignal+16>:  callq  0x4010c0 <puts>
0x00000000004002c1 <onAlarmSignal+21>:  mov    0x296009(%rip),%eax        # 0x6962d0 <alarmed>
0x00000000004002c7 <onAlarmSignal+27>:  add    $0x1,%eax
0x00000000004002ca <onAlarmSignal+30>:  mov    %eax,0x296000(%rip)        # 0x6962d0 <alarmed>
0x00000000004002d0 <onAlarmSignal+36>:  leaveq 
0x00000000004002d1 <onAlarmSignal+37>:  retq   
End of assembler dump.
The segfault is occurring on the retq instruction.

The next question is where is it trying to return to? So, lets set a breakpoint and examine the return address on the stack just before it tries to return:
Code:
(gdb) break *0x4002d1
Breakpoint 1 at 0x4002d1: file gazl-sig.c, line 12.
(gdb) run
Starting program: /tmp/sig-static 
Tick!

Breakpoint 1, 0x00000000004002d1 in onAlarmSignal (signal=1) at gazl-sig.c:12
12      }
(gdb) x/a $rsp
0x7fff1800d3f8: 0xf0000000fc0c748
(gdb) x 0xf0000000fc0c748
0xf0000000fc0c748:      Cannot access memory at address 0xf0000000fc0c748
(gdb)
I may be misinterpreting it, but it looks like it's trying to return to the middle of nowhere.

The shared lib version seems to work in a completely different manner and actually returns to some executable code on the stack (I guess that's how shared libraries work).

Anyway, that's about as far as I can go with this. I don't have the knowledge to dig any deeper.
 
Old 04-26-2010, 12:41 AM   #8
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Original Poster
Rep: Reputation: 94
Thanks all for your feedback.

So some simple things :
- it happens only on 64 bit slackware (being tested succesfully on ubuntu 64)
- it happens only on static link.
- it happens on a system call
- it happens systematically, and even with an empty function, so no 'memory/stack/buffer override' here.

My obvious guess is that the 'retq' as not the same 'size' of the call (it's been called by a 'call' not a 'callq'). That would explain the totally broken pointer.

I'm missing some 'underground' knowledge here so I 'guess' that the caller is the kernel himself. Then if I don't mistake, kernel supports 32bit binaries 'as well' (silently) (nothing related to 'external shared libraries', I'm talking on a static linkage point of view). But to do the 'right call' (calling as the signal as 32bit handler or 64bit handler) there might be 'somewhere' where the kernels get this info.

I mean... "simply"... I think the 64bit kernel can handle both 32bit and 64bit processes... I think that the statically linked binary might be tagged as '32bit' somewhere... But a dump of elf infos still shows a 64bit binary (ld does a good job)... So that might be when the glibc registers the sigaction somewhere or something that is done directly by the buggy compiled process that send the kernel wrong informations.

I guess most of this glue code to be in the glibc, and/or tightly coupled with some gcc crtX.o runtime.
I've tried to look at the 'gcc' package slackbuild and it seemed alright, I mean it should take care of 64bit (and it takes care of that for shared libs), and from first observation, it should do what expected. But I can't help suspecting the static glibc libs to be built with some wrong option...

So I don't have any new way to look into, I have this 'guess' but don't know how to prove/unprove it. And don't know how to find the 'guilty one' in that chain.

Is there anybody working on the Slackware x86/64 build around ?

It might just be a slackbuild 'hack' to do.

Thank you all for the support.

Cheers

Garry.

Last edited by NoStressHQ; 04-26-2010 at 12:55 AM. Reason: Some corrections and precisions.
 
Old 04-26-2010, 11:15 AM   #9
bgeddy
Senior Member
 
Registered: Sep 2006
Location: Liverpool - England
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810

Rep: Reputation: 227Reputation: 227Reputation: 227
Just some more information to be going on with as this intrigues me.

I have turned the source into an Eclipse cpp project and put in the appropriate -static linker flags for Eclipse. This builds a statically linked executable, (as,just to be certain, "file my_alarm" confirms for me). The resulting binary seg faults as usual when ran from cli but runs OK from within Eclipse IDE! Hmm, strange.
 
Old 04-26-2010, 05:50 PM   #10
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Original Poster
Rep: Reputation: 94
Quote:
Originally Posted by bgeddy View Post
Just some more information to be going on with as this intrigues me. [...]
Thanks for the help. This makes me ask if there were 'something' different from a process spawn and a CLI launch. I mean, I thought that the kernel, somehow (elf infos?) get the binary 'bit size'. I assume that the binary you're using is the same whereas you launch it from Eclipse or CLI... So... Is there a way for the calling process to tell the system in which 'bit depth mode' the binary is ? Maybe it's the 'fork/exec' pair that copy the eclipse's 64bit 'flags' to its child process, and could bypass the elf infos baked in binary ? (Big guess here, but the fork would explain that, on the other hand the shell forks too to launch a binary...)

This is an interesting new behavior, yet it's still a mistery !

Cheers

Garry.
 
Old 04-26-2010, 11:57 PM   #11
salemboot
Member
 
Registered: Mar 2007
Location: America
Distribution: Linux
Posts: 159

Rep: Reputation: 36
I was looking at your code.

Looks like you may be relying on the compiler to fix up your code.

You're writing c synatax in a c plus plus compiler.

using namespace std;

namespace foo
{

void main( void )
{
cout << "my message";
}
}


Chances are the C compiler or the glibc could be broken though. I remember back in the day there was a return error that needed patching when you upgraded GCC. EGCS or something.

This machine is Ubuntu 9.10 so it's compiler version is


# gcc -v

Thread model: posix
gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)

Check your version and check google to see if there are reports for that version of the compiler.

Last edited by salemboot; 04-27-2010 at 12:07 AM.
 
Old 04-28-2010, 02:49 PM   #12
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Original Poster
Rep: Reputation: 94
I've run a test to check my guess... I thought the caller was doing some 32bit call to the 64bit callback...

So I 'hacked' the callback this way :

Code:
	void	_onAlarmSignal(int	signal,siginfo_t* sigInfo,pvoid pUContext) {
		printf("Tick !\n");
		++alarmed;

		asm	(	"leaveq\n\t"
				"retw\n\t"
			);
	}
This force a 32bit ret, but it doesn't fix the crash... So, my guess was wrong. It's not related to a call size mismatch...

Anyone else for some clue here ?

Note: You can write this code in ASM it'll still crash... That problem is not a 'religious syntax problem' C++ vs C or whatever, it's about static standard library build... It's a 'system programming' problem, not a "I don't know how to write this code". This is a bug test case, doesn't represent the 'real life code' of course.

Thanks !

Cheers.

Garry.

---- If it can be usefull ----
Target: x86_64-slackware-linux
Configured with: ../gcc-4.4.3/configure --prefix=/usr --libdir=/usr/lib64 --enable-shared --enable-bootstrap --enable-languages=ada,c,c++,fortran,java,objc --enable-threads=posix --enable-checking=release --with-system-zlib --with-python-dir=/lib64/python2.6/site-packages --disable-libunwind-exceptions --enable-__cxa_atexit --enable-libssp --with-gnu-ld --verbose --disable-multilib --target=x86_64-slackware-linux --build=x86_64-slackware-linux --host=x86_64-slackware-linux
Thread model: posix
gcc version 4.4.3 (GCC)
--------------------------------

Last edited by NoStressHQ; 04-28-2010 at 02:51 PM. Reason: Added gcc/platform infos...
 
Old 04-28-2010, 05:05 PM   #13
GazL
Senior Member
 
Registered: May 2008
Posts: 3,392

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
Ok, found out a little more.
By debugging the shared version of the program I've found that the return address on the stack
points to symbol __restore_rt in libc.so.6:
Code:
(gdb) c
Continuing.
Tick!

Breakpoint 2, 0x0000000000400601 in onAlarmSignal (signal=1) at gazl-sig.c:12
12      }
(gdb) x/a $rsp
0x7fff703940b8: 0x7fc42871d450 <__restore_rt>
(gdb) info shared
From                To                  Syms Read   Shared Object Library
0x00007fc428a5ba90  0x00007fc428a73ed4  Yes         /lib64/ld-linux-x86-64.so.2
0x00007fc4287087e0  0x00007fc42880a4a4  Yes         /lib64/libc.so.6
(gdb)
Now when we look at the return address in the statically linked program, we can see
that __restore_rt is at a different location than that of the return address on the top of the stack (which actually looks like an int to me rather than an address):
Code:
(gdb) c
Continuing.
Tick!

Breakpoint 2, 0x00000000004002d1 in onAlarmSignal (signal=1) at gazl-sig.c:12
12      }
(gdb) x/a $rsp
0x7fffa3a31a78: 0xf0000000fc0c748
(gdb) info address __restore_rt
Symbol "__restore_rt" is at 0x400af0 in a file compiled without debugging.
(gdb) info symbol __restore_rt
__restore_rt in section .text
Now, if I manually change that return address on the stack to point to __restore_rt, the program seems to continue correctly (for 1 iteration, at which point the stack has the wrong value again):
Code:
(gdb) set {long} 0x7fffa3a31a78 = __restore_rt
(gdb) x/a $rsp
0x7fffa3a31a78: 0x400af0 <__restore_rt>
(gdb) c
Continuing.
Tick!

Breakpoint 2, 0x00000000004002d1 in onAlarmSignal (signal=1) at gazl-sig.c:12
12      }
(gdb)
Quite why the return address on the stack isn't pointing at "__restore_rt" I have no idea, but that looks like it's what's going wrong.

Last edited by GazL; 04-28-2010 at 05:09 PM.
 
Old 04-28-2010, 08:36 PM   #14
bgeddy
Senior Member
 
Registered: Sep 2006
Location: Liverpool - England
Distribution: slackware64 13.37 and -current, Dragonfly BSD
Posts: 1,810

Rep: Reputation: 227Reputation: 227Reputation: 227
Yes, I figured the stack must be somehow getting messed and trashing the return. I have not, however, been able to give this as much attention today as I had hoped as my development environment has got trashed and needs fixing, (a long story - suffice to say Eclipse can be an absolute nightmare). It would be nice to pinpoint what was causing the frame to get messed up like this.

Nice one for the detective work and keeping us posted !
 
Old 05-03-2010, 10:23 PM   #15
NoStressHQ
Member
 
Registered: Apr 2010
Location: Lausanne - Switzerland ( Bordeaux - France / Montreal - QC - Canada)
Distribution: Slackware Leet - 32/64bit
Posts: 280

Original Poster
Rep: Reputation: 94
Hi,

Thanks for the trace, that's effectively what I got too.

Meanwhile I tried to find some informations.

restore_rt is a special address used by glibc (look in 'signal.c' of the appropriate architecture). I've read that when you do a kernel call, that symbol is 'inserted' in the stack as a return address for the signal handler. Sorry I can't find where I've read that. But you should be able to find this info if you look around "signal" "restore_rt" and such keywords.

Also, as I have a lot of statically compiled programs, not requiring signals, I've found that trying to trace such a program with gdb made gdb freezes quite quickly (you might need two sources and a call from main)... So those programs are working well (if no bug ) because they don't do 'signals', but when trying to trace (if a bug ) gdb quickly freezes. First I thought it was ddd, but CLI gdb does the same. (EDIT: After some more tests I'm not 100% sure about that, it seems GDB just takes ages sometimes, but still it's far longer that what I experienced on Slack32).

I'm pretty sure that this thread talks about exactly the same problem (but with no solution) : http://www.gossamer-threads.com/lists/openssh/dev/47519

So I still think that 'somehow' the static build of the glibc libraries are somehow broken (maybe vs static build of gcc+gdb, and so on...).

It seems that if we don't find it ourselves, we're stucked .

I sincerely think that even if 'static build' is not so common nowadays, it should works, there are quite some situations that requires it.

So we have to debug our Slackware64 build not to be ashamed by Ubuntu users .

Cheers

Garry.

Last edited by NoStressHQ; 05-04-2010 at 01:09 AM. Reason: Precision on GDB freezes / high latency.
 
  


Reply

Tags
gcc, glibc, linking, programming, slackware, static


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Static ip prob in slackware 12.2 netpumber Slackware 12 02-15-2009 08:44 AM
static library cross compilation kskkumar Programming 3 03-20-2007 12:35 PM
slackware current broken? sweetnsourbkr Slackware 8 03-11-2006 06:26 AM
Static Bind Applications to Kernel; cross compilation Alybyzrp Linux - Newbie 0 11-25-2003 01:38 PM
static compilation of Qt programs tb_4js Linux - Software 0 04-25-2003 08:13 AM


All times are GMT -5. The time now is 03:40 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration