LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Weird SIGALRM segmentation fault on 64bit linux... (https://www.linuxquestions.org/questions/programming-9/weird-sigalrm-segmentation-fault-on-64bit-linux-802592/)

NoStressHQ 04-17-2010 04:20 PM

Weird SIGALRM segmentation fault on 64bit linux...
 
Hi,

I've got a weird segmentation fault on the ending 'retq' instruction of my alarm callback, as if "calling pointer size" mismatched the 'q' of the 64bit retq. I've been trying to understand this bug for a while and couldn't get a clue.
This code worked well on 32bit Slackware/Ubuntu/Debian.
It now crashes on my 64bit Slackware install.

I've written a small test case script for those who want to try it :

Code:

#!/bin/sh
#test-sigalrm-pack.sh
# 64bit sigalrm segmentation fault test case package...

echo " * Generating source..."
cat        >tst-sigalrm.cpp        <<TESTSRC
//tst-sigalrm.cpp
#include <stdio.h>
#include <unistd.h>
#include <wait.h>
#include <sys/time.h>

typedef        void*        pvoid;

namespace{
        volatile        unsigned        int        alarmed        =0;
        struct        sigaction        action,oldAction;

        void        _onAlarmSignal(int        signal,siginfo_t* sigInfo,pvoid pUContext) {
                printf("Tick !\n");
                ++alarmed;
        }

        void        _registerSignal() {
                action.sa_flags                =SA_SIGINFO;
                action.sa_sigaction        =_onAlarmSignal;
                action.sa_restorer        =NULL;
                sigemptyset(&action.sa_mask);

                sigaction(SIGALRM,&action,&oldAction);
        }

        void        _startTimer() {
                itimerval        value;
                value.it_interval.tv_sec        =0;
                value.it_interval.tv_usec        =100;
                value.it_value        =value.it_interval;
                setitimer(ITIMER_REAL,&value,NULL);
        }
}

int        main(int argc,const char **argv) {

        _registerSignal();
        _startTimer();

        do        ;        while(alarmed<10);

        return        0;
}
TESTSRC

echo " * Generating build script..."
cat        >tst-sigalrm-build        <<TESTBUILD
#!/bin/sh

#Custom build of the sigalrm test case:
echo " * Build source..."
cc -c -o "tst-sigalrm.o" -fpermissive -g3 -ggdb -w -D _DEBUG "tst-sigalrm.cpp"

#Custom link the test case:
#
#        In order to link I need first to make this link on my system,
#        this is because most distros just forget about static link.
#        If anybody has a better idea for this :)... (Something that could
#        work on any distro without 'hacking' the install...)
#
#        /usr/lib64/gcclib -> gcc/x86_64-slackware-linux/4.4.3
#
#
echo " * Linking..."
ld -static -L "/usr/lib64/" -o "tst-sigalrm" \\
        /usr/lib64/crt1.o /usr/lib64/crti.o \\
        /usr/lib64/gcclib/crtbegin.o  \\
        "tst-sigalrm.o" \\
        -L/usr/lib64/gcclib \\
        -\\( -lgcc -lstdc++ -lgcc_eh -lm -lc -\\) \\
        /usr/lib64/gcclib/crtend.o \\
        /usr/lib64/crtn.o

TESTBUILD
chmod a+x "tst-sigalrm-build"

Paste this script into a file (like "test-sigalrm-pack.sh") and execute it ( $ sh test-sigalrm-pack.sh ), it will generate a cpp file (the source) and another script file that use the kind of link I need (static link) in the current directory.
Also in order to link you might want to create a symbolic link to you glibc files (see note in the build script), I don't know how to do that "universally" (without the symbolic link 'hack'), ideas would be greatly appreciated ! :)

Thank you.

Garry.

ntubski 04-17-2010 05:17 PM

I get no crash here:
Code:

~/tmp/test-sig-alarm$ uname -sm
Linux x86_64
~/tmp/test-sig-alarm$ gcc --version
gcc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

~/tmp/test-sig-alarm$ cat /etc/issue
Ubuntu 8.04.3 LTS \n \l

~/tmp/test-sig-alarm$ ./tst-sigalrm
Tick !
Tick !
Tick !
Tick !
Tick !
Tick !
Tick !
Tick !
Tick !
Tick !
~/tmp/test-sig-alarm$

You can link with just
Code:

g++ -static tst-sigalrm.o -o tst-sigarm

NoStressHQ 04-17-2010 05:25 PM

Quote:

Originally Posted by ntubski (Post 3938841)
I get no crash here:
...
You can link with just
Code:

g++ -static tst-sigalrm.o -o tst-sigarm

Hi thank you for your quick reply.
I know about linking, but "the real situation" uses a build system that generates makefile from project definitions, so I extracted the 'link line' from the generated makefile. If it comes from the link, I need to know why so I can fix the build system. I have separated compilation phases. I'm not using this code for *that* useless program of course :). What I mean is that I need a separate "ld" pass.

Cheers

Garry.

Sergei Steshenko 04-17-2010 05:44 PM

Quote:

Originally Posted by NoStressHQ (Post 3938802)
Hi,

I've got a weird segmentation fault on the ending 'retq' instruction of my alarm callback, as if "calling pointer size" mismatched the 'q' of the 64bit retq. I've been trying to understand this bug for a while and couldn't get a clue.
This code worked well on 32bit Slackware/Ubuntu/Debian.
It now crashes on my 64bit Slackware install.

I've written a small test case script for those who want to try it :

Code:

#!/bin/sh
#test-sigalrm-pack.sh
# 64bit sigalrm segmentation fault test case package...

echo " * Generating source..."
cat        >tst-sigalrm.cpp        <<TESTSRC
//tst-sigalrm.cpp
#include <stdio.h>
#include <unistd.h>
#include <wait.h>
#include <sys/time.h>

typedef        void*        pvoid;

namespace{
        volatile        unsigned        int        alarmed        =0;
        struct        sigaction        action,oldAction;

        void        _onAlarmSignal(int        signal,siginfo_t* sigInfo,pvoid pUContext) {
                printf("Tick !\n");
                ++alarmed;
        }

        void        _registerSignal() {
                action.sa_flags                =SA_SIGINFO;
                action.sa_sigaction        =_onAlarmSignal;
                action.sa_restorer        =NULL;
                sigemptyset(&action.sa_mask);

                sigaction(SIGALRM,&action,&oldAction);
        }

        void        _startTimer() {
                itimerval        value;
                value.it_interval.tv_sec        =0;
                value.it_interval.tv_usec        =100;
                value.it_value        =value.it_interval;
                setitimer(ITIMER_REAL,&value,NULL);
        }
}

int        main(int argc,const char **argv) {

        _registerSignal();
        _startTimer();

        do        ;        while(alarmed<10);

        return        0;
}
TESTSRC

echo " * Generating build script..."
cat        >tst-sigalrm-build        <<TESTBUILD
#!/bin/sh

#Custom build of the sigalrm test case:
echo " * Build source..."
cc -c -o "tst-sigalrm.o" -fpermissive -g3 -ggdb -w -D _DEBUG "tst-sigalrm.cpp"

#Custom link the test case:
#
#        In order to link I need first to make this link on my system,
#        this is because most distros just forget about static link.
#        If anybody has a better idea for this :)... (Something that could
#        work on any distro without 'hacking' the install...)
#
#        /usr/lib64/gcclib -> gcc/x86_64-slackware-linux/4.4.3
#
#
echo " * Linking..."
ld -static -L "/usr/lib64/" -o "tst-sigalrm" \\
        /usr/lib64/crt1.o /usr/lib64/crti.o \\
        /usr/lib64/gcclib/crtbegin.o  \\
        "tst-sigalrm.o" \\
        -L/usr/lib64/gcclib \\
        -\\( -lgcc -lstdc++ -lgcc_eh -lm -lc -\\) \\
        /usr/lib64/gcclib/crtend.o \\
        /usr/lib64/crtn.o

TESTBUILD
chmod a+x "tst-sigalrm-build"

Paste this script into a file (like "test-sigalrm-pack.sh") and execute it ( $ sh test-sigalrm-pack.sh ), it will generate a cpp file (the source) and another script file that use the kind of link I need (static link) in the current directory.
Also in order to link you might want to create a symbolic link to you glibc files (see note in the build script), I don't know how to do that "universally" (without the symbolic link 'hack'), ideas would be greatly appreciated ! :)

Thank you.

Garry.

I would first suggest to replace

Code:

printf("Tick !\n");
with

Code:

fprintf(stderr, "Tick !\n");
in order to avoid any stdout buffering.


Also, add to compilation line '-Wall -Wextra'.

ntubski 04-17-2010 06:51 PM

Quote:

Originally Posted by NoStressHQ (Post 3938844)
What I mean is that I need a separate "ld" pass.

I was going to say that the g++ command I posted just calls ld with the correct arguments, but actually it calls collect2 which then calls ld. Anyway, the test program doesn't crash whichever way I link it.

NoStressHQ 04-23-2010 04:48 PM

Thanks
 
Hey,

Thank you all for taking time to test. Sorry I was busy on another project and couldn't check sooner.

First of all, of course, anybody should know that stderr stuff and warnings are irrelevant to the problem.

For those who tried to understand and test, thank you, it's true I can reproduce the problem with a much simpler compiling line. In fact I first suspected the build system I used to link with the wrong crts/gcc libs, but trying with the simple "g++" command, I found that it worked well with shared linking (no special option) and still crashes when in static (-static) so I updated the test case...

(If you don't want the whole script and still got the source somewhere you can just try these:
Code:

g++ tst-sigalrm.cpp -o tst-sigalrm-shared
g++ -static tst-sigalrm.cpp -o tst-sigalrm-static

test-sigalrm-pack2.sh:
Code:

#!/bin/sh
#test-sigalrm-pack2.sh
# 64bit sigalrm segmentation fault test case package...

echo " * Generating source..."
cat        >tst-sigalrm.cpp        <<TESTSRC
//tst-sigalrm.cpp
#include <stdio.h>
#include <unistd.h>
#include <wait.h>
#include <sys/time.h>

typedef        void*        pvoid;

namespace{
        volatile        unsigned        int        alarmed        =0;
        struct        sigaction        action,oldAction;

        void        _onAlarmSignal(int        signal,siginfo_t* sigInfo,pvoid pUContext) {
                printf("Tick !\n");
                ++alarmed;
        }

        void        _registerSignal() {
                action.sa_flags                =SA_SIGINFO;
                action.sa_sigaction        =_onAlarmSignal;
                action.sa_restorer        =NULL;
                sigemptyset(&action.sa_mask);

                sigaction(SIGALRM,&action,&oldAction);
        }

        void        _startTimer() {
                itimerval        value;
                value.it_interval.tv_sec        =0;
                value.it_interval.tv_usec        =100;
                value.it_value        =value.it_interval;
                setitimer(ITIMER_REAL,&value,NULL);
        }
}

int        main(int argc,const char **argv) {

        _registerSignal();
        _startTimer();

        do        ;        while(alarmed<10);

        return        0;
}
TESTSRC

echo " * Generating build script..."
cat        >tst-sigalrm-build        <<TESTBUILD
#!/bin/sh

#Builds of the sigalrm test case:
g++ tst-sigalrm.cpp -o tst-sigalrm-shared
g++ -static tst-sigalrm.cpp -o tst-sigalrm-static

echo " * Shared run :"
tst-sigalrm-shared
echo " * Static run :"
tst-sigalrm-static
TESTBUILD
chmod a+x "tst-sigalrm-build"

It just compile with the simple g++ command, using -static for one compilation.

So does anyone have a clue ? Might it be a problem with Slackware 64 only ? Some static library built with the wrong "arch" or something like this ?

Thanks

Garry.

Sergei Steshenko 04-23-2010 05:31 PM

Quote:

Originally Posted by NoStressHQ (Post 3945723)
Hey,

...
First of all, of course, anybody should know that stderr stuff and warnings are irrelevant to the problem.
...


Maybe.

My points are:
  1. it is not known exactly where the program crashes;
  2. output to stderr is unbuffered, so if it happens before the crash, there is a chance to see it;
  3. knowing the exact location of the last executed before the crash statement might help in debugging the problem.

I.e. using stderr rather than stdout for diagnostic output is SOP, and I see no reason to change it.

...

What about '-Wall -Wextra' ?

NoStressHQ 04-23-2010 05:48 PM

Forum newbie != programming newbie :)
 
My points were:
1- I explained it crashes on the 'retq' of the callback even with an empty callback. I traced it instruction by instruction, watching the stack and everything...
2- This is a simple test case and stderr or stdout are both buffered and just don't use the same channel. And anyway it's just to show something... Again, it crashes even with an empty function...
3- See point 1.

Sorry if I 'sounded' rude, it's just that the question, as I understand it, is "far away" from your answer which seems to be intended to a programming student. No offense to students of course :), and no offense to you, I had the "I'm not a noob" reflex.

Learning is the path to follow...

Cheers.

Garry.

Sergei Steshenko 04-23-2010 05:56 PM

del

Sergei Steshenko 04-23-2010 05:58 PM

Quote:

Originally Posted by NoStressHQ (Post 3945771)
My points were:
1- I explained it crashes on the 'retq' of the callback even with an empty callback. I traced it instruction by instruction, watching the stack and everything...
2- This is a simple test case and stderr or stdout are both buffered and just don't use the same channel. And anyway it's just to show something... Again, it crashes even with an empty function...
3- See point 1.

Sorry if I 'sounded' rude, it's just that the question, as I understand it, is "far away" from your answer which seems to be intended to a programming student. No offense to students of course :), and no offense to you, I had the "I'm not a noob" reflex.

Learning is the path to follow...

Cheers.

Garry.

What about '-Wall -Wextra' ?

NoStressHQ 04-23-2010 06:11 PM

Quote:

Originally Posted by Sergei Steshenko (Post 3945786)
What about '-Wall -Wextra' ?

It's about "in real life" I use "warning as error" and "warning max level 4 whatever the compiler"...
I asked a precise question, "implicitly" explaining that I traced with debugger... (Usage of 'retq' implies you understand 'a bit' how a CPU works, an OS works, a compiler works, a debugger works)...

I wrote just a "test case" to show a sample of the crash... Then explain me how a "warning all" could change anything to an empty function... Again, explain me why it works in shared linkage and not in static linkage...

My point is simply :
"Don't try to correct what you think is wrong to the one who ask a question, just answer his question".

You never know who asks or 'what did' and 'what knows' that person, so don't take him for a noob... And I think if you understood better how a compiler works, assembly language, and how to use a debugger, you wouldn't even talked about "warnings".

Sorry, again, I've spend severals days (and maybe weeks) on tracking this so the pedantic "have you tried warning all" is irritating me... :)

And also, thank you again for those who took time to test, and answer to the questions. I didn't meant to start any debate here :).

"Peace"

Garry.

Sergei Steshenko 04-23-2010 06:14 PM

I am not sure how valid the following words are:

http://stackoverflow.com/questions/1...in-the-handler :

Quote:

According to the standard, you're really not allowed to do much in a signal handler. All you are guaranteed to be able to do in the signal-handling function, without causing undefined behavior, is to call signal, and to assign a value to a volatile static object of type the type sig_atomic_t.
*printf is too much to my taste.

Sergei Steshenko 04-23-2010 06:20 PM

Quote:

Originally Posted by NoStressHQ (Post 3945800)
It's about "in real life" I use "warning as error" and "warning max level 4 whatever the compiler"...
I asked a precise question, "implicitly" explaining that I traced with debugger... (Usage of 'retq' implies you understand 'a bit' how a CPU works, an OS works, a compiler works, a debugger works)...

I wrote just a "test case" to show a sample of the crash... Then explain me how a "warning all" could change anything to an empty function... Again, explain me why it works in shared linkage and not in static linkage...

My point is simply :
"Don't try to correct what you think is wrong to the one who ask a question, just answer his question".

You never know who asks or 'what did' and 'what knows' that person, so don't take him for a noob... And I think if you understood better how a compiler works, assembly language, and how to use a debugger, you wouldn't even talked about "warnings".

Sorry, again, I've spend severals days (and maybe weeks) on tracking this so the pedantic "have you tried warning all" is irritating me... :)

And also, thank you again for those who took time to test, and answer to the questions. I didn't meant to start any debate here :).

"Peace"

Garry.

You might be missing a number of points. For example, I know that with each new release 'gcc' is getting more and more stringent WRT language compliance. So, nobody needs to guess, it's better the compiler always produces all the warnings it can. I.e. somebody else trying your example with the newest compiler might see a warning you do not have.

About answering question - often answering a question with a question is a good answer.

NoStressHQ 04-23-2010 06:27 PM

Quote:

Originally Posted by Sergei Steshenko (Post 3945805)
*printf is too much to my taste.

Sorry but I still don't see the 'relevance'...

An empty function is 'too much' ?

If it crashes on your machine -> removes the printf you'll see it'll still crash...
If it doesn't crashes on your machine, you can only report me to know "which system" so I can pinpoint the guilty part of this bug.

I had this code work inside a framework of 30+ projects for more than a year on slackware 32...

This code compiles and works on slackware 64 in SHARED linkage...

This code compiles and crashes on slackware 64 in STATIC linkage (at the very specific time of 'poping' the return adress of the kernel caller (?) )... So I suspect a 32/64 bit mismatch... Nothing related to 'race condition' 'timeout' or 'warning'...

Of course when I mean this code i mean this way to use sigalrm... I don't printf into my 'real life' callback... :)

Again I'm not asking for programming courses... I see the bug, I just want to find why it happen and how to fix it.

Btw, thank you for taking time to answer.

Cheers

Sergei Steshenko 04-23-2010 07:05 PM

Quote:

Originally Posted by NoStressHQ (Post 3945818)
Sorry but I still don't see the 'relevance'...

An empty function is 'too much' ?

If it crashes on your machine -> removes the printf you'll see it'll still crash...
If it doesn't crashes on your machine, you can only report me to know "which system" so I can pinpoint the guilty part of this bug.

I had this code work inside a framework of 30+ projects for more than a year on slackware 32...

This code compiles and works on slackware 64 in SHARED linkage...

This code compiles and crashes on slackware 64 in STATIC linkage (at the very specific time of 'poping' the return adress of the kernel caller (?) )... So I suspect a 32/64 bit mismatch... Nothing related to 'race condition' 'timeout' or 'warning'...

Of course when I mean this code i mean this way to use sigalrm... I don't printf into my 'real life' callback... :)

Again I'm not asking for programming courses... I see the bug, I just want to find why it happen and how to fix it.

Btw, thank you for taking time to answer.

Cheers

So you are advertising yourself as not a newbie.

I went through previous posts in this thread and I do not see the following info:
  1. Your OS version (just name);
  2. Your 'gcc' version;
  3. Your 'glibc' version;
  4. Your 'binutils' version.

Meanwhile just performing WEB search/browsing I see some 'retq' related bugs. So, maybe your combination of OS + 'gcc' + glibc' + 'binutils' versions is affected by such a bug.

An example of such a bug: http://sourceware.org/ml/binutils/2008-03/msg00111.html .

I.e. in order to resolve the issue I would try to use different (newer if available) versions of the above tools.

...

Why another thread:
http://www.linuxquestions.org/questi...broken-803845/ ?


All times are GMT -5. The time now is 01:12 AM.