LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Tracking down "Segmentation fault" in blassic (https://www.linuxquestions.org/questions/slackware-14/tracking-down-segmentation-fault-in-blassic-4175522423/)

psionl0 10-17-2014 09:15 AM

Tracking down "Segmentation fault" in blassic
 
I have blassic installed on my 64-bit PC and 32-bit netbook.

On the PC it runs fine (using multilib) but on the netbook, every time I try to dimension a string array, it causes a segmentation fault and the terminal cursor disappears.

In an attempt to try and find out what is going wrong, I edited the blassic.SlackBuild script to stop it stripping down the symbols and then recreated and upgraded the new package.

When I ran gdb, this is what I got:
Code:

Reading symbols from /usr/bin/blassic...done.
(gdb) run
Starting program: /usr/bin/blassic
warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Blassic 0.10.2
(C) 2001-2009 Julian Albo

Ok
dim a$(10)

Program received signal SIGSEGV, Segmentation fault.
0x0804e543 in __gnu_cxx::__exchange_and_add_dispatch(int*, int) [clone .constprop.317] ()
(gdb)

Where do I go from here?

the3dfxdude 10-17-2014 01:03 PM

Code:

bash-4.2$ gdb ./blassic
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-slackware-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/nnn/blassic/blassic-0.10.2/blassic...done.
(gdb) run
Starting program: /home/nnn/blassic/blassic-0.10.2/./blassic

Blassic 0.10.2
(C) 2001-2009 Julian Albo

Ok
dim a$(10)
Ok

Do you know more about how you compiled your program?

ntubski 10-17-2014 01:31 PM

Looking at the backtrace may be useful.

exchange_and_add sounds like a threading synchronization primitive to me.

psionl0 10-17-2014 07:36 PM

Quote:

Originally Posted by the3dfxdude (Post 5255185)
Do you know more about how you compiled your program?

I haven't (so far) done much more than run the SlackBuild script without the stripping instructions. On the netbook in question, since I installed 32-bit Slackware, I didn't have to worry about multi-lib. As I pointed out above, blassic runs fine on one computer and segfaults on another.

This is not the only package that gives inconsistent results like this and I want to get to the bottom of it. If I can figure out what the problem is for one package then I might be able to figure it out for the others. Blassic got the short straw so I am starting off with that package.

@ntubski, I will see what backtrace does and post the results here.

Drakeo 10-17-2014 11:56 PM

Quote:

On the PC it runs fine (using multilib)
this is a simple thing when you built it it compiled against a 32 bit library.
rename /usr/lib to /usr/lib~ and rebuild for 64 bit. and if your building for compat32 that is another road.
to look at. the reason it runs on multi-lib is when you built it it used some 32bit stuff and linked to it.

remeber to switch it back after building

psionl0 10-18-2014 02:08 AM

Quote:

Originally Posted by Drakeo (Post 5255420)
the reason it runs on multi-lib is when you built it it used some 32bit stuff and linked to it.

The reason it runs on the PC seems to be because I created the package while I was still on the LEET version of Slackware. The original blassic package was created on Aug 10 2012 and I upgraded the PC to 14.0 on 24 Dec 2012. When I got the netbook, I did a fresh install of 32 bit Slackware 14.0 and never had a prior version on it. I subsequently re-built the blassic package on the netbook.

When I created a symbol version of blassic for the PC, it too started seg-faulting. When I rebuilt the symbol-free version, it was also seg-faulting. However, when I upgrade to the original package, it runs fine.

@ntubski This is the backtrace I got on the netbook:
Code:

(gdb) run
Starting program: /usr/bin/blassic
warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Blassic 0.10.2
(C) 2001-2009 Julian Albo

Ok
dim a$(10)

Program received signal SIGSEGV, Segmentation fault.
0x0804e543 in __gnu_cxx::__exchange_and_add_dispatch(int*, int) [clone .constprop.317] ()
(gdb) bt
#0  0x0804e543 in __gnu_cxx::__exchange_and_add_dispatch(int*, int) [clone .constprop.317] ()
#1  0x080b2cb0 in (anonymous namespace)::Array<std::string>::delref() [clone .isra.99] [clone .part.100] ()
#2  0x080b6344 in dimvarstring(std::string const&, Dimension const&) ()
#3  0x080a4b84 in RunnerLineImpl::do_DIM() ()
#4  0x0809af75 in RunnerLineImpl::execute() ()
#5  0x08089648 in Runner::runline(CodeLine const&) ()
#6  0x08089e98 in Runner::processline(std::string const&) ()
#7  0x0808dc3c in Runner::interactive() ()
#8  0x0804eb0f in main ()
(gdb)

And this is the backtrace I got on the PC:
Code:

(gdb) run
Starting program: /usr/bin/blassic-symb
warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Blassic 0.10.2
(C) 2001-2009 Julian Albo

Ok
dim a$(10)

Program received signal SIGSEGV, Segmentation fault.
0x080bdfe0 in (anonymous namespace)::Array<std::string>::delref() [clone .isra.99] [clone .part.100] ()
(gdb) bt
#0  0x080bdfe0 in (anonymous namespace)::Array<std::string>::delref() [clone .isra.99] [clone .part.100] ()
#1  0x080c1b2d in dimvarstring(std::string const&, Dimension const&) ()
#2  0x080aeb14 in RunnerLineImpl::do_DIM() ()
#3  0x080a336e in RunnerLineImpl::execute_instruction() ()
#4  0x080a538c in RunnerLineImpl::execute() ()
#5  0x08091319 in Runner::runline(CodeLine const&) ()
#6  0x08091a10 in Runner::processline(std::string const&) ()
#7  0x080959d2 in Runner::interactive() ()
#8  0x0805021e in main ()
(gdb)

The two traces are not identical though I don't know why. In the PC version, execute() calls execute_instruction() which calls do_DIM(). However, in the netbook version execute() calls do_DIM directly.

The segfault in the PC version occurs at delref() in "anonymous namespace" (the linux-gate.so.1 bit) but in the netbook version, delref() manages to call __exchange_and_add_dispatch() before the segfault happens.

Now I'm stuck.

pan64 10-18-2014 02:09 AM

you may try ldd to check which libraries are in use....

psionl0 10-18-2014 02:25 AM

ldd on netbook:
Code:

$ ldd /usr/bin/blassic
        linux-gate.so.1 (0xffffe000)
        libSM.so.6 => /usr/lib/libSM.so.6 (0xb7782000)
        libICE.so.6 => /usr/lib/libICE.so.6 (0xb7769000)
        libX11.so.6 => /usr/lib/libX11.so.6 (0xb7633000)
        libncurses.so.5 => /lib/libncurses.so.5 (0xb75f1000)
        libdl.so.2 => /lib/libdl.so.2 (0xb75eb000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7504000)
        libm.so.6 => /lib/libm.so.6 (0xb74d8000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xb74bb000)
        libc.so.6 => /lib/libc.so.6 (0xb7336000)
        libuuid.so.1 => /lib/libuuid.so.1 (0xb7332000)
        libxcb.so.1 => /usr/lib/libxcb.so.1 (0xb7310000)
        libXau.so.6 => /usr/lib/libXau.so.6 (0xb730d000)
        libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xb7307000)
        /lib/ld-linux.so.2 (0xb77a7000)

ldd on PC:
Code:

$ ldd /usr/bin/blassic-symb
        linux-gate.so.1 (0xffffe000)
        libSM.so.6 => /usr/lib/libSM.so.6 (0xf773c000)
        libICE.so.6 => /usr/lib/libICE.so.6 (0xf7725000)
        libX11.so.6 => /usr/lib/libX11.so.6 (0xf7608000)
        libncurses.so.5 => /lib/libncurses.so.5 (0xf75c6000)
        libdl.so.2 => /lib/libdl.so.2 (0xf75c0000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xf74da000)
        libm.so.6 => /lib/libm.so.6 (0xf74ae000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xf7494000)
        libc.so.6 => /lib/libc.so.6 (0xf730f000)
        libuuid.so.1 => /lib/libuuid.so.1 (0xf730b000)
        libxcb.so.1 => /usr/lib/libxcb.so.1 (0xf72f2000)
        libXau.so.6 => /usr/lib/libXau.so.6 (0xf72ef000)
        libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xf72ea000)
        /lib/ld-linux.so.2 (0xf776c000)

No difference that I can see.

moesasji 10-18-2014 02:50 AM

Quote:

Originally Posted by psionl0 (Post 5255063)
I have blassic installed on my 64-bit PC and 32-bit netbook.

Could you be a bit more specific on whether you indeed have multilib installed on the 64bit PC? Same question applies to the architecture of your netbook. Is that indeed x86?

This because the gdb output mentions linux-gate.so.1. This implies that it is compiled as a x86 32bit program, not a 64 bit one. See vDSO names listed in the man page.

So if you are rebuilding to get symbols on the 64 bit machine you have to make sure you indeed build in a 32 bit environment on that machine. Is that the case?

pan64 10-18-2014 02:52 AM

are those libraries really identical (use file and/or md5sum)

psionl0 10-18-2014 04:05 AM

Quote:

Originally Posted by moesasji (Post 5255461)
Could you be a bit more specific on whether you indeed have multilib installed on the 64bit PC? Same question applies to the architecture of your netbook. Is that indeed x86?

This because the gdb output mentions linux-gate.so.1. This implies that it is compiled as a x86 32bit program, not a 64 bit one. See vDSO names listed in the man page.

So if you are rebuilding to get symbols on the 64 bit machine you have to make sure you indeed build in a 32 bit environment on that machine. Is that the case?

Yes, my PC is 64-bit Slackware 14.0 with multilib installed. This means it has 32-bit AND 64-bit libraries. When I originally got blassic I had troubles running it as a 64-bit executable so, following the multi-lib instructions, I rebuilt it as a 32-bit executable.

I just compiled it as a 64-bit executable again but I still get seg-faults when I try to dimension a string array.

Quote:

Originally Posted by pan64 (Post 5255462)
are those libraries really identical (use file and/or md5sum)

They should be since they come from the same 32-bit Slackware 64 v14.0 distro but I will file/md5 them when I get time later tonight just to be sure.

moesasji 10-18-2014 05:51 AM

Seeing the combination of a segfault in a C++ section of the blassic code and gcc to compile I personally would start suspecting that the blassic code itself has "issues" with more recent versions of gcc. Simply because this would be the key thing that is different compared to when you compiled it in 2012 and there have been a lot of changes on the C++ front since then.

That something compiles on a newer compiler doesn't necessarily mean that it actually works as expected.

knudfl 10-18-2014 06:38 AM

`blassic-0.11.0' : Segmentation fault.

`blassic-0.10.2 : OK with ./configure --disable-svgalib

Code:

$ blassic

Blassic 0.10.2
(C) 2001-2009 Julian Albo

Ok

( The option --disable-svgalib : Ref. Fedora 19 blassic.spec ).

-

psionl0 10-18-2014 08:18 AM

Quote:

Originally Posted by knudfl (Post 5255522)
`blassic-0.10.2 : OK with ./configure --disable-svgalib

( The option --disable-svgalib : Ref. Fedora 19 blassic.spec ).
-

Unfortunately, neither ./configure --disable-svgalib nor ./configure --enable-svgalib had any effect on the segmentation fault which only appears when I try to dimension a string array.

The website https://build.opensuse.org/package/v...c/blassic.spec included some "sed"s which I tried too but to no avail.

Thanks for the suggestion though.

ntubski 10-18-2014 10:51 AM

Quote:

Originally Posted by psionl0 (Post 5255447)
@ntubski This is the backtrace I got on the netbook:

Looking at var.cpp lines 386-394, I think the segfault is happening in delete (where __gnu_cxx::__exchange_and_add_dispatch is used to lock the heap), which indicates heap corruption.

So the next thing to try is to run under valgrind, but compile with debug info so you can get line numbers.


All times are GMT -5. The time now is 09:58 PM.