LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-12-2015, 04:56 PM   #16
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled

rtmistler, nothing after #9
binary was built on SUSE Linux Enterprise Server 10 SP2 (i586) (didn't use -ggdb flag)
and running on Red Hat Enterprise Linux Server release 5.5
 
Old 02-12-2015, 05:27 PM   #17
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
building machine:
SUSE Linux Enterprise Server 10 SP2 (i586), i686
(kernel 2.6.16.60-0.21-default)

running machine1:
CentOS release 4.8 (Final), x86_64
(kernel 2.6.9-89.ELlargesmp)

running machine2:
Red Hat Enterprise Linux Server release 5.5
(kernel 2.6.18-308.el5)
 
Old 02-13-2015, 07:33 AM   #18
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,788
Blog Entries: 13

Rep: Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831
Are you just not familiar with GDB, is this why the reluctance to properly follow up on this?

I'll say this once more, but otherwise it's too difficult to provide assistance if you aren't able to properly try these suggestions.

You compile a program using the -ggdb flag in the gcc compile statement. The use of that flag adds DEBUG symbols to the executable. When you run the executable, if you run it through the debugger, or if you have a core dump occur you will be able to see symbols for the backtrace. However, you will ONLY be able to see those symbols if you have the source file on the system where you are running the executable. From what you're saying, you build and don't use the -ggdb flag for one and then you're running it on two different systems, and probably not moving the source over when you copy the executable.

GDB will really work exactly as I've shown, but ... you have to do those very fundamental steps.

An added thing because you'd be copying the binary and source over to those other systems is that when you are in GDB and have loaded the binary or the core file, you may have to say the command "symbol-file" followed by SPACE and then the fully qualified path and name of the source file. This is because on the system you compiled on, it understands where the source originally was, but having moved it to a different system, it doesn't know that directory hierarchy, however that's why the symbol-file command is there so you can say, "my source is here".

You can obtain a core file on one of those other systems and then copy it back to your development system and debug it. However it will be useless to you if you don't use the -ggdb flag in your compile.

The commands to debug a program using GDB would be:
Code:
gdb executable-file-name
## Once at the GDB prompt, you can issue 'r' to run the program
## Once the program encounters a segment violation you can issue 'bt' to see the backtrace
## If you need to have arguments, there is a 'set args' command in GDB to put in your program arguments
The commands to debug a core file using GDB would be:
Code:
gdb executable-file-name core
## That is literally the use of the word 'core' and of course the core file obtained needs to be
## located in the same directory where you're running the GDB command from
## Once at the GDB prompt, you can issue 'bt' to see the backtrace
If you find a different solution, great, glad you get it resolved. If you can't try to near 100% replicate these recommended steps and instead do some fraction of the effort, then sorry I will not be able to assist you further.
 
Old 02-13-2015, 11:52 AM   #19
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
rtmistler, thanks for the fundamentals. However I did -g flag, is there any difference?

-g Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF 2). GDB can work with this debugging
information.

On most systems that use stabs format, -g enables use of extra debugging information that only GDB can use; this extra information makes
debugging work better in GDB but will probably make other debuggers crash or refuse to read the program. If you want to control for certain
whether to generate the extra information, use -gstabs+, -gstabs, -gxcoff+, -gxcoff, or -gvms (see below).

GCC allows you to use -g with -O. The shortcuts taken by optimized code may occasionally produce surprising results: some variables you
declared may not exist at all; flow of control may briefly move where you did not expect it; some statements may not be executed because they
compute constant results or their values were already at hand; some statements may execute in different places because they were moved out of
loops.

Nevertheless it proves possible to debug optimized output. This makes it reasonable to use the optimizer for programs that might have bugs.

The following options are useful when GCC is generated with the capability for more than one debugging format.

-ggdb
Produce debugging information for use by GDB. This means to use the most expressive format available (DWARF 2, stabs, or the native format if
neither of those are supported), including GDB extensions if at all possible.
 
Old 02-13-2015, 11:58 AM   #20
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
In one of the running machine, the backtrace was as below: (some functions were replaced by xxx for confidential reason)
I didn't find it provided useful information in this specific crash.

GNU gdb Red Hat Linux (6.3.0.0-1.162.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".


warning: core file may not match specified executable file.
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xffffe000
Core was generated by xxx'.
Program terminated with signal 11, Segmentation fault.
#0 0x0806472c in xxx()
(gdb) bt
#0 0x0806472c in xxx()
#1 0x08064a6d in xxx()
#2 0x08069897 in xxx()
#3 0x00000010 in ?? ()
#4 0x00000001 in ?? ()
#5 0x00000001 in ?? ()
#6 0x09a7a358 in ?? ()
#7 0x084784dc in possible_if_ids ()
#8 0xffffd3e8 in ?? ()
#9 0xffffd3f8 in ?? ()
#10 0xffffd3f8 in ?? ()
#11 0x083e4526 in decode_cleanup_pass ()
Previous frame inner to this frame (corrupt stack?)
 
Old 02-13-2015, 12:01 PM   #21
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
I may try valgrind if there is no other choice.
 
Old 02-13-2015, 12:21 PM   #22
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,788
Blog Entries: 13

Rep: Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831
Yes the particulars of the flag matters somewhat.

From the gcc man page:
Code:
       -ggdb
           Produce debugging information for use by GDB.  This means to use the most expressive format available (DWARF
           2, stabs, or the native format if neither of those are supported), including GDB extensions if at all
           possible.
Consider setting optimization off until you diagnose this.
Quote:
Originally Posted by IndianScorpion View Post
warning: core file may not match specified executable file.
### I consider this a pretty big red flag and you should not see this complaint
### Did you recompile between the time you obtained this core and the binary
### from where the core was generated?

Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xffffe000
Core was generated by xxx'.
Program terminated with signal 11, Segmentation fault.
#0 0x0806472c in xxx()
(gdb) bt
#0 0x0806472c in xxx() ### Is this your function, what variables matter, anything set to NULL when it shouldn't be? You can use up and down to traverse the stack and enter this level or any level and then diagnose variables and examine their values, you can dump memory.
#1 0x08064a6d in xxx()
#2 0x08069897 in xxx()
#3 0x00000010 in ?? ()
#4 0x00000001 in ?? ()
#5 0x00000001 in ?? ()
#6 0x09a7a358 in ?? ()
#7 0x084784dc in possible_if_ids ()
#8 0xffffd3e8 in ?? ()
#9 0xffffd3f8 in ?? ()
#10 0xffffd3f8 in ?? ()
#11 0x083e4526 in decode_cleanup_pass ()
Previous frame inner to this frame (corrupt stack?)
It's not necessarily going to be some obvious, "Oh ... it died in sprintf()!" that will jump out from the backtrace, you have to examine the details at points in that backtrace where you can, where it's your code. A good place to start is assuming that the library and system functions are not at fault and that your code did something bad, and then validate that your code either did do something bad, or not. As in my example in post #15, knowing that the SEGV occurred at a certain file and point is "helpful" but to really know what happened one had to examine the variable being manipulated to discern "Oh ... it's a NULL pointer" and that is the problem. How it became that value is the next question.

Last edited by rtmistler; 02-13-2015 at 12:25 PM.
 
Old 02-13-2015, 01:47 PM   #23
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
rtmistler , as I stated in post #1, the program(function xxx) has been running for 9 years perfectly on AIX and WINDOWS.
 
Old 02-16-2015, 09:03 AM   #24
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,788
Blog Entries: 13

Rep: Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831
Quote:
Originally Posted by IndianScorpion View Post
rtmistler , as I stated in post #1, the program(function xxx) has been running for 9 years perfectly on AIX and WINDOWS.
The information also says that you compiled the program on the following system:
Quote:
Originally Posted by IndianScorpion View Post
the binary was built on 32bit
Linux xxx 2.6.16.60-0.21-default #1 Tue May 6 12:41:02 UTC 2008 i686 i686 i386 GNU/Linux
and running on 64 bit
Linux 2.6.18-308.el5
Therefore it was built using an OS last updated on May 6, 2008.

Originally you were using an outdated version of CentOS you cited an out of date version of RHEL.

You mention Windows and AIX.

AIX although kept up to date, does extend all the way back to the 1970's.

Saying you have something running on Windows is fine; however most programs which run on both Windows and Unix/Linux will have conditional flags to alter the system library interfaces per the needs to those disparate operating systems.

What I'm getting is that you're not presently compiling the program and attempting to debug it, you're just saying that it's always worked before.

This is why you see things like "End of Life" and the fact that Windows XP is no longer supported by Microsoft.

Options are numerous, some I can think of are:
  1. Find older Linux versions which run on 32-bit machines or in 32-bit mode and hope that these dice tosses eventually end up finding you a combination of OS and platform which works. A.k.a. Trial and Error
  2. Keep running it on Windows and AIX forever. I'm guessing that this is a problem otherwise you would not be trying to run it on different variations of Linux
  3. Re-compile it now and debug what's wrong with things, at present; whatever those problems may be. But reverse logic analyzing is not going to fix it, you'd probably be better off with method #1 doing trial and error
Note also that re-compiling it now may entail code changes to interface with present system libraries which may be different than what they were pre-2008.
 
Old 02-16-2015, 12:08 PM   #25
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
rtmistler, you are right, we may try to compile on the same LINUX distribution as running machine. At this time, we may still compiling 32 bits and run on 64 bits.
What are you talking about, just AIX/WINDOWS? The program must run on all platforms, of course including LINUX.
 
Old 02-16-2015, 12:30 PM   #26
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,788
Blog Entries: 13

Rep: Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831Reputation: 4831
Quote:
Originally Posted by IndianScorpion View Post
rtmistler, you are right, we may try to compile on the same LINUX distribution as running machine. At this time, we may still compiling 32 bits and run on 64 bits.
What are you talking about, just AIX/WINDOWS? The program must run on all platforms, of course including LINUX.
You say it runs perfectly on AIX and Windows. Therefore for those operating systems, you're all set.

To run it on Linux there's no guarantee that a compilation from 2008 or prior is going to work on Linux.

Further, there's no guarantee that a compilation from 2008 is going to work on Windows 7 or Windows 8.

I have to admit that I'm a bit wondering here why you didn't just try to re-compile it on the intended Linux machine right away, that's about the first thing I'd try. You're not countermanding my statement that it was last compiled about 2008, that's a long time ago. You're not also saying which version of Windows it runs on. You're also not saying that the exact same binary file runs both on Windows and AIX. Software has a life, especially as operating systems evolve. The fact that it comes close to running is pretty good and my bet is that the problem is minor and once dealt with, things will be fine.
 
Old 02-16-2015, 12:36 PM   #27
ron7000
Member
 
Registered: Nov 2007
Location: CT
Posts: 248

Rep: Reputation: 26
my suggestion is to modify /etc/security/limits.conf and add at least the 4 lines below... the two hard and two soft.
I included the items available, this is form novell suse 11.x linux operating system.
other than that without seeing the code it could be something as simple as you using a 32-bit variable somewhere, or a 32-bit operating system, and you're calling for a memory address greater than 2^31 that can't be held in signed 32-bit variable.
all you said was big memory issue... your definition of big might be different from mine.... that's what she said...

Code:
*       hard    stack   unlimited
*       soft    stack   unlimited
*       hard    nofile  10000
*       soft    nofile  10000

#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open files
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to
#        - rtprio - max realtime priority
 
Old 02-16-2015, 01:12 PM   #28
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
@ron7000, it seemed on the right track. Although the ulimit already set to unlimited here. I will double check /etc/security/limits.conf.
 
Old 02-16-2015, 01:17 PM   #29
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
@rtmistler, why I didn't re-compile it right away is that I was thinking it was against the philosophy of software if we have to compile a program on different platforms to be used.
(so many distributions were out there, CentOS, Redhat, SuSE, Ubuntu...)
 
Old 02-16-2015, 01:29 PM   #30
IndianScorpion
LQ Newbie
 
Registered: Dec 2014
Posts: 24

Original Poster
Rep: Reputation: Disabled
What I am trying is to rather prove my running machine was wrong configured (like m32, m64, ulimits, socket buffer size? mostly like it is memory, file, or lock related) than prove that it was a LINUX bug.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Random DVD crash issue mint 14 Linux2 Linux Mint 3 04-30-2013 02:38 AM
Why does linux reboot at Random when running my USB communication program Nightbird Linux - Software 3 09-02-2004 07:38 PM
Linux Cluster - Random Node Crash ! insanecrac007 Linux - General 0 08-17-2004 10:30 AM
Random program crashes in Linux, is fedora not stable? xbennyboy Linux - Newbie 13 08-05-2004 11:15 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration