LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 01-18-2012, 01:22 PM   #1
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Rep: Reputation: 30
Library with undefined symbols fails on one machine, but works fine with another?


My question is more of an educational request rather than a "please help me fix my problem" one. What I have is a binary that is built along with several shared libraries (.so). I can build and run the application fine on our build server, but when I try to do the same on my local machine I get the following error:

Code:
couldn't load file "lib/tcs/drivers/i400/TCStcl.so": lib/tcs/drivers/i400/TCStcl.so: undefined symbol: _ZTV4Line
First thing I did was run ldd on the library that has the undefined symbols.

Code:
$ ldd lib/tcs/drivers/i400/TCStcl.so 
	linux-vdso.so.1 =>  (0x00007fff4bab8000)
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f3d751a3000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f3d74f1f000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f3d74d09000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f3d74953000)
	/lib64/ld-linux-x86-64.so.2 (0x000000392fc00000)
I ran ldd on the same library on the build machine, and received similar results. Next I used nm to find the symbol list in the file name, and sure enough the _ZTV4Line symbol was undefined (along with many other undefined symbols).

Code:
$ nm lib/tcs/drivers/i400/TCStcl.so 
.....
                 U _ZTV12Serializable
                 U _ZTV24TAO_Abstract_ServantBase
                 U _ZTV4Line
                 U _ZTV5Point
                 U _ZTVN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
.....
I then did the same thing on the build machine and, to my surprise, the symbol was undefined there as well. But how could this be? And why would an undefined symbol be a show-stopper run-time error on one system but not another?


The two systems are using slightly different versions of some of the libraries they use (such as Boost and SWIG), and the g++ versions are 4.6.2 for my local machine and 4.3.2 for the build machine. Last month I did manage to get the application to run on my local system, but it required adding a lot of hacks to the makefile to remove all these undefined symbol errors. Most of the changes were adding a library flag to a library build target.


I'm trying to really understand the core of this problem right now. So if you have any thoughts on this that you'd like to share, or links to reference reading that can enlighten me about this issue, I'd greatly appreciate it. Thanks.
 
Old 01-18-2012, 02:47 PM   #2
Nermal
Member
 
Registered: Jan 2009
Distribution: Debian
Posts: 59
Blog Entries: 2

Rep: Reputation: 6
check your LD_LIBARY_PATH

e.g.
Code:
env | grep LD_LIBARY_PATH
That is possibly different on the two machines.
 
Old 01-18-2012, 03:39 PM   #3
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
It is different, but that's because the build machine installs our third-party libraries in a special location whereas on my local machine they are all installed in the usual library paths (/usr/local/lib, etc). Besides, I thought that LD_LIBRARY_PATH only matters during compilation time and not during run time?
 
Old 01-19-2012, 02:49 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 1,456

Rep: Reputation: 445Reputation: 445Reputation: 445Reputation: 445Reputation: 445
First of all, you should find out which shared library exports these symbols (use ldd, nm and gdb on the machine where it works)
 
1 members found this post helpful.
Old 01-19-2012, 10:48 AM   #5
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
All of the undefined symbols are exported by the libraries that are built by this application. None of the third party libraries are posing any problems for me. That's why this is so strange. The hacks I had to do to the makefile to get it to work required me to include many of our libraries in "common" to the product, whereas on the build machine this is not required. That's why I find this so odd, that the Makefile would have to have all of these changes made when the compiler/library software versions are not terribly far apart.
 
Old 01-19-2012, 11:08 AM   #6
dwhitney67
Senior Member
 
Registered: Jun 2006
Location: Maryland
Distribution: Kubuntu, Fedora, RHEL
Posts: 1,494

Rep: Reputation: 327Reputation: 327Reputation: 327Reputation: 327
Quote:
Originally Posted by R00ts View Post
Besides, I thought that LD_LIBRARY_PATH only matters during compilation time and not during run time?
Actually, it is the other way around. LD_LIBRARY_PATH is used during run time. If the library that your application depends upon is not included within the ldconfig cache, then you specify the path to this library using LD_LIBRARY_PATH.

Would it be possible to see the Makefile that you are relying on to build your application?

Last edited by dwhitney67; 01-19-2012 at 11:10 AM.
 
Old 01-19-2012, 11:37 AM   #7
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
Ah, thanks for clarifying that for me. No, unfortunately I don't think I can post the Makefile since its proprietary (owned by my employer). And it also happens to be about 4000 lines long, which isn't much fun I assure you. Its pretty much custom built and doesn't use any of the automake/autoconfig tools and its just a mess because we don't have a dedicated build engineer and things have been continually added to it over the past decade.
 
Old 01-19-2012, 04:38 PM   #8
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 1,456

Rep: Reputation: 445Reputation: 445Reputation: 445Reputation: 445Reputation: 445
Let's clear it, is any of these the problem:

1. On machine 'B' your libsomething.so does export symbol _ZTV4Line, but on machine 'P' it doesn't.
2. Or on machine 'B' the executable does refer to your libsomething.so, but on machine 'P' it doesn't.
3. Or on machine 'B' loader ld.so does find your libsomething.so, but on machine 'P' it doesn't.

PS: Did you run ldconfig after installing a new version of libsomething.so?

Last edited by NevemTeve; 01-20-2012 at 02:29 AM.
 
Old 01-20-2012, 03:09 PM   #9
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
1. No.
2. No.
3. No.

At least I think that's the case. I don't see ldconfig being called anywhere in these makefiles or anywhere else that I can tell. The libraries are not placed in a standard directory like /usr/lib, but are copied over to a workspace directory in the user's home. Should ldconfig be called? I've tried setting the LD_LIBRARY_PATH to include all of these custom library install locations...do I need to also call ldconfig after I do that, or call ldconfig at all? I'll have to read through the man page of ldconfig.

Sorry for all the questions. At my previous jobs we had build engineers who took care of all of these issues but now I'm working at a small company with a hacked up set of makefiles and I'm pretty much on my own here.


I've been trying to focus on just one undefined symbol at the moment that is causing me problems and trying to work my way backward to figure out where the symbol is created, and were it should be found. Maybe if I can get some help to figure out the problem with just this single symbol, I will be able to figure out a similar solution to the rest. So here's what I've found out.

The symbol I'm examining is claled _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context. When I try to run the application, that symbol is undefined in the shared library SimDAQBoard.so. Looking through the code, I found that the symbol is defined in the file src/DAQBoard.cpp. There is a static library, lib/libcommon.a, where I think the symbol is defined.

Code:
$ nm ~/dev/src/common/lib/libcommon.a | grep _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
0000000000003026 T _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
                 U _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
Its a bit odd to me that the symbol is listed there twice, once as undefined. But on the working build machine libcommon.a shows the same output as in the output above.

Now on to the Makefile. Like I said its a big file, but here are the relevant parts to this problem, along with line numbers:

Code:
# [Line 281] : Building the static library. DAQBoard.o is listed in $OBJS.
lib/libcommon.a: $(OBJS) $(CORBAOBJS)
	$(AR) rs $@ $(OBJS) $(CORBAOBJS)

# [Line 293] : Compiling the object file for DAQBoard.cpp, where the symbol is created from
src/.obj/DAQBoard.o: src/DAQBoard.cpp include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
	$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@

# [Line 299] : Compiling the object file for SimDAQBoard, the same name of the library where the undefined symbol problem is found
src/.obj/SimDAQBoard.o: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
	$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@

# [Line 396] : Building the shared library. Note that the static library lib/libcommon.a is included as one of the sources here.
396: lib/drivers/SimDAQBoard.so: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h lib/libcommon.a
	$(CPP) $(CFLAGS) $(INCLUDES) -fPIC -sharewd $< -o
So, from what I can tell, the symbol of interest is compiled in the object file DAQBoard.o, which is used to build the static library libcommon.a. Then libcommon.a is used as a source when building the shared library SimDAQBoard.so, where the undefined symbol error occurs. So it seems to me that the undefined symbol *should* be in SimDAQBoard.so and that this shared library should not need to search for this symbol elsewhere.


My greatest suspicion is that the symbol is found twice in libcommon.a, once as being undefined. Is this something I should be concerned about, or not? And is the order of operations in the makefile correct, or do I need to fix the order of things?



Thanks for all your help so far everyone. Even getting me to ask myself the right questions is a big help since I'm not well experienced in the build process.
 
Old 01-23-2012, 12:05 AM   #10
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 1,456

Rep: Reputation: 445Reputation: 445Reputation: 445Reputation: 445Reputation: 445
Code:
$(CPP) $(CFLAGS) $(INCLUDES) -fPIC -sharewd $< -o
Is this line real? I think it should be '-shared'

What does 'nm -D SimDAQBoard.so | grep _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context' say?

Note: When you run an executable, shared libraries will be resolved from the standard places (see /etc/ld.so.conf), the places hardcoded into the executable (try readelf -d exename | grep RPATH), and places from LD_LIBRARY_PATH.
 
Old 01-23-2012, 10:54 AM   #11
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
The w is a typo I accidentally inserted into the makefile when I was copying lines out of it. You're correct, it is 'shared'.

The result of your command shows the symbol as undefined.
Code:
$ nm -D SimDAQBoard.so | grep _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
                 U _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
The readelf command result returned nothing that matched RPATH.


I'm not sure if this is relevant, but I should mention that this software uses CORBA as there is both C++ and Java code, and Tcl is a scripting language that controls the operation of this program. In fact at the beginning of the Tcl script I'm using it loads two shared libraries into the application.

Code:
load lib/common/drivers/Commontcl.so
load lib/tcs/drivers/i400/TCStcl.so
Something else is loading the SimDAQBoard.so (along with numerous other libraries), although I don't know where that is happening. I know it is though because if I explicitly load the library with a load command like the above, I get a message that it could not be loaded because of an undefined symbol.

Code:
couldn't load file "lib/common/drivers/SimDAQBoard.so": lib/common/drivers/SimDAQBoard.so: undefined symbol: _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
And when I don't explicitly load it in the Tcl script, I get something similar but different.

Code:
2012-01-23 16:29:45 ERROR Context - DAQM 40005: Unable to open driver : lib/common/drivers/SimDAQBoard.so: undefined symbol: _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
2012-01-23 16:29:45 ERROR Context - DAQM 41001: Unable to instantiate

Like I mentioned in the first post, I had previously gotten rid of these undefined symbol errorss by hacking up the makefile with a bunch of statements, most of which simply included the static library (libcommon.a) to several of these libraries that were missing symbols. So I don't think the problem is that any particular library is not being found or being loaded. I think the problem is strictly a compilation/linking one, due to a messy, unorganized makefile that people have been adding crap to and modifying over the last six years or so.
 
Old 01-23-2012, 01:32 PM   #12
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
Well I resorted back to my previous hack and showed that once again, the following change to the makefile removes the undefined symbol error:

Code:
# [Line 299] : Compiling the object file for SimDAQBoard, the same name of the library where the undefined symbol problem is found\
src/.obj/SimDAQBoard.o: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
	$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@ -L$(MII_HOME)/src/common/lib -lcommon

# Adding "-L$(MII_HOME)/src/common/lib -lcommon" to the end of this line removes the undefined symbol error.
And now the results of nm on the two relevant libraries. (I should note that some other makefile changes were made, specifically removing one hack that seemed to be the cause of the double symbol definition found previously in libcommon.a.

Code:
$ nm lib/common/drivers/SimDAQBoard.so | grep useMotorController
0000000000018212 T _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
000000000001a660 r _ZZN8DAQBoard18useMotorControllerEP15MotorControllerP7ContextE8__func__

$ nm ~/dev/src/common/lib/libcommon.a | grep useMotorController
0000000000003026 T _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
00000000000000b0 r _ZZN8DAQBoard18useMotorControllerEP15MotorControllerP7ContextE8__func__

Still, my office mate thinks this is not an acceptable solution (and I agree). We just need to design a better makefile I feel.
 
Old 01-23-2012, 02:16 PM   #13
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
Okay diving into this further, I think we may be on to something. The static library libcommon.a is linked in directly to the application's binary. However, it seems that for whatever reason, this particular symbol is not found in the binary. In fact, many of the symbols from libcommon.a are not found in there, while others are. So somehow, it seems we have a case of a binary that is not importing all of the symbols from a static library that is being used to build it. The question now becomes, why are those symbols not going into the binary? If we can figure this out, we may have found an answer.
 
Old 01-23-2012, 02:28 PM   #14
dwhitney67
Senior Member
 
Registered: Jun 2006
Location: Maryland
Distribution: Kubuntu, Fedora, RHEL
Posts: 1,494

Rep: Reputation: 327Reputation: 327Reputation: 327Reputation: 327
Quote:
Originally Posted by R00ts View Post
We just need to design a better makefile I feel.
Can you please explain these Makefile statements (and I don't mean the -sharewd issue, which was covered earlier)?
Code:
# [Line 299] : Compiling the object file for SimDAQBoard, the same name of the library where the undefined symbol problem is found
src/.obj/SimDAQBoard.o: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
	$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@

# [Line 396] : Building the shared library. Note that the static library lib/libcommon.a is included as one of the sources here.
396: lib/drivers/SimDAQBoard.so: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h lib/libcommon.a
	$(CPP) $(CFLAGS) $(INCLUDES) -fPIC -sharewd $< -o
With the first statement, it seems the intent is to build an object file, however shouldn't -fPIC be used for object files that are destined to be included in a shared-object library?

And AFAIK, the -fPIC is not necessary when building the actual shared-object library.

Which leads me to inquire about the second statement... there seems to be additional typos or missing characters, other than the "sharewd". Shouldn't a $@ following the -o option? And if you are indeed building the shared-object library with this statement, why is CFLAGS and INCLUDES specified?

Last edited by dwhitney67; 01-23-2012 at 02:29 PM.
 
Old 01-24-2012, 12:26 PM   #15
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 545

Original Poster
Rep: Reputation: 30
I checked the Makefile and $CFLAGS actually includes -fPIC. So yes, its included twice in some lines. I warned everyone this was a messy build, didn't I?


As for your questions on the second statement, I'm not sure why there is no $@ to be honest. I haven't had to mess around with Makefiles like this since my undergrad days over 8 years ago. My guess as to why CFLAGS and INCLUDES are specified on that line is that the person who wrote it was simply lazy.



Well my office mate and I spent nearly the entire day yesterday trying to figure this out and made some progress, but not much. It seems that another library, CommonTcl.so, has those symbols defined, and if you recall that is one of the two libraries that are explicitly loaded by the Tcl script that manages the application's startup. So the question has now become "If CommonTcl.so has the symbol defined, and SimDAQBoard.so has the symbol undefined, why is SimDAQBoard.so not finding the symbol when it is loaded into the application?" (SimDAQBoard.so is loaded some time after CommonTcl.so is loaded). We think that maybe the linker is using more strict and/or minimalist behaviour than it has our older machines (which all work) and thus is not loading symbols that it does not detect need to be used. Because from what we can tell, this symbol is not actually used in the code anywhere, although the scripting interface in the application can be used to invoke this symbol and it does so successfully on the older systems.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Su to root works fine @ console, fails in x (cant login at all when using xdm) lionsong Slackware 5 09-06-2009 06:22 PM
DHCP via router on Debian fails, XP works fine oopsdude Linux - Wireless Networking 8 09-02-2008 07:51 PM
Vista fails to print when going to CUPS. XP works fine. yah0m Linux - Software 1 08-09-2008 08:54 AM
rdc:/ KDE ioslave fails, rdesktop works fine utahnix Suse/Novell 0 02-04-2008 08:51 PM
Xorgcfg - undefined symbols and fails to load modules Perps *BSD 0 07-07-2007 08:49 PM


All times are GMT -5. The time now is 08:17 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration