Library with undefined symbols fails on one machine, but works fine with another?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Library with undefined symbols fails on one machine, but works fine with another?
My question is more of an educational request rather than a "please help me fix my problem" one. What I have is a binary that is built along with several shared libraries (.so). I can build and run the application fine on our build server, but when I try to do the same on my local machine I get the following error:
I ran ldd on the same library on the build machine, and received similar results. Next I used nm to find the symbol list in the file name, and sure enough the _ZTV4Line symbol was undefined (along with many other undefined symbols).
Code:
$ nm lib/tcs/drivers/i400/TCStcl.so
.....
U _ZTV12Serializable
U _ZTV24TAO_Abstract_ServantBase
U _ZTV4Line
U _ZTV5Point
U _ZTVN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
.....
I then did the same thing on the build machine and, to my surprise, the symbol was undefined there as well. But how could this be? And why would an undefined symbol be a show-stopper run-time error on one system but not another?
The two systems are using slightly different versions of some of the libraries they use (such as Boost and SWIG), and the g++ versions are 4.6.2 for my local machine and 4.3.2 for the build machine. Last month I did manage to get the application to run on my local system, but it required adding a lot of hacks to the makefile to remove all these undefined symbol errors. Most of the changes were adding a library flag to a library build target.
I'm trying to really understand the core of this problem right now. So if you have any thoughts on this that you'd like to share, or links to reference reading that can enlighten me about this issue, I'd greatly appreciate it. Thanks.
It is different, but that's because the build machine installs our third-party libraries in a special location whereas on my local machine they are all installed in the usual library paths (/usr/local/lib, etc). Besides, I thought that LD_LIBRARY_PATH only matters during compilation time and not during run time?
All of the undefined symbols are exported by the libraries that are built by this application. None of the third party libraries are posing any problems for me. That's why this is so strange. The hacks I had to do to the makefile to get it to work required me to include many of our libraries in "common" to the product, whereas on the build machine this is not required. That's why I find this so odd, that the Makefile would have to have all of these changes made when the compiler/library software versions are not terribly far apart.
Besides, I thought that LD_LIBRARY_PATH only matters during compilation time and not during run time?
Actually, it is the other way around. LD_LIBRARY_PATH is used during run time. If the library that your application depends upon is not included within the ldconfig cache, then you specify the path to this library using LD_LIBRARY_PATH.
Would it be possible to see the Makefile that you are relying on to build your application?
Last edited by dwhitney67; 01-19-2012 at 11:10 AM.
Ah, thanks for clarifying that for me. No, unfortunately I don't think I can post the Makefile since its proprietary (owned by my employer). And it also happens to be about 4000 lines long, which isn't much fun I assure you. Its pretty much custom built and doesn't use any of the automake/autoconfig tools and its just a mess because we don't have a dedicated build engineer and things have been continually added to it over the past decade.
1. On machine 'B' your libsomething.so does export symbol _ZTV4Line, but on machine 'P' it doesn't.
2. Or on machine 'B' the executable does refer to your libsomething.so, but on machine 'P' it doesn't.
3. Or on machine 'B' loader ld.so does find your libsomething.so, but on machine 'P' it doesn't.
PS: Did you run ldconfig after installing a new version of libsomething.so?
At least I think that's the case. I don't see ldconfig being called anywhere in these makefiles or anywhere else that I can tell. The libraries are not placed in a standard directory like /usr/lib, but are copied over to a workspace directory in the user's home. Should ldconfig be called? I've tried setting the LD_LIBRARY_PATH to include all of these custom library install locations...do I need to also call ldconfig after I do that, or call ldconfig at all? I'll have to read through the man page of ldconfig.
Sorry for all the questions. At my previous jobs we had build engineers who took care of all of these issues but now I'm working at a small company with a hacked up set of makefiles and I'm pretty much on my own here.
I've been trying to focus on just one undefined symbol at the moment that is causing me problems and trying to work my way backward to figure out where the symbol is created, and were it should be found. Maybe if I can get some help to figure out the problem with just this single symbol, I will be able to figure out a similar solution to the rest. So here's what I've found out.
The symbol I'm examining is claled _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context. When I try to run the application, that symbol is undefined in the shared library SimDAQBoard.so. Looking through the code, I found that the symbol is defined in the file src/DAQBoard.cpp. There is a static library, lib/libcommon.a, where I think the symbol is defined.
Code:
$ nm ~/dev/src/common/lib/libcommon.a | grep _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
0000000000003026 T _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
U _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
Its a bit odd to me that the symbol is listed there twice, once as undefined. But on the working build machine libcommon.a shows the same output as in the output above.
Now on to the Makefile. Like I said its a big file, but here are the relevant parts to this problem, along with line numbers:
Code:
# [Line 281] : Building the static library. DAQBoard.o is listed in $OBJS.
lib/libcommon.a: $(OBJS) $(CORBAOBJS)
$(AR) rs $@ $(OBJS) $(CORBAOBJS)
# [Line 293] : Compiling the object file for DAQBoard.cpp, where the symbol is created from
src/.obj/DAQBoard.o: src/DAQBoard.cpp include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@
# [Line 299] : Compiling the object file for SimDAQBoard, the same name of the library where the undefined symbol problem is found
src/.obj/SimDAQBoard.o: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@
# [Line 396] : Building the shared library. Note that the static library lib/libcommon.a is included as one of the sources here.
396: lib/drivers/SimDAQBoard.so: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h lib/libcommon.a
$(CPP) $(CFLAGS) $(INCLUDES) -fPIC -sharewd $< -o
So, from what I can tell, the symbol of interest is compiled in the object file DAQBoard.o, which is used to build the static library libcommon.a. Then libcommon.a is used as a source when building the shared library SimDAQBoard.so, where the undefined symbol error occurs. So it seems to me that the undefined symbol *should* be in SimDAQBoard.so and that this shared library should not need to search for this symbol elsewhere.
My greatest suspicion is that the symbol is found twice in libcommon.a, once as being undefined. Is this something I should be concerned about, or not? And is the order of operations in the makefile correct, or do I need to fix the order of things?
Thanks for all your help so far everyone. Even getting me to ask myself the right questions is a big help since I'm not well experienced in the build process.
What does 'nm -D SimDAQBoard.so | grep _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context' say?
Note: When you run an executable, shared libraries will be resolved from the standard places (see /etc/ld.so.conf), the places hardcoded into the executable (try readelf -d exename | grep RPATH), and places from LD_LIBRARY_PATH.
The w is a typo I accidentally inserted into the makefile when I was copying lines out of it. You're correct, it is 'shared'.
The result of your command shows the symbol as undefined.
Code:
$ nm -D SimDAQBoard.so | grep _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
U _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
The readelf command result returned nothing that matched RPATH.
I'm not sure if this is relevant, but I should mention that this software uses CORBA as there is both C++ and Java code, and Tcl is a scripting language that controls the operation of this program. In fact at the beginning of the Tcl script I'm using it loads two shared libraries into the application.
Something else is loading the SimDAQBoard.so (along with numerous other libraries), although I don't know where that is happening. I know it is though because if I explicitly load the library with a load command like the above, I get a message that it could not be loaded because of an undefined symbol.
And when I don't explicitly load it in the Tcl script, I get something similar but different.
Code:
2012-01-23 16:29:45 ERROR Context - DAQM 40005: Unable to open driver : lib/common/drivers/SimDAQBoard.so: undefined symbol: _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
2012-01-23 16:29:45 ERROR Context - DAQM 41001: Unable to instantiate
Like I mentioned in the first post, I had previously gotten rid of these undefined symbol errorss by hacking up the makefile with a bunch of statements, most of which simply included the static library (libcommon.a) to several of these libraries that were missing symbols. So I don't think the problem is that any particular library is not being found or being loaded. I think the problem is strictly a compilation/linking one, due to a messy, unorganized makefile that people have been adding crap to and modifying over the last six years or so.
Well I resorted back to my previous hack and showed that once again, the following change to the makefile removes the undefined symbol error:
Code:
# [Line 299] : Compiling the object file for SimDAQBoard, the same name of the library where the undefined symbol problem is found\
src/.obj/SimDAQBoard.o: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@ -L$(MII_HOME)/src/common/lib -lcommon
# Adding "-L$(MII_HOME)/src/common/lib -lcommon" to the end of this line removes the undefined symbol error.
And now the results of nm on the two relevant libraries. (I should note that some other makefile changes were made, specifically removing one hack that seemed to be the cause of the double symbol definition found previously in libcommon.a.
Code:
$ nm lib/common/drivers/SimDAQBoard.so | grep useMotorController
0000000000018212 T _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
000000000001a660 r _ZZN8DAQBoard18useMotorControllerEP15MotorControllerP7ContextE8__func__
$ nm ~/dev/src/common/lib/libcommon.a | grep useMotorController
0000000000003026 T _ZN8DAQBoard18useMotorControllerEP15MotorControllerP7Context
00000000000000b0 r _ZZN8DAQBoard18useMotorControllerEP15MotorControllerP7ContextE8__func__
Still, my office mate thinks this is not an acceptable solution (and I agree). We just need to design a better makefile I feel.
Okay diving into this further, I think we may be on to something. The static library libcommon.a is linked in directly to the application's binary. However, it seems that for whatever reason, this particular symbol is not found in the binary. In fact, many of the symbols from libcommon.a are not found in there, while others are. So somehow, it seems we have a case of a binary that is not importing all of the symbols from a static library that is being used to build it. The question now becomes, why are those symbols not going into the binary? If we can figure this out, we may have found an answer.
Can you please explain these Makefile statements (and I don't mean the -sharewd issue, which was covered earlier)?
Code:
# [Line 299] : Compiling the object file for SimDAQBoard, the same name of the library where the undefined symbol problem is found
src/.obj/SimDAQBoard.o: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h
$(CPP) $(CFLAGS) $(INCLUDES) -c $< -o $@
# [Line 396] : Building the shared library. Note that the static library lib/libcommon.a is included as one of the sources here.
396: lib/drivers/SimDAQBoard.so: src/SimDAQBoard.cpp include/SimDAQBoard.h include/DAQBoard.h include/Device.h include/Context.h include/Serializable.h lib/libcommon.a
$(CPP) $(CFLAGS) $(INCLUDES) -fPIC -sharewd $< -o
With the first statement, it seems the intent is to build an object file, however shouldn't -fPIC be used for object files that are destined to be included in a shared-object library?
And AFAIK, the -fPIC is not necessary when building the actual shared-object library.
Which leads me to inquire about the second statement... there seems to be additional typos or missing characters, other than the "sharewd". Shouldn't a $@ following the -o option? And if you are indeed building the shared-object library with this statement, why is CFLAGS and INCLUDES specified?
Last edited by dwhitney67; 01-23-2012 at 02:29 PM.
I checked the Makefile and $CFLAGS actually includes -fPIC. So yes, its included twice in some lines. I warned everyone this was a messy build, didn't I?
As for your questions on the second statement, I'm not sure why there is no $@ to be honest. I haven't had to mess around with Makefiles like this since my undergrad days over 8 years ago. My guess as to why CFLAGS and INCLUDES are specified on that line is that the person who wrote it was simply lazy.
Well my office mate and I spent nearly the entire day yesterday trying to figure this out and made some progress, but not much. It seems that another library, CommonTcl.so, has those symbols defined, and if you recall that is one of the two libraries that are explicitly loaded by the Tcl script that manages the application's startup. So the question has now become "If CommonTcl.so has the symbol defined, and SimDAQBoard.so has the symbol undefined, why is SimDAQBoard.so not finding the symbol when it is loaded into the application?" (SimDAQBoard.so is loaded some time after CommonTcl.so is loaded). We think that maybe the linker is using more strict and/or minimalist behaviour than it has our older machines (which all work) and thus is not loading symbols that it does not detect need to be used. Because from what we can tell, this symbol is not actually used in the code anywhere, although the scripting interface in the application can be used to invoke this symbol and it does so successfully on the older systems.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.