LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Error while loading shared libraries (https://www.linuxquestions.org/questions/linux-newbie-8/error-while-loading-shared-libraries-4175551990/)

Operator 08-28-2015 08:31 AM

Error while loading shared libraries
 
I can't understand what's wrong... I have four RHEL 5 servers I'm administering. I went to run tracepath on Server 1 and it gave me the error

Code:

tracepath: error while loading shared libraries: libresolv.so.2: cannot open shared object file: No such file or directory
On Server 2, tracepath works fine. On both servers, libresolv.so.2 is located in /lib and is a symlink to /lib/libresolv-2.5.so.

Both .so files have the same owners and privileges. Both have the same md5 hash. Both servers' /etc/ld.so.conf files are identical. Both have identical running processes. Why can't Server 1 run tracepath / load libresolv.so.2?

rigor 08-28-2015 08:41 AM

Do all servers:
1) have the same hardware, CPU, etc? E.G., are some 64-bit, while others are 32-bit, or is some other difference?
2) run the same point release of RHEL 5?
3) have an identical collection of packages installed?

Operator 08-28-2015 08:49 AM

All servers are running as virtual machines. They are operating on a cluster of 4 physical servers through VSphere.
They are all running the same version of RHEL 5.
I assume they have an identical collection of installed packages... how can I check?

Operator 08-28-2015 08:50 AM

Here's what's interesting relating to the servers' memory usage:
Server 1: 2774MB
Server 2: 13419MB
Server 3: 16409MB
Server 4: 10866MB

I suppose that doesn't mean too much though since Server 1 did just get restarted this morning (kernel panic).

Operator 08-28-2015 09:35 AM

No hope?

Operator 08-28-2015 10:37 AM

Added information: On Server 1, the result of

Code:

type tracepath
is

Code:

tracepath is /bin/tracepath
while the result on Server 2 is

Code:

tracepath is hashed (/bin/tracepath)
tracepath has the same owner and permissions on both servers and has the same md5 hash.

rigor 08-28-2015 02:40 PM

Pardon me if I go into too much detail, I don't know of what information you are aware.

My general suspicions would be along the lines of something like, either a package is missing, or a package deployment wasn't completely successful, or there's an issue with different architectures on different virtual machines, or there is some esoteric problem with access permissions. All too often when I try run a program and I get a message such as you got, there is no information on what form of library the attempt to load was made. E.G., if I'm running a 64-bit Operating System which can also run 32-bit programs, was the attempt made to load a 64-bit library, or a 32-bit library? If as would often tend to be the case, most of what I'm running are 64-bit programs, I might first expect that the library is 64-bit. But if some programs tend to be distributed as only 32-bit, there will tend to need to be a 32-bit version of the library. This command:
Code:

uname -a
run on each virtual machine should show you the architecture each machine is configured to implement. If the servers are implemented as virtual machines then the question becomes, are they supposed to be configured the same way, have the same amount of memory allocated to each, the same number of CPU's or limits of CPU usage, the same architecture, etc., are they supposed to have the same collection of packages? If they are supposed to be conceptually identical, and if you are using standard Red Hat package management, then on server1 for example:
Code:

rpm -qa | sort  >  server1_package_list.txt
will get you a useful list of packages. You can get such a list from each server, gather all the lists together on a single machine, then:
Code:

diff  server1_package_list.txt  server2_package_list.txt
diff  server2_package_list.txt  server3_package_list.txt
diff  server3_package_list.txt  server4_package_list.txt

should not produce any differences if the machines are the same architecture and if they are supposed to have the same packages. I've worked places where load balancing was done between multiple machines which were supposed to be conceptually identical to provide the same services to customers; instead, somehow a deployment of an update failed to one machine and wasn't caught. Customers would complain, tests would be done and seemingly be successful, if done through load balancing, because the load balancer would happen to connect to a machine that had the correct collection of packages. I used this method to check packages and so found the deployment failure; when a test was performed directly against the one machine with the wrong collection of packages, the error was isolated.

Additional info: Ooops! Sorry, I missed that the md5 sum was the same on two servers. If you run these two commands:
Code:

file /bin/tracepath
ldd /bin/tracepath

on Server 1 and Server 2, are the results on both Server 1 and Server 2 the same?

HTH.

Operator 08-31-2015 06:09 AM

Lots to process! From the top...

Quote:

Originally Posted by rigor (Post 5412600)
Pardon me if I go into too much detail...

No such thing. I was VERY pleasantly surprised to come in today and see a reply :-)

Okay, the result from uname -a on Server 1:
Code:

Linux [address] 2.6.18-404.e15 #1 SMP Sat Mar 7 04:14:13 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
Server 2:
Code:

Linux [address] 2.6.18-406.e15 #1 SMP Fri May 1 10:37:57 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
^^Slight differences there...

file /bin/tracepath and ldd/bin/tracepath on Server 1:
Code:

/bin/tracepath: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, stripped
linux-vdso.so.1 =>  (0x00007fff207fd000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00002b4889028000)
libc.so.6 => /lib64/libc.so.6 (0x00002b488923d000)
/lib64/ld-linux-x86-64.so.2 (0x0000003ac1000000)

Server 2:
Code:

/bin/tracepath: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, stripped
linux-vdso.so.1 =>  (0x00007fff81b62000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00002abf1656c000)
libc.so.6 => /lib64/libc.so.6 (0x00002abf16781000)
/lib64/ld-linux-x86-64.so.2 (0x0000003835e00000)

Also worth noting, I ran
Code:

setenforce 0
restart -r now
setenforce 1

and then the return for type /bin/tracepath was the same as server 2 rather than what it was up there in post 6 of this thread.

pan64 08-31-2015 08:39 AM

so what does it mean? Problem solved, or just hashed was not printed or ???
I would try strace tracepath if nothing else helped (on both hosts)

Operator 08-31-2015 08:43 AM

Quote:

Originally Posted by pan64 (Post 5413649)
so what does it mean? Problem solved, or just hashed was not printed or ???
I would try strace tracepath if nothing else helped (on both hosts)

No, the problem was not solved. I'm just being a functional member of the forum and including information the helpful folks might need.

I tried strace. Bash said the command was not found

pan64 08-31-2015 12:18 PM

would be nice to see what have you tried exactly and what was the error message exactly. Probably you hid some information....

Operator 08-31-2015 12:19 PM

Wow, that's kind of an attack, isn't it? No, I didn't hide a single thing from you except for the server's address. If you want something that's not here, ask me for it.

Operator 08-31-2015 12:24 PM

The ONLY thing that has changed since the start of this thread is that instead of Server 1 saying
Code:

tracepath is /bin/tracepath
now they both say
Code:

tracepath is hashed (/bin/tracepath)
.

tracepath still gives me the same result on Server 1 - cannot load shared library. Same as in post 1 of this thread.

pan64 08-31-2015 12:34 PM

hashed only means the shell already found it and remembered. That is from this point of view meaningless.
You are administering that host, so probably you can install strace on it.

what will locate libresolv respond (on both hosts)?
How LD_LIBRARY_PATH was set?

Operator 08-31-2015 12:37 PM

I'm one of the admins. I'll have to get permission from the lead to install it.


All times are GMT -5. The time now is 07:08 PM.