Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
If you 'rpm -qf /lib/libdl.so.2' you'll see it's the Glibc package you could use rpm --verify on to start with. before you reinstall Glibc it would be good to look at other things. Are there or have there been any other logged errors leading up to this?
Anyway instead of chasing what wrong on your system I think it's much simpler just reinstall it
I disagree strongly. About the only time errors like ELF header corruption could happen on GNU/Linux systems is when package contents are written to (as in update). The rest of the time the library file is accessed but not modified. If this was not due to an update then not knowing the source of corruption means it can occur again. Running GNU/Linux is all about performance, protecting assets and providing services in a continuous, stable and secure way so you should not deliberately neglect signals like that. Besides that the "re-install and all will be fine" mantra is reminiscent of working with products from this particular vendor founded to develop and sell BASIC interpreters for the Altair 8800 and doesn't solve anything. Work on the cause, not the symptoms.
I disagree strongly. About the only time errors like ELF header corruption could happen on GNU/Linux systems is when package contents are written to (as in update). The rest of the time the library file is accessed but not modified. If this was not due to an update then not knowing the source of corruption means it can occur again. Running GNU/Linux is all about performance, protecting assets and providing services in a continuous, stable and secure way so you should not deliberately neglect signals like that. Besides that the "re-install and all will be fine" mantra is reminiscent of working with products from this particular vendor founded to develop and sell BASIC interpreters for the Altair 8800 and doesn't solve anything. Work on the cause, not the symptoms.
The question was (As I see it ) about making system work again, not about investigation. It can be any number of reason why it's happened - from hardware failure to rootkit and to occasional bug in some process running as root. While it's interesting HOW system got into this state it's not always possible to find it out. Sure it's have nothing to do with making system work again, but can help to avoid next breakage.
While it's interesting HOW system got into this state it's not always possible to find it out.
Sure, but there's a difference between actually trying to find the root cause and saying "oh, well, just reinstall whatever it is that's b0rken". If this was a production environment where an informed management decision was made (weighing all risks, consequences et cetera) to trade in diagnosis for uptime, then I would agree. Business requirements just bring a different type of "clarity" to things. But otherwise it is a perfect example of human nature to seek the path of least resistance (like by just reinstalling software). The point is there's nothing to be learnt from that approach and it does not solve anything.
Sure, but there's a difference between actually trying to find the root cause and saying "oh, well, just reinstall whatever it is that's b0rken". If this was a production environment where an informed management decision was made (weighing all risks, consequences et cetera) to trade in diagnosis for uptime, then I would agree. Business requirements just bring a different type of "clarity" to things. But otherwise it is a perfect example of human nature to seek the path of least resistance (like by just reinstalling software). The point is there's nothing to be learnt from that approach and it does not solve anything.
I see two different problems: 1) Repair damaged system. 2) Understand what caused damage.
My initial post was related to the first problem: if system is damaged I find it that usually it simpler just re-install it, than fix problem after problem.
And how one proceed with repair in quite unrelated to the second problem - understand what caused damage. If investigate problem is important, than re-installation can be done on the different hard drive (or different computer).
Well, this has certainly turned into an interesting discussion.
Regarding the initial question, as a new guy to RHEL I really have no idea what caused it. Luckily this was a testing system as compared to our actual production environment we are moving towards. I had applied updates that RH deemed worth while just prior to the crash. While i did bring the system back to life via rolling back the virtual server to an earlier instance i still have nothing on the problem. Luckily, again this was a testing enviroment (I needed to test badly enough to roll back pre-error).
In the meantime I'm going to look into a reading more documentation on RH in general so that if this were to happen to our physical server I've got some direction on how to correct it.
For what it’s worth, here are my fix procedures using the CentOS5.5 Live CD.
The CD works nicely with my cable modem.
WHAT WORKED.
1. Loaded linux from the Live CD. Logged in as root (a must).
2. Edited the Live CD’s /etc/fstab file. Located the /dev/hdaN entry associated
with the hard drive of the damaged linux home. changed ro (read)
parameter to rw (read-write).
Note: -N- is an integer
3. invoked yum at the Live Cd's command prompt:
yum --installroot=/mnt/disc/hdaN reinstall glibc
Note: -N- is the same integer as above.
I got a message complaining about libXmuu.so.1 being unable to link.
4. Rebooted linux from hard drive. There was no kernel panic, but a libXmuu.so.1 message reappeared:
/sbin/libconfig: can’t link /usr/lib/libXmuu.so.1 to libXmuu.so.1.0.0
I replied -yes- to a deletion request.
5. I got into the desktop, but many of the menu items and desktop icons were unusable, so I repeated steps 1-3, but using libXmuu.so.1 instead of glibc in the Yum command. Yum found the appropriate package (libXmuu.so.1 is a link) and installed it. I was back up.
WHAT DIDN’T WORK.
1. Manually repairing libdl.so.2. It’s a link to /lib/libdl-2.5.so in the same directory.
Copying/recreating the link didn’t work for me.
Note: I found it’s location by entering:
locate libdl.so.2
at the command prompt.
2. Booting GRUB in emergency mode. Followed
26.4. Booting into Emergency Mode of the CentOS manual
to append emergency to the kernel line. Got same kernel panic.
Last edited by RootAround; 10-24-2010 at 12:22 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.