Registered: Oct 2002
Distribution: Slackware, LFS, Gentoo
The tale of the broken libc, and how I recovered without rebooting
A couple hours ago, I was updating some stuff on a server, when suddenly all hell broke loose. The make install of the update I was applying, glibc 2.3.2, had bombed with an error saying something about a needed Glibc version not being found.
When I tried doing a make distclean, I got another Glibc version not found error, this time from make. As a starting point to figuring out the problem, I ran a simple ls /lib. That gave the same error!
At that point, it hit me that I had a very serious problem on my hands. The C library was busted, and the entire system was unusable.
I pondered solutions for a little while. Shutting down was to be avoided at all costs, as this was a production server that appeard to still be able to process requests, so long as the daemons were not stopped. I couldn't run any program that was dynamicly linked, so simple solutions like SCP'ing libs from another box, or bringing them over on a CD, would be impossible, as both scp and mount are dynamic. On a whim, I checked to see if the staticly linked tools I used to build the system (LFS 4.0) were still there. I cd'd to / and ran echo * to get a listing.
Sure enough, there was a "lfs-4.0-chapter5.tar.bz2" sitting right where I had left it all this time. Unfortunately, I needed both a working tar and bzip2 to get at it, neither of which I had. What I did have, though, was an SMB mount. So I went to another box, built a static tar and bzip2, put them into that exported directory, went back to the broken server, and extracted the archive.
I then ran static/bin/ls just to see if it would work. It did! So I set my PATH to use these tools, and then began to consider how to proceed from here. I looked in /usr/src to see if I still had the source for the old version of glibc. I did, so I decided to try recompiling it. That went OK, but when it came time to make install, I realised that I had another problem: I had no way to get root access! I couldn't log in as root, the login process still tried to use the (unusable) dynamic tools. su and sudo were dynamic, and even if I had a "su" in the static tools, it would be owned by me, rendering any suid bit pointless.
Then I realised that, unlike before, I now have a working mount, so I can mount a CD! I went over to yet another box, one that had a CD-RW drive, and burned a CD containing a static 'su'. Brought it over, mounted it, became root, installed fresh glibc. Just as suddenly as the system died, it sprung back to life! I rm -rf'd the static tools, set the PATH back to normal, and continued on with business as usual.
There was no downtime, so as far as everyone else can tell, nothing out of the ordinary happened. In the future, I will think twice before performing non-essential updates.