LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 01-26-2009, 11:13 AM   #1
dglinder
LQ Newbie
 
Registered: Jan 2009
Posts: 6

Rep: Reputation: 0
Angry RHEL 5.3 update ruined my system


I've got a pretty urgent Linux problem that I can't figure out, and it's very important for me at work because a critical system is down and there's a lot of pressure to get it fixed yesterday. It's starting to make me look bad and several projects are being held up. But I can't figure out what to do. After the RHEL 5.3 upgrade, the system refuses to boot.

The system in question is, unfortunately, 500 miles away, so I can't go to it and do my normal difficult-problem repair routine of "flail, panic, wave dead chickens, and keep trying stuff until something finally works just by random chance". Also (naturally) this is one of the few systems for which there is no remote console connection, so all work has to be done by getting someone in the data center on the phone and walking them through what I need done. All I have is a cell phone picture of the screen where it hangs.

On Friday morning the system was a RHEL 5.2 system running happily on Sun X2100 hardware. It was working fine with people doing lots of important development on it. But it was in the scheduled rotation for patching, so at the scheduled time I ran "yum update". It hadn't been patched in a long time, so a large number of packages were updated,something like 350. But the update seemed to go fine, yum downloaded and installed everything with no errors as far as I could tell. At least there were none at the end. Then I rebooted it, having changed nothing else except running the patch. Judging by the kernel version, the patch seems to have updated the system to RHEL 5.3.

Now, when the system boots, GRUB loads and asks which kernel I want as usual. Then the kernel image seems to load OK, but right after the ramdisk starts, the system just hangs. Nothing. Nada. I thought I'd try using one of the older kernels from within GRUB, but no matter which one I try, the result is the same.

After selecting the kernel to boot, the usual text comes up:

Filesystem type is ext2fs, partition type 0xfd kernel
/vmlinuz-2.6.18-128.el5 ro root=/dev/md1 serial console=ttys0,9600
[Linux-bzImage, setup=0x1e00, size=0x1cb41c] initrd
/initrd-2.6.18-128.el5.img
[Linux-initrd @ 0x37d4f000, 0x2a02e1 bytes]

At the bottom of the screen it says:

Kernel alive
kernel direct mapping tables up to (lots of numbers)

And it just hangs there after that forever. I haven't tried booting from the rescue CD yet - I'm sure I could, but I have no idea what I'd look at to try to fix this, and I don't want to walk someone through that process because I wouldn't know what to tell them to do after it was booted to the rescue CD and the local filesystems mounted.

I opened a ticket with Red Hat, but we only have basic web support. So far they have sent me one message saying that there's a known issue with the HP ILO driver and I should try booting with the option "noapic". I'm going to try that on Monday of course, but I have low hopes since this isn't an HP system and I don't see why an HP ILO driver would be trying to load.

I'm pretty pissed off at Red Hat. In all their sales literature they brag about the RHN update features and how wonderful and easy they are. Then in all the technical release notes for 5.3, they say "Um, you really should do this as a clean install instead of an upgrade, otherwise you're hosed." Of course there was nothing to indicate this using yum - it just looked like a run of the mill package update.

I've been Googling my ass off all weekend and I just can't find anything appropriate. This is one of those things that are hard to search for because the terms are so common - "Red Hat", "boot", "initrd", "hang", and so on. Plus, RHEL 5.3 is so new there really isn't much specifically about it yet. I really wish I had tested this on another system first, but I've never seen a Red Hat update brick a system like this before. Minor glitches, sure, but nothing like this.

I really have to get this fixed ASAP. If I can't fix it I face the rather unpleasant option of trying to get the developer's data off the system somehow and doing a full reinstall, which would set everything back by days. And if I try to get Red Hat to sort it out it could also take several days because of how slow the response is for the web-only Basic support level. Either of those options would make me look bad at time when we all want to look really good to our employers. I'm not saying I'd get fired, but it would cost me a lot of lost reputation.

My deepest thanks to anyone who has an answer or even suggests something that points me in the right direction.

Thanks... :-(
 
Old 01-26-2009, 11:37 AM   #2
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Quick question (since I'm running only RHEL4): did the bump to 5.3 also include a kernel update? If so, have you tried simply booting from a previous kernel?

-------

Edit: Never mind. I see now that you did. Can you post your yum log following the update? I'm curious to see the packages that were updated.

Last edited by anomie; 01-26-2009 at 11:39 AM.
 
Old 01-26-2009, 11:46 AM   #3
dglinder
LQ Newbie
 
Registered: Jan 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by anomie View Post
Can you post your yum log following the update? I'm curious to see the packages that were updated.
Ummm. I can't boot the system so I have no access to the logs. Today I will try to boot from the rescue CD and see if I can get networking and local drive access set up that way but so far I can't get anyone into the machine room.

I installed RHEL 5.2 on it around last October and at that time I ran a yum update so it was current until then. I do remember that it updated something like 370 packages, which I did think was a lot.
 
Old 01-26-2009, 11:49 AM   #4
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
For someone to (try to) help, you're going to need to be able to actually access the system. Post back at that time...
 
Old 01-26-2009, 11:51 AM   #5
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
By the way, since it sounds like you're able to access your grub boot menu, have you tried booting to single-user mode?
 
Old 01-26-2009, 01:27 PM   #6
dglinder
LQ Newbie
 
Registered: Jan 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by anomie View Post
By the way, since it sounds like you're able to access your grub boot menu, have you tried booting to single-user mode?
Thanks to everyone for suggestions. It turns out that GRUB was in fact using the serial console redirect, so all the output was falling out the serial port instead of going to the scree. Booting without that option gives a lot more information.

It looks like the update hosed a filesystems somewhere. The two drives in the system are in a hardware mirror through a PCI card, and they are using LVM. The kernel goes through the normal "checking filesystems" stuff, and decalres that /dev/mda, /dev/vg00/vol00 through vol04 are all clean. The it gets to /dev/ha and says:

fsck.ext3: No medium found while trying to open /dev/hda
/dev/hda:
The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>

The it falls to the "give root password for maintenance" thing.

I want to do what it says (boot from another superblock) but I'm terrified that fsck might corrupt data. There are no backups of this system yet (they were in the works) so if the data gets hosed that's it. Could booting from another superblock be destructive? Can running a normal fsck to fix errors ruin logical volumes or something?
 
Old 01-26-2009, 01:45 PM   #7
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
I would start by running the fsck on that filesystem, as directed. The operation is not risk-free, but as long as the filesystem is mounted read-only (or not at all) you will most likely be OK.

Also, are you sure /dev/hda is under LVM? If that gets mounted to /boot, normally that would be a ext2/3 filesystem (not a physical volume).

One more note: This is your system and your job. As you know, you obviously should have had good backups in the first place. Only proceed with the level of risk you're comfortable with under the circumstances.
 
Old 01-26-2009, 11:47 PM   #8
cojo
Member
 
Registered: Feb 2003
Location: St. Louis
Distribution: RedHat 8
Posts: 262

Rep: Reputation: 31
dglinder,

as long the filesystem is not mounted. fsck should be safe to run. My question is why is it failing on /dev/hda? Do you have any IDE drive on this server? Because, /dev/hd* usually assign to IDE drive. Plus, if you said your system is using LVM then it should fail on filesystem /dev/vg00/lv00. Can you print out your fstab on the system?
 
Old 01-26-2009, 11:56 PM   #9
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
And actually... now that I re-read this thread, /dev/hda usually refers to the MBR (on an IDE drive). Are you sure that is exactly what the error message says (and not e.g. /dev/hda1)?
 
Old 01-27-2009, 12:31 PM   #10
digitalboy74
Member
 
Registered: Aug 2004
Location: Matrix
Distribution: slack currrent
Posts: 61

Rep: Reputation: 16
> fsck.ext3: No medium found while trying to open /dev/hda

Is that the CDRom drive?

Would seem so since root is "md1". Since the screen was reporting the filesystems are clean, it is likely something in the boot process that is off, as the boot fsck check was reported clean.

Last edited by digitalboy74; 01-27-2009 at 12:34 PM.
 
Old 01-27-2009, 03:25 PM   #11
dglinder
LQ Newbie
 
Registered: Jan 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Angry

Quote:
Originally Posted by anomie View Post
And actually... now that I re-read this thread, /dev/hda usually refers to the MBR (on an IDE drive). Are you sure that is exactly what the error message says (and not e.g. /dev/hda1)?
Yes, I'm sure. And I'm REALLY, REALLY ANGRY at Red Hat. I finally fixed this problem when I was able to edit /etc/fstab and simply comment out the BRAINDEAD line they had added. This is the exact line that the update added to my /etc/fstab file:

/dev/hda /mnt ext3 defaults 1 2

As people have pointed out, /dev/hda isn't even a valid ext3 filesystem. No wonder the boot was failing. That line wasn't there before. Then it was.

Why? Why would Red Hat screw with my critical system files to do something so stupid? I can't think of one good reason. 1) They shouldn't be messing with that file in ANY case. EVER. 2) If they MUST mess with it, they should warn you in BIG RED FLASHING LETTERS about it, with full details and option to abort. And 3) If they MUST do it WITHOUT warning you, they should at least have the script that does it check to see if the filesystem added IS A VALID FSCKING FILESYSTEM. Grrr. Whoever wrote the script is such an amateur that it didn't even occur to them that maybe, just maybe, before they added a filesystem to /etc/fstab maybe they should verify that the filesystem they're adding is actually valid.

I have to say, this incident has made me start to think seriously about whether Red Hat, as a company, is really ready to support an enterprise-class OS. They clearly have almost no respect for the environment of their users. In all their sales literature about Red Hat, they make a big deal about how easy updates are - "just type yum update!" But if you read the release notes for the 5.3 update, they specifically warn that updating in place is not a good idea and they recommend a fresh install. Is there any warning that you're about to do major update? No way. If you happen to type "yum update" one day after the update is released, that's it - you get upgraded in place, with no warning whatsoever about this major change. And if that isn't bad enough, the update will do absolutely hideously stupid things that break your system.

Sun doesn't pull this crap on me. They have respect that my environment is complex and not to be messed with. They don't go randomly editing my critical configuration files and adding lines that a junior sysadmin would realize were dangerous and broken. If this problem had destroyed any data I would now be calling a lawyer to sue the crap out of Red Hat for incompetence and negligence.
 
Old 01-27-2009, 03:52 PM   #12
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
@dglinder: This doesn't sound like a change that a Red Hat package update would make. Are you the only sysadmin on the server?

In any case, I suggest that you run a HIDS (I use aide) going forward so that you will immediately spot changes like this after upgrades, and during the normal course of operation.

Glad you got it fixed.
 
Old 01-27-2009, 03:55 PM   #13
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Quote:
Originally Posted by dglinder
If you happen to type "yum update" one day after the update is released, that's it - you get upgraded in place, with no warning whatsoever about this major change. And if that isn't bad enough, the update will do absolutely hideously stupid things that break your system.
Some additional thoughts on yum: If you just use yum update it'll require you to confirm new package installation. (You must have run yum -y update and overrode that protection.)

You might want to set up a nightly cronjob that runs yum check-update and emails you the results, so that you're aware when changes may be coming down the pipe.
 
Old 01-27-2009, 10:12 PM   #14
dglinder
LQ Newbie
 
Registered: Jan 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by anomie View Post
@dglinder: This doesn't sound like a change that a Red Hat package update would make. Are you the only sysadmin on the server?
Yes, I'm sure. That's why I'm so mad, because it could not have been anything else. I'm the only sysadmin, and the system was just fine before. The only change was that I ran the update. The next command after "yum update" was "shutdown -r"
 
Old 01-27-2009, 10:13 PM   #15
dglinder
LQ Newbie
 
Registered: Jan 2009
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by anomie View Post
You might want to set up a nightly cronjob that runs yum check-update and emails you the results, so that you're aware when changes may be coming down the pipe.
You can be very certain that I'm never going to let Red Hat automatically install any updates again.
 
  


Reply

Tags
redhat, support, update


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Upgrading RHEL 4 update 4 to RHEL 4 update 6?? your_shadow03 Linux - Newbie 4 09-05-2008 01:16 AM
updating RHEL AS4 update 1 to update 5 via CDs? icemaker Linux - Software 1 06-09-2007 06:29 PM
suspend to disk ruined system :| read an interesting story alaios Linux - General 4 02-18-2006 12:30 PM
up2date ruined my system philF Fedora 15 06-11-2004 09:38 AM
Ruined my font-system? mcsmurf Linux - Software 0 10-12-2003 03:23 AM


All times are GMT -5. The time now is 11:51 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration