Power failure - boot hangs (after enabling /etc/fstab swap [OK])
Yesterday, I had a power outage.
Due to that power failure a CentOS 5.5 x86_64 Test server (UPS battery not yet connected) restarted. But the machine doesn't boot properly anymore
It hangs at boot just when "Enabling /etc/fstab swap [OK]" is being displayed.
To me, it looks like he hangs on the "entering non-interactive startup" or "Init : Runlevel 3" step.
The X environment shows a total blue screen instead of a CentOS login
Other ttys don't help me... on everyone of them i see a blinking cursor with no possibility to enter data.
The server is a Dell PowerEdge T610 with hardware RAID 1 and RAID 5
I already booted with "linux rescue" via the install DVD and did a fsck on VolGroup00 based on Running fsck in CentOS 5
No errors were found...
I also tried to comment out the swap file in "/etc/fstab" but that doesn't help me either.
Are there other things i can try to get the server booting correctly?
You said you did a linux rescue off of a CD, have you attempted to boot into single user mode?
First step is going to be to boot into single user mode, and look at your /var/log/messages /var/log/dmesg log files to figure out exactly what is happening and where the error is occurring.
Also, during the boot process you actually have multiple tty's running, you can switch between them with alt+f5,f6,etc... and alot of times these other terminals will show the detailed info of the boot process.
Indeed, as you expected, i only did a linux rescue off of a CD. Didn't try "single user mode" rightaway.
On your remark I tried that, but it doesn't change a thing.
Before doing that, I recovered the "messages.log" files.
Looking into this file i found some "error" lines.
I don't think that "error getting update info: Cannot find a valid baseurl for repo: addons" or "Jan 9 07:57:28 MyTestServer kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff81010435a470), AE_SUPPORT" have something to do with it.
To me, it looks like only the following lines could lead to a solution.
Jan 9 07:57:28 MyTestServer kernel: EXT3-fs: INFO: recovery required on readonly filesystem.
Jan 9 07:57:28 MyTestServer kernel: EXT3-fs: write access will be enabled during recovery.
Jan 9 07:57:28 MyTestServer kernel: kjournald starting. Commit interval 5 seconds
Jan 9 07:57:28 MyTestServer kernel: EXT3-fs: dm-0: orphan cleanup on readonly fs
Jan 9 07:57:28 MyTestServer kernel: EXT3-fs: dm-0: 17 orphan inodes deleted
Jan 9 07:57:28 MyTestServer kernel: EXT3-fs: recovery complete.
Jan 9 07:57:28 MyTestServer kernel: EXT3-fs: mounted filesystem with ordered data mode.
I did another fsck on the partition that was written with LVM.
It gave me like this :
WARNING: couldn't open /etc/fstab: No such file or directory
e2fsck 1.39 (29-May-2006)
Pass 1: checking inodes, blocks, and sizes
Pass 2: checking directory structure
Pass 3: checking directory connectivity
Pass 4: checking reference counts
Pass 5: checking group summary information
So the fsck doesn't show me any problems.
Changing the tty's at boot don't tell me much more.
If the issue is with your swap partition, can you try commenting that line out from your /etc/fstab in single user and boot the system? It seems like that should work here, once it boots up without the corrupted swap partition you can create a new swap partition and add it back to your fstab.
Before i wanted to try your suggestion of swap, i tried to backup the data (even if we speak about a testserver...i liked to save my previous work :banghead:).
I went into the single-user mode, but i did a last test to narrow the problem.
I ran "init 3" command in that single-user mode.
That led me to something else. I saw the systeem booting up all the services until just behind the "S98Avahi-daemon" service. That's "S99firstboot". Is that still related with the mentionned "swap" thingy?
Firstboot can generally be disabled, I have it uninstalled on my production systems. If it is hanging on the firstboot service I would turn it off with chkconfig and see if it boots, however, there should be some pertinent info in the messages log file about why the service failed to start.
|All times are GMT -5. The time now is 11:50 AM.|