LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices

Reply
 
Search this Thread
Old 06-02-2010, 10:12 AM   #1
cbrao
LQ Newbie
 
Registered: Jun 2010
Posts: 5

Rep: Reputation: Disabled
Filesystem & Boot poblems after power failure


I am an enduser who now needs to do some sys admin! Only real past experience is having successfully installed Slackware back in 1995-96.

I have RHEL WS 4 on an HP XW6200 dual xeon system with 2 SATA (~75 GB each) and 1 IDE (160 GB) HDDs and Nvidia graphics card. Stable OS, no issues since 2005 till about 5 weeks ago. Abrupt power down brought up EXT3-fs error, system was hung, could remote login but not do a clean shutdown. Had to hard re-boot. Bootup normally happens from SATA drive.

Should have paid more attention, but all seemed well. Every few days the same error was repeated. Had to run fsck a couple of times. Last week, when a user tried to login, system got hung, did not allow remote login, had to switch off power. On restart showed following errors (could not get all messages as they were scrolling too fast):

(some ext3-fs errors in /dev/sda6 and /dev/sda2)
SCSI error <0 0 0 0> return code....
FMK EOM current sda sense= 70 cc
ASC=e2 ASCQ=cf
end request: I/O error, dev sda ...

On re-boot, errors were :

ata1 : called with no error (80)!
mkrootdev : label/ not found
mount: error 2 mounting ext3
mount: error 2 mounting none
switch root: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

Tried F10 (system set-up). Found IDE and 1 SATA drive but not the 2nd SATA. System seemed to be trying to bootup from IDE, which is not the boot disk. Hence it would get hung, no messages.

Tried bootup with recovery DVD (Ubuntu). Many bootup commands showed [FAILURE], then showed :

INIT Id "x" respawning too fast:disabled for 5 minutes.
INIT Id "x" respawning too fast:disabled for 5 minutes.
INIT Id "x" respawning too fast:disabled for 5 minutes.
ATA: abnormal status 0x80 on port 0x34 F7
ATA: abnormal status 0x80 on port 0x34 F7
ATA: abnormal status 0x80 on port 0x34 F7
ata1: command 0x35 timeout, stat 0x80 host_stat 0x21
ata1: status 0x80 {Busy}
SCSI error: <0 0 0 0> return code=0x8000002
FMK EOM ILI current sda : sense 70 f0
ASC=2 ASCQ=a0
end_request: I/O error, dev sda, sector 129842497
Buffer I/O error on device sda6, logical block 2394324
Lost page write due to I/O error on sda6
(similar messages continued endlessly)

When I tried to switch off the CPU, it gave :
ACPI-0691: ***Error: acpi_ev_gpe_dispatch: No handles or method for GPE[1F], disabling event

then earlier ata errors continued.

Opened up the CPU unit, tightened the connections, then both SATA disks were recognized, normal boot-up was initiated, showed unclean shutdown, /usr unclean, and went to Repair mode. e2fsck did not work. Ran fsck. Filesystems were treated as ext2-fs. Showed a lot of inode errors, which I should probably not have cleaned. Finally, got :

EXT3-fs error (device sda6): ext3_find_entry: bad entry in directory # 163521:inod out of bounds - ffset=1148, inode=163599, rec_len=16, name_len=5

Read a post than partition should be re-mounted as ext2-fs, but did not know how to do this.

e2fsck /dev/sda6 gave :

couldn't find ext2 superblock, trying backup blocks ..
ext3 recovery flag clear, but journal has data
Recovery flag not set in backup superblock, so running journal anyway.
/: recovering journal
Backup journal inode block informatiom.

Pass1: checking inodes, blocks, and sizes
Root inode not a directory. Clear(y)?

That is where I am now. What should I do?
 
Old 06-02-2010, 10:45 AM   #2
pingu
Senior Member
 
Registered: Jul 2004
Location: Skuttunge SWEDEN
Distribution: Debian preferably
Posts: 1,287

Rep: Reputation: 120Reputation: 120
Honestly, what you should do is call a Linux pro - this may disappoint you since you'll have to pay for it but I believe your skills are not enough.
This looks like a serious problem which handled wrong might ruin every piece of data you've got.
Done correctly - well, everything except broken hardware can be fixed.
Of course it depends on the server, what's it used for, how many are depending on it?

If you want to try yourself, here's my "analysis":
1) At least one partition on one disk - but probably more - are faulty.
It could be a physically damaged hard disk or just a corrupt filesystem, impossible to tell by now.
One reason for a corrupt filesystem is an abrupt power down.
On the other hand, a faulty system might cause a powerdown.
2) The system seems to boot up from wrong disk/partition.
This can be because one disk is physically damaged and therefore not seen.
It could also be that the disks are numbered in a different order - that happens with anything but IDE-drives (AFAIK)

In order to help you along we need the following:
Post the systems /etc/fstab.
If you can't read that file write down what you remember/know of the partitioning & mounting scheme, including info about what filesystem is used.
Information about bootloader - lilo or grub?
Post bootloaders config file.

Do NOT!try to repare a filesystem if you don't know what you are doing! (Personally, I learned that the hard way, ruined a very pretty girls disk completely )

Or you could simply reinstall.
Depends of course on what it's used for, if you can set it up again with the used apps, and if you know where users data is. If the system is properly setup you can remove the system without touching users data.

Edit: I forgot something very important:
Don't do anything from the system itself! Shut it down if it's running and don't start it again!
Boot from a live-cd, then check out your disks & partitions.
Mount the partitions you need read-only!
Here are a few commands:
# fdisk -l (shows the disks & partitions)
# mount /dev/sda1 /mnt/sda1 -o ro (mounts /dev/sda1 readonly on directory /mnt/sda1)

Last edited by pingu; 06-02-2010 at 11:14 AM. Reason: Forgot something important
 
1 members found this post helpful.
Old 06-02-2010, 01:18 PM   #3
DrLove73
Senior Member
 
Registered: Sep 2009
Location: Srbobran, Serbia
Distribution: CentOS 5.5 i386 & x86_64
Posts: 1,118
Blog Entries: 1

Rep: Reputation: 129Reputation: 129
^agreed. Call a professional in attempt to save what can be saved. I hope you created backups every once a while.

If it is a production server, I would ditch all of the hardware and buy new hardware, just to be on the safe side. It could be a good time to upgrade to RHEL 5.5.

Last edited by DrLove73; 06-02-2010 at 01:20 PM.
 
Old 06-03-2010, 08:54 AM   #4
cbrao
LQ Newbie
 
Registered: Jun 2010
Posts: 5

Original Poster
Rep: Reputation: Disabled
Thanks for all the suggestions.

Flip side : All data is safe thanks to automated daily backups
Flop side : The system that is down was via media for data backup from all linux workstations, and now I have to fix that.

Flop : Can't boot from Ubuntu live CD - explicitly specified CDROM for boot-up, but control gets transferred to HDD and ends up with Kernel panic.

Will try to get professional help.
 
Old 06-04-2010, 03:41 AM   #5
DrLove73
Senior Member
 
Registered: Sep 2009
Location: Srbobran, Serbia
Distribution: CentOS 5.5 i386 & x86_64
Posts: 1,118
Blog Entries: 1

Rep: Reputation: 129Reputation: 129
Quote:
Originally Posted by cbrao View Post
Flop : Can't boot from Ubuntu live CD - explicitly specified CDROM for boot-up, but control gets transferred to HDD and ends up with Kernel panic.
If you need a Live CD for repairs, please use SystemRescueCDor one of the CentOS LiveCD's. Using other distros on business data is not something I would be comfortable with.

Quote:
Originally Posted by cbrao View Post
Will try to get professional help.
The BEST thing to do.
 
Old 06-06-2010, 09:07 AM   #6
cbrao
LQ Newbie
 
Registered: Jun 2010
Posts: 5

Original Poster
Rep: Reputation: Disabled
Knoppix was able to read all partitions except / and /user. Errors shown for accessing / and /usr were : wrong fs type, bad option, bad superblock. Called expert from HQ, whose SystemRescueCD didn't help gain access to /. As a last resort, he ran fsck -y, and said he lost /home. However I have backup on server and can retrieve all but 4 files. Finally, he re-installed OS on a new HDD. Thanks for all the inputs.
 
  


Reply

Tags
error, ext3fs, journal, kernel, panic


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cant boot after power failure landysaccount Linux - Newbie 3 01-07-2010 06:40 AM
No GUI after power failure at boot. Imajica666 Linux - Desktop 5 06-07-2008 12:40 PM
need help.. power failure now no boot vbtalent Fedora 2 11-02-2005 06:51 PM
Unclean shutdown from power failure... filesystem check errors macisaac Linux - General 4 12-26-2003 07:50 PM
boot problem after power failure radulucian Linux - Newbie 1 06-30-2003 04:53 AM


All times are GMT -5. The time now is 11:35 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration