LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 09-07-2010, 05:26 PM   #1
rockdiablo
Member
 
Registered: Dec 2008
Posts: 30

Rep: Reputation: 15
corrupted drive- mechanical or logical?- fsck repair?


I was working on my RHEL 5.2 workstation yesterday when the OS became flakey. first I noticed that some software that I had running was outputting and error that it was unable to write to log file because permission was denied - I've never seen this output before- it would have been writing to my home dir running under my user name. From an open terminal, i did "ls -al" and saw that many of the permissions the files in my home dir were listed as "????????" some were still "rwxrwxrwx", as well, many files were highlighted in the colors set for links and root privileges.

I tried to start a new terminal, and it failed. then Gnome crashed. When I reset the Machine, I got through grub, and into the startup, and after finding the Volumes, the startup failed with a Kernel panic:

giving several errors- like:

ata1.00 error{unc}
I/O error on devdmo block 14-16
EXCEPTION EMASK 0x0
Mount: setuproot error mounting /dev/root on sysroot ext3
Mount: setuproot error mounting /proc on sysroot ext3
Mount: setuproot error mounting /sys on sysroot ext3
Mount: setuproot error mounting /proc on sysroot ext3
Mount: switchroot nount failed no such file or directory
Kernel Panic -not synching attempted to kill init

I dont have much experiencie with this stuff, but it looks obvious that somethig like the MBR or wherever the partion information is stored might have been corrupted.

What I dont understand is why I can get into GRUB (its a dual boot Windows Vista, RHEL machine). I'm guessing that this means its not a mechanical problem, because I can get RHEL to begin to boot which i think is failing somewhere around the /etc/rc.d/re.sysinit script, and also can get Vista to bring up the inital windows screen and a mouse pointer on spanning both of my screens which i think means that it must have at least loaded my ati drivers for my dual head radeon 4850. windows hangs there however.

I've tried the RHEL 5.2 rescue disc, and it doesn't recognize any Linux partitions.

I ran the system diagnostics out of the dell bios and it came back with a failed HDD : Error code 0142, but from digging around a bit I've found that this is a very broad diagnosis.

My concern over it being a mechanical problem is that I'm not sure that I want to try to run any further diagnostics, or any of the disk utility programs that i've seen listed here on linuxquestions, as it might damage it further, and there is some data that i would really like to get off this disk.


Any advice is appreciated. Thanks in advance for your help.

Last edited by rockdiablo; 09-08-2010 at 01:40 PM.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 09-07-2010, 06:59 PM   #2
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
Greetingz!

Sounds like you're in a pickle.
Quote:
Originally Posted by rockdiablo View Post
...From an open terminal, i did "ls -al" and saw that many of the permissions the files in my home dir were listed as "????????" some were still "rwxrwxrwx", as well, many files were highlighted in the colors set for links and root privileges.
I've seen this very same thing happen when data corruption has occured on a hard disk. This isn't a good sign.
Quote:
Originally Posted by rockdiablo View Post
...startup failed with a Kernel panic:
giving several errors- like:
Code:
    ata1.00 error{unc}
    I/O error on devdmo block 14-16
    EXCEPTION EMASK 0x0 
    Mount: setuproot error mounting /dev/root on sysroot ext3
    Mount: setuproot error mounting /proc on sysroot ext3
    Mount: setuproot error mounting /sys on sysroot ext3    
    Mount: setuproot  error mounting /proc on sysroot ext3  
    Mount: switchroot nount failed no such file or directory
    Kernel Panic -not synching attempted to kill init
Problems when trying to boot off the disk could be related to a few bad sectors, however you might have to completly reformat, fsck, then reload the Operating System onto the disk.
Quote:
Originally Posted by rockdiablo View Post
...somethig like the MBR or wherever the partion information is stored might have been corrupted.
The MBR is just a chunk of blocks up a the front of the disk, if that was hosed, there's a good chance GRUB wouldn't be able to "see" any partitions.
Quote:
Originally Posted by rockdiablo View Post
What I dont understand is why I can get into GRUB (its a dual boot Windows Vista, RHEL machine).
The boot loader is actually pretty tiny, stored in a few blocks at the beginning of the disk.
Quote:
Originally Posted by rockdiablo View Post
and also can get Vista to bring up the inital windows screen and a mouse pointer on spanning both of my screens which i think means that it must have at least loaded my ati drivers for my dual head radeon 4850. windows hangs there however.
Hm, if the problem is limited to only a ton of bad sectors, then you might be able to start Windows in Safe Mode and get to the System Event Log. That should give you some clues as to the exact problem.
Quote:
Originally Posted by rockdiablo View Post
I've tried the RHEL 5.2 rescue disc, and it doesn't recognize any Linux partitions.
Okay, now that could be a real problem. Either the hard drive is having problems with requests, so the I/O operations are timing out, or your parition table is/was corrupt.
Quote:
Originally Posted by rockdiablo View Post
...Error code 0142, but from digging around a bit I've found that this is a very broad diagnosis.
Yes, that does look like a "General Error".
Quote:
Originally Posted by rockdiablo View Post
My concern over it being a mechanical problem is that I'm not sure that I want to try to run any further diagnostics, or any of the disk utility programs that i've seen listed here on linuxquestions, as it might damage it further, and there is some data that i would really like to get off this disk.
Well, I would suggest getting a Live Linux CD (such as Knoppix) and hoping for the best. Keep in mind that it seems as though whatever has gone south with your hard drive is corrupting data, so no gaurantees anything you're able to pull off would be of any actual use.

Now, as for determining if this is a "logical" or "mechanical" problem, I would do a read-test of the drive.

1) Get a Linux Live CD (like KNOPPIX).
2) Boot the Linux Live CD
3) Open two terminal windows.
A) In the first terminal window, you're going to 'tail -f /var/log/messages' (basically, just watch the syslog logs)
B) In the second terminal window, you're going to use a command to read the *entire* drive (this can take a while).
time dd if=/dev/sda of=/dev/null
NOTE: "sda" should be replaced with whatever device name your drive was assigned. Check the output of the 'dmesg' command if you're not sure. Don't use "sda1" or "sda2", just use the 'whole drive name' (sda, sdb, sdc, etc)
WARNING: Do not get "if=/dev/<drive_name>" and "of=/dev/null" wrong!
Those stand for "In File" and "Out File" respectivly. If you accidentally put "if=/dev/null" and "of=/dev/sda" you will DESTROY your chances of getting anything useful off the drive.

Now, while that "dd" command is running, do the following;

A) Keep your eye on the logs
B) Keep an ear out for any weird noises the hard drive makes.

If you start to see "Drive Seek" or "I/O" errors in the logs, the drive is mechanically failing.

Hear the drive make any clicks, chirps, squeaks, beeps it shouldn't be making? Mechanical failure.

If it's a mechianical failure, get the make/model/serial of the drive (might be in the output of 'dmesg' somewhere) and pull up the drive's information online. I'm sure you'll want to replace the drive with the same type (i.e: Same Spindle Speed/RPM and same interface type).

Hope it's under warranty!
 
2 members found this post helpful.
Old 09-07-2010, 07:11 PM   #3
Matir
LQ Guru
 
Registered: Nov 2004
Location: San Jose, CA
Distribution: Debian, Arch
Posts: 8,507

Rep: Reputation: 128Reputation: 128
In addition to xeleema's excellent advice, consider using smartmontools to access the drive's built in diagnostics, perform tests, and see more information than you really want about the drive.
 
2 members found this post helpful.
Old 09-07-2010, 07:41 PM   #4
rockdiablo
Member
 
Registered: Dec 2008
Posts: 30

Original Poster
Rep: Reputation: 15
Thanks alot for your quick and thorough reply. I'll give this a shot in the morning, and post results.
 
Old 09-08-2010, 11:24 AM   #5
rockdiablo
Member
 
Registered: Dec 2008
Posts: 30

Original Poster
Rep: Reputation: 15
Thanks again for your responses.

I had a FC11 live usb stick laying around and i booted from that. I think that the issue now is more weather the data is recoverable and how than if the problem is physical or not. Before taking your suggestion of "dd"ing through the entire drive, i noticed a few things that I should add:

first, the fc11 distro has an automatic drive diagnostic: palmipsest Disk Utility which automatically popped up with an error.

Code:
Error mounting: mount exited with exit code 32: mount: wrong fs type, bad option, bad superblock on /dev/dm-2,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

while trying to mount the partition:

from dmesg | tail i get:

Code:
EXT3-fs: can't read group descriptor 17
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[liveuser@localhost ~]$ dmesg |tail -n 100
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        32 b8 ac bb 
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 850963643
ata1: EH complete
EXT3-fs: can't read group descriptor 17
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1.01: configured for UDMA/100
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        32 b8 ac bb 
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 850963643
ata1: EH complete
EXT3-fs: can't read group descriptor 17
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
and while watching the sys log via tail -f /var/log/messages

Code:
Sep  8 13:01:09 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 13:01:09 localhost kernel: ata1.00: BMDMA stat 0x64
Sep  8 13:01:09 localhost kernel: ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
Sep  8 13:01:09 localhost kernel:         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
Sep  8 13:01:09 localhost kernel: ata1.00: status: { DRDY ERR }
Sep  8 13:01:09 localhost kernel: ata1.00: error: { UNC }
Sep  8 13:01:09 localhost kernel: ata1.00: configured for UDMA/133
Sep  8 13:01:10 localhost kernel: ata1.01: configured for UDMA/100
Sep  8 13:01:10 localhost kernel: ata1: EH complete
Sep  8 13:01:13 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 13:01:13 localhost kernel: ata1.00: BMDMA stat 0x64
Sep  8 13:01:13 localhost kernel: ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
Sep  8 13:01:13 localhost kernel:         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
Sep  8 13:01:13 localhost kernel: ata1.00: status: { DRDY ERR }
Sep  8 13:01:13 localhost kernel: ata1.00: error: { UNC }
Sep  8 13:01:13 localhost kernel: ata1.00: configured for UDMA/133
Sep  8 13:01:13 localhost kernel: ata1.01: configured for UDMA/100
Sep  8 13:01:13 localhost kernel: ata1: EH complete
Sep  8 13:01:16 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 13:01:16 localhost kernel: ata1.00: BMDMA stat 0x64
Sep  8 13:01:16 localhost kernel: ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
Sep  8 13:01:16 localhost kernel:         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
Sep  8 13:01:16 localhost kernel: ata1.00: status: { DRDY ERR }
Sep  8 13:01:16 localhost kernel: ata1.00: error: { UNC }
Sep  8 13:01:16 localhost kernel: ata1.00: configured for UDMA/133
Sep  8 13:01:16 localhost kernel: ata1.01: configured for UDMA/100
Sep  8 13:01:16 localhost kernel: ata1: EH complete
Sep  8 13:01:19 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 13:01:19 localhost kernel: ata1.00: BMDMA stat 0x64
Sep  8 13:01:19 localhost kernel: ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
Sep  8 13:01:19 localhost kernel:         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
Sep  8 13:01:19 localhost kernel: ata1.00: status: { DRDY ERR }
Sep  8 13:01:19 localhost kernel: ata1.00: error: { UNC }
Sep  8 13:01:19 localhost kernel: ata1.00: configured for UDMA/133
Sep  8 13:01:19 localhost kernel: ata1.01: configured for UDMA/100
Sep  8 13:01:19 localhost kernel: ata1: EH complete
Sep  8 13:01:22 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 13:01:22 localhost kernel: ata1.00: BMDMA stat 0x64
Sep  8 13:01:22 localhost kernel: ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
Sep  8 13:01:22 localhost kernel:         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
Sep  8 13:01:22 localhost kernel: ata1.00: status: { DRDY ERR }
Sep  8 13:01:22 localhost kernel: ata1.00: error: { UNC }
Sep  8 13:01:22 localhost kernel: ata1.00: configured for UDMA/133
Sep  8 13:01:22 localhost kernel: ata1.01: configured for UDMA/100
Sep  8 13:01:22 localhost kernel: ata1: EH complete
Sep  8 13:01:25 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 13:01:25 localhost kernel: ata1.00: BMDMA stat 0x64
Sep  8 13:01:25 localhost kernel: ata1.00: cmd 25/00:08:b9:ac:b8/00:00:32:00:00/e0 tag 0 dma 4096 in
Sep  8 13:01:25 localhost kernel:         res 51/40:00:bb:ac:b8/40:00:32:00:00/00 Emask 0x9 (media error)
Sep  8 13:01:25 localhost kernel: ata1.00: status: { DRDY ERR }
Sep  8 13:01:25 localhost kernel: ata1.00: error: { UNC }
Sep  8 13:01:25 localhost kernel: ata1.00: configured for UDMA/133
Sep  8 13:01:25 localhost kernel: ata1.01: configured for UDMA/100
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Unhandled sense code
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Sep  8 13:01:25 localhost kernel: Descriptor sense data with sense descriptors (in hex):
Sep  8 13:01:25 localhost kernel:        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Sep  8 13:01:25 localhost kernel:        32 b8 ac bb 
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Sep  8 13:01:25 localhost kernel: end_request: I/O error, dev sda, sector 850963643
Sep  8 13:01:25 localhost kernel: ata1: EH complete
Sep  8 13:01:25 localhost kernel: EXT3-fs: can't read group descriptor 17
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep  8 13:01:25 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Sorry for the huge log, I'm just trying to figure out weather I can repair the disk and pull any info off of it, or if its hopeless.

So what I know is:

1. the Drive is failing and unable to reallocate bad blocks
2. the 750 GB main drive with the MBR and main partition table are readable
2. the 2 windows partitions are mountable via the FC11 live env
3. the 350 GB RHEL ext3 partition is readable
5 the RHEL /boot partition is mountable via the FC11 env ( i mounted it and was able to see all grub files, etc.)
6. the RHEL 314GB LVM2 physical volume is recognized though unmountable
7. there are I/O errors in the logs.

what I'm wondering now is if a tool like fsck can fix this, or can i what might come out if i did a bit for bit copy via dd to another drive?

Any advice is appreciated. Thanks again.

Last edited by rockdiablo; 09-08-2010 at 12:07 PM.
 
Old 09-08-2010, 01:37 PM   #6
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
Greetingz!

Well, depending on where (what sectors) the drive is bad, this could be causing the kernel to freak out.
See if you have the "libata" module loading ("lsmod | grep libata"). If so, then you'll have to insert "options libata noacpi=1" somewhere.
I can't remember the specifics for Fedora Core 11, so look to see if you have an /etc/modprobe.conf or /etc/modprobe.d/options file.
That *might* help.

The "fsck" command is for damaged filesystems, if you have data corruption (like we susupect), then you're going to need to do that "dd" test first.

Have to make sure the drive itself is fine.

By the way, this wouldn't happen to be a 750GB IBM DeskStar, would it?

Last edited by xeleema; 09-08-2010 at 01:38 PM.
 
1 members found this post helpful.
Old 09-08-2010, 01:49 PM   #7
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
If you want to recover any files from that drive then boot a live CD and use ddrescue to clone the drive to another drive. Then you can use some recovery programs to recover files. Or, if you don't have a large enough drive then you can just try to recover files to another smaller drive right away, but the dieing HDD may die completely in the process.
 
1 members found this post helpful.
Old 09-09-2010, 08:16 PM   #8
rockdiablo
Member
 
Registered: Dec 2008
Posts: 30

Original Poster
Rep: Reputation: 15
Quote:
Well, depending on where (what sectors) the drive is bad, this could be causing the kernel to freak out.
See if you have the "libata" module loading ("lsmod | grep libata"). If so, then you'll have to insert "options libata noacpi=1" somewhere.
I can't remember the specifics for Fedora Core 11, so look to see if you have an /etc/modprobe.conf or /etc/modprobe.d/options file.
That *might* help.
Thanks, I will try this. I've been trying to figure out what to do in the worst case scenerio, I havent wanted to start it up again in case its dieing quickly.


Quote:
By the way, this wouldn't happen to be a 750GB IBM DeskStar, would it?
I think its a Western Digital.

I've looked into ddrescue and that seems like the way to go.

Quote:
if you have data corruption (like we susupect), then you're going to need to do that "dd" test first.
Would you still suggest running the "dd" test before running ddrescue? I am only hesitant because if it is failing quickly, I don't want to run it to hard before i try to retrieve the data.

I might be getting a little off topic here, but you guys seem to know what your talking about, so figured I'd run my plan of attack by you.

Also wondering if you could recommend a new 1 TB SATA HDD capable of working in a RAID 1 Array (want to avoid this problem again).


So I guess What I'll Do is :

1. Disconnect current Failing HDD
2. Connect new 1 TB HDD and
3. Install RHEL 5.2 fresh on new HDD and create an extra ~400GB ext3 /rescue partition.
4. Shutdown, reconnect Failing HDD
5. Restart from a Live FC11 usb, install ddrescue
6. clone partition with something like (havent figured out exactly what yet)
Code:
    $ ddrescue -n /dev/sdax /dev/sdbx rescue.log
    $ ddrescue -r 3 /dev/sdax /dev/sdbx rescue.log
7. Restart back into fresh install and run something like(havent figured out exactly what yet)
Code:
    $ fsck -v -f /dev/sdbx
    $ mount -t ext3 -o ro /dev/sdbx /mnt
8. Copy all relevant data (or entire filesystem) to a remote machine via scp (or dd)
9. Disconnect Failed drive
10.Connect second 1TB drive and re-install RHEL 5.2 with RAID 1.
11.Copy relevant files(filesystem) back from remote machine via scp (or dd)


I'm wondering if i need to do this from a Live environment if i'm already going to install the temporary RHEL 5.2 onto the new drive, and could skip step 5. from what i've seen ddrescue can take a good amount of time(days), and It seems that if it fails it needs to read from a logfile. Without persistant memory, on the FC11 USB, It seems like it might be better to go from the HDD install if thats possible.

This may be the wrong way to go. The drive has Windows partitions on it and I wouldnt mind saving the entire thing Though I'm much more interested in whats on the linux partition. If I were to try to use ddrescue to just clone the entire drive, and if i were lucky enough that the problem were just a few bad sectors at the begining of the / partition, would I have a bootable machine? (after running some repair tools on the new drive)

Sorry for the lengthy post but I'm obviously out of my element here so any advice is appreciated.

Thanks again for your help.

Last edited by rockdiablo; 09-09-2010 at 08:38 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
2 Logical Files corrupted in Fedora Core 6 fedoraman Linux - Newbie 3 01-03-2008 05:53 PM
How add another logical drive cnm Linux - Newbie 11 11-25-2006 03:55 AM
Is there a tool that can read a corrupted logical disk drive HGeneAnthony Linux - General 2 07-04-2006 01:19 AM
io Stress on logical drive kaganis Linux - General 1 08-09-2005 10:11 PM
Problems booting to logical drive newbietolinux Linux - Software 2 12-06-2001 08:40 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 11:13 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration