LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 01-25-2008, 02:14 AM   #1
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Rep: Reputation: 0
Exclamation A8V + PATA + SATA: data corruption and unstable!


Hi,

Hopefully someone can give me a quick info and preferably solution.

Situation is following:
-A8V mobo
-X2 4400+
-ATI RV280 gfx (9200 or something)
-1x 250GB PATA
-2x 500GB SATA

Installed PCLinux2007. Was unstable - thougt that the problem was distro itself.

Installed OpenSuSe 10.3. More stable but sudden crashed (complete freezing) without any warning nor log messages. Removed OpenGL from xorg.conf to check if that is the reason (it used to be with Fedora Core 4). Since then it did not crash but I haven't been using YaST much since then and crashes usually happened when using it. Crashes happened quite often (I've had never so unstable system before). Edit: crashes happened when there were only SATA drives connected and when PATA and SATA driver were connected so PATA connection did not make any difference to stability in that respect.

But the worst thing happened yesterday. I mounted PATA drive to copy data from old server installation to new (SATA). Couple of small files were copied without problems. Then started copying all emails (cp -a) and the hell broke loose. Quite a lof of errors to log (don't recall exactly what they were and now I'm not at home to look at them but if I remember correctly there were indications of timing issues - waiting etc.) and the worst it broke ext3 filesystem (despite I was only copying files!). Many of the folders on the drive became '????' -marks.

I tried to boot to the old distro to check that everything was still OK but it failed to boot. Ran fsck but it was not able to fix everything nor I was able to boot up to old distro. So something is really badly broken. I haven't had time yet to check all the damages i.e. is all data lost or can some of it be recovered (I'm going to put it to other PC with live distro and check how it works).

Difference between old setup and new are: kernel supports dual cores (in FC4 I had only the other core in use) and now there are two SATA drives in addition to single PATA drive.

What I'd like to know is that is this known problem with A8V. Dual core problem, SATA problem, IRQ problem (found some issues when googling in 2.6.14 kernel regarding irqbalace) or what ever HW problem. And if it is, is there a solution or should I get a new mobo (that's perfectly fine too as long as I get stable system!).

Please, promp help I greatly appreciated!

TIA!

Last edited by paju; 01-25-2008 at 02:38 AM.
 
Old 01-25-2008, 09:05 AM   #2
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
Some further info from log:

Jan 24 23:22:03 xxx kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 24 23:22:03 xxx kernel: ata3.00: BMDMA stat 0x25
Jan 24 23:22:03 xxx kernel: ata3.00: cmd 25/00:08:25:82:49/00:00:19:00:00/e0 tag 0 cdb 0x0 data 4096 in
Jan 24 23:22:03 xxx kernel: res 51/40:00:26:82:49/40:00:19:00:00/e0 Emask 0x9 (media error)
Jan 24 23:22:03 xxx kernel: ata3.00: configured for UDMA/100
Jan 24 23:22:03 xxx kernel: ata3: EH complete

Above got repeated 11 times. Finally ended to:

Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 24 23:22:46 xxx kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

At next boot SMART gave following logs:

Jan 25 00:42:43 xxx smartd[4219]: Device: /dev/sdc, 26 Currently unreadable (pending) sectors
Jan 25 00:42:43 xxx smartd[4219]: Sending warning via /usr/lib/smartmontools/smart-notify to root@localhost ...
Jan 25 00:42:43 xxx smartd[4219]: Warning via /usr/lib/smartmontools/smart-notify to root@localhost produced unexpected output (52 bytes) to STDOUT/STDERR: method return sender=:1.7 -> dest=:1.12 uint16 0
Jan 25 00:42:43 xxx smartd[4219]: Warning via /usr/lib/smartmontools/smart-notify to root@localhost: successful
Jan 25 00:42:43 xxx smartd[4219]: Device: /dev/sdc, 26 Offline uncorrectable sectors
Jan 25 00:42:43 xxx smartd[4219]: Sending warning via /usr/lib/smartmontools/smart-notify to root@localhost ...
Jan 25 00:42:43 xxx smartd[4219]: Warning via /usr/lib/smartmontools/smart-notify to root@localhost produced unexpected output (52 bytes) to STDOUT/STDERR: method return sender=:1.7 -> dest=:1.13 uint16 0
Jan 25 00:42:43 xxx smartd[4219]: Warning via /usr/lib/smartmontools/smart-notify to root@localhost: successful
Jan 25 00:42:43 xxx smartd[4353]: smartd has fork()ed into background mode. New PID=4353.
...
Jan 25 01:12:45 xxx smartd[4353]: Device: /dev/sdc, 26 Currently unreadable (pending) sectors
Jan 25 01:12:45 xxx smartd[4353]: Device: /dev/sdc, 26 Offline uncorrectable sectors
Jan 25 01:12:45 xxx smartd[4353]: Device: /dev/sdc, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 48 to 49
Jan 25 01:12:45 xxx smartd[4353]: Device: /dev/sdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 33
Jan 25 01:12:45 xxx smartd[4353]: Device: /dev/sdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 48 to 49
...
Jan 25 02:12:43 xxx smartd[4353]: Device: /dev/sda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 75 to 74
Jan 25 02:12:49 xxx smartd[4353]: Device: /dev/sdc, 26 Currently unreadable (pending) sectors
Jan 25 02:12:49 xxx smartd[4353]: Device: /dev/sdc, 26 Offline uncorrectable sectors
Jan 25 02:42:50 xxx smartd[4353]: Device: /dev/sdc, 26 Currently unreadable (pending) sectors
Jan 25 02:42:50 xxx smartd[4353]: Device: /dev/sdc, 26 Offline uncorrectable sectors
Jan 25 02:42:50 xxx smartd[4353]: Device: /dev/sdc, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 49 to 48
Jan 25 02:42:50 xxx smartd[4353]: Device: /dev/sdc, SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 32
Jan 25 02:42:50 xxx smartd[4353]: Device: /dev/sdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 49 to 48

Above indicates HD problems on disk itself. However, this I don't think happened. I've been using the disk without single failures (or even hint of problems) nor there have been any temp problems. Next step will be to check the content of the disk on other PC and see if something can be rescued..

But the question still remains - are there problems with the A8V mobo regarding PATA & SATA or SATA alone (don't know) and/or dual core.
 
Old 01-25-2008, 09:35 AM   #3
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 2,988

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
The newer kernels have rolled ide/ata support into libata. You can tell if your kernel is doing this if your standard ide/pata drives are designated with sdx instead of the older hdx designation.

I have two pata hard drives on a Promise Ultra 100 controller card along with two sata drives on the onboard sata controllers(intel 965 motherboard) and have experienced identical problems as you described on every kernel/distro that designates those ide drives as sdx. In short, it's a known kernel bug, at least with the chipset used by Promise and perhaps other ide controller chipsets. You can tell if you are having a similar problem by checking the transfer and read speeds on the ide hard drives using hdparm:

# hdparm -t /dev/sdx

On my system, I get incredibly slow speeds with my pata drives, about 3MB/sec. The sata drives function at normal speed. Odd thing is that after booting up the ide drives appear to function normally at about 30MB/sec. However, after a few reads or writes to those ide drives, the speed dramatically drops to 3Mb/sec and I start to get disk errors.

The only recent distro that works normally with these pata drives and the Promise controller is mandriva 2008. Mandriva appears to be the only recently released distro that has not enabled pata support on libata in their kernel. Instead, they have stuck with the legacy support for pata hard drives and both my pata drives are designated wtih the hdx notation in mandriva.

Last edited by kilgoretrout; 01-25-2008 at 09:40 AM.
 
Old 01-25-2008, 10:14 AM   #4
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
That sounds the cause. Result from the test:

/dev/sdc:
Timing buffered disk reads: read(2097152) returned 299008 bytes

Is there any other solution to the problem than Mandriva? I'm not very keen on installing new distro again (it's easy to install base distro but the configuration part takes the time as my Linux box acts as a server but also as a workstation).

Thanks for this anyway!
 
Old 01-25-2008, 10:40 AM   #5
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 2,988

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
You would have to research it, but IIRC what you need to do is compile a custom kernel with the legacy pata support enabled. I believe that's how mandriva does it with their kernel. Not sure of exactly what options to select in order to do that but it would probably be easier/quicker to just install mandriva 2008. If your prefer, Slackware 12 also has stuck with the legacy pata support; my ide hard drives are designated hdx there and work fine.
You could also try contacting suse and see if they have an alternative precompiled kernel that will do this for you.
Other than that, if you just want to copy some data off your pata drives onto your sata drives, you could stick the pata drive(s) in a usb enclosure and copy the data off that way.
 
Old 01-25-2008, 10:47 AM   #6
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
How come USB would solve this? Does it handle ATA drives completely different way then?

I was planning to use both SATA and PATA drives but seems that I have to let go this idea. Which leads to yet another partitioning session..

But, I'll check what SuSe has to offer regarding the problem.
 
Old 01-25-2008, 11:54 AM   #7
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 2,988

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
The problem is with the way libata handles various ide controllers. If you put the pata drive in a usb enclosure, that problem goes away as now everything is controlled by the usb_storage module on the linux end and the ide to usb chip/firmware in the enclosure. I've never had any problems with pata hard drives in usb enclosures with the new kernels.
 
Old 01-25-2008, 02:38 PM   #8
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
I'm running Mandrive live now and seems that I can rescue at least some of the files. Not all but many anyway. Maybe I'll survive after all :-)
 
Old 01-26-2008, 02:01 AM   #9
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 2,988

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
Good luck to you paju; I wish you well. May the gods smile upon your efforts.
 
Old 02-07-2008, 01:59 PM   #10
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
Just to inform you. I managed to save few hundred MBs. Superblock is lost and haven't had yet time to check if it can be recovered manually or not (tried couple of windows tools but both failed). Currently somewhat over 100GB data is lost. Not so nice..
 
Old 02-07-2008, 03:20 PM   #11
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 2,988

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
Have you tried repairing the ext3 filesystem using e2fsck with some alternative superblock backup locations:

# e2fsck -fp -b <alternate superblock location> <device file>

The location of the superblock backups is determined by the bock size which is usually set to 4096 bytes or 4KB. For this bock size the superblock backups are:

32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208

Actually, those are the block groups ending block locations. The superblock backups are located in the first block of each new block group so you have to add 1 to each of the above numbers that you want to use, eg:

# e2fsck -fp -b 32769 <device file>

or:

# e2fsck -fp -b 98305 <device file>

etc. Also, make sure the partition is not mounted when you run e2fsck on it. The "p" switch in e2fsck will cause e2fsck to automatically repair the filesystem without prompting. If you don't want that, just drop the "p" switch from the above.
 
Old 02-11-2008, 03:00 PM   #12
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
Thanks for the tip but that did not work (can't read anything/much from the disk). I'm now playing with dd but that also failed. ddrescue will be my next bet :-)
 
Old 02-18-2008, 03:24 PM   #13
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
Some help needed. I managed to copy all but 163kB (errors are distributes in 255 places in very small chunks) data from the HD according to ddrescue. However, e2fsck complains that it cannot find superblock. None of the above addresses worked. Very strange. So, any other tips how to get the image working?

The HD is pretty much "kaput" now. Very difficult to start PC with it connected. It is no longer seen as /dev/hda when connected on the fly during booting.

edit: seems that the image file contains parts that are not filled. When using od -tool there are several lines starting with * (asterix). How to fill these..

edit2: managed to "fill" those bad blocks but od still produces lines starting with *. Cannot check nor mount the image. On the other hand it may not be possible just like that. There is boot partition then swap and after that there is root (/) which fills the rest of the image. But how to check exact positions of partitions from the image?

Last edited by paju; 02-18-2008 at 04:00 PM.
 
Old 02-24-2008, 05:48 AM   #14
paju
LQ Newbie
 
Registered: Jan 2005
Posts: 17

Original Poster
Rep: Reputation: 0
Now I'm finally able to get out data from the corrupted HD. This is short help for those who will face the same problem in the future unless backups are in order.

To make sure that nothing happens to working HD(s) disconnect it/them and use live distros (this is probably not mandatory but to be on the safe side I disconnected other HDs). I used Mandriva 2008 which was easy to quite nice for the task. Easy to use, easy to get NFS mount for network drive and worked pretty well in general (only problem was gfx card - resolution was not possible to change).

1) Do not mount the broken HD as read-write! Each access to the file (even copy) causes write operation to the HD which further breaks the HD. If the HD is recognized by the distro in the boot it will be mounted as read-write. Remount it to read-only or simply unmount it. Drive don't have to be mounted. It just needs to be present at e.g. /dev/hda.

2) If it is possible to mount then take the partition table information from the HD are save it for later usage. If not then don't panic :-)

3) If the drive is in bad condition and does not allow booting of Linux then try connecting the HD during booting. Let the distro boot up many of the initialization parts and services and by the time it reaches HAL daemon plug the power supply to the HD. This way the HD is used quite little during the boot up.

4) dd might be sufficient for the task but in my case it turned out not to be. In case of HD which is in bad condition i.e. producing a lot of read errors (bad HW blocks etc.) the HD changes the operation (may depend on the HD manufacturer - in my case it was Seagate 250GB IDE drive) at some point of reading. This change results heads to go to park position and every command sent to the drive is acknowledged as an error. Further data cannot be read from the drive unless it has been powered down first (remounting does not help because this operation is done at very low level in the drive's software).

Very good tool for the task turned out to be ddrescue (v1.7 was used in this case): http://www.gnu.org/software/ddrescue/ddrescue.html. It has advanced functionalities to get most of the data from the HD. It contains log-functionality which allows resuming of the operation after HD changes its operation and computer has to be rebooted. It can also split the damaged areas into smaller blocks thus get all the valid data leaving only small fraction broken. There are many other features in this tool so I suggest to read the documentation before usage.

If there are multiple partitions in the disc then try to get only the specific partition to the image unless you want to write the image to other HD or mount it using Windows tools. Multiple partitions where there are data corruptions present can be impossible to mount under Linux. I simply recovered full image including /boot, swap and / partitions and this caused a lot of problems regarding the mounting of the image.

In my case the operation to get the image of the HD tool around a week. I made a mistake in a beginning by mounting the HD as read-write which caused a lot of problems. Also, all the time when the image was further updated with more valid data the HD got into worse and worse condition. Now it is in so poor condition that it is no longer recognized if it is connected after HAL daemon. It is not always recognized either when it is connected from the beginning of the boot (which takes a long time when broken HD is connected).

5) After ddrescue the image is fine for further usage. It does not matter if the image does not contain all the data. Most like it contains enough data to get something out. Usually there are more than a single partition on the drive thus direct mounting cannot be done. In case of single partition then the image can be mounted directly via loopback device (-o loop). First it might be a good idea to run e2fsck to the image.

If there are multiple partitions the partition needs to be mounted using an offset. For this the partition table info is needed. If you managed to get at the beginning then calculate the byte offset (sector size * partition's start sector) and mount the partition.

If the partition table was not obtained or was not possible to get then try using testdisk tool (can be found at least in Open SuSe 10.3 distro). With this tool it is possible to get the partition table from the image. It can also fetch superblock positions which is vital information to get the image mounted.

For further information about the mounting see excellent info from http://edseek.com/~jasonb/articles/linux_loopback.html.

6) If there are problems with the superblock then it may not be possible to mount the image under Linux as the mount tool does not support (or I'm blind) both offset setting and backup superblock usage (using direct position for backup superblock). e2fsck tool don't support offset setting which makes it no possible to check the image if there are multiple partitions. It is therefore advicable to rescue partition by partition from the broken HD (this I learned after full image was rescued). However, if the HD is in bad shape it might be necessary to simply get everything out and check the image out later.

To mount HD image which has broken superblocks I had to turn to Windows tools. For this I used Mount Image Pro. Trial version is fully functional for 30 days which probably is sufficient. This allows to mount dd-images which is also the format what ddrescue uses. Naturally mounting the image is not sufficient as windows have no idea what to do with the ext2/ext3 partition. It actually suggest to format the partition when you try to open the mounted disk drive.

For recovering files from the mounted image I used Stellar Phoenix Linux tool. This is commercial tool but does not cost that much - especially if there is important data on the broken HD. This is able to seek the mounted image and show not only the directory tree but correct filenames, dates etc. It can take a long time to get the file structure out from the image (in my case 187GB of data, 498 000+ files and 68 000+ directories) so I recommend to save the scan information so that it can be used later to speed up the opening of the image. The reason for reopening the image is the instability of the tool-set. I'm not sure where the problem is or what causes it but I had several blue screens during the recovery of files - even at the scanning of the directory tree (I had to close tree structure nodes on the fly to avoid blue screen which indicates some graphics card driver issues but this is a long shot). The crash might come from the broken image also but I'm not sure.


By the time of writing this I still have over 100GB data to recover but I'm confident that most (if not all) of that can be fully recovered. At least several hundred MBs of data got corrupted but luckily nothing important. But several hundred MBs is nothing compared to total of 187GB of data.

Lastly I'd like to thank kilgoretrout for the help! Even those tips did not provide final solution they helped me to get to correct tracks to reach final solution.

Hopefully this short guide can help someone in similar problem. Above information can also be used for other formats than ext2/ext3. ddrescue don't care about the format of the HD/partition thus any HD should be possible to rescue to image.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
PATA vs. SATA ddaas Linux - Hardware 7 04-24-2007 02:06 PM
Solaris on SATA on Asus A8V? halfpower Solaris / OpenSolaris 1 09-15-2006 09:58 PM
New motherboard with SATA and PATA won't boot linux from PATA centosian Linux - General 4 08-14-2006 10:24 AM
data corruption when cp'ing files across sata disks!! garba Linux - Hardware 3 04-26-2006 05:38 PM
PATA with SATA cherif Linux - Hardware 3 12-08-2004 01:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 10:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration