badblocks trouble

gsgleason · 03-01-2005, 10:05 AM

I posted this in the slackware section, but this forum seems more appropriate..here it goes.

I am running slack 10.0 on a machine using software raid 0 and 1 for various partitions, and it's been pretty good so far.

Here's my most recent issue I've had.

I installed a new 250GB maxtor ide HDD running as secondary master (on the same ide chain as my cd rom) on which I created one big primary partition with an ext3 filesystem (i had tried ext2 with the same result)

When I used the -c option with mkfs.ext3 (or ext2) to check for bad blocks, the thing takes forever!! It ran for 4 days before I cancelled the command with ^c

So now I removed and recreated the partition with fdisk /dev/hdc (I made one big primary partition.)

I then make an ext3 filesystem without checking for bad blocks. that worked

Now I used the e2fsck to check the filesystem with the command e2fsck -v -c -p /dev/hdc1 to check for bad blocks.

It's been sitting there for 2 days with no output ( i thought -v would give me something) and a solid HD activity led.

help!

Matir · 03-01-2005, 11:01 AM

Is "dmesg" showing anything? Also, with a drive this large, a bad block check could take a while, though days seems excessive.

jiml8 · 03-01-2005, 12:28 PM

It shouldn't take days under any circumstances, though with a drive that size you should expect to spend several hours on it.

What does your processor utilization look like? Is e2fsck running, or is it sleeping?

What brand drive. Have you tried the manufacturer's utilities for testing it?

Can you do ordinary disk I/O to it?

gsgleason · 03-01-2005, 12:46 PM

root 2022 1183 0 Feb26 tty1 00:00:00 e2fsck -v -c -p /dev/hdc1
root 2023 2022 0 Feb26 tty1 00:02:10 badblocks -b 4096 /dev/hdc1 6127

that's what I see for processes running.

I don't know how to look at processor utilization. The drive has had files on it. I had it mounded at /music with all my mp3 files on there. I tried to make a user with home of /music, and then it stopped working after that, so I decided to start over on it.

It is a maxtor 7Y250P0 drive. I'm not aware of any manufacturers tools for testing it.

Here's the output of dmesg:

root@slackbox:~# dmesg|more
16:01 (hdc), sector 364988544
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=364988609, high=21, lo
w=12667073, sector=364988546
end_request: I/O error, dev 16:01 (hdc), sector 364988546
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=364988611, high=21, lo
w=12667075, sector=364988548
end_request: I/O error, dev 16:01 (hdc), sector 364988548
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=364988613, high=21, lo
w=12667077, sector=364988550
end_request: I/O error, dev 16:01 (hdc), sector 364988550
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=364988615, high=21, lo
w=12667079, sector=364988552
end_request: I/O error, dev 16:01 (hdc), sector 364988552
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=364988618, high=21, lo
w=12667082, sector=364988554
end_request: I/O error, dev 16:01 (hdc), sector 364988554
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }

blah blah blah

it goes on...

gsgleason · 03-01-2005, 01:04 PM

it keeps adding similar lines for every other sector.

Matir · 03-01-2005, 03:28 PM

With messages in dmesg like that, it looks like some sort of hardware issue: either a bad drive, bad controller, or even a bad cable. Given that the drive is brand new, I'd lean towards cabling or IDE controller. Though if the CD-ROM works, it could be the drive. How long is the cable? 18" is the limit for ATA.

jiml8 · 03-01-2005, 04:36 PM

I agree. Definite hardware problem.

The first thing I would check is to make sure the jumpers are correct. You do have one drive jumpered as master and the other as slave, right? Ordinarily the CDROM would be the slave. Alternatively, you do have a cable select cable in place, right?

After assuring myself that the jumpers/cable were correct, the next thing I would do is take a good close look into the connector shell. Any mashed pins? This one is easy to do, and can be fixed with some patience.

Once I was past these silly ones, then I would be taking a look at the HD or controller. I would, pro forma, change the cable anyway.

Matir · 03-01-2005, 07:45 PM

You might also want to consider using the hard drive on another computer to check if it gets the same results. This can help rule out cable and controller issues. If it's so new, be glad that it's still under warranty. I had a Seagate 120 GB SATA hard drive die on me recently: it was a mere 6 months old.

gsgleason · 03-01-2005, 08:41 PM

I did notice that the bios took a while to recognize the drive. in fact, I originally had it as secondary slave, and the bios woudln't see it. I switched it to master and set the cdrom to slave and it saw it then.

I just set it to single and removed the ide cable from the cd rom and am running the badblock check again. we'll see how it goes. if it's not done by tomorrow, I'll put it on my pci ata controller card and try it there. I swapped the cable, too, to be sure.

gsgleason · 03-01-2005, 09:16 PM

thanks for the help. that dmesg thing helps a lot. I'll let you know how it goes.

jiml8 · 03-01-2005, 09:32 PM

Quote:

I had a Seagate 120 GB SATA hard drive die on me recently: it was a mere 6 months old.

I used to say that I had never had a seacrate product that didn't fail. But then I got a good buy on 2 50 Gig 7200 RPM 40 MB/Sec SCSI Barracuda drives and, even though they wre seacrate, I couldn't pass on the price. I had never owned a seacrate SCSI drive; my experience with their IDEs and their tape drives was sufficient to turn me off totally to them, but some people I trust had told me their SCSI drives were OK.

I bought them intending to use them basically for scratchpad and non-critical storage, and that is how I am using them.

These drives - on every startup - report to the controller that they are approaching end-of-life due to accumulated failures. This happened on first startup when the drives were brand new (and they were sold as new surplus). When I first saw that, my immediate response was: "Huh. Seacrate. Should'a known."

So I exercised the hell out of them for days before putting them in service. I pulled the bad blocks list. I ran every diagnostic there was on them. Those drives have no errors, a small bad blocks list, and no evidence of any problems at all. So, I conclude the report of imminent failure is a firmware bug. Of course, such a bug naturally reflects very badly on seacrate.

Those drives have both been running 24/7 for a year now and both work fine.

So I have to say that these are literally the very first Seacrate products I have ever had that didn't fail quickly, and they keep insisting they are going to fail - which, because they are seacrate, is pretty humorous.

Moral of the story? Don't buy seacrate, except SCSI. If you must have IDE or SATA, then Western Digital and Maxtor both seem to have good reps. IBM Ultrastar is a good drive; IBM Deskstar is a worse POS than seacrates - a definite "avoid". The best IDE drive I have ever had is a little Fujitsu that I put into service in 1998 and it is still running. Every other IDE I have ever had has failed after just a few years at most.

Matir · 03-01-2005, 09:40 PM

I'm a big fan of Maxtor's these days. In fact, virtually all of my boxes now run Maxtors (except the RMAed seagate).

gsgleason · 03-03-2005, 09:34 PM

i tried the same drive on my pci ata controller card and got the same result.

so, I just happen to have a couple more maxtor 250GB drives sitting around. I swapped it out and ran the same test. no problems. IT was clear to me that the drive was FUBAR, so I went on maxtor's website for get an RMA.

Part of the process involved using an application callled "powermax" - it's a win32 application which creates a bootable floppy with a disk testing utility.

The idea is to run the first test, which is the "connection" test, then a 90 second test, so on and so forth. The third test, which is was the read-only test, failed, and gave me an 8 digit failure code which I used on the maxtor website to get my rma.

Soo all is well. Thank you to everyone who helped! I learned some new commands which will help me out! I appreciate it!

-Greg