LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Bad Sectors In HDD (https://www.linuxquestions.org/questions/linux-hardware-18/bad-sectors-in-hdd-787126/)

Israfel2000 02-04-2010 09:09 PM

Bad Sectors In HDD
 
I probably should not have posted this 'cause of so many people having the similar situations and probably getting the same results, but...

Ok. I did a lot of research into this subject and left me with more questions than answers. I just wanted to see if you guys can help make the right decision before I go buying another HDD (even though I already bought TWO HDDs).

Here is the thing...

I have a Compaq CQ5110F desktop with Fedora 11 Linux installed. The program "Palimpsest Disk Utility" detected bad sectors on the Seagate HDD. So I tested it with the "Diagnostics Tools" by pressing F9 when I reboot the computer. It gives me this error:

Test: Failed

Error Code: BIOHD-8

Then I scanned it with the SMART in the Bios. (Pressing F10)

Name of HDD: Seagate Barracuda 7200
Capacity: 320GB
Target Disk: ST3320418AS
SMART Short Self-Test:
Estimated Time: 2 minutes
Completed without errors: Passed Test in less than a minute

SMART Extended Self-Test:
Estimated Time: 74 minutes
Completed without errors: Passed Test in less than 5 seconds

I get NO errors.

After using it for a while my HDD crashes after it installed updates. Everything else boots up alright but I can't go into the log in screen. I re-installed it but it does the same thing after a while. So I bought another one but this time Hitachi 1TB.

This one didn't last me a week when Palimpsest gave me bad sectors too.

I scanned it with the "Diagnostics Tools" (pressing F9):

Test: Passed

Great! It passed the scan but it made loud sounds during the scan.

Now for SMART in Bios: (Pressing F10)

Name of HDD: Hitachi HDE721
Capacity: 1000GB
Target Disk: HDE721010SLA330
SMART Short Self-Test:
Estimated Time: 1 minute
Completed without errors: Passed in less then one minute.

SMART Extended Self-Test:
Estimated Time: 235 minutes
*** Last time I scanned this HDD it took over 5 hours and I had to
cancel the scan


Ok. Other than the loud sounds comeing from the Hitachi HDD, I want to know if the scans, from the Bios (SMART), the estimated time and the real-time scan has anything to do with the HDDs (both) not working properly? You know, one scan taking longer or shorter than the estimated time? Is it even scanning it sector by sector?

I know I can't depend on Palimpsest because somewhere in this forum it says that it has bugs and is too sensitive and picks up the smallest errors.

Is Fedora 11 so buggy that it eats up the HDD? Can I trust Fedora? Or should I just try another distro?

I can't just go out and buy another HDD. What if it's just a bug?

Thank you in advance.

jschiwal 02-04-2010 09:18 PM

Quote:

Great! It passed the scan but it made loud sounds during the scan.
It sounds to me that while it passed this time, it had to do a number of rereads to pass. It may be heading south fast.

Israfel2000 02-04-2010 11:01 PM

:/

*Factory fault. Badly handled during shipping. Store clerk dropped it and gave it to me knowing it was the last one in stock.

These are the things that come to my mind knowing these things (HDDs) are too sensitive. Any type of shock can damage them.

jefro 02-05-2010 05:48 PM

Hard drives have areas go bad all the time. Dos just marks it and moves on. It used to be we would run the disks for a long time with until about 20 bad. Then we would low level them and use them again. Norton used to make a deal that would move the format a bit to try to clear that area.

If you want you can try to get a low level format tool and see how it works. Then spend 4 or 5 days running diags on the drive with random read writes and see how it does.

Or just chuck the thing and get that new shinny SSD you always wanted.

business_kid 02-06-2010 03:19 AM

Hard drives _do_ fail, but I am not happy to see two together failing. Possible causes would be heat, poor writing performance, dodgy m/b. I had a box that would periodically loose logic level 0 on the ide. I would save something - it would pause and say
hda - not mounted
hdb - not mounted
or similar nonsense. A three fingered salute would fail because I no longer had a root filesystem, so /sbin/shutdown could not run. Another box here threw 'disk failed' but miraculously stopped when I left the side off. The kernel could be done in by all these writing errors and simply be doing the wrong things. Had that too.

Edit: do the power supplies look good?
+12V +/- 1 Volt
+5V, +3.3V: +/- 0.2 Volts

Israfel2000 02-06-2010 10:44 AM

Well, the power supply seems to be ok. I'll have to take it to my local technician to make sure it's ok. I also have a battery backup in case of a power surge. The last computer I had literaly completely burnout because I had no battery backup.
Some people tell me that it's common for HDDs to make sounds when others are quiet and not go bad. I don't know what to believe or who to believe anymore.

I'll just have to use them until they truly break. Where it makes it virtually impossible to write or read any data on the HDDs. Or when the thing burns to a crisp. Take the juice out of them and make my money worth. And keep backing up my data. Then buy a new one. :(

In any case I'll try a low level format tool. Thanks jefro.

jschiwal 02-06-2010 10:49 AM

You might try running badblocks to mark these blocks as bad. If you don't get any further failures, you may be OK.
You will probably want to monitor it for a time to make sure that the drive isn't degrading.

Use the badblocks option of e2fsck if you already have a filesystem on the disk.

Use the badblocks option of mke2fs for creating a new filesystem. Using -c twice will use a slower write/read test.

Maybe having a handful of bad blocks detected on a new drive is more the norm for modern 1 or 2 terabyte drives.

onebuck 02-07-2010 01:02 PM

Hi,

I suggest that you try the HDD manufactures diagnostic not Compaqs.

'SystemRescueCd' & 'UBCD' are good LiveCD diagnostic tools that have hdd tools & utilities available.

:hattip:
The above links and others can be found at 'Slackware-Links'. More than just SlackwareŽ links!

GoinEasy9 02-07-2010 01:20 PM

I've disabled disk notifications in startup, Palimpsest is too buggy to be taken seriously. It has nothing to do with Fedora. System-->Preferences-->Startup Applications uncheck Disk Notifications. I've been running Fedora for nine months since I've seen the first of the Palimpsest notifications, and there is nothing wrong with my HDD's.
BTW - This bug/feature is talked about many times in the fedoraforums also.

Israfel2000 02-07-2010 05:14 PM

Quote:

Originally Posted by jschiwal (Post 3854849)
You might try running badblocks to mark these blocks as bad. If you don't get any further failures, you may be OK.
You will probably want to monitor it for a time to make sure that the drive isn't degrading.

Use the badblocks option of e2fsck if you already have a filesystem on the disk.

Use the badblocks option of mke2fs for creating a new filesystem. Using -c twice will use a slower write/read test.

Maybe having a handful of bad blocks detected on a new drive is more the norm for modern 1 or 2 terabyte drives.


Ok. I did the e2fsck on my 320GB Seagate HDD. These are results:

[root@localhost xxxxxxxxx]# e2fsck -n /dev/sda1
e2fsck 1.41.4 (27-Jan-2009)
Warning! /dev/sda1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sda1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (2340610, counted=2340605).
Fix? no

Free inodes count wrong (667329, counted=667326).
Fix? no


/dev/sda1: ********** WARNING: Filesystem still has errors **********

/dev/sda1: 121087/788416 files (0.2% non-contiguous), 808122/3148732 blocks
[root@localhost xxxxxxxxx]#



Now for the 1TG Hitachi HDD:

[root@master xxxxxxxxx]# e2fsck -n /dev/sda1
e2fsck 1.41.4 (27-Jan-2009)
Warning! /dev/sda1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sda1: clean, 42/51200 files, 28760/204800 blocks
[root@master xxxxxxxxx]#


For now, only the 320GB Seagate HDD has errors. The Hitachi 1TG HDD looks like it passed Unless I am doing this wrong (because I just started using e2fsck yesterday).

I'm not sure if by doing "e2fsck -cc /dev/sda1" will erase my whole drive?
I will try it out though.

I'll keep in touch if anything. Thanks.


edit: "I meant mke2fs -cc /dev/sda1" :P

jschiwal 02-07-2010 05:28 PM

Don't run mke2fs if there is data on the drive. There is a -c option to e2fsck as well. It does the same thing. It performs a non-destructive write and read, and adds bad blocks to a bad blocks inode to keep the from being used in the future.

From my post above:
Quote:

Use the badblocks option of e2fsck if you already have a filesystem on the disk.

onebuck 02-07-2010 05:34 PM

Hi,

You should never perform maintenance on a mounted filesystem. Boot your system with the install CD or a LiveCD or single mode. That way the filesystem will not be mounted so a safe fsck can be performed.

:hattip:

Israfel2000 02-08-2010 11:59 AM

Ok. I did the e2fsck and here are results;

Note: I am only scanning the 320GB HDD since this is the one with the errors messages.

[root@localhost xxxxxxxxx]# e2fsck -n -cc /dev/sda1
e2fsck 1.41.4 (27-Jan-2009)
/dev/sda1 is mounted.

WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

/dev/sda1: recovering journal
/dev/sda1 is mounted; it's not safe to run badblocks!
/dev/sda1: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Inode 146325, i_size is 66162901, should be 66174976. Fix? no

Inode 146325, i_blocks is 129408, should be 127944. Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +(612520--612642)
Fix? no

Free blocks count wrong for group #18 (16394, counted=16376).
Fix? no

Free blocks count wrong (2325085, counted=2325067).
Fix? no


/dev/sda1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda1: ***** REBOOT LINUX *****

/dev/sda1: ********** WARNING: Filesystem still has errors **********

/dev/sda1: 121107/788416 files (0.3% non-contiguous), 823647/3148732 blocks
[root@localhost xxxxxxxxx]#


From the looks of it I don't think it fixed anything. So I rebooted the system and logged in and scanned it again and get this:


[root@localhost xxxxxxxxx]# e2fsck -n /dev/sda1
e2fsck 1.41.4 (27-Jan-2009)
Warning! /dev/sda1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sda1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (2340883, counted=2340882).
Fix? no

Free inodes count wrong (667311, counted=667310).
Fix? no


/dev/sda1: ********** WARNING: Filesystem still has errors **********

/dev/sda1: 121105/788416 files (0.3% non-contiguous), 807849/3148732 blocks
[root@localhost xxxxxxxxx]#



I guess if I add more options like; "e2fsck -n -c /dev/sda1" and/or "e2fsck -p /dev/sda1".
It just might fix/repair it but definitly lose data. I don't know. You tell me jschiwal. :)

onebuck 02-08-2010 12:13 PM

Hi,

You're asking for trouble!

DO NOT PERFORM A FSCK ON A MOUNTED FILESYSTEM!

Especially a system filesystem. You should use 'single' mode or boot with a LiveCD or install cd/dvd media then perform the maintenance on the desired filesystem.

jefro 02-08-2010 03:23 PM

I like the idea of running hd diags.

Might as well run memtest and maybe cputest.

When I see those errors it is usually because of crashed system then hard reboots but could be almost any issue.


All times are GMT -5. The time now is 05:48 AM.