LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Disk check takes too long to check (https://www.linuxquestions.org/questions/linux-hardware-18/disk-check-takes-too-long-to-check-510584/)

hemantm 12-14-2006 11:42 PM

Disk check takes too long to check
 
I have a Suse 9.x box, with a 80GB IDE HDD and the following partion info for
Linux (all are ext3)

/dev/hda5 4.0G 2.1G 1.7G 55% /
/dev/hda3 1012M 45M 916M 5% /boot
/dev/hda12 9.9G 5.6G 3.9G 60% /dumps
/dev/hda10 9.9G 3.2G 6.3G 34% /home
/dev/hda11 9.9G 6.9G 2.5G 74% /home1
/dev/hda15 5.4G 33M 5.1G 1% /local
/dev/hda8 2.0G 365M 1.6G 19% /tmp
/dev/hda6 8.9G 4.4G 4.1G 53% /usr
/dev/hda7 3.0G 148M 2.7G 6% /usr/local
/dev/hda9 2.0G 841M 1.1G 44% /var

Overall there are 15 partitions including partition type 0x0b and
0x4f with other OSes installed on them (not listed in the above table).

During boot into Suse (kernel 2.6.8.x) every 60 days all ext3 partitions undergo a consistency check, most likely using fsck.
However, of late the time taken for this consistency check is going on
increasing. Initially with nearly the same amount of data in all the
partitions it used to take 5-10 minutes, however this time it took more than
*10 hours* .

Please suggest any pointers towards the probable reason and any solutions for the same.

Thanks in advance,
Hemant

randyding 12-14-2006 11:51 PM

Its really hard to say exactly, and I don't know what your specific problem is, however I've always seen hard drives fail completely shortly after this type of behavior starts.

I would back up all your data ASAP before doing any more troubleshooting.

jiml8 12-15-2006 06:01 PM

Yes. If fsck (or any routine disk check) is taking that long, it should be considered a very bad sign. You might want to check your error log (/var/log/errors or /var/log/messages) to see if there is disk I/O trouble.

hemantm 12-18-2006 10:41 PM

I looked up for the various logs in /var/log but could not find anything useful. Of course I have taken up the complete backup.

Is there a way how I can enable logging with fsck. I could not find any option for logging in the fsck man page. I have observed the file system operations like open, create, close, read/write, mv, cp etc. seem to be working fine during normal working.
Are there any other tools that can be used to verify if there are any I/O issues with the IDE disk.

Thanks,
Hemant

randyding 12-18-2006 11:24 PM

You can do read speed tests, though this doesn't prove the drive is still ok. P.S. be very careful with hdparm.
Code:

# hdparm -t /dev/hdb

/dev/hdb:
 Timing buffered disk reads:  164 MB in  3.00 seconds =  54.58 MB/sec

You may want to install a second hard drive, format it with the same partitions, then use rsync to copy the failing drive. Then just install the boot loader on the new drive and take out the old one.

Matir 12-19-2006 12:08 AM

I would pull out 'smartmontools' to check the SMART attributes of the drive: those are intended to give a warning if failure is imminent.

jiml8 12-19-2006 07:43 PM

Quote:

Originally Posted by hemantm
I looked up for the various logs in /var/log but could not find anything useful. Of course I have taken up the complete backup.

Is there a way how I can enable logging with fsck. I could not find any option for logging in the fsck man page. I have observed the file system operations like open, create, close, read/write, mv, cp etc. seem to be working fine during normal working.
Are there any other tools that can be used to verify if there are any I/O issues with the IDE disk.

Thanks,
Hemant

I didn't think about this until I re-read my last post, but the problem with logging on a HD that is failing disk check is that the logging might not work, if it is writing to the disk that has the problem.

In the past, when I have experienced problems, sometimes I have discovered lots of error messages being written to the default console window, which you can reach by a CTRL-ALT-F1. Return to your X environment by a CTRL-ALT-F7.

Matir 12-19-2006 07:53 PM

You can also look at the kernel logs by using 'dmesg'. The messages will generally look something like this:
Code:

hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }


hemantm 12-20-2006 09:38 AM

Quote:

Originally Posted by Matir
I would pull out 'smartmontools' to check the SMART attributes of the drive: those are intended to give a warning if failure is imminent.

Here is the output of smartctl on the harddisk. I issued smartctl -A followed by hdparam -t again followed by smartctl -A on /dev/hda.
I can observe
Seek_Error_Rate changed from 52267894 to 52268146 and Hardware_ECC_Recovered changed from 8642647 to 8813922 before and after issuing the hdparam -t command.

Is this an indication of a future hard disk failure?

---------------
# smartctl -A /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 061 060 006 Pre-fail Always - 8642647
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 45
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 52267894
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1768
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1107
194 Temperature_Celsius 0x0022 031 050 000 Old_age Always - 31
195 Hardware_ECC_Recovered 0x001a 061 060 000 Old_age Always - 8642647
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

# hdparm -t /dev/hda

/dev/hda:
Timing buffered disk reads: 82 MB in 3.03 seconds = 27.07 MB/sec


# smartctl -A /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 061 060 006 Pre-fail Always - 8813922
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 45
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 52268146
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1768
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1107
194 Temperature_Celsius 0x0022 031 050 000 Old_age Always - 31
195 Hardware_ECC_Recovered 0x001a 061 060 000 Old_age Always - 8813922
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Matir 12-20-2006 10:09 AM

Try using the -H option to smartctl to get an overall health assessment. Did you find any errors in dmesg?

hemantm 12-20-2006 10:29 AM

The output of smartctl -H is
--------------
# smartctl -H /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
--------------

The dmesg output does not show any error relaed to harddisk.

Matir 12-20-2006 10:45 AM

What kind of data is stored in this filesystem? If it's a lot of small files, then the disk check might just take longer... a more complex filesystem results in more work for fsck.

Electro 12-20-2006 12:15 PM

The fastest filesystem to check and repair is XFS and JFS even though they a full. The slowest is EXT2 or EXT3. ReiserFS is the second slowest and it has pathetic repair operations. I do not recommend using ReiserFS for anything. Since ReiserFS is not reliable, I suggest do not use Reiser4.

S.M.A.R.T only gives you an estimate of the health for hard drive. I suggest using hard drive manufacture's utility to do a better test.

I suggest backing up soon just in case any future problems that makes accessing the drive harder than it is now.

hemantm 12-21-2006 12:28 AM

Yes, I do have lots of small files on the ext3 partitions. Still I am not convinced that fsck should take 10 hours to complete the disk check. Anyway I will be downloading utilities from the Seagate website to check the disk through them and post the results thereafter.

hemantm 12-21-2006 09:24 PM

I checked the hard disk using the tool available with Seagtae ( http://www.seagate.com/support/seatools/ ). This tool could not find any issues with my system, though it could only scan through VFAT partitions. However, if there was an I/O issue, with the HDD then it should be independent of the File System and should have been visible with FAT32 partitions as well.


All times are GMT -5. The time now is 03:37 PM.