LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   device I/O Errors hard disk error help (https://www.linuxquestions.org/questions/linux-hardware-18/device-i-o-errors-hard-disk-error-help-4175457269/)

plisken 04-07-2013 03:26 PM

device I/O Errors hard disk error help
 
I'm getting a lot of errors in my syslog along the lines of:

I/O error dev 08:22 sector 12345678

Now occasionally, the system has become totally unresponsive, this has gotten progressively more common.
Now its old hardware, PII era and running slackware 9.1
I have 5 disks with various mounts but no raid and I'm wondering how to diagnose which drive is causeing the problem.

I've noticed these errors on 08:22 08:31 so I'm assuming that means its more than one drive which I guess could point to the scsi controller?

Any tips or pointers would be greatly appreciated.

I've also got some links to some images of these errors should they be of any help

https://www.dropbox.com/s/cry7kavlcg...2017.05.25.jpg
https://www.dropbox.com/s/ui5q1fa4qb...2017.06.12.jpg
https://www.dropbox.com/s/b7bxna2ee4...2017.26.39.jpg
https://www.dropbox.com/s/09e69f49o8...2017.34.22.jpg

/etc/fstab looks like
Code:

/dev/sda2        /                reiserfs    defaults        1  1
/dev/sda1        /boot            ext2        defaults        1  2
/dev/sdc2        /home            reiserfs    defaults        1  2
/dev/sdb1        /var            reiserfs    defaults        1  2
/dev/cdrom      /mnt/cdrom      iso9660    noauto,owner,ro  0  0
/dev/fd0        /mnt/floppy      auto        noauto,owner    0  0
devpts          /dev/pts        devpts      gid=5,mode=620  0  0
proc            /proc            proc        defaults        0  0
/dev/sdc1        swap            swap        defaults        0  0
/dev/sdd1        /BACKUP          reiserfs    defaults        1  0
/dev/sde1        /STORAGE        reiserfs    defaults        1  0

All help appreciated.

manu-tm 04-07-2013 04:55 PM

Have you tried fsck to find out which drives are problematic ? (About checking the scsi controller, I have no clue.)

TobiSGD 04-08-2013 05:17 AM

If you are able to log in as root you can use smartctl to display information about the different drives, including drive conditions.

plisken 04-08-2013 09:08 AM

Not sure if this means anything then.

Code:

smartctl version 5.1-18 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: IBM      DDRS39130SUN9.0G Version: S98E
Serial number: 256988
Device type: disk
Local Time is: Mon Apr  8 14:08:31 2013 Local time zone must be set--see zic m
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error counter log:
          Errors Corrected    Total      Total  Correction    Gigabytes    Total
              delay:      [rereads/    errors  algorithm      processed    uncorrected
            minor | major  rewrites]  corrected  invocations  [10^9 bytes]  errors
read:          0        1        0        1          1          4.295          0
write:        0        0        0        85        85          4.295          0
verify:        0        0        0        0          0          4.295          0

Non-medium error count:        0
Device does not support Self Test logging

Code:

smartctl version 5.1-18 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: IBM      DDRS39130SUN9.0G Version: S98E
Serial number: 3R2052
Device type: disk
Local Time is: Mon Apr  8 14:08:41 2013 Local time zone must be set--see zic m
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error counter log:
          Errors Corrected    Total      Total  Correction    Gigabytes    Total
              delay:      [rereads/    errors  algorithm      processed    uncorrected
            minor | major  rewrites]  corrected  invocations  [10^9 bytes]  errors
read:          0        1        0        1          1          4.295          0
write:        0        0        0        3          3          4.295          0

Non-medium error count:        0
Device does not support Self Test logging

Code:

smartctl version 5.1-18 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: IBM      DDRS39130SUN9.0G Version: S98E
Serial number: 159571
Device type: disk
Local Time is: Mon Apr  8 14:08:58 2013 Local time zone must be set--see zic m
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error counter log:
          Errors Corrected    Total      Total  Correction    Gigabytes    Total
              delay:      [rereads/    errors  algorithm      processed    uncorrected
            minor | major  rewrites]  corrected  invocations  [10^9 bytes]  errors
read:          0      467        0      467        467          4.295          0
write:        0        0        0    15211      15211          4.295          0

Non-medium error count:        0
Device does not support Self Test logging

I've run reiserfsck on \home (all ok) and will do \ and \var when at machine

TobiSGD 04-08-2013 09:33 AM

Try it with
Code:

smartctl -a
to see the complete error status of the disks, sorry should have mentioned that. But seeing the short output I would assume that it is the third disk that is the faulty one.

plisken 04-09-2013 05:27 AM

The above is with the -a switch actually :(

The third drive above is /home which I run reiserfsck on and it didnt show anything.

I was thinking of temporarily creating new mount points for /var and /home on a new drive and see if that helps, as they are both on their own drives at the moment. Only thing is it can take a few months of this problem to air itself.


All times are GMT -5. The time now is 08:14 AM.