LinuxQuestions.org - RAID degraded, partition missing from md0

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - RAID degraded, partition missing from md0 (https://www.linuxquestions.org/questions/linux-hardware-18/raid-degraded-partition-missing-from-md0-4175483697/)

RAID degraded, partition missing from md0

Hey guys,
We're having a very weird issue at work. Our Ubuntu server has 6 drives, set up with RAID1 as follows:

/dev/md0, consisting of:
/dev/sda1
/dev/sdb1

/dev/md1, consisting of:
/dev/sda2
/dev/sdb2

/dev/md2, consisting of:
/dev/sda3
/dev/sdb3

/dev/md3, consisting of:
/dev/sdc1
/dev/sdd1

/dev/md4, consisting of:
/dev/sde1
/dev/sdf1

As you can see, md0, md1 and md2 all use the same 2 drives (split into 3 partitions). I also have to note that this is done via ubuntu software raid, not hardware raid.

Today, the /md0 RAID1 array shows as degraded - it is missing the /dev/sdb1 drive. But since /dev/sdb1 is only a partition (and /dev/sdb2 and /dev/sdb3 are working fine), it's obviously not the drive that's gone AWOL, it seems the partition itself is missing.

How is that even possible? And what could we do to fix it?

My output of cat /proc/mdstat:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]



md1 : active raid1 sda2[0] sdb2[1]

      24006528 blocks super 1.2 [2/2] [UU]





md2 : active raid1 sda3[0] sdb3[1]

      1441268544 blocks super 1.2 [2/2] [UU]





md0 : active raid1 sda1[0]

      1464710976 blocks super 1.2 [2/1] [U_]





md3 : active raid1 sdd1[1] sdc1[0]

      2930133824 blocks super 1.2 [2/2] [UU]





md4 : active raid1 sdf2[1] sde2[0]

      2929939264 blocks super 1.2 [2/2] [UU]





unused devices: <none>

Any help would be greatly appreciated!

Hi,

it's not so unusual to have problems with just one partition on a disk.

You can try to rebuild with the existing sdb, or you can replace the sdb and then rebuild. See for example http://www.howtoforge.com/replacing_..._a_raid1_array for the latter option.

However, before doing anything make sure you are familiar with: https://raid.wiki.kernel.org/index.php/Linux_Raid

Evo2.

Quote:

Originally Posted by evo2 (Post 5059874)

Thanks Evo2. Can you please explain how I'd go about trying the first option (rebuild with the existing sdb)? Safely, that is :P

Hi,

Quote:

Originally Posted by reano (Post 5059877)

Thanks Evo2. Can you please explain how I'd go about trying the first option (rebuild with the existing sdb)? Safely, that is :P

didn't remember off the top of my head but from a quick scan of https://raid.wiki.kernel.org/index.php/Reconstruction and the mdadm man page it looks like the first thing to try should be:

Code:

mdadm --assemble --scan

However, please check for yourself.

Evo2.

Quote:

Originally Posted by evo2 (Post 5059883)

Hi,

didn't remember off the top of my head but from a quick scan of https://raid.wiki.kernel.org/index.php/Reconstruction and the mdadm man page it looks like the first thing to try should be:

Code:

mdadm --assemble --scan

However, please check for yourself.

Evo2.

Thanks - I've been doing a bit of reading on mdadm --assemble as well. Will this not damage or endanger any of the other raid devices or the raid setup itself? I can't have any of the other partitions or md-devices go down, as our mail services etc run on this same server.

Actually, let me clarify - if I do a:

Code:

mdadm --assemble --scan

Then it will essentially be doing the same as:

Code:

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1

My main concern here is, while it's doing that, what's happening with md0? Because md0 is online right now (albeit without it's sdb1 mirror, only with sda1) and the root filesystem is mounted on md0. So if I do an assemble, will it interrupt the filesystem in any way, or can I safely do it while the server is running with users connected to it? (which is 24/7 unfortunately).

I think its better to stop md device. What is output of mdadm --detail /dev/md0

Thanks

I can't stop the device :(
Also, the / root filesystem is mounted on md0.

The output you requested is:

Code:

/dev/md0:

        Version : 1.2

  Creation Time : Sat Dec 29 17:09:45 2012

    Raid Level : raid1

    Array Size : 1464710976 (1396.86 GiB 1499.86 GB)

  Used Dev Size : 1464710976 (1396.86 GiB 1499.86 GB)

  Raid Devices : 2

  Total Devices : 1

    Persistence : Superblock is persistent



    Update Time : Thu Nov  7 15:55:07 2013

          State : clean, degraded

 Active Devices : 1

Working Devices : 1

 Failed Devices : 0

  Spare Devices : 0



          Name : lia:0  (local to host lia)

          UUID : eb302d19:ff70c7bf:401d63af:ed042d59

        Events : 26216



    Number  Major  Minor  RaidDevice State

      0      8        1        0      active sync  /dev/sda1

      1      0        0        1      removed

What's interesting is that it shows sdb1 as removed, not failed or spare.

I think if its showing removed that following command should recover

mdadm /dev/md0 -a /dev/sdb1

Thanks

Quote:

Originally Posted by vishesh (Post 5060032)

I think if its showing removed that following command should recover

mdadm /dev/md0 -a /dev/sdb1

Thanks

Is that not the same as mdadm /dev/md0 --add /dev/sdb1 ? If so, that doesn't work (see above for the error message I got when I tried that).

I am unable to see any error message above . Ideally for replacing device , I follow

mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -a /dev/sdb1

Thanks

Quote:

Originally Posted by vishesh (Post 5060054)

I am unable to see any error message above . Ideally for replacing device , I follow

mdadm /dev/md0 -f /dev/sdb1
mdadm /dev/md0 -r /dev/sdb1
mdadm /dev/md0 -a /dev/sdb1

Thanks

Ah sorry, seems I didn't post the result in the original post. When I do the -a (or --add) I get the following:

Code:

mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument

I haven't tried to do it in that order (first f, then r, then a). I can't damage anything further than it already is, can I? Keep in mind that sda1 and sdb1 (in other words, md0) contains the root filesystem. At the moment md0 seems to run only on sda1 (and not on sdb1). At least the server is still running.

Got the following results:

Code:

root@lia:~# mdadm /dev/md0 -f /dev/sdb1

mdadm: set device faulty failed for /dev/sdb1:  No such device



root@lia:~# mdadm /dev/md0 -r /dev/sdb1

mdadm: hot remove failed for /dev/sdb1: No such device or address



root@lia:~# mdadm /dev/md0 -a /dev/sdb1

mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument

Hate to bump a thread, but I still need help with this. Any advice, anyone? :)

Hi,

mdadm doesn't seem to see /dev/sdb1 at all. I suggest you investigate its status with other tools. Eg fdisk

Evo2.

Quote:

Originally Posted by evo2 (Post 5063532)

Hi,

mdadm doesn't seem to see /dev/sdb1 at all. I suggest you investigate its status with other tools. Eg fdisk

Evo2.

Ok sure. What exactly do you want me to check? sdb1 looks normal when I check the partition tables, compared to the other partitions/drives.

Do below command showing any output?

Quote:

ls -l /dev|grep sdb1

Thanks

Quote:

Originally Posted by vishesh (Post 5063634)

Do below command showing any output?

Thanks

Yes, it shows:

Code:

brw-rw---- 1 root disk 8, 17 Nov 8 08:33 sdb1

Check the /dev directory and see if the /dev/sdb1 device actually exists. If it doesn't, you'll need to recreate it with fdisk, parted or whatever tool you prefer to use to manage partitions.

If the device is missing but the partition seems to be there, try running partprobe then check the /dev directory again.

Quote:

Originally Posted by Ser Olmy (Post 5063870)

sdb1 is in the /dev directory :)

The next step is to figure out why mdadm returns an error message when you try to reference /dev/sdb1. See what

Code:

mdadm --examine /dev/sdb1

has to say about that partition.

According to /proc/mdstat (in your first post), /deb/md0 only has one member, /dev/sda1. As long as the /dev/sdb1 partition is valid and identical in size to /dev/sda1 (which fdisk -l /dev/sdb or parted /dev/sdb print should be able to confirm or deny), you should be able to re-add /dev/sdb1 with the following command:

Code:

mdadm --manage /dev/md0 --add /dev/sdb1

You may also want to check the health of /dev/sdb with:

Code:

smartctl -a /dev/sdb

In particular, examine the Reallocated_Sector_Count and Current_Pending_Sector attributes. There has to be a reason why the partition was dropped from the RAID device.

mdadm --examine /dev/sdb1 gives the following:

Code:

mdadm: No md superblock detected on /dev/sdb1.

parted /dev/sda print:

Code:

Model: ATA ST3000VX000-9YW1 (scsi)

Disk /dev/sda: 3001GB

Sector size (logical/physical): 512B/4096B

Partition Table: gpt



Number  Start  End    Size    File system  Name  Flags

 1      1049kB  1500GB  1500GB  ext4              raid

 2      1500GB  1525GB  24,6GB                    raid

 3      1525GB  3001GB  1476GB                    raid

parted /dev/sdb print:

Code:

Model: ATA ST3000VX000-9YW1 (scsi)

Disk /dev/sdb: 3001GB

Sector size (logical/physical): 512B/4096B

Partition Table: gpt



Number  Start  End    Size    File system  Name  Flags

 1      1049kB  1500GB  1500GB                    raid

 2      1500GB  1525GB  24,6GB                    raid

 3      1525GB  3001GB  1476GB                    raid

mdadm --manage /dev/md0 --add /dev/sdb1:

Code:

mdadm: add new device failed for /dev/sdb1 as 2: Invalid argument

smartctl -a /dev/sdb:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



=== START OF INFORMATION SECTION ===

Device Model:    ST3000VX000-9YW166

Serial Number:    W1F0VJ95

LU WWN Device Id: 5 000c50 052d36854

Firmware Version: CV13

User Capacity:    3Â*000Â*592Â*982Â*016 bytes [3,00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Thu Nov 14 11:17:32 2013 SAST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (  584) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  2) minutes.

SCT capabilities:              (0x10b9) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  111  099  006    Pre-fail  Always      -      34112212

  3 Spin_Up_Time            0x0003  095  095  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      97

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  082  060  030    Pre-fail  Always      -      189255078

  9 Power_On_Hours          0x0032  090  090  000    Old_age  Always      -      8951

 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      97

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  032  032  000    Old_age  Always      -      68

188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      12885098499

189 High_Fly_Writes        0x003a  001  001  000    Old_age  Always      -      264

190 Airflow_Temperature_Cel 0x0022  063  059  045    Old_age  Always      -      37 (Min/Max 34/38)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      89

193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      1191

194 Temperature_Celsius    0x0022  037  041  000    Old_age  Always      -      37 (0 16 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      15

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      15

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0



SMART Error Log Version: 1

ATA Error Count: 3338 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.



Error 3338 occurred at disk power-on lifetime: 8951 hours (372 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 09 08 00 00  Error: UNC at LBA = 0x00000809 = 2057



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 08 08 00 e0 00  7d+02:46:03.171  READ DMA

  27 00 00 00 00 00 e0 00  7d+02:46:03.159  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  7d+02:46:03.151  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  7d+02:46:03.103  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  7d+02:46:03.087  READ NATIVE MAX ADDRESS EXT



Error 3337 occurred at disk power-on lifetime: 8951 hours (372 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 09 08 00 00  Error: UNC at LBA = 0x00000809 = 2057



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 08 08 00 e0 00  7d+02:46:03.171  READ DMA

  27 00 00 00 00 00 e0 00  7d+02:46:03.159  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  7d+02:46:03.151  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  7d+02:46:03.103  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  7d+02:46:03.087  READ NATIVE MAX ADDRESS EXT



Error 3336 occurred at disk power-on lifetime: 8951 hours (372 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 09 08 00 00  Error: UNC at LBA = 0x00000809 = 2057



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 08 08 00 e0 00  7d+02:46:02.819  READ DMA

  27 00 00 00 00 00 e0 00  7d+02:46:02.807  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  7d+02:46:02.799  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  7d+02:46:02.727  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  7d+02:46:02.707  READ NATIVE MAX ADDRESS EXT



Error 3335 occurred at disk power-on lifetime: 8951 hours (372 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 09 08 00 00  Error: UNC at LBA = 0x00000809 = 2057



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 08 08 00 e0 00  7d+02:46:02.819  READ DMA

  27 00 00 00 00 00 e0 00  7d+02:46:02.807  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  7d+02:46:02.799  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  7d+02:46:02.727  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  7d+02:46:02.707  READ NATIVE MAX ADDRESS EXT



Error 3334 occurred at disk power-on lifetime: 8951 hours (372 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 09 08 00 00  Error: UNC at LBA = 0x00000809 = 2057



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 08 08 00 e0 00  7d+02:46:02.436  READ DMA

  27 00 00 00 00 00 e0 00  7d+02:46:02.435  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  7d+02:46:02.427  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  7d+02:46:02.371  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  7d+02:46:02.363  READ NATIVE MAX ADDRESS EXT



SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%      8933        -



SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote:

Originally Posted by Ser Olmy (Post 5064133)

The next step is to figure out why mdadm returns an error message when you try to reference /dev/sdb1. See what

Code:

mdadm --examine /dev/sdb1

Code:

mdadm --manage /dev/md0 --add /dev/sdb1

You may also want to check the health of /dev/sdb with:

Code:

smartctl -a /dev/sdb

In particular, examine the Reallocated_Sector_Count and Current_Pending_Sector attributes. There has to be a reason why the partition was dropped from the RAID device.

The /dev/sdb device has 15 "pending" sectors, meaning it's waiting for a write command to reallocate whose sectors. While 15 is not an alarmingly large number, the fact that they're all "pending" rather than "reallocated", suggests the defects may have appeared at approximately the same time, which could be an indication of drive failure. You should run badblocks -ns on /dev/sdb1 before proceeding, and check the S.M.A.R.T. status for /dev/sdb again when it's done.

The "invalid argument" error is usually caused by a non-removed device. The "--add" command is only valid if the array is online and can be expanded, or if a device has been removed. However, the output from mdadm --detail /dev/md0 in post #8 does indeed show the second device as "removed". Strange.

Could you port the output from:

Code:

ls /sys/block/md0/md/

Also, do any messages appear in the logs when you try to add back /dev/sdb1 to the array?

I can't run the badblocks at the moment, as it uses all the server resources and totally kills the network users logged onto it :/

Which log file specifically do you want me to check when i try add the device back to md0?

Output of ls /sys/block/md0/md/ is:

Code:

array_size      layout            reshape_position    sync_max

array_state      level            resync_start        sync_min

bitmap          max_read_errors  safe_mode_delay      sync_speed

bitmap_set_bits  metadata_version  suspend_hi          sync_speed_max

chunk_size      mismatch_cnt      suspend_lo          sync_speed_min

component_size  new_dev          sync_action

degraded        raid_disks        sync_completed

dev-sda1        rd0              sync_force_parallel

Quote:

Originally Posted by Ser Olmy (Post 5064162)

Code:

ls /sys/block/md0/md/

Also, do any messages appear in the logs when you try to add back /dev/sdb1 to the array?

Do a tail -f /var/log/messages in one terminal window while you attempt to add /dev/sdb1 to md0 in another.

The files in /sys/block/md0/md confirms that there's no reference from md0 to anything other than /dev/sda1. It should be possible to add another device/partition.

I don't have a /var/log/messages, but I did do a tail on the syslog, and it showed the following while trying to add the partition back to md0:

Code:

Nov 15 08:38:25 lia kernel: [674827.954967] ata1: EH complete

Nov 15 08:38:25 lia kernel: [674828.187410] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Nov 15 08:38:25 lia kernel: [674828.187416] ata1.01: failed command: READ DMA

Nov 15 08:38:25 lia kernel: [674828.187422] ata1.01: cmd c8/00:08:08:08:00/00:00:00:00:00/f0 tag 0 dma 4096 in

Nov 15 08:38:25 lia kernel: [674828.187424]          res 51/40:00:09:08:00/00:00:00:00:00/10 Emask 0x9 (media error)

Nov 15 08:38:25 lia kernel: [674828.187427] ata1.01: status: { DRDY ERR }

Nov 15 08:38:25 lia kernel: [674828.187430] ata1.01: error: { UNC }

Nov 15 08:38:25 lia kernel: [674828.242074] ata1.00: configured for UDMA/133

It seems the md driver ran into one of the bad sectors on the drive. If you can't run badblocks, try using dd to overwrite the partition with zeros:

Code:

dd if=/dev/zero of=/dev/sdb1 bs=8192 oflag=direct

That should trigger a reallocation of any bad sectors.

The "oflag=direct" parameter bypasses the cache, and has the effect of slowing the process down significantly. With any luck, the other users won't notice anything. The real reason it's there, however, is to prevent cache management from doing read-ahead, as that would cause it to attempt to read the bad sectors, which in turn would cause dd to abort.

Quote:

Originally Posted by Ser Olmy (Post 5064910)

It seems the md driver ran into one of the bad sectors on the drive. If you can't run badblocks, try using dd to overwrite the partition with zeros:

Code:

dd if=/dev/zero of=/dev/sdb1 bs=8192 oflag=direct

Thank you! I'll do that now. Once it's done, is there anything specific I need to do BEFORE trying to --add the sdb1 partition to md0 again?

Quote:

Originally Posted by reano (Post 5064924)

Thank you! I'll do that now. Once it's done, is there anything specific I need to do BEFORE trying to --add the sdb1 partition to md0 again?

I'd check the S.M.A.R.T. status again. The Current_Pending_Sector counter should show a number lower than 15 (0, ideally).

Other than that, there's nothing in particular you need to consider before attempting to add the partition to the RAID array again.

Quote:

Originally Posted by Ser Olmy (Post 5064927)

Thank you so much. After the process completed, there were 0 pending sectors. I then successfully re-added sdb1 to md0, and it is now busy with recovery!
I just hope the recovery process completes without any issues. I'll let you know!

One thing that strikes me as a bit weird though: in all the arrays, the disks are ID's 0 and 1. But on md0, sda1 is id 0, and the re-added sdb1 is id 2, not id 1. Does that make a difference?

Output of cat /proc/mdstat:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid1 sdb1[2] sda1[0]

      1464710976 blocks super 1.2 [2/1] [U_]

      [>....................]  recovery =  4.3% (63596480/1464710976) finish=318.5min speed=73315K/sec



md1 : active raid1 sda2[0] sdb2[1]

      24006528 blocks super 1.2 [2/2] [UU]



md2 : active raid1 sdb3[1] sda3[0]

      1441268544 blocks super 1.2 [2/2] [UU]



md3 : active raid1 sdc1[0] sdd1[1]

      2930133824 blocks super 1.2 [2/2] [UU]



md4 : active raid1 sdf2[1] sde2[0]

      2929939264 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Seems I spoke to soon. About 20% into the recovery process sdb1 failed again, and this time sdb2 in md1 also failed. Seems the whole sdb drive is busted.

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid1 sdb1[2](F) sda1[0]

      1464710976 blocks super 1.2 [2/1] [U_]



md1 : active raid1 sda2[0] sdb2[1](F)

      24006528 blocks super 1.2 [2/1] [U_]



md2 : active raid1 sdb3[1] sda3[0]

      1441268544 blocks super 1.2 [2/2] [UU]



md3 : active raid1 sdc1[0] sdd1[1]

      2930133824 blocks super 1.2 [2/2] [UU]



md4 : active raid1 sdf2[1] sde2[0]

      2929939264 blocks super 1.2 [2/2] [UU]



unused devices: <none>

I'll have to replace the drive. Now the tricky part is, how do I know which physical hard drive is sdb? Is there a way to tell?

Meh.

Quote:

Originally Posted by reano (Post 5065077)

I'll have to replace the drive. Now the tricky part is, how do I know which physical hard drive is sdb? Is there a way to tell?

Now you know why RAID array drives should be clearly labeled...

Assuming these are SATA drives, sdb is (most likely) the drive connected to the SATA port with the second lowest number that's in use.

Since it's no longer part of the array, it will be the only inactive drive. If the drives have on-board activity LEDs (few do these days), you should be able to tell by just looking.

You could try spinning the drive down with hdparm -Y. You should be able to hear it power down.

Quote:

Originally Posted by Ser Olmy (Post 5065083)

Now you know why RAID array drives should be clearly labeled...

Yup, lesson learned indeed.

I'll try the hdparm on Monday. Is there a way to power it back up, as I might need to toggle it a few times to find the right one - there are 6 drives in that box :S

Also, before I power down the drive and replace it, I'll need to remove sdb1, sdb2 and sdb3 from md0, md1 and md2. Do I just do that normally, as in:

Code:

mdadm --manage /dev/md0 --fail /dev/sdb1

mdadm --manage /dev/md0 --remove /dev/sdb1



mdadm --manage /dev/md1 --fail /dev/sdb2

mdadm --manage /dev/md1 --remove /dev/sdb2





mdadm --manage /dev/md2 --fail /dev/sdb3

mdadm --manage /dev/md2 --remove /dev/sdb3

Or is there another way to go about it?

No, that's how you do it; first "--fail", then "--remove".

(And any kind of disk access should wake a sleeping drive, like running fdisk or parted, or dd'ing a few blocks to /dev/null.)

Now something very concerning started happening.
I wanted to install a package using apt-get. I got the following error:

Code:

root@lia:~# apt-get install gdisk

-bash: /usr/bin/apt-get: Input/output error

So then I did:

Code:

root@lia:~# smartctl -a /dev/sdb



smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



Vendor:              /0:0:1:0

Product:

User Capacity:        600Â*332Â*565Â*813Â*390Â*450 bytes [600 PB]

Logical block size:  774843950 bytes

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

>> Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Bus error

But, I also get the following on sda:

Code:

root@lia:~# smartctl -a /dev/sda



smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



Vendor:              /0:0:0:0

Product:

User Capacity:        600Â*332Â*565Â*813Â*390Â*450 bytes [600 PB]

Logical block size:  774843950 bytes

scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

>> Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Bus error

What the heck....? Is sda failing now as well?

cat /proc/mdstat still shows:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid1 sdb1[2](F) sda1[0]

      1464710976 blocks super 1.2 [2/1] [U_]



md1 : active raid1 sda2[0] sdb2[1](F)

      24006528 blocks super 1.2 [2/1] [U_]



md2 : active raid1 sdb3[1] sda3[0]

      1441268544 blocks super 1.2 [2/2] [UU]



md3 : active raid1 sdc1[0] sdd1[1]

      2930133824 blocks super 1.2 [2/2] [UU]



md4 : active raid1 sdf2[1] sde2[0]

      2929939264 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Indicating that only sdb failed, with 2 out of the 3 partitions down so far.

The faulty drive may be blocking the controller. An emergency reboot may be in order here.

You also need to check the S.M.A.R.T. status of all remaining drives asap.

(For instance, are you sure the rebuild failure was caused by a write error on /dev/sdb, and not a read error on /dev/sda?)

Quote:

Originally Posted by Ser Olmy (Post 5065112)

Normal reboot console command? Or is there another way to do an emergency reboot?

The other drives:

sdc has 0 pending sectors.
sdd has 24 pending sectors, and shows "Error 244 occurred at disk power-on lifetime: 8689 hours (362 days + 1 hours)"
sde has 0 pending sectors, but also shows "Error 51 occurred at disk power-on lifetime: 8009 hours (333 days + 17 hours)"
sdf has 0 pending sectors.

This spells crisis to me :/ Of the 6 drives, 3 seems to be busted, one on each array - and I have no idea what's going on with sda.

The reboot command (or the 3-fingered salute) should be used, if possible. Only when that fails should one resort to alternate strategies involving the SysRq key or the power button.

Have you been checking these arrays regularly? I run

Code:

echo check > /sys/devices/virtual/block/<md device>/md/sync_action

at least once a week. Also, one should always monitor the S.M.A.R.T. status of all drives with smartd.

Quote:

Originally Posted by Ser Olmy (Post 5065122)

Code:

echo check > /sys/devices/virtual/block/<md device>/md/sync_action

at least weekly. Also, one should always monitor the S.M.A.R.T. status of all drives with smartd.

Do I need to remove any drives before rebooting? The server is offsite, and I'm accessing it remotely at the moment.
EDIT: Just lost remote connection. Server is still up as it's still routing traffic, but I can't access it via SSH anymore.

It would have been really great if someone could unplug the drive causing these bus errors, but the problem is we don't know with 100 % certainty that /dev/sdb is the culprit (although it's more than likely). Also, the drives aren't labeled.

Does this server have built-in remote access functionality, or do you have to rely on the OS?

Edit: I guess you need the OS, parts of which are probably spewing "oops" messages at the console right now.

Quote:

Originally Posted by Ser Olmy (Post 5065124)

See my edit. I'll have to drive in and shutdown -h, then locate sdb, disconnect it, and start her back up. Anything else I need to know before going in? (if the server doesn't come back up I won't have internet access from the premises... talk about a double-crisis)

Make sure to bring a live CD (like, say, System Rescue CD) in case the system fails to boot. You could even set up an emergency NAT router with a CD/DVD like that.

Quote:

Originally Posted by Ser Olmy (Post 5065129)

Make sure to bring a live CD (like, say, System Rescue CD) in case the system fails to boot. You could even set up an emergency NAT router with a CD/DVD like that.

Will do. If possible, I'll still try to remove sdb1,2,3 from md0,1,2 before shutting down and removing the drive. Right?

Also, any idea why we're seeing errors on 3 drives instead of 1 (refer to post #37)? Normally I'd suspect a RAID controller, but this is software raid.

Must be the drives. There's no way other hardware or software can make a drive report "pending sectors" via S.M.A.R.T. Media error is the only possibility.

Ok, I'm on the premises. I turned off the server (it was hanging with alot of error messages, like you predicted). I removed sdb (I looked for the serial number on the drive casing, to match the serial number as reported by smartctl on sdb).

Booted up, and it's running now. But here's the really strange thing:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 

md3 : active raid1 sdc1[1] sdb1[0]

      2930133824 blocks super 1.2 [2/2] [UU]

      

md0 : active raid1 sda1[0]

      1464710976 blocks super 1.2 [2/1] [U_]

      

md1 : active (auto-read-only) raid1 sda2[0]

      24006528 blocks super 1.2 [2/1] [U_]

      

md2 : active raid1 sda3[0]

      1441268544 blocks super 1.2 [2/1] [U_]

      

md4 : active raid1 sdd2[0] sde2[1]

      2929939264 blocks super 1.2 [2/2] [UU]

      

unused devices: <none>

But I definitely removed sdb. But now sdf is missing, and sdb is there. Also, that mdstat doesn't make any sense, look at it closely... Looks like sdf became sdb, or something. Compare this with how mdstat used to look before:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid1 sdb1[2] sda1[0]

      1464710976 blocks super 1.2 [2/1] [U_]

      [>....................]  recovery =  4.3% (63596480/1464710976) finish=318.5min speed=73315K/sec



md1 : active raid1 sda2[0] sdb2[1]

      24006528 blocks super 1.2 [2/2] [UU]



md2 : active raid1 sdb3[1] sda3[0]

      1441268544 blocks super 1.2 [2/2] [UU]



md3 : active raid1 sdc1[0] sdd1[1]

      2930133824 blocks super 1.2 [2/2] [UU]



md4 : active raid1 sdf2[1] sde2[0]

      2929939264 blocks super 1.2 [2/2] [UU]



unused devices: <none>

Btw, my swap partition runs on md1, but it shows as auto read-only?

EDIT: Here are the md device details:

Code:

/dev/md0:

        Version : 1.2

  Creation Time : Sat Dec 29 17:09:45 2012

    Raid Level : raid1

    Array Size : 1464710976 (1396.86 GiB 1499.86 GB)

  Used Dev Size : 1464710976 (1396.86 GiB 1499.86 GB)

  Raid Devices : 2

  Total Devices : 1

    Persistence : Superblock is persistent



    Update Time : Fri Nov 15 22:08:29 2013

          State : clean, degraded 

 Active Devices : 1

Working Devices : 1

 Failed Devices : 0

  Spare Devices : 0



          Name : lia:0  (local to host lia)

          UUID : eb302d19:ff70c7bf:401d63af:ed042d59

        Events : 513922



    Number  Major  Minor  RaidDevice State

      0      8        1        0      active sync  /dev/sda1

      1      0        0        1      removed

Code:

/dev/md1:

        Version : 1.2

  Creation Time : Sat Dec 29 17:09:50 2012

    Raid Level : raid1

    Array Size : 24006528 (22.89 GiB 24.58 GB)

  Used Dev Size : 24006528 (22.89 GiB 24.58 GB)

  Raid Devices : 2

  Total Devices : 1

    Persistence : Superblock is persistent



    Update Time : Fri Nov 15 15:36:33 2013

          State : clean, degraded 

 Active Devices : 1

Working Devices : 1

 Failed Devices : 0

  Spare Devices : 0



          Name : lia:1  (local to host lia)

          UUID : 1f8dff14:bc317bcb:d3587249:9ffc0b42

        Events : 58



    Number  Major  Minor  RaidDevice State

      0      8        2        0      active sync  /dev/sda2

      1      0        0        1      removed

Code:

/dev/md2:

        Version : 1.2

  Creation Time : Sat Dec 29 17:09:59 2012

    Raid Level : raid1

    Array Size : 1441268544 (1374.50 GiB 1475.86 GB)

  Used Dev Size : 1441268544 (1374.50 GiB 1475.86 GB)

  Raid Devices : 2

  Total Devices : 1

    Persistence : Superblock is persistent



    Update Time : Fri Nov 15 21:42:19 2013

          State : clean, degraded 

 Active Devices : 1

Working Devices : 1

 Failed Devices : 0

  Spare Devices : 0



          Name : lia:2  (local to host lia)

          UUID : 543b8db0:660e4e18:d388dec8:b9fe81cb

        Events : 103



    Number  Major  Minor  RaidDevice State

      0      8        3        0      active sync  /dev/sda3

      1      0        0        1      removed

Code:

/dev/md3:

        Version : 1.2

  Creation Time : Sat Dec 29 17:10:04 2012

    Raid Level : raid1

    Array Size : 2930133824 (2794.39 GiB 3000.46 GB)

  Used Dev Size : 2930133824 (2794.39 GiB 3000.46 GB)

  Raid Devices : 2

  Total Devices : 2

    Persistence : Superblock is persistent



    Update Time : Fri Nov 15 21:48:23 2013

          State : clean 

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0



          Name : lia:3  (local to host lia)

          UUID : 2a35faa7:b076b115:f2e45d70:e9e0f885

        Events : 72



    Number  Major  Minor  RaidDevice State

      0      8      17        0      active sync  /dev/sdb1

      1      8      33        1      active sync  /dev/sdc1

Code:

/dev/md4:

        Version : 1.2

  Creation Time : Sat Dec 29 17:10:15 2012

    Raid Level : raid1

    Array Size : 2929939264 (2794.21 GiB 3000.26 GB)

  Used Dev Size : 2929939264 (2794.21 GiB 3000.26 GB)

  Raid Devices : 2

  Total Devices : 2

    Persistence : Superblock is persistent



    Update Time : Fri Nov 15 22:08:50 2013

          State : clean 

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0



          Name : lia:4  (local to host lia)

          UUID : 18cafde6:cdd0d6ad:e80fe7e2:a346e157

        Events : 196



    Number  Major  Minor  RaidDevice State

      0      8      50        0      active sync  /dev/sdd2

      1      8      66        1      active sync  /dev/sde2

I'll post the smartctl stats in the next post, this one is getting a bit long.

...continued from previous post...

sda:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



=== START OF INFORMATION SECTION ===

Device Model:    ST3000VX000-9YW166

Serial Number:    Z1F0SK6G

LU WWN Device Id: 5 000c50 04dcd6768

Firmware Version: CV13

User Capacity:    3*000*592*982*016 bytes [3,00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Nov 15 22:17:20 2013 SAST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (  575) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  2) minutes.

SCT capabilities:              (0x10b9) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  117  099  006    Pre-fail  Always      -      157046752

  3 Spin_Up_Time            0x0003  095  095  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      97

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  082  060  030    Pre-fail  Always      -      193004742

  9 Power_On_Hours          0x0032  090  090  000    Old_age  Always      -      8982

 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      97

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      1

189 High_Fly_Writes        0x003a  001  001  000    Old_age  Always      -      896

190 Airflow_Temperature_Cel 0x0022  063  055  045    Old_age  Always      -      37 (Min/Max 32/37)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      89

193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      326

194 Temperature_Celsius    0x0022  037  045  000    Old_age  Always      -      37 (0 16 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0



SMART Error Log Version: 1

No Errors Logged



SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]





SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

sdb:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



=== START OF INFORMATION SECTION ===

Device Model:    ST3000VX000-9YW166

Serial Number:    Z1F0SN8B

LU WWN Device Id: 5 000c50 04dcd6911

Firmware Version: CV13

User Capacity:    3*000*592*982*016 bytes [3,00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Nov 15 22:18:19 2013 SAST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (  584) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  2) minutes.

SCT capabilities:              (0x10b9) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  117  099  006    Pre-fail  Always      -      142164536

  3 Spin_Up_Time            0x0003  095  094  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      97

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  070  060  030    Pre-fail  Always      -      11890152

  9 Power_On_Hours          0x0032  090  090  000    Old_age  Always      -      8983

 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      97

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0

189 High_Fly_Writes        0x003a  001  001  000    Old_age  Always      -      114

190 Airflow_Temperature_Cel 0x0022  068  059  045    Old_age  Always      -      32 (Min/Max 31/33)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      89

193 Load_Cycle_Count        0x0032  090  090  000    Old_age  Always      -      21074

194 Temperature_Celsius    0x0022  032  041  000    Old_age  Always      -      32 (0 15 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0



SMART Error Log Version: 1

No Errors Logged



SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]





SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

sdc:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



=== START OF INFORMATION SECTION ===

Device Model:    ST3000VX000-9YW166

Serial Number:    Z1F0SML8

LU WWN Device Id: 5 000c50 04dcd1e8e

Firmware Version: CV13

User Capacity:    3*000*592*982*016 bytes [3,00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Nov 15 22:19:47 2013 SAST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (  575) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  2) minutes.

SCT capabilities:              (0x10b9) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  114  098  006    Pre-fail  Always      -      66583096

  3 Spin_Up_Time            0x0003  095  094  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      97

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  070  060  030    Pre-fail  Always      -      11716429

  9 Power_On_Hours          0x0032  090  090  000    Old_age  Always      -      8981

 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      97

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  001  001  000    Old_age  Always      -      263

188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      1

189 High_Fly_Writes        0x003a  001  001  000    Old_age  Always      -      314

190 Airflow_Temperature_Cel 0x0022  066  058  045    Old_age  Always      -      34 (Min/Max 31/34)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      89

193 Load_Cycle_Count        0x0032  090  090  000    Old_age  Always      -      20770

194 Temperature_Celsius    0x0022  034  042  000    Old_age  Always      -      34 (0 14 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      24

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      24

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0



SMART Error Log Version: 1

ATA Error Count: 248 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.



Error 248 occurred at disk power-on lifetime: 8689 hours (362 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  40d+22:06:28.428  READ DMA EXT

  27 00 00 00 00 00 e0 00  40d+22:06:28.427  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  40d+22:06:28.419  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  40d+22:06:28.339  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  40d+22:06:28.331  READ NATIVE MAX ADDRESS EXT



Error 247 occurred at disk power-on lifetime: 8689 hours (362 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  40d+22:06:25.531  READ DMA EXT

  27 00 00 00 00 00 e0 00  40d+22:06:25.531  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  40d+22:06:25.522  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  40d+22:06:25.443  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  40d+22:06:25.435  READ NATIVE MAX ADDRESS EXT



Error 246 occurred at disk power-on lifetime: 8689 hours (362 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  40d+22:06:22.671  READ DMA EXT

  27 00 00 00 00 00 e0 00  40d+22:06:22.670  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  40d+22:06:22.662  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  40d+22:06:22.590  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  40d+22:06:22.574  READ NATIVE MAX ADDRESS EXT



Error 245 occurred at disk power-on lifetime: 8689 hours (362 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  40d+22:06:19.803  READ DMA EXT

  27 00 00 00 00 00 e0 00  40d+22:06:19.802  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  40d+22:06:19.794  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  40d+22:06:19.714  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  40d+22:06:19.706  READ NATIVE MAX ADDRESS EXT



Error 244 occurred at disk power-on lifetime: 8689 hours (362 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  40d+22:06:16.934  READ DMA EXT

  27 00 00 00 00 00 e0 00  40d+22:06:16.933  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  40d+22:06:16.925  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  40d+22:06:16.846  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  40d+22:06:16.830  READ NATIVE MAX ADDRESS EXT



SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]





SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

...continue on next post...

...continued from previous post...

sdd:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



=== START OF INFORMATION SECTION ===

Device Model:    ST3000VX000-9YW166

Serial Number:    Z1F0R4EY

LU WWN Device Id: 5 000c50 04dc4a62e

Firmware Version: CV13

User Capacity:    3 000 592 982 016 bytes [3,00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Nov 15 22:20:51 2013 SAST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (  584) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  2) minutes.

SCT capabilities:              (0x10b9) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  116  099  006    Pre-fail  Always      -      117184888

  3 Spin_Up_Time            0x0003  095  094  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      97

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  077  060  030    Pre-fail  Always      -      53608287

  9 Power_On_Hours          0x0032  090  090  000    Old_age  Always      -      8988

 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      97

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  046  046  000    Old_age  Always      -      54

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0

189 High_Fly_Writes        0x003a  020  020  000    Old_age  Always      -      80

190 Airflow_Temperature_Cel 0x0022  064  059  045    Old_age  Always      -      36 (Min/Max 31/36)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      89

193 Load_Cycle_Count        0x0032  063  063  000    Old_age  Always      -      75120

194 Temperature_Celsius    0x0022  036  041  000    Old_age  Always      -      36 (0 16 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0



SMART Error Log Version: 1

ATA Error Count: 54 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.



Error 54 occurred at disk power-on lifetime: 8009 hours (333 days + 17 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  12d+21:36:44.943  READ DMA EXT

  27 00 00 00 00 00 e0 00  12d+21:36:44.942  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  12d+21:36:44.894  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  12d+21:36:44.886  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  12d+21:36:44.886  READ NATIVE MAX ADDRESS EXT



Error 53 occurred at disk power-on lifetime: 8009 hours (333 days + 17 hours)

  When the command that caused the error occurred, the device was active or idle.





  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  12d+21:36:42.102  READ DMA EXT

  27 00 00 00 00 00 e0 00  12d+21:36:42.101  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  12d+21:36:42.094  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  12d+21:36:42.094  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  12d+21:36:42.093  READ NATIVE MAX ADDRESS EXT



Error 52 occurred at disk power-on lifetime: 8009 hours (333 days + 17 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  12d+21:36:39.290  READ DMA EXT

  27 00 00 00 00 00 e0 00  12d+21:36:39.289  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  12d+21:36:39.216  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  12d+21:36:39.209  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  12d+21:36:39.209  READ NATIVE MAX ADDRESS EXT



Error 51 occurred at disk power-on lifetime: 8009 hours (333 days + 17 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  12d+21:36:36.421  READ DMA EXT

  27 00 00 00 00 00 e0 00  12d+21:36:36.420  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  12d+21:36:36.364  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  12d+21:36:36.356  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  12d+21:36:36.356  READ NATIVE MAX ADDRESS EXT



Error 50 occurred at disk power-on lifetime: 8009 hours (333 days + 17 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  12d+21:36:33.584  READ DMA EXT

  27 00 00 00 00 00 e0 00  12d+21:36:33.583  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 00  12d+21:36:33.576  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00  12d+21:36:33.575  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00  12d+21:36:33.575  READ NATIVE MAX ADDRESS EXT



SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]





SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

sde:

Code:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-29-generic] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net



=== START OF INFORMATION SECTION ===

Device Model:    ST3000VX000-9YW166

Serial Number:    Z1F0SMES

LU WWN Device Id: 5 000c50 04dcd3ad1

Firmware Version: CV13

User Capacity:    3 000 592 982 016 bytes [3,00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Nov 15 22:21:56 2013 SAST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (  584) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  2) minutes.

SCT capabilities:              (0x10b9) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  113  099  006    Pre-fail  Always      -      54310680

  3 Spin_Up_Time            0x0003  095  094  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      97

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  077  060  030    Pre-fail  Always      -      55382099

  9 Power_On_Hours          0x0032  090  090  000    Old_age  Always      -      8988

 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      97

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0

189 High_Fly_Writes        0x003a  001  001  000    Old_age  Always      -      393

190 Airflow_Temperature_Cel 0x0022  066  062  045    Old_age  Always      -      34 (Min/Max 29/34)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      89

193 Load_Cycle_Count        0x0032  064  064  000    Old_age  Always      -      73794

194 Temperature_Celsius    0x0022  034  040  000    Old_age  Always      -      34 (0 15 0 0)

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0



SMART Error Log Version: 1

No Errors Logged



SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]





SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

...continued from previous post...

As you can see from all the stats in the above 3 posts, the sdb device doesn't have the original sdb serial number. Seems sdf renamed itself to sdb. Bizarre...