Hello. I'm currently in the process of building a Linux Software RAID 5 array and am getting allot of conflicting information about whether or not one of the drives is bad.
I have 4 of these drives: Seagate Barracuda LP 1.5 TB (ST31500541AS)
I downloaded and ran the SeaTools (on Windows XP) and ran the "Long Generic" test on all 4 of the drives and they all passed.
I ran:
Code:
badblocks -wvs -o /root/badblocks.txt /dev/sde
And one of the drives did show some badblocks, but upon inspecting the inside of the computer case, the cables had become unplugged. I ran a second badblocks check on the same drive and it passed.
I have also run:
Code:
smartctl -t long /dev/sde
and, after the test finished, I ran:
Code:
smartctl -a /dev/sde
and can see the output:
Code:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 171 -
# 2 Short offline Completed without error 00% 157 -
Also, in the output of "smartctl -a /dev/sde" is "SMART overall-health self-assessment test result: PASSED", which is a good sign. However, the difference between the output of "smartctl -a /dev/sde" and for other drives is that in the "SMART Error Log Version: 1" section, there are some errors listed, but the output for "smartctl -a /dev/sdd" and for the other drives, says "No Errors Logged". Here is an example of the errors on /dev/sde :
Code:
Error 2855 occurred at disk power-on lifetime: 88 hours (3 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 ff ff ff ef 00 08:01:03.097 READ DMA EXT
27 00 00 00 00 00 e0 00 08:01:03.096 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 08:01:03.093 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 08:01:03.090 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 08:01:03.060 READ NATIVE MAX ADDRESS EXT
I followed along with
http://www.linuxhomenetworking.com/w..._Software_RAID to create the RAID array.
So, I have created one MSDOS partition on each of these drives, using up all the space on the drive, so that the output of fdisk -l looks like this for all the drives:
Code:
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sde1 1 182401 1465136001 fd Linux raid autodetect
I have used this command to create the RAID array:
Code:
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
The RAID array is currently building, so I am getting this output from "cat /proc/mdstat" :
Code:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde1[4] sdd1[2] sdc1[1] sdb1[0]
4395407808 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[===>.................] recovery = 15.0% (221120564/1465135936) finish=805.6min speed=25734K/sec
unused devices: <none>
I'm quite confused by all this. Why is sde1 listed as "sde1[4]" instead of "sde1[3]"? Why is it showing "[4/3] [UUU_]" instead of "[4/4] [UUUU]"? If it is detecting a drive (sde) as failed, why does it not show a (F) beside it, as indicated by
https://www-304.ibm.com/support/docv...d=isg3T1011259 ?
And I guess the big question is, should I attempt to return this drive, or perhaps I just need to clear that SMART error log and software RAID will allow me to use that drive?
Thanks.