LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 10-04-2011, 09:10 PM   #1
digdogger
Member
 
Registered: Mar 2007
Distribution: CentOS
Posts: 32

Rep: Reputation: 2
Question Do I have a bad drive or not?


Hello. I'm currently in the process of building a Linux Software RAID 5 array and am getting allot of conflicting information about whether or not one of the drives is bad.

I have 4 of these drives: Seagate Barracuda LP 1.5 TB (ST31500541AS)

I downloaded and ran the SeaTools (on Windows XP) and ran the "Long Generic" test on all 4 of the drives and they all passed.

I ran:

Code:
badblocks -wvs -o /root/badblocks.txt /dev/sde
And one of the drives did show some badblocks, but upon inspecting the inside of the computer case, the cables had become unplugged. I ran a second badblocks check on the same drive and it passed.

I have also run:

Code:
smartctl -t long /dev/sde
and, after the test finished, I ran:

Code:
smartctl -a /dev/sde
and can see the output:

Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       171         -
# 2  Short offline       Completed without error       00%       157         -
Also, in the output of "smartctl -a /dev/sde" is "SMART overall-health self-assessment test result: PASSED", which is a good sign. However, the difference between the output of "smartctl -a /dev/sde" and for other drives is that in the "SMART Error Log Version: 1" section, there are some errors listed, but the output for "smartctl -a /dev/sdd" and for the other drives, says "No Errors Logged". Here is an example of the errors on /dev/sde :

Code:
Error 2855 occurred at disk power-on lifetime: 88 hours (3 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      08:01:03.097  READ DMA EXT
  27 00 00 00 00 00 e0 00      08:01:03.096  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      08:01:03.093  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      08:01:03.090  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      08:01:03.060  READ NATIVE MAX ADDRESS EXT
I followed along with http://www.linuxhomenetworking.com/w..._Software_RAID to create the RAID array.

So, I have created one MSDOS partition on each of these drives, using up all the space on the drive, so that the output of fdisk -l looks like this for all the drives:

Code:
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1      182401  1465136001   fd  Linux raid autodetect
I have used this command to create the RAID array:

Code:
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
The RAID array is currently building, so I am getting this output from "cat /proc/mdstat" :

Code:
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[4] sdd1[2] sdc1[1] sdb1[0]
      4395407808 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      [===>.................]  recovery = 15.0% (221120564/1465135936) finish=805.6min speed=25734K/sec
      
unused devices: <none>
I'm quite confused by all this. Why is sde1 listed as "sde1[4]" instead of "sde1[3]"? Why is it showing "[4/3] [UUU_]" instead of "[4/4] [UUUU]"? If it is detecting a drive (sde) as failed, why does it not show a (F) beside it, as indicated by https://www-304.ibm.com/support/docv...d=isg3T1011259 ?

And I guess the big question is, should I attempt to return this drive, or perhaps I just need to clear that SMART error log and software RAID will allow me to use that drive?

Thanks.
 
Old 10-04-2011, 10:17 PM   #2
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Rep: Reputation: 48
The _ represent a missing disk- not a failure. What does /var/log/messages say? What does mdadm --detail /dev/md0 return?
 
Old 10-05-2011, 04:02 PM   #3
digdogger
Member
 
Registered: Mar 2007
Distribution: CentOS
Posts: 32

Original Poster
Rep: Reputation: 2
Thanks for the reply, rch. The plot thickens, though. The RAID build process has completed, and now the output of "cat /proc/mdstat" is:

Code:
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[3] sdd1[2] sdc1[1] sdb1[0]
      4395407808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>
So it looks like everything is ok ... Perhaps I jumped the gun in making this forum posting. To answer your questions:

/var/log/messages (removing everything that doesn't have to do with the RAID build):

Code:
Oct  4 16:34:34 velma kernel: md: bind<sdb1>
Oct  4 16:34:34 velma kernel: md: bind<sdc1>
Oct  4 16:34:34 velma kernel: md: bind<sdd1>
Oct  4 16:34:34 velma kernel: md: bind<sde1>
Oct  4 16:34:34 velma kernel: raid5: device sdd1 operational as raid disk 2
Oct  4 16:34:34 velma kernel: raid5: device sdc1 operational as raid disk 1
Oct  4 16:34:34 velma kernel: raid5: device sdb1 operational as raid disk 0
Oct  4 16:34:34 velma kernel: raid5: allocated 4262kB for md0
Oct  4 16:34:34 velma kernel: raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
Oct  4 16:34:34 velma kernel: RAID5 conf printout:
Oct  4 16:34:34 velma kernel:  --- rd:4 wd:3 fd:1
Oct  4 16:34:34 velma kernel:  disk 0, o:1, dev:sdb1
Oct  4 16:34:34 velma kernel:  disk 1, o:1, dev:sdc1
Oct  4 16:34:34 velma kernel:  disk 2, o:1, dev:sdd1
Oct  4 16:34:34 velma kernel: RAID5 conf printout:
Oct  4 16:34:34 velma kernel:  --- rd:4 wd:3 fd:1
Oct  4 16:34:34 velma kernel:  disk 0, o:1, dev:sdb1
Oct  4 16:34:34 velma kernel:  disk 1, o:1, dev:sdc1
Oct  4 16:34:34 velma kernel:  disk 2, o:1, dev:sdd1
Oct  4 16:34:34 velma kernel:  disk 3, o:1, dev:sde1
Oct  4 16:34:34 velma kernel: md: syncing RAID array md0
Oct  4 16:34:34 velma kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Oct  4 16:34:34 velma kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Oct  4 16:34:34 velma kernel: md: using 128k window, over a total of 1465135936 blocks.
Oct  5 08:39:39 velma kernel: md: md0: sync done.
Oct  5 08:39:39 velma kernel: RAID5 conf printout:
Oct  5 08:39:39 velma kernel:  --- rd:4 wd:4 fd:0
Oct  5 08:39:39 velma kernel:  disk 0, o:1, dev:sdb1
Oct  5 08:39:39 velma kernel:  disk 1, o:1, dev:sdc1
Oct  5 08:39:39 velma kernel:  disk 2, o:1, dev:sdd1
Oct  5 08:39:39 velma kernel:  disk 3, o:1, dev:sde1
And "mdadm --detail /dev/md0" :

Code:
/dev/md0:
        Version : 0.90
  Creation Time : Tue Oct  4 16:34:34 2011
     Raid Level : raid5
     Array Size : 4395407808 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Oct  5 08:39:39 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : def18c72:165523b5:6544cc2b:a8748b14
         Events : 0.2

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
I found this: http://blog.ringerc.id.au/2010/04/us...cron-raid.html and I'm *definitely* going to follow the advice on that page and am running:

Code:
echo "check" > /sys/block/md0/md/sync_action
I'll be setting that up as a cron job too, and I highly recommend to anybody who stumbles across this posting to do the same if you have a software RAID setup.

Does anybody know why, in /var/log/messages above, I have "RAID5 conf printout" twice in a row, but with different data below it? Does anybody know the meaning of "--- rd:4 wd:3 fd:1" ?

Thanks.
 
Old 10-05-2011, 04:30 PM   #4
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Rep: Reputation: 48
It means 3 working , 1 failed. See how it changes to rd:4 wd:4 fd:0. The fd:1 was the missing one.
 
Old 10-17-2011, 08:03 PM   #5
digdogger
Member
 
Registered: Mar 2007
Distribution: CentOS
Posts: 32

Original Poster
Rep: Reputation: 2
After a whole bunch of time consuming trial and error, I eventually figured out the issue. A change in OS is what I needed. I was using CentOS 5.7, which is using kernel 2.6.18-274.3.1 as of Oct 17, 2011 . I switched to Fedora 14, which is using kernel 2.6.35.14-97 as of Oct 17, 2011, which I think is what made the difference. After switching to Fedora 14 but still sticking with the same hardware, I was able to build the RAID array, and keep the output of /proc/mdstat saying "[UUUU]" even after running "echo check > /sys/block/md0/md/sync_action".
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bad, Bad, BAD! (Firefox is basically ditching html5 video support) smeezekitty General 11 05-05-2010 06:29 PM
Removed bad hard drive from computer, now flash drive won't mount. checkmate3001 Linux - Hardware 6 08-15-2008 12:03 AM
Bad Hard Drive bingo Linux - Hardware 2 04-25-2008 10:46 PM
Bad Hard Drive? raylhm SUSE / openSUSE 5 04-13-2008 08:43 AM
usb drive won't mount - error: wrong fs type, bad option, bad superblock on /dev/sda1 bluecog6 Linux - Hardware 4 01-27-2007 09:07 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 12:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration