LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-11-2006, 08:41 AM   #1
DJCF
LQ Newbie
 
Registered: Jan 2005
Posts: 12

Rep: Reputation: 0
help interpreting MDADM readouts


Hi all,

(Is this in the right place? Is it hardware or software related? Also, it's a repost because I posted it in the hardware section, but no replies.)

I have a Fedora Core 3 home server with three 320GB hard drives, which are fairly new -- only a month or so old. They are in a RAID-5 array, with partitions like this:

/dev/hda1 5GB root partition mounted as /
/dev/hdd1 and /dev/hdc1 are a 8GB Logical Volume Group thingumy mounted as /tmp
/dev/hda2 /dev/hdc3 and /dev/hdd3 are the 630 GB RAID-5 array mounted as /home

(There's some swap partitions there too and some space is lost due to filesystem inefficienies.)

This all well and good but late last night one of the hard drives (hdd, a secondary slave) started making loud clicking sounds at fairly regular intervals, about once a minute or so. Catting /proc/mdstat revealed one of the drives was faulty but there was nothing I could do about it until today. I turned off the server, reseated the cables etc., turned it back on, and the drive wasn't recognised by the BIOS. I put the drive into my own workstation and it "click"ed on startup, but was recognised by both the BIOS and by Suse (though I didnt try to mount it -- obviously). So I put it back into the server and the BIOS recognised it ok and its been running for an hour or so with no clicking sounds. I dont think all is well, however, perhaps you guys can help me make sense of these RAID readouts?

# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 hdc3[1] hda2[0]
614903808 blocks level 5, 256k chunk, algorithm 2 [3/2] [UU_]

unused devices: <none>

There seems to be only hdc3 and hda2 in this array -- no sign of hdd3. And what does it mean [3/2]? Shouldnt it be [2/3] because it is two out of three drives?

# mdadm --examine /dev/md0
mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got 00000000)

What exactly does this mean?


# mdadm --query /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Wed Feb 15 14:35:22 2006
Raid Level : raid5
Array Size : 614903808 (586.42 GiB 629.66 GB)
Device Size : 307451904 (293.21 GiB 314.83 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Mar 3 14:39:02 2006
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
0 3 2 0 active sync /dev/hda2
1 22 3 1 active sync /dev/hdc3
2 0 0 -1 removed
UUID : 8c8f0f62:9e69e701:409df450:89adf2fb
Events : 0.136964


This suggests to me that there are two drives in the array, not three -- we're missing HDD, right?

# mdadm --examine /dev/hda2

/dev/hda2:
Magic : a92b4efc
Version : 00.90.00
UUID : 8c8f0f62:9e69e701:409df450:89adf2fb
Creation Time : Wed Feb 15 14:35:22 2006
Raid Level : raid5
Device Size : 307451904 (293.21 GiB 314.83 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 0

Update Time : Fri Mar 3 14:39:24 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : 38c744bb - correct
Events : 0.136972

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
this 0 3 2 0 active sync /dev/hda2
0 0 3 2 0 active sync /dev/hda2
1 1 22 3 1 active sync /dev/hdc3
2 2 0 0 2 faulty removed


Why does /dev/hda2 appear in that list twice? Surely it should only appear once? And again, we're missing HDD, right?

# mdadm --examine /dev/hdc3
/dev/hdc3:
Magic : a92b4efc
Version : 00.90.00
UUID : 8c8f0f62:9e69e701:409df450:89adf2fb
Creation Time : Wed Feb 15 14:35:22 2006
Raid Level : raid5
Device Size : 307451904 (293.21 GiB 314.83 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 0

Update Time : Fri Mar 3 14:39:36 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : 38c744e9 - correct
Events : 0.136978

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
this 1 22 3 1 active sync /dev/hdc3
0 0 3 2 0 active sync /dev/hda2
1 1 22 3 1 active sync /dev/hdc3
2 2 0 0 2 faulty removed

Again, hdc3 appears twice (why?) and there is no sign of HDD.

Let's have a look for HDD...

# mdadm --examine /dev/hdd3
/dev/hdd3:
Magic : a92b4efc
Version : 00.90.00
UUID : 8c8f0f62:9e69e701:409df450:89adf2fb
Creation Time : Wed Feb 15 14:35:22 2006
Raid Level : raid5
Device Size : 307451904 (293.21 GiB 314.83 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0

Update Time : Thu Mar 2 18:30:44 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 38c60fdd - correct
Events : 0.133609

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
this 2 22 67 2 active sync /dev/hdd3
0 0 3 2 0 active sync /dev/hda2
1 1 22 3 1 active sync /dev/hdc3
2 2 22 67 2 active sync /dev/hdd3

Now this is strange: we now have hdd listed twice, along with the others.

What exactly is going on here? Is it, like I think, that I'm going on only two drives? If so, how can I make the array reintegrate hdd3? Or if I'm wrong and everything is OK, where have I gone wrong in interpreting the readouts?

Cheers,

Daniel
 
Old 04-20-2006, 04:50 AM   #2
Emmanuel_uk
Senior Member
 
Registered: Nov 2004
Distribution: Mandriva mostly, vector 5.1, tried many.Suse gone from HD because bad Novell/Zinblows agreement
Posts: 1,606

Rep: Reputation: 53
UU_
I think it means one HD is not part of the raid anymore
confirmed by
State : clean, degraded

AFAIK raid5 can work with only 2 HD, this is the whole point of it

Time to do some backups, and buy a new HD, and "rebuild" the area

man mdadm (I have never rebuilt an area)
 
Old 04-20-2006, 06:37 AM   #3
DJCF
LQ Newbie
 
Registered: Jan 2005
Posts: 12

Original Poster
Rep: Reputation: 0
Cheers for the help, looks like I have some work to do.

As I understand it, the third hard disk should be working physically fine, just as not part of the array. So I'll have to reintegrate it somehow. The persistent superblock (am I right?) will still be there which will hamper my attempts to reintegrate it the "normal" way (tutorials, man pages, etc.)

Cheers for your help,

Daniel
 
Old 04-20-2006, 06:49 AM   #4
Emmanuel_uk
Senior Member
 
Registered: Nov 2004
Distribution: Mandriva mostly, vector 5.1, tried many.Suse gone from HD because bad Novell/Zinblows agreement
Posts: 1,606

Rep: Reputation: 53
If the drive make noises and your server is critical,
then why would you put that faulty drive back?
If you try to add back that faulty drive to the area I do not know what can happen
I suppose a new drive is needed

I played only with raid 0

clicking sound, 1 month, send back for refund (3 yr warranty)
 
Old 04-20-2006, 07:09 AM   #5
DJCF
LQ Newbie
 
Registered: Jan 2005
Posts: 12

Original Poster
Rep: Reputation: 0
I think its a 5 year warranty actually, so very cool! (It's not even 5 months old yet.)

It was clicking but after restarting the server and plugging the hard drive back in, the clicking stopped and seems to be working fine now. (Both Suse and Fedora can see it in /dev, and I can querry it using mdadm.) So I was planning to try and simply re-add it. Good idea, or do you think I should send it back? If I send it back, wouldn't they most likely plug it into a test computer, discover that it "works" (no clicking, recognised by the OS and the BIOS) and send it back to me?

Cheers,

Daniel
 
Old 04-20-2006, 07:21 AM   #6
Emmanuel_uk
Senior Member
 
Registered: Nov 2004
Distribution: Mandriva mostly, vector 5.1, tried many.Suse gone from HD because bad Novell/Zinblows agreement
Posts: 1,606

Rep: Reputation: 53
you can install smartmontools and look into
the life parameters of the HD
(saying that only very recent kernel may support SMART on sata)

http://smartmontools.sourceforge.net/
vendor will accept just a printout

There is probably a win utility from the vendor to access smart data

Be frank with vendor and tell them that the noise stopped on putting it back

Using it again: how much is it worth loosing all your data?
You know it is partly faulty... why do you want to use again for?

series effect:
check that serial no do not follow
if one HD failed, maybe the other will

raid is no replacement for backups
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Interpreting /proc/diskstats rajesht SUSE / openSUSE 8 02-25-2014 02:10 PM
help understanding RAID readouts DJCF Linux - Hardware 0 03-03-2006 08:50 AM
interpreting a cron nitaish Linux - General 1 10-15-2004 12:18 PM
interpreting gdb...... deadhead Programming 2 12-03-2003 10:12 PM
Interpreting mail sent by logwatch JasonW Linux - General 0 02-20-2003 06:17 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration