LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-08-2009, 10:44 PM   #1
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Rep: Reputation: 30
HDD crash on RAID5 in Ubuntu Server 8.10


Running Ubuntu 8.10 Server, headless but have Putty.

Hardware is an Asus mb with AMD 6400 dual core processor and 5 HDDs - an 80 gb HDD holding the operating system, and 4 250 gb HDDs in a RAID5 array.

APPARENTLY, two of the HDDs in the RAID5 array decided to crash simultaneously.

My FIRST task is to figure out WHICH two of my drives crashed.

Here is what I DO know about my drives:
  1. /dev/sda, case slot 1, sata slot 5, 80 gb, model# WDC-WD800JD-00MS, serial# WMAM9CRJ6471
  2. /dev/sdb, case slot 2, sata slot 6, 250 gb, model# WDC-WD2500AAKS-0, serial# WMART1755547
  3. /dev/sdc, case slot 3, sata slot 1, 250 gb, model# WDC-WD2500AAKS-0, serial# WMART1760390
  4. /dev/sdd, case slot 4, sata slot 3, 250 gb, model# WDC-WD2500AAKS-0, serial# WMAT15924203
  5. /dev/sde, case slot 5, sata slot 2, 250 gb, model# WDC-WD2500AAKS-0, serial# WMAT15923873

I know that two of my drives have crashed because when I try to assemble the array, it returns the message:
Quote:
/dev/md/0 assembled from 2 drives - not enough to start the array.
HOW can I tell which of my 4 250 gb HDDs has failed? I do still have Putty working & can get to the server, but it appears that the OS drive is the only one functioning properly.

Any help is greatly appreciated.
 
Old 08-08-2009, 11:46 PM   #2
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
Research revealed the "fdisk -l" command, which I've run, and received the following results:

Quote:
root@RCH-SERVER:/etc# fdisk -l

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x41413535

Device Boot Start End Blocks Id System
/dev/sda1 1 30401 244196001 fd Linux raid autodetect

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0002d3a2

Device Boot Start End Blocks Id System
/dev/sdb1 1 30401 244196001 fd Linux raid autodetect

Disk /dev/sdc: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00010f8f

Device Boot Start End Blocks Id System
/dev/sdc1 1 30401 244196001 fd Linux raid autodetect

Disk /dev/sdd: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005380d

Device Boot Start End Blocks Id System
/dev/sdd1 * 1 9327 74919096 83 Linux
/dev/sdd2 9328 9729 3229065 5 Extended
/dev/sdd5 9328 9729 3229033+ 82 Linux swap / Solaris

Disk /dev/sde: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000b3fcd

Device Boot Start End Blocks Id System
/dev/sde1 1 30401 244196001 fd Linux raid autodetect
root@RCH-SERVER:/etc#
So, maybe I don't have a HDD failure, but I'm definitely unsure of what's going on, and I need to do something, but don't know what. Thanks.
 
Old 08-09-2009, 01:11 AM   #3
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
I'm still looking back into previous posts that might help me either solve this problem, or at least find commands that will help gather info so that somebody might be able to provide some insight into a solution. Here's another bit of info:

I found the command: "cat /proc/mdstat" & it returned the following info:

Quote:
Personalities :
md0 : inactive sde1[0](S) sdc1[4](S) sdb1[2](S) sda1[1](S)
976783360 blocks super 1.0

unused devices: <none>
I'm not sure what it means, unless SDE1 is inactive & the others are active.
 
Old 08-09-2009, 01:38 AM   #4
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
& here's yet another that I've found - the "lshw" command, which has returned the following info (I've deleted the massive amount of info that does not appear pertinent to the current situation, & have bolded the info which appears most important):

Quote:
*-storage
description: SATA controller
product: SB700/SB800 SATA Controller [IDE mode]
vendor: ATI Technologies Inc
physical id: 11
bus info: pci@0000:00:11.0
logical name: scsi0
logical name: scsi1
logical name: scsi2
version: 00
width: 32 bits
clock: 66MHz
capabilities: storage pm bus_master cap_list emulated
configuration: driver=ahci latency=64 module=ahci

*-disk:0
description: ATA Disk
product: WDC WD2500AAKS-0
vendor: Western Digital
physical id: 0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: 01.0
serial: WD-WMART1760390
size: 232GiB (250GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=41413535

*-volume
description: Linux raid autodetect partition
physical id: 1
bus info: scsi@0:0.0.0,1
logical name: /dev/sda1
capacity: 232GiB
capabilities: primary multi

*-disk:1
description: ATA Disk
product: WDC WD2500AAKS-0
vendor: Western Digital
physical id: 1
bus info: scsi@1:0.0.0
logical name: /dev/sdb
version: 01.0
serial: WD-WMAT15924203
size: 232GiB (250GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=0002d3a2

*-volume
description: Linux raid autodetect partition
physical id: 1
bus info: scsi@1:0.0.0,1
logical name: /dev/sdb1
capacity: 232GiB
capabilities: primary multi

*-disk:2
description: ATA Disk
product: WDC WD2500AAKS-0
vendor: Western Digital
physical id: 0.0.0
bus info: scsi@2:0.0.0
logical name: /dev/sdc
version: 01.0
serial: WD-WMAT15923873
size: 232GiB (250GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=00010f8f

*-volume
description: EXT3 volume
physical id: 1
bus info: scsi@2:0.0.0,1
logical name: /dev/sdc1
version: 1.0
serial: 585932a4-6e29-4fae-b0c5-b26c430f8b42
size: 846GiB
capabilities: primary multi journaled extended_attributes large_files huge_files recover ext3 ext2 initialized
configuration: created=2011-12-14 03:43:17 filesystem=ext3 label=� lastmountpoint=� �� � modified=2009-08-07 19:11:26 mounted=2026-08-12 13:59:58 state=unknown


*-ide
description: IDE interface
product: SB700/SB800 IDE Controller
vendor: ATI Technologies Inc
physical id: 14.1
bus info: pci@0000:00:14.1
logical name: scsi4
logical name: scsi5
version: 00
width: 32 bits
clock: 66MHz
capabilities: ide msi bus_master cap_list emulated
configuration: driver=pata_atiixp latency=64 module=pata_atiixp

*-cdrom
description: DVD reader
product: ROM
vendor: 16X DVD-
physical id: 0
bus info: scsi@4:0.0.0
logical name: /dev/cdrom
logical name: /dev/dvd
logical name: /dev/scd0
logical name: /dev/sr0
version: 107G
capabilities: removable audio dvd
configuration: ansiversion=5 status=nodisc

*-disk:0 (This is my HDD that's dedicated to the OS)
description: ATA Disk
product: WDC WD800JD-00MS
vendor: Western Digital
physical id: 1
bus info: scsi@5:0.0.0
logical name: /dev/sdd
version: 10.0
serial: WD-WMAM9CRJ5825
size: 74GiB (80GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=0005380d

*-volume:0
description: EXT3 volume
vendor: Linux
physical id: 1
bus info: scsi@5:0.0.0,1
logical name: /dev/sdd1
logical name: /
logical name: /dev/.static/dev
version: 1.0
serial: 06c0dfc0-4fa7-4e53-a4b9-b41f698bb49e
size: 71GiB
capacity: 71GiB
capabilities: primary bootable journaled extended_attributes large_files huge_files recover ext3 ext2 initialized
configuration: created=2009-01-11 17:39:13 filesystem=ext3 modified=2009-08-08 22:02:37 mount.fstype=ext3 mount.options=ro,errors=remount-ro,data=ordered mounted=2009-08-08 22:02:37 state=mounted

*-volume:1
description: Extended partition
physical id: 2
bus info: scsi@5:0.0.0,2
logical name: /dev/sdd2
size: 3153MiB
capacity: 3153MiB
capabilities: primary extended partitioned partitioned:extended
*-logicalvolume
description: Linux swap / Solaris partition
physical id: 5
logical name: /dev/sdd5
capacity: 3153MiB
capabilities: nofs

*-disk:1
description: ATA Disk
product: WDC WD2500AAKS-0
vendor: Western Digital
physical id: 0.1.0
bus info: scsi@5:0.1.0
logical name: /dev/sde
version: 01.0
serial: WD-WMART1755547
size: 232GiB (250GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=000b3fcd

*-volume
description: EXT3 volume
vendor: Linux
physical id: 1
bus info: scsi@5:0.1.0,1
logical name: /dev/sde1
version: 1.0
serial: 5ad9e3a3-6e29-4f8e-b0c5-b26c430f8b42
size: 698GiB
capabilities: primary multi journaled extended_attributes large_files huge_files recover ext3 ext2 initialized
configuration: created=2008-12-07 18:52:07 filesystem=ext3 modified=2009-08-07 19:11:26 mounted=2009-08-07 19:11:26 state=unknown
So, my 80 gb HDD is a different serial number from what I documented, but that's not such a big deal. I purchased several of the 80 gb drives and set two of them up with the Linux OS on them. They are swappable, so I documented one of them, but not the other, but they are identical, except that one was installed & configured & then removed, while the other was installed & configured & left in to be used. No upgrades have been done, no changes in configuration have been done, no new software has been installed.

In fact, we left town for a week & I turned off the server. When we got back yesterday, I turned on the whole system (DSL modem, DSL router, File Server - the one in question, work stations, & network printer - a Xerox Phaser solid ink printer that connects via ethernet on built-in print server), and it worked just fine for a full day before starting to fail by first not letting me access the drive to MAKE new folders (for the pics we took), and then by not mounting the drive properly when I rebooted.

And, it appears that all the HDDs are present & accounted for, especially the 4 drives that are 250 gb each and RAID5'd. However, two are in state=unknown, so those are the ones that I need to get mounted. THIS COULD BE A CLUE, but I'm too tired to think about it right now.

I'm through looking at this for tonight, and will pick up on it again tomorrow.

I have a feeling that the solution is fairly simple. But, this plays into my theory that the simplest solutions require the most head-banging. I'm now going to go bang my head on my pillows!

Thanks, in advance, I hope! ;+)
 
Old 08-09-2009, 11:08 AM   #5
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
Ok, I may be getting closer. I ran "VIM FSTAB" & got this:

root@RCH-SERVER:/etc# vim fstab
Quote:
# /etc/fstab: static file system information.
#
# COLUMN HEADINGS
# <file system> <mount point> <type> <options> <dump> <pass>
#
# MOUNT PROC
proc /proc proc defaults 0 0
#
# MOUNT UBUNTU OSS HDD = /dev/sda1
UUID=06c0dfc0-4fa7-4e53-a4b9-b41f698bb49e / ext3 relatime,errors=remount-ro 0 1
#
# MOUNT UBUNTU OSS SWAP = /dev/sda5
UUID=4081d67c-74af-49ed-9b87-a312983ada62 none swap sw 0 0
#
# MOUNT /NW-DATA HDD
UUID=5ad9e3a3-6e29-4f8e-b0c5-b26c430f8b42 /NW-DATA ext3 defaults 0 0
#
# MOUNT CD-DVD ROM DRIVE
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0
~
~
"fstab" 19L, 683C
Maybe something in there is a clue to what's going on?
 
Old 08-09-2009, 10:24 PM   #6
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
Gosh, I'm flummoxed. I've tried loading Western Digital's diagnostics. I was able to burn the DOS based ISO to a bootable CD, but it gives me the message
Quote:
Unable to locate the License Agreement file, DLGLICE.TXT!!!
Please make sure that the License Agreement file is located
in the same path as DLGDIAG.EXE..."
But, it's right there on the disk. So, I can't even get the diagnostics to load.

If anyone has any ideas, I'd be most appreciative.
 
Old 08-10-2009, 01:34 AM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,406

Rep: Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396Reputation: 2396
This should help: http://linux.die.net/man/8/mdadm

start with

mdadm --detail

cat /proc/mdstat
 
Old 08-10-2009, 07:19 AM   #8
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
Chrism01:

Thank you for the post. I've tried what I think you're telling me:

root@RCH-SERVER:/home/admiral# mdadm --detail
Quote:
mdadm: No devices given.

So, I tried (as posted previously):
root@RCH-SERVER:/home/admiral# cat /proc/mdstat
Quote:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [ra id10]
md0 : inactive sde1[0](S) sdc1[4](S) sdb1[2](S) sda1[1](S)
976783360 blocks super 1.0

unused devices: <none>

How do I make them active?

I went back & tried:
root@RCH-SERVER:/home/admiral# mdadm --assemble --scan
Quote:
mdadm: /dev/md/0 assembled from 2 drives - not enough to start the array.

So, I went back & tried:

root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sda1
Quote:
mdadm: /dev/sda1 does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sdb1
Quote:
mdadm: /dev/sdb1 does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sdc1
Quote:
mdadm: /dev/sdc1 does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sdd1
Quote:
mdadm: /dev/sdd1 does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sde1
Quote:
mdadm: /dev/sde1 does not appear to be an md device

& just for grins:
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sda
Quote:
mdadm: /dev/sda does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sdb
Quote:
mdadm: /dev/sdb does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sdc
Quote:
mdadm: /dev/sdc does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sdd
Quote:
mdadm: /dev/sdd does not appear to be an md device
root@RCH-SERVER:/home/admiral# mdadm --detail /dev/sde
Quote:
mdadm: /dev/sde does not appear to be an md device

& I tried:
root@RCH-SERVER:/dev# mdadm --detail md0
Quote:
mdadm: md device md0 does not appear to be active.

So, How do I make MD0 active?


I'm off to work now, will be back in about 9-10 hours, and will continue addressing this issue.

Thank you very much for your thoughts. I'll read the link thoroughly after I get back, but upon first view, I'm not sure I can figure out what I need to know.

Thank you!
David Labens
San Antonio, TX
 
Old 08-10-2009, 11:05 AM   #9
johanbach
LQ Newbie
 
Registered: Aug 2009
Posts: 1

Rep: Reputation: 0
I'm having a similar problem, although one of my drives did crash. I get the same mdadm error saying only 2 drives are present (when I have 3/4) and the raid can't be assembled. I'll be watching this post to see if it can help me resolve my problem.

JB
 
Old 08-10-2009, 05:28 PM   #10
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
root@RCH-SERVER:/dev# cat /proc/mdstat
Quote:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sde1[0](S) sdc1[4](S) sdb1[2](S) sda1[1](S)
976783360 blocks super 1.0

unused devices: <none>
Someone mentioned the "(S)" means "Spare" drive
Quote:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sde1[0](S) sdc1[4](S) sdb1[2](S) sda1[1](S)
976783360 blocks super 1.0

unused devices: <none>
How would one manually mount the drives in a 4 disk array?

Thanks!
 
Old 08-10-2009, 08:10 PM   #11
cbtshare
Member
 
Registered: Jul 2009
Posts: 619

Rep: Reputation: 42
do you have hardware Raid?

If you do then you run the diagnostic tool along with the command and it will tell you which drive(s) going.

Last edited by cbtshare; 08-10-2009 at 08:11 PM.
 
Old 08-11-2009, 07:38 AM   #12
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
CBTSHARE,

Thank you for the response.

Quote:
Originally Posted by cbtshare View Post
do you have hardware Raid?

If you do then you run the diagnostic tool along with the command and it will tell you which drive(s) going.
No, it's a software raid, using mdadm. The problem is no longer knowing which HDDs have gone because they're both marked "state=unknown" (/dev/sdc1 & /dev/sde1).

The problem has become one of trying to get them manually mounted because they all appear to be spares.

Last edited by DarkFlame; 08-11-2009 at 07:39 AM.
 
Old 09-11-2009, 06:09 PM   #13
DarkFlame
Member
 
Registered: Nov 2008
Location: San Antonio, TX, USA
Distribution: Ubuntu Server 8.10 & SAMBA 3.2.3
Posts: 158
Blog Entries: 1

Original Poster
Rep: Reputation: 30
Final update: The only solution was, with the RAID5 array unmounted, was to connect just the OS HDD and the damaged drive and a brand new blank drive, download & install ddrescue, and then run through multiple copy routines to get every possible bit of data off the old/damaged drive. I copied from the damaged drive with it in every possible position except having it in the freezer (because freezing it can move things around inside just enough to get a good reading, sometimes). Once I had that done, then I had to reassemble the RAID5 array, still unmounted, and (still unmounted) had to have it recreate the data, (fdisk with an option, I believe), and then mount the drive. Once done, I copied ALL the data to a 1 TB drive in my XP box, and found that there were 3 pictures that would not copy because the bytes were set to ZERO. There were 3 other pics that had the exact same data. So, from what I can tell, out of 12,000 pictures PLUS a total of 250 GB data, I lost less than 10 mb in six files. Not bad.

My choices were to use DDRESCUE or SpinWrite, and DDRESCUE is Linux, so that's the way I went.

Case closed!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Crash Testing ext4 on Ubuntu 9.04 Server LXer Syndicated Linux News 0 04-19-2009 02:00 PM
Trying to create a raid5 array in ubuntu, ubuntu crashes. randknu Ubuntu 3 02-27-2008 12:00 PM
Need to crash X server (Ubuntu) mrazoun Linux - Hardware 2 01-11-2008 11:44 AM
Software RAID5: HDD lights always on vapspwi Linux - Hardware 3 09-22-2005 10:25 PM
Recovering Raid5 After Crash rspurlock Linux - Hardware 1 08-19-2003 12:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 11:54 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration