LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-21-2009, 02:49 PM   #1
GeneralDark
Member
 
Registered: Nov 2007
Location: Sweden
Distribution: Gentoo 2007
Posts: 32

Rep: Reputation: 15
Hardrive crash in vmware?


Hello.
Feel free to move this topic to another section thats more appropriate. I do not really know where this belong.

First of all I'm gonna describe "the whole picture" (sorry for my english).
I have an ESXi host wich is having 4 harddrives. 1 for the ESXi OS and 3 for storage.
The 3 storage drives have 1 .vmdk file each (all equally large) on them, not a single file more.
The fileserver is running from drive 1 (same as ESXi is installed on). that means that the Debian system is running on that drive (rootdrive).
The Debian system is then configured to use the 3 storagedrives as a raid 5 (software raid created in debian installer), and that partition is encrypted with LUKS (the rootpartition is encrypted aswell if that of any concern).

The encrypted storage is a ext3 and has several forlders exported using SMB.

Earlier today I did a successful check on both drives (both the OS drive and the raid-drive) ,was forced to since I rebooted, and all was fine.

About an hour ago I was browsing the storage as usual but then suddenly I noticed that a textfile I was working in made the application hang (MS Word). I reconnected the drive and everything was fine, except that the directory I had been working in was empty. I checked several other directories but all of them was intact.

Well, I rebooted the system, thinking it might help with a fresh start.
Rebooting the system goes fine, I enter the password for the rootdrive and everything is fine until its about to mount the raid-drive.
I get the following error-msg:
The superblock could not be read or does not describe a correct ext2 filesystem.

Checking the /var/log/fschk/checkfs log states:
fsck.ext3: /no such file or directory wrile trying to open /dev/mapper/raid-crypt

ls /dev/mapper does indeed not show this drive.
fdisk -l shows that all the drives (root and storage) are ok, but the /dev/dm-* doesnt contain a valid partition table

I would like to point out that the LUKS passwords are intact and wont be a problem if the problem can be fixed.

I have searched the forumes, but I'm not sure if those other topics would help me since this is abit more complexed.
Anyone could explain what has happend, and why? And what I would to do save it would be nice aswell
Help is really appriciated since I got everything on those drives.

Did I miss some important info? Just ask.

Last edited by GeneralDark; 06-21-2009 at 03:11 PM. Reason: Forgot some things.
 
Old 06-23-2009, 05:26 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by GeneralDark View Post
Well, I rebooted the system, thinking it might help with a fresh start. Rebooting the system goes fine, I enter the password for the rootdrive and everything is fine until its about to mount the raid-drive. I get the following error-msg: The superblock could not be read or does not describe a correct ext2 filesystem. Checking the /var/log/fschk/checkfs log states: fsck.ext3: / no such file or directory wrile trying to open /dev/mapper/raid-crypt ls /dev/mapper does indeed not show this drive. fdisk -l shows that all the drives (root and storage) are ok, but the /dev/dm-* doesnt contain a valid partition table I would like to point out that the LUKS passwords are intact and wont be a problem if the problem can be fixed.
If you boot your VM guest into any runlevel that allows you manual control over what gets mounted how, are there any system messages that could indicate problems at the "hardware" level? And if you read back the logs? Can you query, scan, examine all RAID components verbosely with mdadm?
 
Old 06-29-2009, 12:46 PM   #3
GeneralDark
Member
 
Registered: Nov 2007
Location: Sweden
Distribution: Gentoo 2007
Posts: 32

Original Poster
Rep: Reputation: 15
Thx for the advice. I booted up with a gentoo live cd and saw the following in dmesg:
Code:
scsi2 : ioc0: LSI53C1030 B0, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=16
scsi 2:0:0:0: Direct-Access     VMware   Virtual disk     1.0  PQ: 0 ANSI: 2
 target2:0:0: Beginning Domain Validation
 target2:0:0: Domain Validation skipping write tests
 target2:0:0: Ending Domain Validation
 target2:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
scsi 2:0:1:0: Direct-Access     VMware   Virtual disk     1.0  PQ: 0 ANSI: 2
 target2:0:1: Beginning Domain Validation
 target2:0:1: Domain Validation skipping write tests
 target2:0:1: Ending Domain Validation
 target2:0:1: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
scsi 2:0:2:0: Direct-Access     VMware   Virtual disk     1.0  PQ: 0 ANSI: 2
 target2:0:2: Beginning Domain Validation
 target2:0:2: Domain Validation skipping write tests
 target2:0:2: Ending Domain Validation
 target2:0:2: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
scsi 2:0:3:0: Direct-Access     VMware   Virtual disk     1.0  PQ: 0 ANSI: 2
 target2:0:3: Beginning Domain Validation
 target2:0:3: Domain Validation skipping write tests
 target2:0:3: Ending Domain Validation
 target2:0:3: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
sd 2:0:0:0: [sda] 16777216 512-byte hardware sectors (8590 MB)
sd 2:0:0:0: [sda] Test WP failed, assume Write Enabled
sd 2:0:0:0: [sda] Cache data unavailable
sd 2:0:0:0: [sda] Assuming drive cache: write through
sd 2:0:0:0: [sda] 16777216 512-byte hardware sectors (8590 MB)
sd 2:0:0:0: [sda] Test WP failed, assume Write Enabled
sd 2:0:0:0: [sda] Cache data unavailable
sd 2:0:0:0: [sda] Assuming drive cache: write through
 sda: sda1 sda2 < sda5 >
sd 2:0:0:0: [sda] Attached SCSI disk
sd 2:0:0:0: Attached scsi generic sg0 type 0
sd 2:0:1:0: [sdb] 1951756452 512-byte hardware sectors (999299 MB)
sd 2:0:1:0: [sdb] Test WP failed, assume Write Enabled
sd 2:0:1:0: [sdb] Cache data unavailable
sd 2:0:1:0: [sdb] Assuming drive cache: write through
sd 2:0:1:0: [sdb] 1951756452 512-byte hardware sectors (999299 MB)
sd 2:0:1:0: [sdb] Test WP failed, assume Write Enabled
sd 2:0:1:0: [sdb] Cache data unavailable
sd 2:0:1:0: [sdb] Assuming drive cache: write through
 sdb: sdb1
sd 2:0:1:0: [sdb] Attached SCSI disk
sd 2:0:1:0: Attached scsi generic sg1 type 0
sd 2:0:2:0: [sdc] 1951756452 512-byte hardware sectors (999299 MB)
sd 2:0:2:0: [sdc] Test WP failed, assume Write Enabled
sd 2:0:2:0: [sdc] Cache data unavailable
sd 2:0:2:0: [sdc] Assuming drive cache: write through
sd 2:0:2:0: [sdc] 1951756452 512-byte hardware sectors (999299 MB)
sd 2:0:2:0: [sdc] Test WP failed, assume Write Enabled
sd 2:0:2:0: [sdc] Cache data unavailable
sd 2:0:2:0: [sdc] Assuming drive cache: write through
 sdc: sdc1
sd 2:0:2:0: [sdc] Attached SCSI disk
sd 2:0:2:0: Attached scsi generic sg2 type 0
sd 2:0:3:0: [sdd] 1951756452 512-byte hardware sectors (999299 MB)
sd 2:0:3:0: [sdd] Test WP failed, assume Write Enabled
sd 2:0:3:0: [sdd] Cache data unavailable
sd 2:0:3:0: [sdd] Assuming drive cache: write through
sd 2:0:3:0: [sdd] 1951756452 512-byte hardware sectors (999299 MB)
sd 2:0:3:0: [sdd] Test WP failed, assume Write Enabled
sd 2:0:3:0: [sdd] Cache data unavailable
sd 2:0:3:0: [sdd] Assuming drive cache: write through
 sdd: sdd1
sd 2:0:3:0: [sdd] Attached SCSI disk
sd 2:0:3:0: Attached scsi generic sg3 type 0
I guess it looks normal?

So, I ran a ext3 check:
Code:
livecd ~ # fsck.ext3 /dev/sdb1
e2fsck 1.40.8 (13-Mar-2008)
fsck.ext3: Group descriptors look bad... trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdb1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

livecd ~ # fsck.ext3 /dev/sdc1
e2fsck 1.40.8 (13-Mar-2008)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdc1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

livecd ~ # fsck.ext3 /dev/sdd1
e2fsck 1.40.8 (13-Mar-2008)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdd1

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
mdadm --detail /dev/md0 shows the following when running it in the normal debian system (sorry for the picture):
http://img33.imageshack.us/img33/8157/mdadm.jpg

/proc/mdstat tells me:
Code:
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdc1[1]
      975876352 blocks
unused devices: <none>
Also, I ran badblocks on all drives without errors.

Does this mean the first drive is faulty (sdb)? Is sdc the only working drive? Why doesnt the other two drives show up in mdadm? Did I miss something?

Happy for answers
 
Old 06-29-2009, 04:35 PM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Sorry, I was away for a bit. Your mdadm.jpg says your RAID is active, degraded and not started. Mdadm doesn't seem to be able to access 2 out of the 3 devices that make up your array. RAID5 can suffer 1 out of 3 but 2 out of 3 is fatal (AFAIK, I'm no expert). If there's anything to test I'd start the host OS only and start testing at the lowest hardware level. Where applicable use "dry run" mode to do recon and write logs or use "tee" to gather information to post/attach. If that doesn't show failures you might want to boot the VM guest OS in controlled mode (single, runlevel 1 or whatever equivalent disaster mode) and try mdadm with as much detail in the scan, detail, examine modes to gather information to post/attach. If *that* doesn't show errors (and I'll guess it does, but anyway you could try activating your degraded array with 'mdadm --assemble -f /dev/md0 missing /dev/sdc1 missing' but I doubt that'll work. OTOH, if you have even the faintest hint, gut feeling or whatever else kind of omen, common sense tells you to make backups. Sure that hurts, and I don't even know if it will help in any way, but if the data is of value you would agree you'd better be safe than sorry, right?
 
Old 06-30-2009, 10:46 AM   #5
GeneralDark
Member
 
Registered: Nov 2007
Location: Sweden
Distribution: Gentoo 2007
Posts: 32

Original Poster
Rep: Reputation: 15
Thanks for the advice. I mangaged to forcebuild the raidarray with 2 drives and the calculate the third drive without a problem. Everything is back to normal, even the superblock got correct.
Thanks alot
 
Old 06-30-2009, 11:34 AM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Well done!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
kde crash slackware 12.1 on vmware 5.5.7 xterminator890i Linux - Software 0 06-26-2008 10:42 AM
Crash, Crash, Crash, Crash and You Guessed it Crash! little_penguin SUSE / openSUSE 8 07-04-2005 09:34 AM
xmms crash xine crash mplayer crash paledread Linux - Software 9 03-09-2004 07:09 AM
VMWare 4 crash/freeze in SuSE 9 uranologist Linux - Software 2 02-27-2004 06:54 PM
Crash when booting kernel 2.4 under VMware Muchembled Linux - General 0 10-16-2003 04:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration