LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 02-28-2011, 11:14 AM   #1
ckozler
LQ Newbie
 
Registered: Nov 2010
Posts: 5

Rep: Reputation: 0
Linux LVM & Bad superBlock(s)


Hi Guys,

This isnt too much of a question of how to resolve any kind of issue but more of a question to try to understand LVM better.

I understand what LVM is, does, and generally how it works and how to use it.

I had a server with a Logical Volume Group (LVG_STORE) with three physical volumes inside of it with a single Logical Volume (LV_OPT). Recently though, one of these three disks failed which lead to a number of issues.

Some more background information:

- Server had a total of 4 disks
- 1 disk -- sda -- two partitions -- one for root and one for swap
- 3 disks -- sdb, sdc, and sdd (143GB 3gb/s SAS disks) in the Logical Volume
- sdb failed
- The RAID disk controller reported all disks OK
- Uncommenting it in /etc/fstab allowed me to mount to it manually and read certain parts but a specific directory (probably that bad block on the disk) would always cause the disk to throw an error and subsequently be unmountable

When I uncommented it from fstab and was able to boot and do some manual work on it and copy some data, I got this error

Code:
Feb 28 10:43:18 demo1 kernel: Aborting journal on device dm-0.
Feb 28 10:43:18 demo1 kernel: ------------[ cut here ]------------
Feb 28 10:43:18 demo1 kernel: WARNING: at fs/buffer.c:1197 mark_buffer_dirty+0x23/0x72()
Feb 28 10:43:18 demo1 kernel: Modules linked in: joydev st ide_disk ide_cd_mod vboxnetadp(N) vboxnetflt(N) vboxdrv(N) ipv6 bonding binfmt_misc fuse loop dm_mod bnx2 rtc_cmos i2c_piix4 rtc_core serio_raw pcspkr rtc_lib shpchp button i2c_core sr_mod ses pci_hotplug sg cdrom enclosure usbhid hid ff_memless ohci_hcd sd_mod crc_t10dif ehci_hcd usbcore edd ext3 mbcache jbd fan ide_pci_generic serverworks ide_core ata_generic sata_svw pata_serverworks libata dock thermal processor thermal_sys hwmon aacraid scsi_mod
Feb 28 10:43:18 demo1 kernel: Supported: No
Feb 28 10:43:18 demo1 kernel: Pid: 5014, comm: umount Tainted: G          2.6.27.19-5-default #1
Feb 28 10:43:18 demo1 kernel: 
Feb 28 10:43:18 demo1 kernel: Call Trace:
Feb 28 10:43:18 demo1 kernel:  [<ffffffff8020da29>] show_trace_log_lvl+0x41/0x58
Feb 28 10:43:18 demo1 kernel:  [<ffffffff8049a3da>] dump_stack+0x69/0x6f
Feb 28 10:43:18 demo1 kernel:  [<ffffffff8023d562>] warn_on_slowpath+0x51/0x77
Feb 28 10:43:18 demo1 kernel:  [<ffffffff802d30c4>] mark_buffer_dirty+0x23/0x72
Feb 28 10:43:18 demo1 kernel:  [<ffffffffa00e1b55>] ext3_put_super+0x54/0x1ce [ext3]
Feb 28 10:43:18 demo1 kernel:  [<ffffffff802b33bf>] generic_shutdown_super+0x60/0xee
Feb 28 10:43:18 demo1 kernel:  [<ffffffff802b345a>] kill_block_super+0xd/0x1e
Feb 28 10:43:18 demo1 kernel:  [<ffffffff802b3518>] deactivate_super+0x60/0x79
Feb 28 10:43:18 demo1 kernel:  [<ffffffff802c7ebb>] sys_umount+0x87/0x91
Feb 28 10:43:18 demo1 kernel:  [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b
Feb 28 10:43:18 demo1 kernel:  [<00007f1d146df1c7>] 0x7f1d146df1c7
Feb 28 10:43:18 demo1 kernel: 
Feb 28 10:43:18 demo1 kernel: ---[ end trace d044caa59498ad32 ]---
My dmesg & assorted logs are littered with

Code:
Feb 28 10:43:18 demo1 kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Feb 28 10:43:18 demo1 kernel: sd 0:0:1:0: [sdb] Sense Key : Hardware Error [current] 
Feb 28 10:43:18 demo1 kernel: sd 0:0:1:0: [sdb] Add. Sense: Internal target failure
Feb 28 10:43:18 demo1 kernel: end_request: I/O error, dev sdb, sector 384
Feb 28 10:43:18 demo1 kernel: Buffer I/O error on device dm-0, logical block 0
Feb 28 10:43:18 demo1 kernel: lost page write due to I/O error on dm-0

Now, I know its pretty apparent that my drive died but here is my question--


It was evidently clear that sdb was the issue but for some reason the entire Logical Volume was unable to be mounted once the disk through an error. I tried everything from finding the backup superblocks and trying e2fsck -b on /dev/mapper/LVG_STORE-LV_OPT to backing up as much data as I could before I hit "X" amount of reads or before I hit that specific block on the disk and the error was thrown resulting in me unable to do anything else.

Why does a single disk failing cause an entire LVM to fail? This seems a little backwards unless I am missing something?

Any information would help. Thank you.

Last edited by ckozler; 02-28-2011 at 11:15 AM.
 
Old 02-28-2011, 11:37 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Forget about the pv's - from a filesystem perspective, the lv is the equivalent of a partition in the non-LVM world.
Once it's broken, it's broken.

There ain't any substitute for backups.
 
Old 02-28-2011, 11:40 AM   #3
ckozler
LQ Newbie
 
Registered: Nov 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
Forget about the pv's - from a filesystem perspective, the lv is the equivalent of a partition in the non-LVM world.
Once it's broken, it's broken.

There ain't any substitute for backups.
Thank you for the response.

So is that the equivalent of saying that if a single disk goes back in a LVM of three disks that just about the entire LVM is unusable because of that one block?

Also, the only way I was able to use the other two disks again was to delete the entire LVG of LVG_STORE and then re-create it with just the two good disks-- was this incorrect or was there another way that I may have missed?
 
Old 02-28-2011, 11:52 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Theoretically you're supposed to be able to replace the failed drive(s) if you have access to the meta-data (UUID primarily).
Mirrored RAID would be a possible solution.
I don't like LVM and don't use it except under orders.

Others will hopefully give you better answers - I'll see what I can find as well.
 
Old 02-28-2011, 12:00 PM   #5
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
Hello,

In my opinion you're confusing or mixing LVM with RAID. LVM by default doesn't provide any data redundancy or fault tolerance for as far as I know. Of course it provides possibilities and functionalities like snapshot or mirroring to provide the necessary 'backup' or redundancy. The way you set it up, having three disks in one VG and one LV in that VG that holds all the space, without any RAID or mirroring to other physical devices, I think you're out of luck. I sincerely hope for you that I'm wrong and that someone will come along with a solution. I for one would copy that solution in my personal wiki for future use.

Kind regards,

Eric
 
Old 02-28-2011, 12:15 PM   #6
ckozler
LQ Newbie
 
Registered: Nov 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
Theoretically you're supposed to be able to replace the failed drive(s) if you have access to the meta-data (UUID primarily).
Mirrored RAID would be a possible solution.
I don't like LVM and don't use it except under orders.

Others will hopefully give you better answers - I'll see what I can find as well.

I didnt have a spare SAS disk so I didnt even try (you could be completely right)-- the box was just a demo box and it was a good 'test' place to have an issue with LVM.

Quote:
Originally Posted by EricTRA View Post
Hello,

In my opinion you're confusing or mixing LVM with RAID. LVM by default doesn't provide any data redundancy or fault tolerance for as far as I know. Of course it provides possibilities and functionalities like snapshot or mirroring to provide the necessary 'backup' or redundancy. The way you set it up, having three disks in one VG and one LV in that VG that holds all the space, without any RAID or mirroring to other physical devices, I think you're out of luck. I sincerely hope for you that I'm wrong and that someone will come along with a solution. I for one would copy that solution in my personal wiki for future use.

Kind regards,

Eric
Hi Eric,

I am not confusing them-- I know the difference between RAID (hardware & software) and an LVM. The LVM journal was recovered when the disk had first failed and I restarted the server and the data/journal was about 3 months old (still completely usable). It wasnt until recently that the disk became worse and would error out much quicker.

I do believe you are correct in that since I did not provide any RAID behind the VLM (ideal would have been 6 disks in a RAID 1 -- resulting in 3 usable disks which was served in the LVG) that I am out of luck.

Yes, I have stored this in my memory bank of future use -- if one physical disk fails in an LVG that the entire LVM becomes unusable which seems correct now because the entire LVM is presented as a block device to the OS (which explains why at a specific file/directory the disk would error out and unmount).

Thanks for all the responses though if anything is wrong/incorrect or I have mis-worded something, it would be great for someone to correct me.

Thanks!


EDIT: Guess I should have done this -- http://www.google.com/search?sourcei...=LVM+snapshots

Last edited by ckozler; 02-28-2011 at 12:20 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux mount issue Seagate Barracuda 7200.11 250GB Bad fs, bad superblock chickenlinux Linux - Hardware 11 01-26-2009 12:15 PM
OpenSuse 10.3 "Bad Superblock" on LVM badger_fruit Linux - Server 5 12-08-2008 04:35 AM
mounting LVM disks via rescue disk (bad superblock) dfezz1 Linux - Server 2 12-06-2008 08:29 PM
Bad mount of .mdf - "wrong fs type, bad option, bad superblock, on /dev/loop0" Maybe-not Linux - General 2 02-29-2008 01:30 PM
FC5, Hardware Raid 5, LVM, Rescue Mode, Bad Superblock dtdionne Fedora 10 12-05-2007 09:32 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration