LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Red Hat (http://www.linuxquestions.org/questions/red-hat-31/)
-   -   RHEL 6.1 kickstart reboot fails with SAN disks attached (http://www.linuxquestions.org/questions/red-hat-31/rhel-6-1-kickstart-reboot-fails-with-san-disks-attached-902152/)

dhmusil 09-09-2011 10:31 AM

RHEL 6.1 kickstart reboot fails with SAN disks attached
 
All,

We are in the process of testing RHEL 6.1 for deployment within our environment. We have stumbled on a strange bug/error/opportunity when building the servers that can see SAN attached storage. The machine appears to build correctly but on reboot, the server hangs or panics. If the SAN storage removed, (in our case they pull it from the zone.), the machine boots up perfectly.

I have looked on line for the last few day to see if this is a know issue and have found nothing. I'm wondering if anyone else has seen this happen.

HARDWARE
Server: IBM 3650
Storage: EMC Clarion
HBA: Emulex

Initially I am hoping for a sanity check on this. (Verification that I haven't had any bad coffee...for the last 3 weeks)

Thank you kindly

anomie 09-09-2011 01:16 PM

So I think what you're saying is:
  • You deploy new server hardware
  • You attach it to your SAN (specific hardware noted above)
  • You fire off Kickstart, and it installs RHEL6 without trouble
  • Upon reboot (post-install), you get a "hang" or a kernel panic

Correct? If so, does /var/log/messages provide any clues? What happens just before a hang? What info does the panic provide?

(This might end up being a bug report to RH, but I'm curious about the details.)

dhmusil 09-12-2011 10:54 AM

You are correct Anomie.

I have added the console output just prior to the machines demise. Again, once the storage is removed, the machine comes back perfectly.

(We have also tried building the machine without the storage attached initially and then adding it later, same scenario.)

=== Console Output ===
.
.
.
sd 3:0:1:1: [sdd] 33554432 512-byte logical blocks: (17.1 GB/16.0 GiB)
sd 1:0:0:1: [sda] 33554432 512-byte logical blocks: (17.1 GB/16.0 GiB)
scsi 8:0:2:0: Direct-Access IBM-ESXS MBD2147RC SB18 PQ: 0 ANSI: 6
sd 3:0:0:1: [sdc] 33554432 512-byte logical blocks: (17.1 GB/16.0 GiB)
sd 1:0:1:1: [sdb] 33554432 512-byte logical blocks: (17.1 GB/16.0 GiB)
scsi 8:0:3:0: Direct-Access IBM-ESXS MBD2147RC SB18 PQ: 0 ANSI: 6
sd 3:0:1:1: [sdd] Write Protect is off
sd 3:0:0:1: [sdc] Write Protect is off
sd 1:0:1:1: [sdb] Write Protect is off
sd 3:0:1:1: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 3:0:0:1: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 1:0:1:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sdd:
sdc:
sdb: unknown partition table
scsi 8:2:0:0: Direct-Access IBM ServeRAID-MR10i 1.40 PQ: 0 ANSI: 5
sd 8:2:0:0: [sde] 855465984 512-byte logical blocks: (437 GB/407 GiB)
sd 8:2:0:0: [sde] Write Protect is off
sd 8:2:0:0: [sde] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
sde: unknown partition table
sd 1:0:0:1: [sda] Write Protect is off
sd 3:0:1:1: [sdd] Attached SCSI disk
sd 1:0:0:1: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
unknown partition table
sd 3:0:0:1: [sdc] Attached SCSI disk
sde1 sde2 sde3 sde4
sd 8:2:0:0: [sde] Attached SCSI disk
sd 1:0:1:1: [sdb] Attached SCSI disk
sda: unknown partition table
sd 1:0:0:1: [sda] Attached SCSI disk
sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
device-mapper: multipath round-robin: version 1.0.0 loaded
dracut: Scanning devices sde2 sde4 for LVM logical volumes rootvg/root rootvg/swap
dracut: No volume groups found
dracut: Volume group "rootvg" not found
dracut: Skipping volume group rootvg
dracut Warning: No root device "block:/dev/mapper/rootvg-root" found

dracut Warning: LVM rootvg/root not found
dracut Warning: LVM rootvg/swap not found

dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
dracut Warning: Signal caught!

dracut Warning: LVM rootvg/root not found
dracut Warning: LVM rootvg/swap not found

dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-131.12.1.el6.x86_64 #1
Call Trace:
[<ffffffff814da648>] ? panic+0x78/0x143
[<ffffffff8106c452>] ? do_exit+0x852/0x860
[<ffffffff81174215>] ? fput+0x25/0x30
[<ffffffff8106c4b8>] ? do_group_exit+0x58/0xd0
[<ffffffff8106c547>] ? sys_exit_group+0x17/0x20
[<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b

anomie 09-14-2011 05:32 PM

Quote:

Originally Posted by dhmusil
Code:

dracut: Scanning devices sde2 sde4  for LVM logical volumes rootvg/root rootvg/swap
dracut: No volume groups found
dracut: Volume group "rootvg" not found
dracut: Skipping volume group rootvg
dracut Warning: No root device "block:/dev/mapper/rootvg-root" found

dracut Warning: LVM rootvg/root not found
dracut Warning: LVM rootvg/swap not found


Bam. There's your problem. Detach from the SAN, boot the machine, and post the output from:
Code:

# pvdisplay
and
Code:

# grep '[^#a-z] filter' /etc/lvm/lvm.conf
You created the physical volumes on local disk (obviously), but I think what you're running into is a shift in device naming when your system sees the storage LUNs. As a result, LVM2 can not find logical volumes at boot time. Let's get all the facts and figure out how to fix.

davidlee 09-21-2012 11:11 AM

Sorry I'm late to this (I've only just registered).

Disclaimer: I'm no expert on IBM hardware, BIOS etc. So please apply caution in acting on any of this.

We install RHEL 5.8 on various IBM hardware, and have seen something very similar. (Well-installed system; boots happily when no SAN disk is attached; fails to boot when SAN disk is present.)

A colleague seems to have solved this and it seems to work for us.

As the machine boots, get into the BIOS (F1 soon after machine start-up). Then:
System settings
-> Devices and I/O Ports
-> Enable/Disable Adapter Option ROM Support

That should present you with a couple of lists of the HBA slots. The list headings are:
"Legacy Option ROM(s)"
and:
"UEFI Options ROM(s)"

You'll probably find those are "enabled" for all slots. On our systems we set those to "disabled" for those slots with SAN HBAs.

We think that on boot-up the BIOS scans through all its disks, including SAN, and grabs hold of the first one it finds and tries to boot from it. Or something like that. Setting those HBAs to disabled seems to prevent the BIOS for searching (and unfortunately finding) those disks at boot time. The SAN disk still attached and working, but we can now boot the machine from its internal system disk.

It might be worth investigating that sort of thing.

Hope that helps.


All times are GMT -5. The time now is 07:01 PM.