Having issues booting from ECM san root after kernel patch
Here is the scenario.
Hardware info/Env info:
I have a Cisco UCS blade system. We are running Red Hat 5.x on it. Connected to EMC vmax
The system will only take a boot from san root.
The problem:
When we update the OS and a kernel patch gets loaded say from 5.8 to 5.9 and we reboot, we lose the root disk. Powerpath drivers get kicked out of the kernel and thus the system won’t boot.
There is a big procedure to recover the system and get the powerpath devices back into the kernel. Some of which requires several reboots in rescue mode and going to single path drives and rescue mode again and then wave a dead chicken and some other smoke and mirrors stuff to get the powerpath sw to load back in.
All that equals down time to my users and that’s not acceptable depending on the system that is affected. It takes an act of GOD to get them patched. Last thing I want to do is tell them that a 10 min patch / reboot is now an 8 hour job because EMC drivers don’t play nice with Red Hat updates.
In AIX we know this to be a know issue and we DO NOT boot from SAN this way. We do have native MPIO that will allow us to do this. We are also running a vio server for the root drives to avoid booting from SAN.
Is there some way in RH or Linux to do this with native drivers?
We cannot be updating the kernel and losing our root drive every time we do a yum update.
I want to know if there is a native way to boot SAN in Linux and what drivers are needed SW etc. Any cfg that needs to be done etc.
We also have boot local drives with EMC attached disk and because of these drivers getting kicked out of the kernel we lose our data drives. This is not that bad as I can comment them out before booting and then load the powerpath driver back in once the root is up. Then uncomment the FS from /etc/fstab and mount manually. It is ugly if you forget to do this step and root won’t boot. You need to go into rescue mode and mount /etc and comment out the data fs etc. Then reboot and work the above mentioned magic.
Any info or sources of info would be greatly appreciated.
EMC has come back and basically said working as designed here is the recovery procedure we have no way to prevent this. But if you want us to help you we can do professional services. Hmmm I smell a rat. I have nothing against EMC they have a place in the MKT but they need to get onboard and write better drivers for this function and support it.
|