system no longer boots after imaging : (more...)
Slackware 13.0, kernel 3.7.5
Intel QuadCore GigaBit Motherboard
/dev/sda = 1 TB Hitachi drive /dev/sda1 is bootable and is the / partition with /home and /opt on other paritions
sdb-sdd are each 1 TB drives in a raid 5 configuration.
I am preparing to upgrade to Slackware 14.0
1) upgraded to 3.7.5 and latest of VirtualBox as I was having version confilicts and the vboxdrv was not building (1 month ago)
2) Installed multipath libraries, kpartx, and installed kernel modules for device-mapper md-mod (required by kpartx). This was done within the past 3 days.
I had previous disk images created by dd that I wanted to use kpartx with to create devices so that I could mount the individual partitions on within the images themselevs.
Booted into SystemRescueCD in order to image /dev/sda onto an external drive prior to upgrading to Slackware 14.0 (yes call me paranoid) but I needed a good backup anyway.
This completed Saturday and all looked good.
Rebooted from the hard drive. My lilo.conf must have something wrong but I keep getting a vga message which forces me to hit the space bar or enter to select alternate VGA modes. When I select <space> only one of my cores is said to be responding. That is another question for another day but wanted to mention it in case it is a clue. I had hit <space> by mistake so I rebooted and chose a simple 80x40 vga and this time I was presented with am error message stating something like "unable to mount /dev/sda1 perhaps the superblock is corrupt ... yadda yadda yadda. Yes all the cores are present and responding.
I went into single user mode and tried to fdisk /dev/sda or even fdisk -l would not show me any drives at all ... like they were not even there.
I looked through DMESG and saw some ACPI error messages and saw that it was going through and seeing all of the drives and I saw it register sda and all three of its partitions.
I rebooted into the system-rescue-cd and I was able to fdisk all the drives. I was able to run fsck.ext3 -n on all sda partitions and all came back clean. I was able to mount the sda1 partition in rw mode and modify a simple file.
Everything looks fine. I try and reboot from the hard drive and I get the same error.
I was worried that installing dm-mod or multipath libs might have done something (not sure why it would) but they were the only modifications I made that I have not rebooted with. Meaning I build/loaded the modules/libs, used them from kpartx but had since not rebooted the system.
I appreciate any help as I am completely baffled as to what the problem could be.
thank you all in advance
I'd be looking at those multi-path libs. Sounds like something got overwritten.
I have been looking at the uninstall targets and it looks like it is just deleting all of the shared libraries and binaries. If that is the case then what could it have overwritten ?
A real RAID5 or the "fake-raid" on-board implementation that requires a (Windoze) driver ?.
If the latter, introducing device-mapper into the mix might explain some vagaries. It will recognise the RAID and attempt to build (construct) its own devices. No reason to expect it has actually "overwritten" anything - merely added function that is getting in the way.
Doesn't explain (to me) why /dev/sda disappears, or why you're down to one core, but that sounds like some-one (initrd being prime suspect) is passing dodgy options to the kernel.
(not been a Slack user for many years, so I don't have one of its initrd to look at)
Are you using an "intird" image?
The fact that you can get to single user mode implies that the kernel and driver for the root device are partly working. In single user mode, do you see device nodes if you do "ls -l /dev/sd*"? It sounds like "udev" might not be creating the device nodes in "/dev". If you find that "udev" is not creating the device nodes you can try removing or renaming the files in "/etc/udev/rules.d". Also make sure that you have the files in "/lib/udev".
If you find the device nodes missing, you can temporarily create them like this.
As I was looking at manually creating the nodes I decided to have a look my .bash_history on my server and compare with my laptop. The laptop I am using now I followed the same steps (multipath, dm-mod, etc.) I have not rebooted it and I was concerned however it is Slackware 13.1 whereas my server was Slackware 13.0. While comparing I noticed one additional step I did no the 13.0 installation that I did not capture in my initial post mainly because I was using the laptop as the basis for what I did. Multipath required a newer udev, newer than version 141 that was on the server. I downloaded it from my laptop and scp'd it down to the server. As I was on my laptop I made the mistake of not paying close attention to the version, I pulled down version 153 which is the base version for Slackware 13.1 so at that point I was out of my 13.0 realm and I guarantee that is the problem.
Since I have a backup the solution is to just perform the upgrade which should accomplish both fixing the problem and my end goal at the same time :-)
Thanks for everyone's assistance.
|All times are GMT -5. The time now is 10:29 PM.|