LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   system no longer boots after imaging : (more...) (http://www.linuxquestions.org/questions/slackware-14/system-no-longer-boots-after-imaging-more-4175451405/)

cygnus-x1 02-23-2013 11:19 AM

system no longer boots after imaging : (more...)
 
Slackware 13.0, kernel 3.7.5
Intel QuadCore GigaBit Motherboard
/dev/sda = 1 TB Hitachi drive /dev/sda1 is bootable and is the / partition with /home and /opt on other paritions

sdb-sdd are each 1 TB drives in a raid 5 configuration.

I am preparing to upgrade to Slackware 14.0

Recent history:
---------------
1) upgraded to 3.7.5 and latest of VirtualBox as I was having version confilicts and the vboxdrv was not building (1 month ago)

2) Installed multipath libraries, kpartx, and installed kernel modules for device-mapper md-mod (required by kpartx). This was done within the past 3 days.

I had previous disk images created by dd that I wanted to use kpartx with to create devices so that I could mount the individual partitions on within the images themselevs.

Friday/Saturday:
----------
Booted into SystemRescueCD in order to image /dev/sda onto an external drive prior to upgrading to Slackware 14.0 (yes call me paranoid) but I needed a good backup anyway.

This completed Saturday and all looked good.

Rebooted from the hard drive. My lilo.conf must have something wrong but I keep getting a vga message which forces me to hit the space bar or enter to select alternate VGA modes. When I select <space> only one of my cores is said to be responding. That is another question for another day but wanted to mention it in case it is a clue. I had hit <space> by mistake so I rebooted and chose a simple 80x40 vga and this time I was presented with am error message stating something like "unable to mount /dev/sda1 perhaps the superblock is corrupt ... yadda yadda yadda. Yes all the cores are present and responding.

I went into single user mode and tried to fdisk /dev/sda or even fdisk -l would not show me any drives at all ... like they were not even there.

I looked through DMESG and saw some ACPI error messages and saw that it was going through and seeing all of the drives and I saw it register sda and all three of its partitions.

I rebooted into the system-rescue-cd and I was able to fdisk all the drives. I was able to run fsck.ext3 -n on all sda partitions and all came back clean. I was able to mount the sda1 partition in rw mode and modify a simple file.

Everything looks fine. I try and reboot from the hard drive and I get the same error.

I was worried that installing dm-mod or multipath libs might have done something (not sure why it would) but they were the only modifications I made that I have not rebooted with. Meaning I build/loaded the modules/libs, used them from kpartx but had since not rebooted the system.

I appreciate any help as I am completely baffled as to what the problem could be.

thank you all in advance

gnashley 02-23-2013 12:36 PM

I'd be looking at those multi-path libs. Sounds like something got overwritten.

cygnus-x1 02-23-2013 04:17 PM

Quote:

Originally Posted by gnashley (Post 4898251)
I'd be looking at those multi-path libs. Sounds like something got overwritten.

The only reason I installed them was to get kpartx. I had no idea they could be that intrusive. Is there a way to back it out ? There is an uninstall makefile target in the source tree. I suppose I could manually do what it does as I cannot run it.

thanks

cygnus-x1 02-23-2013 04:31 PM

I have been looking at the uninstall targets and it looks like it is just deleting all of the shared libraries and binaries. If that is the case then what could it have overwritten ?

syg00 02-23-2013 06:28 PM

Quote:

Originally Posted by cygnus-x1 (Post 4898213)
sdb-sdd are each 1 TB drives in a raid 5 configuration.

Hmmm - hardware RAID5 presumably.
A real RAID5 or the "fake-raid" on-board implementation that requires a (Windoze) driver ?.

If the latter, introducing device-mapper into the mix might explain some vagaries. It will recognise the RAID and attempt to build (construct) its own devices. No reason to expect it has actually "overwritten" anything - merely added function that is getting in the way.
Doesn't explain (to me) why /dev/sda disappears, or why you're down to one core, but that sounds like some-one (initrd being prime suspect) is passing dodgy options to the kernel.
(not been a Slack user for many years, so I don't have one of its initrd to look at)

Erik_FL 02-23-2013 06:45 PM

Are you using an "intird" image?

The fact that you can get to single user mode implies that the kernel and driver for the root device are partly working. In single user mode, do you see device nodes if you do "ls -l /dev/sd*"? It sounds like "udev" might not be creating the device nodes in "/dev". If you find that "udev" is not creating the device nodes you can try removing or renaming the files in "/etc/udev/rules.d". Also make sure that you have the files in "/lib/udev".

If you find the device nodes missing, you can temporarily create them like this.

Code:

mount -o remount,rw /
mknod -m u=rw,g=rw,o= /dev/sda b 8 0
mknod -m u=rw,g=rw,o= /dev/sda1 b 8 1

You can find out information about the "sda1" kernel device like this.

Code:

udevadm info -a -p /sys/class/block/sda1
The "lilo" boot-loader may have incorrect block lists for loading the kernel or "initrd". You can try reinstalling "lilo". Also check to make sure that the "/boot/vmlinuz" link points to where you expect, or use the complete kernel file name in "lilo.conf". You will have to use "chroot" from a rescue disk to do that.

Code:

mount /dev/sda1 /mnt
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
chroot /mnt
ls -l /boot/vmlinuz
ls -l /boot
cd /etc
nano lilo.conf
lilo
exit
umount /mnt/sys
umount /mnt/proc
umount /mnt/dev
umount /mnt


cygnus-x1 02-24-2013 06:19 AM

Quote:

Originally Posted by syg00 (Post 4898391)
Hmmm - hardware RAID5 presumably.
A real RAID5 or the "fake-raid" on-board implementation that requires a (Windoze) driver ?.

syg, real raid 5. I haven't used a windoze driver since the early days of wireless and the ndiswrapper. I did verify however just to make sure that md did not somehow try and take sda as a raid device and it was not as a good friend of my mine have that happen to him.

Erik,

As I was looking at manually creating the nodes I decided to have a look my .bash_history on my server and compare with my laptop. The laptop I am using now I followed the same steps (multipath, dm-mod, etc.) I have not rebooted it and I was concerned however it is Slackware 13.1 whereas my server was Slackware 13.0. While comparing I noticed one additional step I did no the 13.0 installation that I did not capture in my initial post mainly because I was using the laptop as the basis for what I did. Multipath required a newer udev, newer than version 141 that was on the server. I downloaded it from my laptop and scp'd it down to the server. As I was on my laptop I made the mistake of not paying close attention to the version, I pulled down version 153 which is the base version for Slackware 13.1 so at that point I was out of my 13.0 realm and I guarantee that is the problem.

Since I have a backup the solution is to just perform the upgrade which should accomplish both fixing the problem and my end goal at the same time :-)

Thanks for everyone's assistance.


All times are GMT -5. The time now is 08:19 PM.