LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Mandriva (https://www.linuxquestions.org/questions/mandriva-30/)
-   -   Boot from hard drive stalls, rescue CD works fine (https://www.linuxquestions.org/questions/mandriva-30/boot-from-hard-drive-stalls-rescue-cd-works-fine-252555/)

jonr 11-08-2004 04:18 PM

Boot from hard drive stalls, rescue CD works fine
 
I have been experimenting with SimplyMepis and Libranet on a spare hard drive.

During the experimentation something happened to my Mandrake 9.2 drive and it won't boot from the hard drive anymore except when I choose "hard drive" option from the Mandrake rescue CD that I, thank goodness, prepared some time ago.

When I try to boot from the hard drive (the bootloader is installed in the MBR of /dev/hda), I get the message from BIOS saying "Looking for boot record on IDE1" and a second later it says
"OK," which is normal procedure. But then it just hangs, though it does sound like the hard drive may be churning away---hard to tell.

So I reboot with the rescue CD in the CD-ROM drive, and choose "hard drive" as the boot option and immediately it goes to the screen where you can choose what kernel you want, just as it normally did booting from hard drive in the first place.

What's lacking, and how can I fix it? I tried "upgrade" from the distro set of CD's, which often fixes lots of minor problems, but it had no effect on this one. I tried reinstalling the bootloader after I got booted up, as root:

Code:

lilo -v -b /dev/hda
and it made no difference next time I tried to boot.

Out of ideas at this point! I'd rather not do a whole new install because it takes about five hours to get all my apps back and installed correctly every time I do that.

But I'd rather not have to rely on a CD-ROM every time I want to boot, either--though I intend to manufacture a couple more of them now, just in case! ;)

opjose 11-08-2004 06:47 PM

What happens when you merely type lilo while logged in?

Also post the output of

mount

cat /etc/fstab

fdisk -l /dev/hda

jonr 11-08-2004 07:27 PM

Quote:

Originally posted by opjose
What happens when you merely type lilo while logged in?

Also post the output of

mount

cat /etc/fstab

fdisk -l /dev/hda

Output of mount:

Code:

[root@bodhisattva jon]# mount
/dev/ide/host0/bus0/target0/lun0/part1 on / type ext3 (rw,noatime)
none on /proc type proc (rw)
none on /proc/bus/usb type usbfs (rw)
none on /dev type devfs (rw)
none on /dev/pts type devpts (rw,mode=0620)
/dev/ide/host0/bus0/target0/lun0/part6 on /home type ext3 (rw,noatime)
none on /mnt/cdrom type supermount (ro,dev=/dev/scd0,fs=udf:iso9660,--,iocharset=iso8859-1)
/dev/ide/host0/bus0/target1/lun0/part1 on /prime_backup type ext3 (rw,noatime)
/dev/ide/host0/bus0/target1/lun0/part5 on /store_1 type ext3 (rw,noatime)
/dev/ide/host0/bus0/target1/lun0/part6 on /store_2 type ext3 (rw,noatime)
/dev/ide/host0/bus0/target1/lun0/part7 on /store_3 type ext3 (rw,noatime)
/dev/ide/host0/bus1/target1/lun0/part1 on /mnt/libranet type ext3 (rw)

Output of cat /etc/fstab:

Code:

[root@bodhisattva jon]# cat /etc/fstab
/dev/hda1 / ext3 noatime 1 1
none /dev/pts devpts mode=0620 0 0
/dev/hda6 /home ext3 noatime 1 2
none /mnt/cdrom supermount dev=/dev/scd0,fs=udf:iso9660,ro,--,iocharset=iso8859-1 0 0
/dev/hdb1 /prime_backup ext3 noatime 1 2
none /proc proc defaults 0 0
#/dev/hdd1 /removable ext3 noatime 1 2
/dev/hdb5 /store_1 ext3 noatime 1 2
/dev/hdb6 /store_2 ext3 noatime 1 2
/dev/hdb7 /store_3 ext3 noatime 1 2
/dev/hda5 swap swap defaults 0 0

...and output of fdisk -l /dev/hda:
Code:

[root@bodhisattva jon]# fdisk -l /dev/hda

Disk /dev/hda: 10.2 GB, 10245537792 bytes
255 heads, 63 sectors/track, 1245 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot    Start      End    Blocks  Id  System
/dev/hda1  *        1      701  5630751  83  Linux
/dev/hda2          702      1245  4369680    5  Extended
/dev/hda5          702      732    248976  82  Linux swap
/dev/hda6          733      1245  4120641  83  Linux

And then I typed merely "lilo" as root and got:

Code:

[root@bodhisattva jon]# lilo
Added linux *
Added linux-nonfb
Added 2422-37
Added 2422-10
Added old_2422-10
Added old_linux-nonfb
Added failsafe

Now I will reboot, and see what happens without the rescue CD.

opjose 11-08-2004 07:34 PM

You also may want to post /etc/lilo.conf and the hdparm info for your drive as well.

jonr 11-08-2004 07:48 PM

Quote:

Originally posted by opjose
You also may want to post /etc/lilo.conf and the hdparm info for your drive as well.
Here's /etc/lilo.conf-------------
Code:

boot=/dev/hda
map=/boot/map
vga=normal
default="linux"
keytable=/boot/us.klt
prompt
nowarn
timeout=100
message=/boot/message
menu-scheme=wb:bw:wb:bw
disk=/dev/hda bios=0x82
image=/boot/vmlinuz
        label="linux"
        root=/dev/hda1
        initrd=/boot/initrd.img
        append="devfs=mount splash=silent hdc=ide-scsi acpi=ht splash=silent"
        vga=788
        read-only
image=/boot/vmlinuz
        label="linux-nonfb"
        root=/dev/hda1
        initrd=/boot/initrd.img
        append="devfs=mount splash=silent hdc=ide-scsi acpi=ht"
        read-only
image=/boot/vmlinuz-2.4.22-37mdk
        label="2422-37"
        root=/dev/hda1
        initrd=/boot/initrd-2.4.22-37mdk.img
        append="devfs=mount splash=silent hdc=ide-scsi acpi=ht"
        vga=788
        read-only
image=/boot/vmlinuz-2.4.22-10mdk
        label="2422-10"
        root=/dev/hda1
        initrd=/boot/initrd-2.4.22-10mdk.img
        append="devfs=mount splash=silent hdc=ide-scsi acpi=ht splash=silent"
        read-only
image=/boot/vmlinuz-2.4.22-10mdk
        label="old_2422-10"
        root=/dev/hda1
        initrd=/boot/initrd-2.4.22-10mdk.img
        append="devfs=mount splash=silent hdc=ide-scsi acpi=ht"
        vga=788
        read-only
image=/boot/vmlinuz-2.4.22-37mdk
        label="old_linux-nonfb"
        root=/dev/hda1
        initrd=/boot/initrd-2.4.22-37mdk.img
        append="devfs=mount hdc=ide-scsi acpi=ht"
        read-only
image=/boot/vmlinuz
        label="failsafe"
        root=/dev/hda1
        initrd=/boot/initrd.img
        append="devfs=nomount splash=silent hdc=ide-scsi acpi=ht failsafe"
        read-only

Here is what I get when I run hdparm -I /dev/hda------------------

Code:


/dev/hda:

ATA device, with non-removable media
        Model Number:      Maxtor 91021U2                         
        Serial Number:      G232Z2PC           
        Firmware Revision:  FA520S60
Standards:
        Used: ATA/ATAPI-4 T13 1153D revision 17
        Supported: 5 4 3 2 & some of 5
Configuration:
        Logical                max        current
        cylinders        16383        16383
        heads                16        16
        sectors/track        63        63
        --
        CHS current addressable sectors:  16514064
        LBA    user addressable sectors:  20010816
        device size with M = 1024*1024:        9770 MBytes
        device size with M = 1000*1000:      10245 MBytes (10 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        bytes avail on r/w long: 57        Queue depth: 1
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16        Current = 16
        Advanced power management level: unknown setting (0x0000)
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 *udma4
            Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
            Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled        Supported:
          *        NOP cmd
          *        READ BUFFER cmd
          *        WRITE BUFFER cmd
          *        Host Protected Area feature set
          *        Look-ahead
          *        Write cache
          *        Power Management feature set
          *        SMART feature set
                Advanced Power Management feature set
          *        DOWNLOAD MICROCODE cmd
HW reset results:
        CBLID- above Vih
        Device num = 0 determined by the jumper
Checksum: correct

And when I rebooted after my last post, it stalled just as before, and the
rescue CD came to the rescue again by selecting "hard drive."

opjose 11-08-2004 08:31 PM

I see nothing unusual in your configuration files or information returned.

On the other hand it appears that the BIOS is not locating Lilo correctly on the drive since you never see the Lilo screen.

This (to me) normally indicates that the bios believes that the MBR is located one place on the drive, when it actually is located elsewhere.

This can happen if the state of LBA mode is toggled differently (accidentally, or by another OS) and then the MBR re-written.

So while the Linux boot from the CD may be getting the actual geometry used when the drive was set up, the BIOS now sees a slightly different configuration and is looking (and finding!) a different MBR record which in turn is erroneous.

Try toggling the LBA states for the drive in the bios then COLD boot after each change.

If this doesn't help, again try running Lilo after setting a different LBA mode, booting to Linux via the CD and running Lilo again.

I've had a situation wherein the drive's geometry block has been modified by a utility when the bios was configured one way, then when the drive was reconfigured (after a dead cmos battery) it was not able to work properly due to the old geometry data.

Nothing I could do would eliminate this problem except a complete drive reformat with a drive vendor utility that would zero out the ENTIRE drive. Since this also took care of the erroneous MBR and sectors things worked again after doing so.

But this was very extreme!

Remember though, that since you can boot into Linux, it's not too difficult to backup your current configuration completely so you do not have to do a lot of work to get everything back to a working state.

E.G.

Boot up Linux via the CD.

tar off the /etc directory

tar -czvf savedetc.tar.gz /etc/*

Save the contents of your RPM installation...


rpm -qa > rpmfiles.lst

Then boot from a Live CD, and tar off either your entire partition(s) or just tar off your /home directories to another media.

Once done, reformat and re-install your hda drive as close as possible to what you have now.

Then once you have a rudimentary Linux system running first re-install all of your RPM's to exactly the same state using

urpmi < rpmfiles.lst

Next UNTAR the /etc folder back to /etc

cd /

tar -xzvf savedetc.tar.gz

And run lilo again and reboot.

Finally restore your home directories from their backups.

jonr 11-08-2004 09:39 PM

Thanks, opjose! I am saving this page as "if all else fails.html" and will print it out and work on these tactics--hopefully not having to resort to the extremest measure--tomorrow.

Will of course report back with results.

jonr 11-09-2004 12:25 AM

I went ahead and tried that tonight. No luck. I did a complete reinstall erasing the entire disk and placing the bootloader in MRB but it still stalls as before. I don't currently have Maxtor's handy-dandy low-level disk utility; I guess I may be able to download it and put it on CD-ROM (don't currently have a floppy drive hooked up, either). However it seems last time I checked, Maxtor only offered one that required a Windows machine. I may be wrong.

Anyway, the urpmi I have on this system won't work from redirection, so I guess I can go and put "urpmi" at the beginning of every one of the 500-some lines of that list and see if it will run that way---or just do things piecemeal as I've done many times before---takes a few hours and lots of frustration.

Got email and browser going, and that's it. Still have to boot from CD-ROM, which is not that much of a deal. Maybe I better just leave it that way, get halfway back to normal, and hope for better days.

opjose 11-09-2004 01:01 AM

Did you first ZERO out the disk with a manufacturer's utility.

When I had this problem the system(s) would not boot properly until this was done.

If you've saved the file then there is no reason to manually enter urpmi at the beginning.

Just let sed or some other editor do the work for you, then you can add the shell commands at the beginning and make the file executable.

Finally just run it with

sh ./filelist.lst

jonr 11-09-2004 01:21 AM

Unfortunately I wasn't able to zero out the disk as I lack a suitable utility for that at present. As I mentioned in the prior post, I will probably try to get one from Maxtor's site, but it seems to me they only offered one that required Windows last time I checked!

I'm almost back to normal operations already, though, mainly thanks to the handy tip you gave about restoring the /etc directory. That made a world of difference in ease of getting programs running again such as lm_sensors, power backup monitoring, and others. Thanks!

I already thought of making a script from that list of rpms, and will probably do so tomorrow. There may be no need, though--I'm almost back to where I have everything available and working again. I have a pretty simple system, where email, browsing, word processing, spreadsheet, some graphics, and some system monitoring is all I need. Lucky that way!

I intend to burn (at low speed) three or four more "rescue" disks and keep them carefully so I will presumably always be able to boot SOMEHOW. I'm sure glad I made that one several weeks ago, on the spur of the moment. The first disk of the Mandrake distribution is bootable, of course, but I don't know how you would just boot into an existing Mandrake-Linux system with it. So for me the "rescue" disk was worthy of its name.

opjose 11-09-2004 02:59 AM

I'll bet google will turn up something to zero out a drive which does not need to be maxstor specific, as long as it hits the entire disk... (a maxtor utility would be better though...).

Yeah those rescue disks are a handy thing, as are those LiveCD's in a pinch.

jonr 11-09-2004 09:13 AM

Quote:

Originally posted by opjose
I'll bet google will turn up something to zero out a drive which does not need to be maxstor specific, as long as it hits the entire disk... (a maxtor utility would be better though...).
Thanks! You've devoted a lot of time to helping me with this, and I really appreciate it. I didn't even know the /etc directory could be successfully restored and work--probably because my attempts in the past to restore system info after major disasters have turned out to be disasters themselves, and I figured the "volatile" information in /etc was probably at cause!

From recent experiences it looks to me like the only directory that resists restoring is /lib. I suppose because the mechanism I use for restoring it (I've tried both cp and rdiff-backup's restore function) is calling routines from there while it works.

opjose 11-09-2004 06:54 PM

Correct!

Which is why if you are going to try to preserve a full linux file system, it should really be offline.

E.G. boot from a Live Linux CD or use a Winblows program (bleh) to backup your Linux partition.

However I frequently just hook up an external drive, boot up a live Linux file system, then tar off everything to the external drive.

Then I make any changes, hardware swaps, etc. and tar back everything after doing a rudimentary Linux install.


I actually prefer the urpmi method though after saving off /etc and home directories as it "cleans up" the system and makes it run quite well in the process.


All times are GMT -5. The time now is 05:54 AM.