LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Multiple intermittent boot errors on a Fedora 9 / Windows XP dual boot system (http://www.linuxquestions.org/questions/linux-general-1/multiple-intermittent-boot-errors-on-a-fedora-9-windows-xp-dual-boot-system-669211/)

RichyAD 09-11-2008 09:52 AM

Multiple intermittent boot errors on a Fedora 9 / Windows XP dual boot system
 
First time posting on a forum and not sure I picked the right forum here. So be gentle with this forum virgin. Here goes ...

Info:
Motherboard: Asus A8N-SLI Premium
Processor: AMD Athlon 64 X2 Dual Core
Memory: 2 GB
Distro: Fedora 9
Kernel: 2.6.25.14-108
Single 320 GB Serial ATA3 disk:
/dev/sda1 * 1 6527 52428096 7 HPFS/NTFS (Windows XP 64bit)
/dev/sda3 6528 6788 2096482+ b W95 FAT32 (shared disk space Linux/Windows; mounted on /windows)
/dev/sda2 6789 6814 208845 83 Linux (mounted on /boot)
/dev/sda4 6815 38913 257835217+ 5 Extended
/dev/sda6 6815 38651 255730639+ 83 Linux (mounted on /)
/dev/sda5 38653 38913 2096482+ 82 Linux swap / Solaris

History:
I've "inherited" a 2-year old desktop computer from my supervisor. It ran fine for 2 year on a single-boot Fedora distro (Core 6, I believe). However, right out of the box it sporadically had some problems starting up after an extended down time (night, weekend, etc.). Typically it would give a "DISK BOOT FAILURE" error at the first 1 or 2 attempts, but would continue to boot properly after that. We suspected some disk problem. So when it was handed down to me, we decided to make the minor investment of a new disk. This was when my problems started ...

Problems:
With the new disk, I decided to make the system dual-boot: Windows XP (64bit) and Fedora 9. So I partitioned the disk (see above) installed the OS's. Since I rarely use windows, I decided to install GRUB into the MBR and have it handle the boot process.
At first all seemed fine and the system worked as it should. I could boot into both Windows and Linux without any problem. But then the annoying boot errors started again. Apparently, the disk had not been a problem ... And this time things are worse. Apart from a number of system freezes at various stages during Linux start-up, I've now also had quite a few different boot error messages:

DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER
System boot failure insert system disk
GRUB Geom. Error
GRUB Loading Stage1 --> ERROR 18
GRUB Loading Stage1.5 Read Error
Error 16: Inconsistent file system

Sometimes it takes several attempts to boot, but in the end it works. One odd (and slightly annoying) thing though is that if I select Windows (when I'm able to actual get into GRUB) it loads without a problem every single time, while Linux regularly leaves me with a frozen/locked system.

Solution attemps:
I've had a look at the GRUB manual and various forums, but none of the error messages and solutions seem to make sense in this case since the system can work fine (working on it now!); just not every time.
But to be save, I did try some suggestions from some forums:
- reinstall GRUB into the MBR;
- upgrade the BIOS;
- change the SATA cable and switch SATA ports on the motherboard;
Unfortunately, all without success. I've also done an extended disk integrity test (using smartctl) and this checks out fine too.

I now suspect that it's a hardware problem (motherboard with some bad connection?). But before I cart my box off for a repair that may take weeks and cost a bundle, it'd like to know if I'm missing something obvious here; some possible Linux/kernel/GRUB problem. Especially since when I select Windows it boots every time without locking up.

So if anybody has a good idea, please let me know before I dive into the repair shop hell.

Cheers,

Richard.

jiml8 09-12-2008 01:05 PM

Sounds like you have some corruption on your linux partition. Since XP boots reliably every time, and since for awhile linux did too, I would be looking at the hard drive and wondering if there was an issue (a bad block, a head alignment error...something like that) which affected the grub stage2 loader. I also would be wondering about a bad cable to the HD, though I can't explain to you why this wouldn't affect XP.

You need to run fsck on the Linux partition. To do this, load a linux live CD and run from there. Even if fsck reports the filesystem as clean, you should force it using the -f option.

Also take a look at the badblocks command in linux.

If you have access to a copy, I very highly recommend spinrite for problems like this, but it'll cost you money if you don't have it.

You also might try removing and replacing all cards in the computer (including memory) and blowing the dust bunnies out would be a wonderful idea too.

Running memcheck on the memory is not a bad idea; Windows and Linux boot very very differently and if there is an issue in your RAM, Linux might be bitten where Windows was not (and vice-versa).

After all those possibilities were eliminated, then and only then I would be looking at the power supply and the motherboard.

The problem with the original HD looks like a case where the BIOS was not waiting long enough for the drive to spin up, and was trying to access the drive before it reported itself as ready. BIOS will typically report a failure in this case.

RichyAD 09-15-2008 10:43 AM

Thanks for the post jiml8. As of last Friday "intermittent" has changed to permanent. I can't get into GRUB at all; I just get various error message (Error 17 can be added to the list).

Quote:

Originally Posted by jiml8 (Post 3278467)
Sounds like you have some corruption on your linux partition. Since XP boots reliably every time, and since for awhile linux did too, I would be looking at the hard drive and wondering if there was an issue (a bad block, a head alignment error...something like that) which affected the grub stage2 loader. I also would be wondering about a bad cable to the HD, though I can't explain to you why this wouldn't affect XP.

You need to run fsck on the Linux partition. To do this, load a linux live CD and run from there. Even if fsck reports the filesystem as clean, you should force it using the -f option.

Also take a look at the badblocks command in linux.

Ah ... I thought smartctl took care of these checks ... Feeling a bit like a newbie here, not reading the man page properly. Thanks for the correction.
So I started the Fedora 9 LiveCD. The first try failed: the system froze at "Starting udev". After a reset I managed to get into Fedora and run fsck: no problems on the boot partition (/dev/sda2). However, I had some serious problems with the main linux partition (/dev/sda6). Everytime I tried to check it, I ended up with a full system freeze and even an automatic system reboot! This is something I'd expect from Windows, not from Linux. Something had to be seriously wrong ...
And, yes, when trying to perform the badblocks test on the /dev/sda6 it returned quite a list of bad blocks before, once again, leaving me with a complete system freeze. It's probably save to assume that my nice new HD (Hitachi Deskstar 320GB, HDT725032VLA360) is a lemon ... Time to find the receipt ...


Quote:

Originally Posted by jiml8 (Post 3278467)
The problem with the original HD looks like a case where the BIOS was not waiting long enough for the drive to spin up, and was trying to access the drive before it reported itself as ready. BIOS will typically report a failure in this case.

That seems to make sense. Might this also explain why it usually happens when it tries to boot after being down for a while? HD takes a fraction longer to spin up when it's cold?

Anyway, I've gone back to my original Maxtor HD and the major problems seemed to have disappeared for the moment; except for that initial boot failure now explained by jiml8. Can anybody tell me if it's possible to tweak the system such that the BIOS waits a bit longer for HD spin up? Or is this simply part of the internal workings of the BIOS?

Cheers,

Richard.

RichyAD 09-15-2008 03:54 PM

Just a small update.

I thought it'd be a good idea to hook up both drives at the same time, boot from the old one, and check the new one while I work ... Unfortunately only the new Hitachi HD got detected by BIOS.

So I had a look around the forums once again and found some interesting facts about the old HD, which happens to be a Maxtor DiamontMax ... It seems that there are A LOT of problems with this HD when combining it with other HDs because of its "Staggered Spin-up Detection". Have a look at: http://icrontic.com/forum/showthread.php?t=29207 . This also explain my initial boot problems, as already mentioned by jiml8 (good catch!). And it answers my own question about whether or not it's possible to tweak the settings ... :-(

The joke now is that I bought a new HD that is broken and can not be used. And even if it is replaced by one that is not broken, I still can not use it together with my old Maxtor; it's either one or the other. Screwed twice ...

Next time: more research before replacing parts ...

Larry Webb 09-15-2008 04:32 PM

Hey I feel for you, it has happened to me a couple of times. Just part of the education process, do not be too hard on yourself.


All times are GMT -5. The time now is 08:42 PM.