LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 09-17-2012, 05:02 PM   #1
mr-roboto
Member
 
Registered: Aug 2006
Location: NYC in the US of A
Distribution: Slax, FreeBSD, PCLinuxOS, Ubuntu, TurnkeyLinux
Posts: 50

Rep: Reputation: 16
Bizarre Server Booting Issues


My question pertains to a Ubuntu-based fileserver appliance called TurnkeyLinux. I'm posting here because this is a truly bizzare, generic Linux issue and I'm hoping to reach the widest number of eyeballs for a solution.

Server was working fabulously for months (two years in fact), then by accident, I discovered the boot hard drive had completely failed. Took the PC home over the weekend, replaced the boot hard drive, reinstalled the server software, and all should've been well, but wasn't.

To visualize the next part, one must know about the exact hardware config. Primary master is a simple CD-ROM, primary slave is the 15GB boot drive. The secondary master+slave are two (2) 500GB drives. Since these are all IDE drives, I believe everything is jumpered as cable-select.

After the TKL server appliance was installed, the PC wouldn't reboot into Ubuntu ! I began by retracing my steps till (hours later) I discovered the only procedure that would work:
  1. Startup PC. Only displays a cursor. No disc activity.
  2. Enter BIOS Setup and switch local IDE config to primary (to disable the Secondary controller.) Reboot.
  3. PC immediately loads GRUB and loads Ubuntu/TKL fileserver appliance, but somehow this is an unconfigured server. Reboot.
  4. Re-enter BIOS Setup and change local IDE config back to both controllers. Reboot.
  5. PC immediately loads GRUB and the Ubuntu/TKL fileserver appliance and customer's fileserver is operational again !

For reasons that I can't explain, the PC will not cold-boot into the TKL anymore. I stumbled on to the BIOS Setup workaround, when I physically disconnected the storage array (ie. the secondary IDE drives) and the PC booted normally. I probably would've solved this on my own, but I ran out of time that weekend and had to return the fileserver for the start of business on Monday. However, I rebooted the server a dozen times in a row, to make sure I had nailed the issue.

To add a new wrinkle, some time later (today, in fact) there was some sort of power failure at the office. I was able to bring up the server per the procedure enumerated above, but after I able to restart the fileserver, my PS/2 keyboard became completely unresponsive ! I can reboot the PC remotely, but my customer is starting to get impatient about this ongoing series of foul ups.

I suspect there's some kind of GRUB error, but GRUB is pretty much opaque to me. Also, I can't explain why the system drive (ie. /)is designated /dev/hdc1 and not /dev/hda1. There is a hda and hdb when I ls /dev. For every other Linux install I've done, I seem to recall the drives are always designed by primary master+slave, secondary master+slave order.

I just don't understand. Any help is welcome. TIA.....
 
Old 09-17-2012, 09:02 PM   #2
Kenarkies
Member
 
Registered: Nov 2007
Location: South Australia
Distribution: Ubuntu 11.10
Posts: 78

Rep: Reputation: 23
I can't say I can work out why, but the obvious statement is that it's trying to boot from a secondary drive. I never used cable select but always set the addresses directly, so there may possibly be something to do with the disk ordering. Anyway one workaround might be to install GRUB on the drive it's trying to boot from. Once logged in it's fairly straightforward - there are howto's around that explain the procedure.

Later thoughts - maybe it's not so obvious. This can happen if two drives have the same address, but that doesn't explain why it comes good later. It sounds rather more like a heat issue with the new drive, though unlikely. When booting from cold does the BIOS show all the drives correctly? If it can't see the new drive maybe it's stuck on the optical drive, although the BIOS settings should allow alternative drives to be tried. Depends on how old the motherboard is. It would probably be older than 2 years if it has all four IDE channels.

Ken

Last edited by Kenarkies; 09-17-2012 at 10:15 PM. Reason: More thoughts
 
Old 09-17-2012, 11:56 PM   #3
mr-roboto
Member
 
Registered: Aug 2006
Location: NYC in the US of A
Distribution: Slax, FreeBSD, PCLinuxOS, Ubuntu, TurnkeyLinux
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by Kenarkies View Post
I can't say I can work out why, but the obvious statement is that it's trying to boot from a secondary drive. I never used cable select but always set the addresses directly, so there may possibly be something to do with the disk ordering. Anyway one workaround might be to install GRUB on the drive it's trying to boot from. Once logged in it's fairly straightforward - there are howto's around that explain the procedure.
@Ken: Thanx for the feedback. I normally set drives explicitly (ie. master/slave) as well, but that can't be it. All mfrs use CS mode by default, for ease of installation. However, I will check that they're all using the correct mode when I can.

Quote:
Later thoughts - maybe it's not so obvious. This can happen if two drives have the same address, but that doesn't explain why it comes good later. It sounds rather more like a heat issue with the new drive, though unlikely. When booting from cold does the BIOS show all the drives correctly? If it can't see the new drive maybe it's stuck on the optical drive, although the BIOS settings should allow alternative drives to be tried. Depends on how old the motherboard is. It would probably be older than 2 years if it has all four IDE channels.

Ken
The new boot drive ain't so new, was just something that worked and was hanging around (ie. free), but Will check that isn't overtemping anyway. The PC hasn't actually been off, for more than a couple mins in months. Actually, it's only been off for an extended period when in transit to/from my house.

Thanx again....
 
Old 09-18-2012, 01:33 AM   #4
ghstridr
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Rep: Reputation: 0
I have to agree Kenarkies about the issue of the drive/partitioning discovery. Best thing is to boot to a recovery cd, look in dmesg to see if the order of discovery is correct.
Next look at your fstab to see how partitions are being identified and mounted. I find that using UUID's is the surest way to enforce a particular partition to a specific mount point. Using labels is easier to read, but I have 1000+ physical servers (no, really) and had a recycled drive (ie. free) cause me problems because it contained a file system with labels that interfered with my new grub installation. So I recommend using UUID's in grub and /etc/fstab.
 
Old 09-18-2012, 09:14 AM   #5
scrooge74
LQ Newbie
 
Registered: Jul 2009
Location: out there
Distribution: debian/ubuntu
Posts: 5

Rep: Reputation: 0
Just a weird idea, could it be the clock battery of the BIOS died on you? And that ends up messing your configuration after reboots or taking the power off the equipment?
 
Old 09-18-2012, 09:42 AM   #6
mr-roboto
Member
 
Registered: Aug 2006
Location: NYC in the US of A
Distribution: Slax, FreeBSD, PCLinuxOS, Ubuntu, TurnkeyLinux
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by ghstridr View Post
I have to agree Kenarkies about the issue of the drive/partitioning discovery. Best thing is to boot to a recovery cd, look in dmesg to see if the order of discovery is correct.
Next look at your fstab to see how partitions are being identified and mounted. I find that using UUID's is the surest way to enforce a particular partition to a specific mount point. Using labels is easier to read, but I have 1000+ physical servers (no, really) and had a recycled drive (ie. free) cause me problems because it contained a file system with labels that interfered with my new grub installation. So I recommend using UUID's in grub and /etc/fstab.
UUIDs. That's an interesting idea. Thanx, I'm on that that as soon I fix their phone system....
 
Old 09-18-2012, 09:46 AM   #7
mr-roboto
Member
 
Registered: Aug 2006
Location: NYC in the US of A
Distribution: Slax, FreeBSD, PCLinuxOS, Ubuntu, TurnkeyLinux
Posts: 50

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by scrooge74 View Post
Just a weird idea, could it be the clock battery of the BIOS died on you? And that ends up messing your configuration after reboots or taking the power off the equipment?
Nah. Keeps time just fine. Drive params haven't changed. Thanx....
 
Old 11-19-2012, 01:26 PM   #8
mr-roboto
Member
 
Registered: Aug 2006
Location: NYC in the US of A
Distribution: Slax, FreeBSD, PCLinuxOS, Ubuntu, TurnkeyLinux
Posts: 50

Original Poster
Rep: Reputation: 16
Finally discovered the source of the problem: motherboard/firmware issues. Had nothing to do with the Linux software at all. Everything checked out physically, but still wouldn't work as expected. Since the original post, I've personally encountered other Compaq Presarios that have similar boot problems all related to BIOS/motherboard initialization. In one case, the PC powers up, but won't load an operating system. Clear the CMOS (by popping the battery) and it boots right into Windows, but won't reboot. It hangs at the cursor til you clear the CMOS again.

Mystery solved....
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Issues with ICE and miscellaneous bizarre issues Emerald-Rose Linux - General 1 12-15-2011 04:04 PM
[SOLVED] Server booting issues(EXT3-fs: unable to read superblock 'etc') Goleteral Linux - Server 2 08-30-2010 02:20 AM
Bizarre TCP connectivity issues from certain clients, totally mystified! ponga Linux - Networking 6 12-01-2009 12:02 AM
Bizarre 5.1 surround issues with AD1985 AC'97 on SuSE 9.2 Sejanus Linux - Hardware 0 05-02-2005 11:05 AM
Bizarre Issues When Left Alone... AngelicCharon Mandriva 2 08-23-2004 01:51 PM


All times are GMT -5. The time now is 06:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration