LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 02-28-2010, 02:56 PM   #1
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Rep: Reputation: 1
Ubuntu Server - Random Hard Drive Corruption


So I built a new system few months to act as a development/"mess around with" server with an Asus Mobo and a Q6600 processor and 8 gigs of ram. Along with file, web and app hosting, I also do some virtualization on it... or atleast I had hoped to.

Ever since the first install, I've been randomly getting crashes and lockups. Sometimes it would just dump an error to the screen but stay alive, and sometimes it would dump an error and then lock up fully. The error mentions something about "kernel not tainted" etc. I will post the detailed error once it comes up again, as I have just formatted it again.

Other problems include downloaded files becoming corrupt. Files downloaded through any means (wget, torrents, ssh, ftp etc.) seem to randomly get corrupted (ie: the hashes are wrong).

I currently have one WD 150GB raptor as my primary OS partition, and 3 WD 1TB greens as my storage in an mdadm raid 5 array. At first, I had thought it was the raid array or it's drives causing issues. After painfully transfering the data off of it, I took the drives out and tried to run ubuntu with just the OS drive for a while. This still had the same issues. I then put in only one of the 1TB greens and had the same issue...

I downloaded WD's hardware diagnostic tool and ran full scans on all the drives. They all check out fine.

I left memtest running overnight and it had no errors either.


Most recently, ubuntu would not even install. It would get stuck at the stage of partitioning, and the keyboard lights would flash. After much googling, I tried popping in "noapic nolapic" to the end of the grub string, and it managed to install.

Now, I'm in a fresh system and just wgetted vmware server. However, it wont untar, I just realized the MD5 hash doesn't match!

So definately not the memory or the hds... I'm assuming it has to do with the APIC? From what I found on google, it seems as though this is only needed for the install.

Do I really need this to be on the boot string too? From what I understand, APIC allows processes to be divided out to the least loaded CPU. Having a quad core, I'd rather leave this on since it seems somewhat beneficial... I have yet to try putting this into the grub yet since I'm offsite and need

As a side note, this latest install is using just the WD Raptor as an OS drive.

Any ideas?

And I'll post up the dumped errors if I get them again. There were none dumped out when the vmware download corrupted. The message format is very similar to the one here:
http://www.linuxquestions.org/questi...uption-180137/
However, sometimes it mentions ext3 (or one of the other filesystem types I had tried with thinking it was a problem with ext3) Again, the error message is not the EXACT same, however the format is very similar...

Thanks for the help!
 
Old 03-01-2010, 02:14 PM   #2
strick1226
Member
 
Registered: Feb 2005
Distribution: CentOS, Fedora, OS X, SLES, Ubuntu
Posts: 273

Rep: Reputation: 51
Hi, Mistro,

I'm presuming this is a fairly recent motherboard since you're using a Q6600, and that you have all the hard drives attached to the onboard Intel ICH9 (or similar) controller.

I wonder if it might have anything to do with whether or not the drives are in AHCI mode...

Are the drives configured as "AHCI" or "SATA" in your BIOS settings? Have you tried setting them to another setting in the BIOS and re-testing (usually will require a re-install of the OS, unfortunately)?

Have you been able to install *any* OS on this box without I/O issues? I'm curious if a temporary installation of, say, the Windows 7 beta would fail as well.

I hate to ask, but since the Q6600 is one of the most popular chips for this... are you overclocking the CPU in any way? Or are all CPU and memory bus speeds at the defaults?

Which model Asus motherboard are you using, and which BIOS is it running? I had some funky issues running Ubuntu 8.10 on a Rampage Formula until I upgraded the BIOS to a later version. From your report of disabling APIC it almost sounds like it could be an issue with an initial BIOS release--worth checking, at least!

Definitely recommend sticking with the lone 150 Raptor as the only drive until you get things sorted out--good idea.
 
Old 03-04-2010, 09:05 PM   #3
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
Hi there, thanks for the reply. This is indeed a fairly new motherboard. I had originally had the q6600 as my desktop processor with a XFX 680i mobo. It's now on a Asus P5QL-VM D0 mobo. I had installed windows server 2008 R2 just to mess around with on just the raptor when I had initially built it, and it had been working perfectly fine. I had been tempted recently to go stick with windows again but I shall resist that temptation for a while longer

The initial install of ubuntu also seemed to go smoothly, but had the corruption problems once installed. I initially thought I had done some bad settings or installed conflicting things, so I had tried re-installing a few times.

In the bios, there is a setting in the hard drive section that reads "Sata Configuration". It had (by default) been set to "Enhanced". The other options for that were "Compatible" and "Disabled". I have set it to compatible and am currently trying to install ubuntu again to see what happens.

Below the SATA Configuration, there is another option that states "Operate as" with the option of "IDE" and "AHCI". This had been set to IDE. I've left it as that.

In the boot menu there is an option called "ACPI APIC Support". This had originally been set to "disabled". I noticed the default is actually "enabled", so I have left it as that for this install. The name seems promising, however the description for it in the manual says:
"When set to enabled, the acpi apic table pointer is included in the rsdt pointer list."

Any idea what that means? (Good? Bad? Could this be the cause?)

The Q6600 is at it's stock 2.4Ghz clock. All other settings are at default or "auto".

I guess my next steps right now are to:
- Reinstall Ubuntu (Currently starting that up while I type this)
- Try setting HD settings to AHCI to "on" (?)
- Try setting "SATA Configuration" to "disabled" (This removes the option of IDE/AHCI)


I'll report on the findings, but in the mean time, does anyone have any ideas from what I've mentioned above?

Thanks again for the help!
 
Old 03-04-2010, 10:38 PM   #4
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
Ok so basically I went through the 4 main settings (all combinations) related to APIC and AHCI. Odly enough, once you choose AHCI as the sata mode, it disables the choice of "enhanced" and "compatible". Similary, when choosing enhanced or compatible, IDE/AHCI options show up, but when disabled is chosen, neither do.

In any case, I've updated the bios firmware (the update notes said "Improve system stability". After trying all combinations, the following is the only one that seems to work:

Sata Configuration: N/A
Operate as: AHCI
ACPI APIC Support: Disabled
ACPI 2.0 Support: Disabled

I've tested all combinations with a few different parition sizes (ie: 148G / and 2G swap, 149G / and 1G swap) and it seems the above configuration is the only one that consistantly doesn't lock up at the partition/format stage of the installer. The rest worked once or twice, but eventually locked up.

I'm just afraid that even though it installs properly, there may be some underlying issue that could still be sneaking around. I will try downloading a few large files and checking the checksums to be sure...

But in the mean time, does any of this seem familiar to anyone? Any suggestions on what the problem could be? I know the above settings work, but why exactly (Sorry, I just need to know or its going to bug me forever lol)

EDIT: NVM. It finished installing and was slow as molases... I checked /proc/cpuinfo and it seems only one core is detected! Anyone have any other ideas? I'm trying IDE Enhanced mode with APIC ACPI turned on, and ACPI 2 turned off now... but I'm basically back to mixing and matching settings again. I'd really like to get to the root cause of this...

Last edited by Mistro; 03-05-2010 at 12:36 AM.
 
Old 03-06-2010, 12:50 PM   #5
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
Ok, so I again tried the following settings:

Sata Configuration: Compatible
Operate as: IDE
ACPI APIC Support: Disabled
ACPI 2.0 Support: Disabled

Still no luck It managed to get through formatting, but again locked up with the blinking keyboard lights right after the tasksel screen runs (where you pick any packages you want for your server). It was "Retrieving man-db" when it crashed this time.

Any ideas?
 
Old 03-06-2010, 11:57 PM   #6
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
More information if it's any use, Windows Server 08 fails to install as well. It bluescreens with an error related to ntfs.sys during the install stage.
 
Old 03-07-2010, 12:28 AM   #7
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,234

Rep: Reputation: 189Reputation: 189
it sounds like a mobo problem.
did you check all connections?
 
Old 03-07-2010, 11:41 AM   #8
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
I fully disassembled it (cables only) and put it back together just now with just the Raptor and no extra PCI cards (I had two video capture cards in there). Same problem. I then tried individually with each of the greens, and same thing

Is there any specific connection I should be checking?
 
Old 03-07-2010, 07:55 PM   #9
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
Success! Kind of.

I managed to narrow it down to the RAM (I think). I tried popping in the ram one by one until it failed (I have 4 modules). I then tried that module on all four slots and it failed. With the same test with the other three, installation went fine.

What is bugging me is that memtest on that one stick of ram passes memtest with flying colours...

In any case, I'm going to mark this thread as resovled since it's no longer an ubuntu problem.
 
Old 03-08-2010, 05:14 AM   #10
Laurens73
Member
 
Registered: Aug 2009
Location: Zeewolde, Flevoland NL
Distribution: Debian squeeze (Gnome) on netbooks; Debian Lenny on servers and Debian wheezy (XFCE) on new laptops
Posts: 144

Rep: Reputation: 23
I had the same problem a year ago with almost the same type of mobo (p5B, also socket 775 and a quad core in stead of its little brother the Q6600), everything working fine, and from one second to an other the entire system crashed resulting in data loss on the hard disk. I solved the problem by doing a BIOS upgrade (the mobo has the urge to switch off the cpu and case fans now and then with the old BIOS installed). I also placed a bigger cpu fan because of the high temperatures this mobo causes. After that te problem hasn't returned yet. Perhaps it works for you as well if your problem might return. The driver can be download at the site of ASUS: http://support.asus.com/download/download.aspx
 
Old 03-08-2010, 07:00 AM   #11
Mistro
LQ Newbie
 
Registered: Dec 2009
Posts: 15

Original Poster
Rep: Reputation: 1
Hi Laurens, Thanks for the tip. I did in fact upgrade the bios to the latest version, however that didn't seem to do anything with this issue.

I'm about 90% sure it's the single memory module at this point, but it's strange because there are no errors on memtest... Perhaps it could be that this particular stick's heatsink is improperly applied or something causing it to heat up abnormally and have those errors? Still odd that they all OS's crash during the HD stages though. And again, memtest coming up blank still seems slightly odd.

After googling, kernel panics and CRC errors are usually a result of bad memory, bad mobo, or OCing. Definitely not the last option, so probably one of the first two. As last resort, I've tried forcing the recommended memory timing values as well with no luck. Neither does bumping up the NB Voltage slightly to see if it gets any more stable.

I've taken the offending stick out of the server along with it's pair. I'm going to see if OCZ will RMA this for me hopefully In the mean time, only 1 or 2 VM's at a time I guess.

On a side note, Debian doesn't seem to want to detect this mobo's ethernet controller (e1000) either but that's another story lol
 
Old 03-08-2010, 07:41 AM   #12
Laurens73
Member
 
Registered: Aug 2009
Location: Zeewolde, Flevoland NL
Distribution: Debian squeeze (Gnome) on netbooks; Debian Lenny on servers and Debian wheezy (XFCE) on new laptops
Posts: 144

Rep: Reputation: 23
The new Debian indeed doesn't support the ethernet controller by standard, you need to install the firmware modules instead. The best way to accomplish this is to download the firmware-linux-free .deb package and install it to get the network working (or by using a usb network adapter in first case and altering the /etc/apt/sources.list file to download and install the files via apt and the second interface).

If you find out that it's not your memory there's a big chance you still have warranty on your mobo if the hardware problem is detected by your ASUS dealer, or if your dealer also suspects your mobo from a defect. They can send the mobo to ASUS themselves. Most ASUS mobos have a warranty period of 2 years.

Last edited by Laurens73; 03-08-2010 at 07:45 AM. Reason: language and typing errors
 
Old 02-23-2011, 06:32 PM   #13
Insuite
LQ Newbie
 
Registered: Feb 2011
Posts: 2

Rep: Reputation: 0
Mistro...

Here is a link to to someone having a very similar problem on another forum, I am having the same thing as well.


HIya Jimmy...

I am having the a very similar problem. I have 10.10 on one partition and Win 7 on another. I too have a ASUS MOBO but socket 775 version. I have very recently reinstalled Ubuntu and the same problem started again. The system freezes, hard reset and then I get on reboot either...
grub error
disk not found
or
the HD disk name corrupted to complete rubish in the bios.
a hard reset again garners the same results, however if I turn the machine off, the corrupted HD name is gone and the correct is back and the system boots.

Here is an interesting link where the guy is having the same or very similar problem and it turned out to be a RAM module,,, [link leading here]

I didn't mention that I switched from a Biostar mobo as I had read that their SATA chips had a tendency to corrupt HD's. I was having the same problem there as well.

I have noticed that if I unplug my SATA DVD drive from the mobo it sometimes fixes the problem temporarily. It is all very strange!!!!

I like your posts but unfortunately, i do not speak linux near as fluent as you do.

Cheers,
Insuite


I could sure use some insight in this as well, it is really starting to get to me after a about a year or so... Yes I seem to have a bit of patience.

Cheers,
Insuite
 
Old 02-23-2011, 06:33 PM   #14
Insuite
LQ Newbie
 
Registered: Feb 2011
Posts: 2

Rep: Reputation: 0
sorry first post so I could not add the link... Here it is...

http://ubuntuforums.org/showthread.php?t=1686405

cheers,
Insuite
 
  


Reply

Tags
corruption, noapic, nolapic


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] External USB hard drive - file corruption taylorkh Linux - Hardware 4 05-06-2009 01:15 PM
Hard drive random timeout towel401 Linux - Hardware 3 07-04-2006 08:13 AM
SATA Hard Drive Corruption with Folding@Home Matir Linux - Hardware 3 04-28-2005 06:32 PM
Hard drive corruption? RMSe17 Linux - Hardware 1 01-12-2005 07:27 PM
Bad Hard Drive or reiserfs Corruption? KungFuHamster Linux - Hardware 0 05-10-2004 09:58 PM


All times are GMT -5. The time now is 06:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration