LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 10-19-2006, 12:58 PM   #1
Child of Wonder
Member
 
Registered: Jul 2004
Location: Sioux Falls, SD
Distribution: Debian, Ubuntu, Fedora, Red Hat
Posts: 69

Rep: Reputation: 16
Hard drive errors on file server


Lately I've been seeing the following in my messages log:

[4303937.493000] hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
[4303937.493000] hda: drive_cmd: error=0x04 { DriveStatusError }
[4303937.493000] ide: failed opcode was: 0xb0
[4303937.650000] hdc: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
[4303937.650000] hdc: drive_cmd: error=0x04 { DriveStatusError }
[4303937.650000] ide: failed opcode was: 0xb0
[4294685.348000] EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
[4294685.348000] EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

I noticed this the other night when my Samba share suddenly became read only. I reboot the server and things work again for about 24 hours and then it happens again. Drives are kept cool (30-33C).

Here is the hardware:

AMD Athlon XP 2100+
MSI KM4M-L KM400 motherboard (pulled from an eMachines)
512MB Mushkin DDR400
20GB WD PATA HDD (hda OS drive)
500GB Seagate PATA HDD (hdc)
2x500GB Seagate PATA HDDs (RAID 0 /dev/md0, hde, hdg on Promise ULTRA133 TX2 PCI controller card)

HDA and HDC are attached to the mobo on their own channels and HDE and HDG are on their own channels on the RAID card.

Initially, I thought the drives were going bad but how could all of them be going bad at the same time? I've got a new, good quality PSU (Fortron AX450-PN) so I'd hope that's not frying the drives.

Any suggestions? I'm leaning towards a bad motherboard.
 
Old 10-19-2006, 01:04 PM   #2
ciotog
Member
 
Registered: Mar 2004
Location: Canada
Distribution: Slackware current
Posts: 728
Blog Entries: 2

Rep: Reputation: 43
Have you recently upgraded the kernel? I've been trying to bring a computer out of retirement, and it boots with the 2.4 series kernel from the Slackware install disk, but not the most recent 2.6 series kernel (after getting similar error messages as you).
 
Old 10-19-2006, 01:08 PM   #3
Child of Wonder
Member
 
Registered: Jul 2004
Location: Sioux Falls, SD
Distribution: Debian, Ubuntu, Fedora, Red Hat
Posts: 69

Original Poster
Rep: Reputation: 16
I forgot to mention that. The system is running Ubuntu 6.06. Over a month ago I upgraded to 2.6.17.13 kernel so I could use my Gb NIC. These hard drive problems started about 5 days ago.
 
Old 10-20-2006, 08:40 AM   #4
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,249
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
Why not apt-get install smartmontools and check the status of your Hard drive ? drive probably has errors on it, an indication it may be starting to fail.

man smartctl

http://smartmontools.sourceforge.net/

Code:
itg-debian:/etc/init.d# smartctl -a /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD400BB-00DEA0
Serial Number:    WD-WMAD11145217
Firmware Version: 05.03E05
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Oct 20 09:39:21 2006 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (1506) seconds.
Offline data collection
capabilities:                    (0x3b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  28) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   104   093   021    Pre-fail  Always       -       2175
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       111
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   059   059   000    Old_age   Always       -       30450
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       81
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging
Check your error counts..
 
Old 10-20-2006, 02:23 PM   #5
ciotog
Member
 
Registered: Mar 2004
Location: Canada
Distribution: Slackware current
Posts: 728
Blog Entries: 2

Rep: Reputation: 43
I don't think it's the HDD myself, as I've had roughly the same problem. I actually have 2 HDDs installed, and both give the same error. I booted from a knoppix live dvd and it didn't report any issues.

I think it may be a bug in the recent kernels for older Mobos. I'm going to try going back to a prior 2.6 kernel that I know was working fine with that hardware before, and see what happens.
 
Old 10-22-2006, 05:03 PM   #6
Child of Wonder
Member
 
Registered: Jul 2004
Location: Sioux Falls, SD
Distribution: Debian, Ubuntu, Fedora, Red Hat
Posts: 69

Original Poster
Rep: Reputation: 16
Please let me know what you find out with an older kernel. I've already ordered a different motherboard to replace mine thinking that it would be the problem.

Since it's a VIA KT880 chipset and my current board is VIA KM400 I'm hoping I can just swap boards without reloading Ubuntu.

As for smartctl, I had to enable it on both drives. No errors are logged as of now.

I ran a short test on each and they both passed.

If the drives are good (which I think they are) either the kernel is buggy or the mobo is bad.
 
Old 10-23-2006, 03:37 AM   #7
hansalfredche
Member
 
Registered: Jun 2005
Posts: 445

Rep: Reputation: 31
Hmmm ... I got similar errors always at boot time on Mandriva 2006/2007. Once kernel is booted everything seems OK though? While run smartctl when I'm at home.
 
Old 10-23-2006, 11:51 AM   #8
ciotog
Member
 
Registered: Mar 2004
Location: Canada
Distribution: Slackware current
Posts: 728
Blog Entries: 2

Rep: Reputation: 43
Ok, I finally got around to working on the old machine again. It turns out that I am getting errors booting with a 2.4.29 kernel (the stock kernel with Slack 10.1). The errors I'm getting are different, however:
Code:
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
As for your issue Child of Wonder, there's a kernel config option that specifically addresses the error you are getting, and perhaps that option was enabled on the previous kernel but not the newer one.
It's under Device Drivers -> ATA/ATAPI/MFF/RLL support and it's "Use multi-mode by default". The help document for this option is as follows:
Code:
CONFIG_IDEDISK_MULTI_MODE:                                                               

If you get this error, try to say Y here:

hda: set_multmode: status=0x51 { DriveReady SeekComplete Error }
hda: set_multmode: error=0x04 { DriveStatusError }

If in doubt, say N.
If you run "hdparm /dev/hda" and "hdparm /dev/hdc" and the multcount option is set to 16 then this means that multi-mode is being set later, but it might be helpful to have it initially set to 16 using the kernel option above.
 
Old 10-23-2006, 01:35 PM   #9
Child of Wonder
Member
 
Registered: Jul 2004
Location: Sioux Falls, SD
Distribution: Debian, Ubuntu, Fedora, Red Hat
Posts: 69

Original Poster
Rep: Reputation: 16
I'll try that if it comes back.

Odd thing is, the errors stopped yesterday in /var/log/messages.

The only thing I did yesterday was enabled SMART on the two drives and performed a short self test on each.

Multimode support was not set nor is it "16" when running hdparm on the drives. I'm going to recompile the kernel with this enabled and see what happens.
 
Old 10-26-2006, 12:49 PM   #10
Child of Wonder
Member
 
Registered: Jul 2004
Location: Sioux Falls, SD
Distribution: Debian, Ubuntu, Fedora, Red Hat
Posts: 69

Original Poster
Rep: Reputation: 16
Set up 2.6.18.1 with multimode enabled and that brought the errors back.

Now that I've ran "hdparm -s on" on /dev/hda and /dev/hdc now the errors are gone in /var/log/messages.
 
Old 10-26-2006, 12:55 PM   #11
ciotog
Member
 
Registered: Mar 2004
Location: Canada
Distribution: Slackware current
Posts: 728
Blog Entries: 2

Rep: Reputation: 43
Hmm... I can't find any info about the -s option for hdparm -- what does it do?
 
Old 10-26-2006, 01:00 PM   #12
Child of Wonder
Member
 
Registered: Jul 2004
Location: Sioux Falls, SD
Distribution: Debian, Ubuntu, Fedora, Red Hat
Posts: 69

Original Poster
Rep: Reputation: 16
I'm sorry... I meant to post:

smartctl -s on /dev/hda
 
Old 01-28-2007, 05:38 PM   #13
ciotog
Member
 
Registered: Mar 2004
Location: Canada
Distribution: Slackware current
Posts: 728
Blog Entries: 2

Rep: Reputation: 43
Just to follow up, I just recently disconnected the old 2GB drive which was hdb, and the errors disappeared. It's interesting that the errors mentioned hda, when it was hdb that was causing the problem. I think it was a message being sent on the primary channel to the master drive that the slave drive was returning an error for.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
File Server Hard Drive Grook93 Linux - Hardware 2 09-07-2006 11:15 PM
Hard Drive errors on S3 wakeup slinky2004 Linux - Laptop and Netbook 1 11-19-2005 12:07 PM
Reiser file system / Hard Disk/ Hard Drive Problems Oxyacetylene Linux - Software 4 10-10-2005 02:24 PM
I just booted up and getting errors about my Hard Drive. Royle Debian 1 01-24-2005 03:00 PM
XSession-errors file eats my hard drive alexbgd Linux - Newbie 5 10-27-2003 03:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 01:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration