Random crashes with VIA VT8237 SATA Controller on ASUS K8V-MX mobo
I have an ASUS K8V-MX mobo with an AMD Sempron CPU, and two Western Digital Caviar SE 320GB SATA Drives. The drives are connected to the onboard SATA controller. The drives are individually mounted and are not actually being run in a RAID configuration. The OS is Fedora Core 5.
The machine periodically crashes due to what appears to be hard drive failures, which sometimes result in a corrupted file system. Generally, the entire file system will either become read-only, or entirely inaccessible until the machine is rebooted. If the file system wasn’t damaged, then the machine boots up normally.
The failures are sometimes (but not always) accompanied by errors, such as
-"kernel: journal commit I/O error" which appears as a console message
-Seek errors and Bad CRC errors which appear in /var/log/messages
In addition, at boot up, I get the message “Incorrect metadata area header checksum” when it mounts the main drive.
The errors seem to happen randomly. Sometimes the machine will stay up for as long as a week or two sometimes less than a few hours.
My first suspicion was that the SATA drive(s) were bad, but SMART data (retrieved by running /usr/sbin/smartctl and also from the WD diagnostic boot CD) indicates that the drives are fine. I even wrote a small program that continuously executes smartctl and logs the results to another machine via network (just in case the drive failure was preventing useful data from being logged to a file on the machine). However, the smartctl output continues to show that the drives are fine, even after the main drive has “crashed”
I’m now wondering if it is a problem with the SATA controller itself? and whether I need to buy a new mobo, or a PCI RAID controller to use instead of the onboard one? Or maybe there really is a problem with the drives and MART isn’t picking it up for some reason??? Any advice would be immensely appreciated.