LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Intermittent freezes - suspect SATA as the problem (https://www.linuxquestions.org/questions/linux-hardware-18/intermittent-freezes-suspect-sata-as-the-problem-605496/)

damiendusha 12-08-2007 10:28 PM

Intermittent freezes - suspect SATA as the problem
 
Hello all,

I have recently upgraded my RAID 5 Array from a 3-drive configuration to a 4 drive configuration, with 4x500GB SATA Samsung hard drives.

/home is mounted on the RAID, with the remainder of the system mounted on a 200GB Western digital PATA hard drive

Since doing it, I am getting intermittent freezes of the entire system. By freezes, I mean that the application (typically amarok, because it's a media centre pc/file server) has locked up, no applets in gnome respond, the mouse or keyboard does not respond, and I cannot ssh into the machine. From this, I am suspecting that it's either hardware or kernel.

Before this, I was on constant uptime since compiling and installing a new kernel, and never had any problems with kernels before that. Now, I am getting hours of uptime at best (with the new hdd, same kernel), when /home (such as the music collection) is being accessed.

The problem is almost identical to that seen in these posts:
http://www.linuxquestions.org/questi...ht=sata+freeze
http://www.linuxquestions.org/questi...ht=sata+freeze

And similar-ish issues here:
http://www.linuxquestions.org/questi...ht=sata+freeze (don't use LUKS though)
http://www.linuxquestions.org/questi...ht=sata+freeze
http://www.linuxquestions.org/questi...ht=sata+freeze
http://www.linuxquestions.org/questi...ht=sata+freeze
http://www.linuxquestions.org/questi...ht=sata+freeze

Oh, I should mention my hardware and OS :)
O.S. Fedora 7
[damien@localhost ~]$ uname -r
2.6.23.1
[damien@localhost ~]$ uname -m
x86_64

Motherboard = Gigabyte GA-M57SLI-S4 (nForce 570 SLI Chipset)
CPU = AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
RAM = 1GB DDR2-800 (Can't remember if it's Geil or Corsair)

Now, the reason I am suspecting it's the SATA driver and/or hardware is that on this exact same motherboard (actually, I have two instances of the same motherboard and they both exhibited this behaviour), I was running Windows XP, and with the "nvidia enhanced" drivers, I was getting the BSOD. Using the standard MS drivers, they're much more stable. I remember doing all the memtest stuff back then with no problems on either mobo.

Until now, using Linux, I have not had a problem, but I am now using a new SATA port (which is my only HW configuration change), and my previous problems, and noting that some people also seem to be having problems with the same chipset, leads me to suspect that this may be the issue. Unfortunately. I can't remember if the issues I had then was due to the **exact same** sata port, but seem to remember shifting around the sata ports on the motherboard - it was 12 months ago, after all!

I am running a Seasonic S12+ 400W power supply, so I suspect it isn't a power supply overload issue.

Can anyone suggest the way forward? For example:
- Any logs/configuration files?
- Tests that I should do to try and reporduce it, and if so, what information should I capture?
- Should I recompile another old kernel (someone found that 2.6.17 was stable with the sata_nv, but were having problems with 2.6.20 or 2.6.21)

I'm less keen on doing HW tests, becasue it's a pain to open the case :( , but if it comes to that, I'll do it.

Thank you very much
Damien.

duryodhan 12-08-2007 10:41 PM

what does dmesg say near about it slows up ?

if the Sata is mounted at /home ... could you try running as root and umounting /home to check if that actually is problem ... infact keep it mounted and check dmesg

(my point being ... running as root doesn't require /home)

first make sure that it is the SATA problem for sure, and then we can try to fix that .

you could try running a different kernel, but I guess it is just your luck and experience of others that will help.

damiendusha 12-08-2007 11:55 PM

Well, I am in via ssh and running dmesg on a continuous loop, and see if I can catch it in the act.

I'm hammering the HDDs at the same time, so let's see if anything spits out...

Haplo770 03-17-2008 08:54 AM

Did you ever find a solution to this.
 
I recently purchased a new system. And I am having the same problems.
System will not stay up
Tweaked BIOS (Updated to latest released version)
Stopped unneeded services.
It is appearing that it hangs when lots of data are transferred either locally or thru 1000Mb network.

Tried to move 20Gb off of IDE drive it hangs.
but moved 4GB ISO without trouble.

CPU temps 45C, surprising very Cool air blows out from case
Reseated memory, graphics, cpu

ASUS M2n-SLi deluxe
AMD - 5000 X2 65 Watt
ASUS - 7300GT Video Card
thermaltake 500W PS
2 - 1Gb Corsair Ram



7 Hard Disk Drives
SATA
2 used for nvRaid 1 (OS)
4 used a independent Disks

IDE
1 used as independent disk

2 network cards
1 public 100mb link
1 private 1000Mb link

os : centos 5.1

Linux belgarion 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

Debating on reinstalling as 32bit, to see if architecture is cause. thoughts?


UPDATE:
I saw memory errors during memtest86 and after pulling the memory I realized that i mixed up memory (still corsair XMS2 but with different timings)
I moved it from an XP system but never checked. My stupidity never ceases to amaze me. I reseated the memory about 3 times.

reran memtest for 90 minutes and 2 passes no errors.
large data moves showed no issues.

damiendusha 03-27-2008 06:21 PM

I never properly solved it - I just bit the bullet and got a new motherboard :(

Works a treat now, and I can have my weekends back :)


All times are GMT -5. The time now is 01:58 AM.