LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Server randomly hangs indefinitely (https://www.linuxquestions.org/questions/linux-general-1/server-randomly-hangs-indefinitely-624559/)

bdb4269 02-28-2008 02:34 PM

Server randomly hangs indefinitely
 
My servers are running RedHat ES 3.

A while back, (maybe a month or two) when I came in to work in the morning, our main server was not responding at all. Nothing over the network, or on the console. The power light was still on though.

Upon trying to restart it, it became apparent that the RAID controller card had gone bad.

I replaced the RAID controller card, and everything has pretty seemed to be working as normal.

Except there have now been 2 or 3 times since I replaced the card, that it has done that same thing. Completely not responding, not even to the console. Restarting it, seems to solve the problem though.

What seems even more strange to me, is the fact that this has always happened in the middle of the night, when nothing is really going on on the server. During the day, and evening, we have this server running up to 140 'dumb' terminals. As well as many people accessing it through PuTTY and SAMBA. But it has never gone down when all this is going on.


It's not a major problem (yet), since it has happened only a few times, and happened at night -- but I want to try to head this off before it does become a problem. I have tried looking at logs, etc, and haven't found anything revealing (at least not to me).

Does anyone have an suggestions, of what it might be/where I should start looking? Do you think that the new RAID card might also be going bad?

Any and all input is greatly appreciated!



(Let me know what in any extra information I should post)

rayfordj 02-28-2008 05:00 PM

it is possibly a hardware problem but I find it unlikely that two raid cards would fail in such a short timeframe.

It's possible that something running at night (via cron?) is causing an excessive amount of I/O [writes] that causes the system to run itself into a problem where it can not service the demand fast enough and ends up starving itself.

You may want to consider installing/using sysstat to collect system statistics and see if you can identify any spikes in activity during the timeframe(s) this problem happens.

sysstat drops a cron job as /etc/cron.d/sysstat that collects data every 10 minutes by default (may vary by version) and creates a text report nightly (around 4am) which may be found in /var/log/sa/ (the sar* files).

bdb4269 03-07-2008 01:24 PM

I hate being that person who posts a question and disappears......

I greatly appreciate the input and idea's, and I will be back to try things out, and let the community know what I find, etc. -- but things just got really busy at my work, and maintaining the servers is considered my "side" work (except for when it's something actually notably interfering with production). And this problem has been rare, and has never happened during a shift, so even though personally I would like to get to the bottom of this, I'll have to wait until I have some "free time" at work, to continue looking into this.


All times are GMT -5. The time now is 06:42 AM.