Server load gets really high...
So I've done some reading about how to understand the stats that the top command gives you and I am fairly confident that my problem is an I/O problem. As the wa value when my server load goes through the roof is generally in the 90%+ range.
So then I used the vmstats and ifconfig to see if it was a disk problem and/or a network problem, but I'm not sure what is considered "High values" when I am looking at this data. vmstats Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ifconfig Code:
eth0 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04 |
Well this didn't take very long.
top Code:
top - 15:16:55 up 27 days, 13:08, 2 users, load average: 24.93, 16.97, 9.20 vmstat Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ ifconfig Code:
eth0 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04 I also just realized that yum is doing updates. Could that cause a large server load? I'd like to turn that off if so. |
At the time of the second measurement the load average was 24.93 but no application apparently maxing out RAM or CPU, but with 1GB swap being used and a 97.7% wait state you have to search for the bottleneck in a different way. Rebooting the machine returns the system to a "known good" state, and then running 'atop', storing data continuously and over a longer period, could help to trace back peaks and narrow down to processes more easily. (Also see 'dstat', 'collectl', 'atsar', SAR.) It would also be interesting to know more HW and SW (services mainly) specs, any anomalies in system or daemon logs and if this behaviour started at some point (SW installation? updates?, configuration changes?).
|
See those status "D" tasks ? - they are all counted in loadavg.
And they are probably all waiting on disk I/O. Looks like you have a under/badly configured disk farm. Either get some more devices or manage the things that are going to exacerbate the situation. Don't run a yum update against updatedb say ... |
Well I attempted to reboot the server, but it's having a difficult time coming back on. When it did finally come back on, it took forever for me to login. Once I did login, the server load was already at 0.54, 2.21, 1.35 so something is defiantly wrong here. Then the server suddenly went down again for a reboot (I'm thinking it did this because after a few minutes of the server not coming back on, I went to my Data center's control panel and initiated a reboot from it, so I think it was just delaying the message) so now I am waiting on it to come back online again.
|
Server came back online and the server load is still high.
Code:
top - 00:46:18 up 9 min, 2 users, load average: 2.49, 3.34, 1.59 Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ Code:
eth0 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04 I do have more than one disk on the server, a 500GB primary and 250GB secondary. |
My hardware is:
Intel Core2Duo E6750 DC 1GB DDR2 667 250GB SATA HDD 500GB SATA HDD My software is: CENTOS 5.3 cPanel 11.24.5-R38506 - WHM 11.24.2 - X 3.9 Along with those.. I also have two Unreal Tournament 10 person servers hosted on the server (hardly ever have any players) and a TeamSpeak 3 server (hasn't seen activity at all this month) |
Try this from a terminal and post the (full) output
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' |
Code:
root@server2 [~]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' |
Run it sometime the numbers - particularly the short one - are upp-ish.
|
Quote:
|
Yep - merely (circumstantial) evidence; but might help.
|
Something I have just noticed. When I log into the server, though SSH/Putty it takes FOREVER. Like the "Login as:" text pops up instantly, I enter my username, then the password prompt appears immediately then when I enter my password it takes a really, really long time before it goes though. Like at least a minute to a minute and a half.
Usually it logs in faster than I can type. |
Quote:
- Are the two UT servers and the TS3 server the only publicly accessible services running? If not, what other services mainly run? - Is cPanel (and maybe related paths on the server like /phpmyadmin?) only accessible from your management IP or IP range? - Do the system or daemon logs show any "odd" lines involving 'links', 'wget' or any network tools? - Are there by any chance oddly named files in your /tmp, /var/tmp or Apache docroot? - Did this load problem start right from using the server or at some point? If the latter, can you trace back what happened at that point in terms of HW changes, SW installation or updates, reconfiguration?. |
Quote:
No, the other service is a FTP server. The one that runs for cPanel, it also has a "public login" that is posted on one of my sites for people to upload specific files to. I monitor it daily, with logs that are emailed to me the people who login to it and what they do. Doesn't really get that much traffic. Those things are only accessible through cpanel. You have to login to get to them. What logs can I look at for those messages, because I use wget often to copy things to my server that are otherwise too large for me to try to download then FTP. Files in my /tmp: Buch of files that look similar to this; sess_381b2d464edc56d83b9026b9fa50d0dc then .ICE-unix/ lost+found/ mysql.sock@ spamd-9952-init/ Looks like the same files in /var/tmp Not sure where the apache doc root is? No, the problem seems to happen every once in a while though it has seemed to become a bit more frequent. When I first got the server, I never noticed it. Then sometimes I'd notice the server load get really high, but then it would go away. I always assumed it was the Unreal Tournament servers (I had 5 running at one point plus a BF2 Demo server) but when I shut them down, the load didn't go away. I am really, really thinking it might have something to do with Apache though. Not sure if it's a coincidence or not, but it seems that when the load is high and I shut down the httpd service the load goes back down. This doesn't explain why the server load is really high upon boot though. |
All times are GMT -5. The time now is 06:24 AM. |