Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
03-11-2010, 02:18 PM
|
#1
|
Member
Registered: Sep 2007
Posts: 252
Rep:
|
Server load gets really high...
So I've done some reading about how to understand the stats that the top command gives you and I am fairly confident that my problem is an I/O problem. As the wa value when my server load goes through the roof is generally in the 90%+ range.
So then I used the vmstats and ifconfig to see if it was a disk problem and/or a network problem, but I'm not sure what is considered "High values" when I am looking at this data.
vmstats
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 1034092 20608 4536 94468 5 3 214 53 8 7 5 1 92 3 0
I am pretty sure the bi and bo values are the values I need to be interested in. Granted this print isn't during the high server load, but so I am going to use this as a base now but what would be considered high? If it was twice as high as this, is that a problem?
ifconfig
Code:
eth0 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04
inet addr:64.34.170.212 Bcast:64.34.170.255 Mask:255.255.255.192
inet6 addr: fe80::230:48ff:feb8:e504/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:516255425 errors:2 dropped:18 overruns:0 frame:2
TX packets:802790881 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:2195224972 (2.0 GiB) TX bytes:2843031510 (2.6 GiB)
Memory:d0200000-d0220000
eth0:1 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04
inet addr:64.34.214.184 Bcast:64.34.214.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Memory:d0200000-d0220000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:3467295 errors:0 dropped:0 overruns:0 frame:0
TX packets:3467295 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1000808432 (954.4 MiB) TX bytes:1000808432 (954.4 MiB)
Now this is a little more complicated, but I think I am searching for the RX packets and TX packets which are currently 516255425 and 802790881 respectively. Now just looking at those numbers, one would assume that they are extremely high. However, my server load at the time of this print was only around .70 w/ wa of 20%.
|
|
|
03-11-2010, 03:21 PM
|
#2
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Well this didn't take very long.
top
Code:
top - 15:16:55 up 27 days, 13:08, 2 users, load average: 24.93, 16.97, 9.20
Tasks: 195 total, 1 running, 189 sleeping, 0 stopped, 5 zombie
Cpu(s): 1.2%us, 0.5%sy, 0.0%ni, 0.0%id, 97.7%wa, 0.2%hi, 0.5%si, 0.0%st
Mem: 1033652k total, 1021336k used, 12316k free, 5528k buffers
Swap: 2096472k total, 1160388k used, 936084k free, 90732k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28171 nobody 15 0 25388 8136 2076 S 1.0 0.8 0:00.17 httpd
2528 mysql 15 0 164m 16m 2668 S 0.7 1.7 200:34.34 mysqld
26191 nobody 16 0 31876 7544 1528 D 0.3 0.7 0:07.69 spamd
28166 root 34 19 23364 10m 4616 D 0.3 1.1 0:00.17 yum-updatesd-he
28260 nobody 16 0 25204 7960 2168 D 0.3 0.8 0:00.07 httpd
28265 nobody 15 0 24316 6940 2168 S 0.3 0.7 0:00.08 httpd
1 root 15 0 2064 348 316 S 0.0 0.0 0:01.79 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.73 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.62 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:04.93 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:04.31 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0.0 0.0 0:00.01 events/0
9 root 10 -5 0 0 0 S 0.0 0.0 0:00.04 events/1
10 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
11 root 17 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
15 root 10 -5 0 0 0 S 0.0 0.0 0:08.16 kblockd/0
16 root 10 -5 0 0 0 S 0.0 0.0 0:01.03 kblockd/1
17 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
137 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
138 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/1
141 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
143 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
213 root 10 -5 0 0 0 D 0.0 0.0 5:29.13 kswapd0
214 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
215 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/1
374 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
403 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
404 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 ata/1
405 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
409 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
410 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
411 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2
412 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3
419 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 kstriped
432 root 10 -5 0 0 0 D 0.0 0.0 3:00.81 kjournald
458 root 10 -5 0 0 0 S 0.0 0.0 0:00.09 kauditd
490 root 14 -4 2252 252 248 S 0.0 0.0 0:00.05 udevd
1236 root 19 0 7428 740 628 S 0.0 0.1 0:00.62 authProg
vmstat
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 20 1159932 14208 5284 96364 5 3 214 54 0 8 5 1 91 3 0
ifconfig
Code:
eth0 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04
inet addr:64.34.170.212 Bcast:64.34.170.255 Mask:255.255.255.192
inet6 addr: fe80::230:48ff:feb8:e504/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:517130158 errors:2 dropped:18 overruns:0 frame:2
TX packets:804363987 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:2256818114 (2.1 GiB) TX bytes:859854741 (820.0 MiB)
Memory:d0200000-d0220000
eth0:1 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04
inet addr:64.34.214.184 Bcast:64.34.214.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Memory:d0200000-d0220000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:3471387 errors:0 dropped:0 overruns:0 frame:0
TX packets:3471387 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1001311451 (954.9 MiB) TX bytes:1001311451 (954.9 MiB)
I don't see a problem, can someone point it out to me?
I also just realized that yum is doing updates. Could that cause a large server load? I'd like to turn that off if so.
Last edited by Skillz; 03-11-2010 at 03:24 PM.
|
|
|
03-11-2010, 04:33 PM
|
#3
|
Moderator
Registered: May 2001
Posts: 29,415
|
At the time of the second measurement the load average was 24.93 but no application apparently maxing out RAM or CPU, but with 1GB swap being used and a 97.7% wait state you have to search for the bottleneck in a different way. Rebooting the machine returns the system to a "known good" state, and then running 'atop', storing data continuously and over a longer period, could help to trace back peaks and narrow down to processes more easily. (Also see 'dstat', 'collectl', 'atsar', SAR.) It would also be interesting to know more HW and SW (services mainly) specs, any anomalies in system or daemon logs and if this behaviour started at some point (SW installation? updates?, configuration changes?).
|
|
|
03-11-2010, 05:16 PM
|
#4
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,365
|
See those status "D" tasks ? - they are all counted in loadavg.
And they are probably all waiting on disk I/O. Looks like you have a under/badly configured disk farm. Either get some more devices or manage the things that are going to exacerbate the situation. Don't run a yum update against updatedb say ...
|
|
|
03-12-2010, 12:38 AM
|
#5
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Well I attempted to reboot the server, but it's having a difficult time coming back on. When it did finally come back on, it took forever for me to login. Once I did login, the server load was already at 0.54, 2.21, 1.35 so something is defiantly wrong here. Then the server suddenly went down again for a reboot (I'm thinking it did this because after a few minutes of the server not coming back on, I went to my Data center's control panel and initiated a reboot from it, so I think it was just delaying the message) so now I am waiting on it to come back online again.
|
|
|
03-12-2010, 12:49 AM
|
#6
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Server came back online and the server load is still high.
Code:
top - 00:46:18 up 9 min, 2 users, load average: 2.49, 3.34, 1.59
Tasks: 145 total, 1 running, 139 sleeping, 0 stopped, 5 zombie
Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 1033652k total, 842604k used, 191048k free, 27480k buffers
Swap: 2096472k total, 0k used, 2096472k free, 477972k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4440 root 15 0 2196 1060 808 R 0.3 0.1 0:00.01 top
1 root 15 0 2064 640 548 S 0.0 0.1 0:00.41 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
9 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/1
10 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
11 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
15 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 kblockd/0
16 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/1
17 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
137 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
138 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/1
141 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
143 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
211 root 18 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
212 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
213 root 10 -5 0 0 0 S 0.0 0.0 0:01.02 kswapd0
214 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
215 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 aio/1
373 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
403 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
404 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 ata/1
405 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
409 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
410 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
411 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2
412 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3
419 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kstriped
432 root 10 -5 0 0 0 S 0.0 0.0 0:00.16 kjournald
458 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kauditd
490 root 18 -4 2252 672 404 S 0.0 0.1 0:00.05 udevd
1483 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/0
1484 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/1
1485 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kmpath_handlerd
1583 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kjournald
1795 root 0 -20 0 0 0 S 0.0 0.0 0:00.01 loop0
1796 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kjournald
2073 root 15 -4 12516 768 576 S 0.0 0.1 0:00.00 auditd
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 191692 27548 477972 0 0 4054 63 692 376 8 2 50 40 0
Code:
eth0 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04
inet addr:64.34.170.212 Bcast:64.34.170.255 Mask:255.255.255.192
inet6 addr: fe80::230:48ff:feb8:e504/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7945 errors:0 dropped:3867 overruns:0 frame:0
TX packets:7360 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:4485177 (4.2 MiB) TX bytes:4272684 (4.0 MiB)
Memory:d0200000-d0220000
eth0:1 Link encap:Ethernet HWaddr 00:30:48:B8:E5:04
inet addr:64.34.214.184 Bcast:64.34.214.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Memory:d0200000-d0220000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:216 errors:0 dropped:0 overruns:0 frame:0
TX packets:216 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:35596 (34.7 KiB) TX bytes:35596 (34.7 KiB)
Maybe I don't understand, but the commands atop, dstat, collectl, atsar did not work.
I do have more than one disk on the server, a 500GB primary and 250GB secondary.
|
|
|
03-12-2010, 01:10 AM
|
#7
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
My hardware is:
Intel Core2Duo E6750 DC
1GB DDR2 667
250GB SATA HDD
500GB SATA HDD
My software is:
CENTOS 5.3
cPanel 11.24.5-R38506 - WHM 11.24.2 - X 3.9
Along with those.. I also have two Unreal Tournament 10 person servers hosted on the server (hardly ever have any players) and a TeamSpeak 3 server (hasn't seen activity at all this month)
|
|
|
03-12-2010, 01:14 AM
|
#8
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,365
|
Try this from a terminal and post the (full) output
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
|
|
|
03-12-2010, 01:17 AM
|
#9
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Code:
root@server2 [~]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
top - 01:16:03 up 39 min, 2 users, load average: 0.11, 0.15, 0.35
Tasks: 161 total, 1 running, 155 sleeping, 0 stopped, 5 zombie
Cpu(s): 3.0%us, 0.6%sy, 0.0%ni, 84.7%id, 11.5%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 1033652k total, 990812k used, 42840k free, 19832k buffers
Swap: 2096472k total, 0k used, 2096472k free, 571940k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Total status D:
Will make a note that the server load right now is only 0.19, 0.17, 0.34
|
|
|
03-12-2010, 01:20 AM
|
#10
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,365
|
Run it sometime the numbers - particularly the short one - are upp-ish.
|
|
|
03-12-2010, 01:22 AM
|
#11
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Quote:
Originally Posted by syg00
Run it sometime the numbers - particularly the short one - are upp-ish.
|
I will do this, I'm thinking this script is just looking to see how many processes are in the "D" state, since you mentioned that statuses in "D" state are all totaled in with the system load. Right?
|
|
|
03-12-2010, 01:27 AM
|
#12
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,365
|
Yep - merely (circumstantial) evidence; but might help.
|
|
|
03-12-2010, 02:14 AM
|
#13
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Something I have just noticed. When I log into the server, though SSH/Putty it takes FOREVER. Like the "Login as:" text pops up instantly, I enter my username, then the password prompt appears immediately then when I enter my password it takes a really, really long time before it goes though. Like at least a minute to a minute and a half.
Usually it logs in faster than I can type.
|
|
|
03-12-2010, 11:06 AM
|
#14
|
Moderator
Registered: May 2001
Posts: 29,415
|
Quote:
Originally Posted by Skillz
the commands atop, dstat, collectl, atsar did not work.
|
That's because you have to install them before you can use them. They should be in the default Centos repo or else RPMForge or EPEL.
- Are the two UT servers and the TS3 server the only publicly accessible services running? If not, what other services mainly run?
- Is cPanel (and maybe related paths on the server like /phpmyadmin?) only accessible from your management IP or IP range?
- Do the system or daemon logs show any "odd" lines involving 'links', 'wget' or any network tools?
- Are there by any chance oddly named files in your /tmp, /var/tmp or Apache docroot?
- Did this load problem start right from using the server or at some point? If the latter, can you trace back what happened at that point in terms of HW changes, SW installation or updates, reconfiguration?.
|
|
|
03-13-2010, 01:08 AM
|
#15
|
Member
Registered: Sep 2007
Posts: 252
Original Poster
Rep:
|
Quote:
Originally Posted by unSpawn
That's because you have to install them before you can use them. They should be in the default Centos repo or else RPMForge or EPEL.
- Are the two UT servers and the TS3 server the only publicly accessible services running? If not, what other services mainly run?
- Is cPanel (and maybe related paths on the server like /phpmyadmin?) only accessible from your management IP or IP range?
- Do the system or daemon logs show any "odd" lines involving 'links', 'wget' or any network tools?
- Are there by any chance oddly named files in your /tmp, /var/tmp or Apache docroot?
- Did this load problem start right from using the server or at some point? If the latter, can you trace back what happened at that point in terms of HW changes, SW installation or updates, reconfiguration?.
|
Yea, I realized that after I posted. I went Googling. Still not 100% sure on how to install them. I tried yum install atop but it didn't work.
No, the other service is a FTP server. The one that runs for cPanel, it also has a "public login" that is posted on one of my sites for people to upload specific files to. I monitor it daily, with logs that are emailed to me the people who login to it and what they do. Doesn't really get that much traffic.
Those things are only accessible through cpanel. You have to login to get to them.
What logs can I look at for those messages, because I use wget often to copy things to my server that are otherwise too large for me to try to download then FTP.
Files in my /tmp:
Buch of files that look similar to this; sess_381b2d464edc56d83b9026b9fa50d0dc then
.ICE-unix/
lost+found/
mysql.sock@
spamd-9952-init/
Looks like the same files in /var/tmp
Not sure where the apache doc root is?
No, the problem seems to happen every once in a while though it has seemed to become a bit more frequent. When I first got the server, I never noticed it. Then sometimes I'd notice the server load get really high, but then it would go away. I always assumed it was the Unreal Tournament servers (I had 5 running at one point plus a BF2 Demo server) but when I shut them down, the load didn't go away.
I am really, really thinking it might have something to do with Apache though. Not sure if it's a coincidence or not, but it seems that when the load is high and I shut down the httpd service the load goes back down. This doesn't explain why the server load is really high upon boot though.
|
|
|
All times are GMT -5. The time now is 06:21 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|