Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
12-10-2010, 11:06 AM
|
#1
|
Member
Registered: Oct 2005
Posts: 542
Rep:
|
top output
Not sure how this is done. Given the following "top" output...
1) Is it correct to say CPU0 is occupied 8.3% of the time, and CPU1 is occupied 72.4% of the time?
2) How to make out of the 99.9% CPU of PID 25100? 72.4% + 8.3% is only 80.7%, which is less than 99.9%. And I haven't even added the other processes CPU consumption percentage.
3) All of the processes should be evenly distributed between the two CPUs. Any reason why one CPU is so heavily loaded and the other one is not?
Anyway, after I restarted Apache, the crazy run of this 99.9% thread is gone.
top - 09:09:29 up 121 days, 12:04, 1 user, load average: 0.19, 0.77, 0.93
Tasks: 122 total, 2 running, 120 sleeping, 0 stopped, 0 zombie
Cpu0 : 8.3%us, 0.7%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.7%hi, 1.3%si, 0.0%st
Cpu1 : 72.4%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059516k total, 1782924k used, 276592k free, 152132k buffers
Swap: 4095992k total, 80k used, 4095912k free, 944040k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28405 daemon 15 0 108m 11m 2848 S 99.9 0.6 0:00.45 httpd
28359 daemon 15 0 112m 14m 2960 S 7.0 0.7 0:00.89 httpd
28356 daemon 16 0 114m 16m 2872 S 5.3 0.8 0:01.12 httpd
28347 daemon 15 0 109m 12m 2952 S 4.3 0.6 0:00.94 httpd
28407 daemon 15 0 105m 8652 2776 S 1.0 0.4 0:00.24 httpd
Last edited by grob115; 12-10-2010 at 11:10 AM.
|
|
|
12-10-2010, 03:21 PM
|
#2
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,384
|
There are lies, damn lies ... and statistics. Good luck with that last one.
All the CPU usage numbers (not just top) are based on sampling. That's problem number one. Top uses numbers from "files" in /proc, which are read sequentially - problem number two; they are not all done at the same time.
And it uses different numbers for the summary at the top.
So, given all that:
- you are better off using %us+%sys+%ni to get a representative number
- the process numbers are not normalised, that is a percentage of one CPU/core - it's possible to see several hundred here
- processes (threads) tend to get re-dispatched on the same CPU/core (the Linux scheduler does this deliberately), so an errant process can drive one CPU/core to the limit quite easily.
Your numbers for CPU1 are way out of whack (all the fields in the summary line should add to around 100) - makes any analysis effectively meaningless.
<Edit:> Even CPU0 is outside what I'd call reasonable numbers. Something else going on here </Edit:>
Last edited by syg00; 12-10-2010 at 03:54 PM.
|
|
|
12-11-2010, 12:12 AM
|
#3
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
There's no entitlement to be called a Guru unless you are indeed one. You are most probably correct in that the following two lines are out of whack.
Code:
Cpu0 : 8.3%us, 0.7%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.7%hi, 1.3%si, 0.0%st
Cpu1 : 72.4%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
I actually typed the message, restarted the Apache service, before I copied the output. So I basically copied the "top" output and changed the percentage of us time just to make the point.
The explanation is insightful. Questions:
1) Why would the scheduler deliberately assigns processes to one CPU? Wouldn't doing so cause more interrupts for higher priority processes, and more queued time for lower priority processes?
2) Each process line, below the summary section, for the "top" output can have the CPU % go above 100%? I did see it went up to 100.2% but thought it was a rounding error of some sort. Can you explain how this happens?
Thanks!
|
|
|
12-11-2010, 01:13 AM
|
#4
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,384
|
So the "Something else going on here" I suspected was actually you fudging the numbers and not telling us ?.
You'll not find many willing to help if you continue that - analysis of the paltry metrics within Linux is difficult enough without being lied to.
1) You may have mis-interpreted my (somewhat misleading  ) comment - the scheduler attempts to re-dispatch each process on the processor it was last on. Primarily for cache/TLB performance. I wasn't implying every process would re-dispatch on the one (same) processor.
2) You are correct about the rounding error case. Depending on (top) options, multi-threaded apps may display as a single line - all the CPU consumption is attributed to the [grand]father task, and not normalised.
|
|
|
12-11-2010, 09:18 AM
|
#5
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
Apologies on the numbers. I wasn't trying to lie but the situation was gone and I tried the best to make the top output to look like the way it was. Sorry didn't know it actually would cause inconsistencies.
Anyway, for the case when the individual process CPU % being listed at 100%. What exactly does this mean then? If both CPUs are not maxed out with 100%us, then how come a process can be 100%?
The same thing just happened again... here it is the actual top output captured. This time it is the actual output. Is there a way to make the scheduler to not keep assigning the same PID against the same CPU? Not sure why but one of the Apache's PID consistently is up at 100%.
top - 08:11:06 up 122 days, 11:05, 1 user, load average: 1.05, 1.04, 0.97
Tasks: 135 total, 3 running, 132 sleeping, 0 stopped, 0 zombie
Cpu0 : 5.7%us, 0.7%sy, 0.0%ni, 90.7%id, 1.7%wa, 0.7%hi, 0.7%si, 0.0%st
Cpu1 : 73.4%us, 26.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059516k total, 1906896k used, 152620k free, 154016k buffers
Swap: 4095992k total, 80k used, 4095912k free, 1005564k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4188 daemon 25 0 115m 18m 2948 R 100.2 0.9 43:25.32 httpd
6602 daemon 15 0 109m 12m 2856 S 2.7 0.6 0:00.17 httpd
6678 daemon 15 0 105m 8416 2600 S 1.3 0.4 0:00.05 httpd
Here're all the processes for httpd. It looks like PID 4188 is the only one that is working, taking up 96% of CPU with a State = Running and the rest 0% with a State = Sleeping.
Code:
[root@production ~]# ps -ef | grep http
UID PID PPID C STIME TTY TIME CMD
daemon 4188 28338 96 07:26 ? 00:45:36 /usr/local/apache2/bin/httpd
daemon 6715 28338 0 08:10 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6759 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6760 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6770 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6773 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6794 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6795 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6796 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6814 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6816 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6817 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6819 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6820 28338 0 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6821 28338 1 08:12 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6824 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6825 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6827 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6829 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6830 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6831 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6833 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6834 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6835 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6837 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6838 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6839 28338 1 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 6840 28338 0 08:13 ? 00:00:00 /usr/local/apache2/bin/httpd
root 28338 1 0 Dec10 ? 00:00:03 /usr/local/apache2/bin/httpd
Code:
[root@production ~]# ps aux | grep http
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
daemon 4188 97.1 0.8 118548 18484 ? R 07:26 49:45 /usr/local/apache2/bin/httpd
daemon 6922 0.3 0.6 112804 13036 ? S 08:14 0:00 /usr/local/apache2/bin/httpd
daemon 7034 0.1 0.5 111840 11704 ? S 08:15 0:00 /usr/local/apache2/bin/httpd
daemon 7055 0.2 0.4 109892 10124 ? S 08:15 0:00 /usr/local/apache2/bin/httpd
daemon 7064 0.0 0.4 108520 8668 ? S 08:15 0:00 /usr/local/apache2/bin/httpd
daemon 7069 0.0 0.2 106044 4168 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7077 0.3 0.6 113168 13380 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7078 0.1 0.4 109652 9596 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7106 0.2 0.7 115492 15384 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7107 0.1 0.4 108536 8728 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7109 0.3 0.4 108540 8748 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7110 0.1 0.4 109636 9592 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7112 0.5 0.6 112652 12872 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7136 0.0 0.4 108832 8492 ? S 08:16 0:00 /usr/local/apache2/bin/httpd
daemon 7146 0.0 0.2 106044 4180 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
daemon 7148 0.0 0.2 106044 4152 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
daemon 7165 1.1 0.4 108520 8692 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
daemon 7167 0.7 0.3 108104 8228 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
daemon 7168 0.0 0.2 106044 4156 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
daemon 7169 0.0 0.2 106044 4156 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
daemon 7170 0.0 0.2 106044 4156 ? S 08:17 0:00 /usr/local/apache2/bin/httpd
root 7173 0.0 0.0 61160 716 pts/2 R+ 08:17 0:00 grep http
root 28338 0.0 0.2 106044 5252 ? Ss Dec10 0:03 /usr/local/apache2/bin/httpd
A few minutes later.... Note PID 4188 is still there but the rest of the PIDs have incremented. Why would the PIDs change if there START times haven't?
Code:
[root@production ~]# ps aux | grep http
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
daemon 4188 97.6 0.8 118548 18484 ? R 07:26 60:45 /usr/local/apache2/bin/httpd
daemon 7570 0.4 0.7 114724 15008 ? S 08:24 0:01 /usr/local/apache2/bin/httpd
daemon 7577 0.1 0.7 116496 16344 ? S 08:24 0:00 /usr/local/apache2/bin/httpd
daemon 7648 0.4 0.6 112752 13052 ? S 08:25 0:00 /usr/local/apache2/bin/httpd
daemon 7673 0.2 0.8 117260 17040 ? S 08:26 0:00 /usr/local/apache2/bin/httpd
daemon 7747 0.1 0.6 113392 13580 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7770 0.1 0.4 109808 9768 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7772 0.4 0.9 120484 20424 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7776 0.5 0.6 113932 14276 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7780 0.3 0.6 113420 13680 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7784 0.4 0.6 112968 13164 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7785 0.4 0.6 112244 12508 ? S 08:27 0:00 /usr/local/apache2/bin/httpd
daemon 7845 0.5 0.4 109652 9580 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7869 1.5 0.7 114464 14848 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7871 0.4 0.4 108612 8772 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7872 0.0 0.2 106044 4172 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7873 1.1 0.4 108552 8740 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7875 0.5 0.4 109656 9528 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7876 0.1 0.4 108484 8372 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7880 0.0 0.2 106044 4172 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7881 1.6 0.4 109636 9592 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7882 0.8 0.6 112816 13036 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7883 0.0 0.2 106044 4176 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7884 0.1 0.4 108500 8388 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7885 0.6 0.5 110316 10484 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
daemon 7886 0.0 0.4 108832 8468 ? S 08:28 0:00 /usr/local/apache2/bin/httpd
root 7905 0.0 0.0 61160 716 pts/2 R+ 08:28 0:00 grep http
root 28338 0.0 0.2 106044 5252 ? Ss Dec10 0:03 /usr/local/apache2/bin/httpd
Last edited by grob115; 12-11-2010 at 10:30 AM.
|
|
|
12-12-2010, 03:01 AM
|
#6
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,384
|
Quote:
Originally Posted by grob115
Anyway, for the case when the individual process CPU % being listed at 100%. What exactly does this mean then? If both CPUs are not maxed out with 100%us, then how come a process can be 100%?
|
See problems #1 and #2 above. The task is not (necessarily) using 100% CPU - it is merely using the/a CPU 100% of the times it was sampled.
Entirely different thing.
Quote:
This time it is the actual output. Is there a way to make the scheduler to not keep assigning the same PID against the same CPU?
|
Why would you want to do that ?. The scheduler only dispatches runnable work. If apache determines that only one of its threads is dispatchable (bug or design), then that's what gets added to the run queue. Forcing it onto another CPU changes nothing - certainly won't cause other processes to magically become runnable.
Quote:
Here're all the processes for httpd. It looks like PID 4188 is the only one that is working, taking up 96% of CPU with a State = Running and the rest 0% with a State = Sleeping.
|
see above.
Quote:
Why would the PIDs change if there START times haven't?
|
The start times have changed.
|
|
|
12-12-2010, 10:33 AM
|
#7
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
Quote:
The task is not (necessarily) using 100% CPU - it is merely using the/a CPU 100% of the times it was sampled.
|
Would it be correct if I state the following:
1) The %us figure for CPU0 and CPU1 indicates the percentage of time each of the CPUs are busy based on previous sampling. For example, if there were 76 samples within the last 100 samples when CPU1 was busy, then "top" would display 76%us for CPU1.
2) The process' %CPU indicates the percentage of time each of these processes were busy based on previous sampling. For example, if there were 96 samples within the last 100 samples when PID 4188 was in State = Running, then "top" would show PID 4188 with 96% for %CPU.
Quote:
Why would you want to do that ?. The scheduler only dispatches runnable work. If apache determines that only one of its threads is dispatchable (bug or design), then that's what gets added to the run queue. Forcing it onto another CPU changes nothing - certainly won't cause other processes to magically become runnable.
|
I was thinking that if CPU1 is busy so often, any processes that is being dispatched to CPU1 would need to wait rather than getting executed immediately. It's kind of like there are 2 cashier at the supermarket. Cashier 1 is busy 100% of the time doing other stuff (maybe counting money), then a customer would get faster service if s/he goes to Cashier 0 who is not busy at all.
How can you check whether a thread is dispatchable? Is there a command or /proc file to check for this?
Quote:
The start times have changed.
|
Sorry my bad. I can see that now.
Based on Apache's explanation here, PID 28338 is the parent process and it should be the one forking the child processes and recycling them after an x number of pages have been served. This explains why the child process' PIDs and Start Time are changing. However, this doesn't explain what is happening with PID 4188. Any ideas?
Last edited by grob115; 12-12-2010 at 10:39 AM.
|
|
|
12-12-2010, 03:41 PM
|
#8
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,384
|
You would need to look at the code to be sure, but that CPU% analysis seems reasonable.
<Edit:> Hmmm - I have a vague recollection I did look at this a while back. Maybe top uses the timer tick count to determine CPU% busy. Will be (much) more accurate than just a sample count, but still subject to some rounding errors. Maybe I'll look it up again someday.</Edit:>
Note I said "tends to be dispatched". There are heuristics that balance out starvation. Movement across CPUs can (and does) happen if fair share is disrupted. For state, look at /proc/<pid>/stat - note this is likely to be extremely volatile. Only valid at the instant you looked.
As for what may happening with that process, who knows. May be working as designed; or may be a bug - apache, user, web page ...
Last edited by syg00; 12-12-2010 at 06:06 PM.
|
|
|
12-13-2010, 08:18 AM
|
#9
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
Definitely a bug. Take a look at the following just captured now.
See the extreme time of 2806:08 for PID 4188 while the rest are hardly over 1 min.
Code:
top - 06:16:14 up 124 days, 9:10, 1 user, load average: 1.03, 1.20, 1.18
Tasks: 128 total, 3 running, 125 sleeping, 0 stopped, 0 zombie
Cpu0 : 12.3%us, 1.3%sy, 0.0%ni, 85.3%id, 0.0%wa, 0.3%hi, 0.7%si, 0.0%st
Cpu1 : 73.8%us, 26.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059516k total, 1923432k used, 136084k free, 157940k buffers
Swap: 4095992k total, 80k used, 4095912k free, 986552k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4188 daemon 25 0 115m 18m 2948 R 99.9 0.9 2806:08 httpd
645 daemon 15 0 108m 10m 2852 S 4.7 0.5 0:00.15 httpd
550 daemon 15 0 109m 12m 2868 S 2.7 0.6 0:00.56 httpd
632 daemon 15 0 108m 11m 2836 S 2.7 0.6 0:00.15 httpd
633 daemon 15 0 109m 11m 2832 S 2.0 0.6 0:00.19 httpd
647 daemon 16 0 106m 8860 2740 S 1.7 0.4 0:00.05 httpd
476 daemon 16 0 108m 11m 2932 S 0.7 0.6 0:01.63 httpd
|
|
|
12-14-2010, 12:28 AM
|
#10
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.x
Posts: 18,434
|
You could also get that effect if someone was auto hammering your site with just connects (basic request), but not asking for anything to be done; the main thread (Apache ctrl dispatcher) will get hammered, but there's (almost) nothing for the worker thrs to do.
|
|
|
12-14-2010, 10:17 AM
|
#11
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
Thanks though that's not the case. The dispatcher is executed under user "root" with PID 28338. PID 4188 is one of the worker processes.
|
|
|
12-14-2010, 07:17 PM
|
#12
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.x
Posts: 18,434
|
In that case, can you tell from eg Apache access_log or error_log what that process is trying to do? Is there anything on your website that would require an Apache child to remain open that long eg status update page of some sort?
Try
ps -ef|grep 4188
to see if its calling something else.
Do you need to adjust one of the timeout settings http://stackoverflow.com/questions/7...-configuration.
What happens if you just kill 4188; does the problem go away or does another process start to exhibit the same symptoms?
|
|
|
12-15-2010, 09:28 AM
|
#13
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
The two logs don't have enough detail. Technically I suppose I can turn on more logging but I try to avoid that unless it's necessary. I forgot I should do the "ps -ef | grep 4188" part. I'll try to do that next time. Definitely nothing should be running that long.
|
|
|
12-19-2010, 09:22 AM
|
#14
|
Member
Registered: Oct 2005
Posts: 542
Original Poster
Rep:
|
Hi, the situation just happened again today. Here's the upper portion of the top output.
The same characteristics, CPU1 is busy and CPU0 is not. One of the httpd processes (PID 5175 this time) has been running for excessively long relative to the other ones, and is busy 100.2% of the time.
Code:
top - 07:17:29 up 130 days, 10:12, 1 user, load average: 1.16, 1.06, 1.06
Tasks: 125 total, 3 running, 122 sleeping, 0 stopped, 0 zombie
Cpu0 : 5.3%us, 0.3%sy, 0.0%ni, 93.7%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st
Cpu1 : 72.3%us, 27.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059516k total, 1917784k used, 141732k free, 157504k buffers
Swap: 4095992k total, 80k used, 4095912k free, 984164k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5175 daemon 25 0 115m 18m 3012 R 100.2 0.9 386:01.47 httpd
24602 daemon 15 0 112m 15m 2880 S 3.3 0.8 0:00.10 httpd
24425 daemon 15 0 113m 16m 2972 S 2.0 0.8 0:00.88 httpd
24597 daemon 15 0 108m 10m 2820 S 0.7 0.5 0:00.08 httpd
As suggested, I've tried to grab the "ps -ef" output to see what PID 5175 may be running but found nothing.
Code:
[root@production ~]# ps -ef | grep 5175
daemon 5175 28338 99 00:49 ? 06:27:38 /usr/local/apache2/bin/httpd
root 24712 24604 0 07:19 pts/2 00:00:00 grep 5175
[root@production ~]#
This again proves that PID 5175 isn't the root process, which is PID 28338.
Code:
[root@production ~]# ps -ef | grep httpd
daemon 5175 28338 99 00:49 ? 06:28:50 /usr/local/apache2/bin/httpd
daemon 24566 28338 0 07:16 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24600 28338 0 07:17 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24603 28338 0 07:17 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24702 28338 0 07:18 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24703 28338 0 07:19 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24707 28338 0 07:19 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24709 28338 0 07:19 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24735 28338 0 07:19 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24740 28338 0 07:19 ? 00:00:00 /usr/local/apache2/bin/httpd
daemon 24764 28338 0 07:19 ? 00:00:00 /usr/local/apache2/bin/httpd
root 24782 24604 0 07:20 pts/2 00:00:00 grep httpd
root 28338 1 0 Dec10 ? 00:00:32 /usr/local/apache2/bin/httpd
[root@production ~]#
Any way I can tell what's happening?
|
|
|
12-19-2010, 09:44 AM
|
#15
|
Senior Member
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873
|
You could always use tcpdump or wireshark to watch the network traffic, or you could increase the log level on Apache. Of the two watching network traffic would have the least impact on performance.
Reading all of the previous posts with their observations I wondered if your server was being hit with http requests for files that don't exist. Years ago I read about this technique as an attempt to create a DDOS attack. It would require this type of request from many machines to actually overwhelm a server.
Certainly increasing the Apache log level would show this. It would also show if the work was coming from the same remote machine. You could also determine if this was the case by watching network traffic and reading the data portion of the packets.
Last edited by stress_junkie; 12-19-2010 at 09:46 AM.
|
|
|
All times are GMT -5. The time now is 10:11 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|