LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 12-10-2010, 11:06 AM   #1
grob115
Member
 
Registered: Oct 2005
Posts: 528

Rep: Reputation: 32
top output


Not sure how this is done. Given the following "top" output...
1) Is it correct to say CPU0 is occupied 8.3% of the time, and CPU1 is occupied 72.4% of the time?
2) How to make out of the 99.9% CPU of PID 25100? 72.4% + 8.3% is only 80.7%, which is less than 99.9%. And I haven't even added the other processes CPU consumption percentage.
3) All of the processes should be evenly distributed between the two CPUs. Any reason why one CPU is so heavily loaded and the other one is not?

Anyway, after I restarted Apache, the crazy run of this 99.9% thread is gone.


top - 09:09:29 up 121 days, 12:04, 1 user, load average: 0.19, 0.77, 0.93
Tasks: 122 total, 2 running, 120 sleeping, 0 stopped, 0 zombie
Cpu0 : 8.3%us, 0.7%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.7%hi, 1.3%si, 0.0%st
Cpu1 : 72.4%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059516k total, 1782924k used, 276592k free, 152132k buffers
Swap: 4095992k total, 80k used, 4095912k free, 944040k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28405 daemon 15 0 108m 11m 2848 S 99.9 0.6 0:00.45 httpd
28359 daemon 15 0 112m 14m 2960 S 7.0 0.7 0:00.89 httpd
28356 daemon 16 0 114m 16m 2872 S 5.3 0.8 0:01.12 httpd
28347 daemon 15 0 109m 12m 2952 S 4.3 0.6 0:00.94 httpd
28407 daemon 15 0 105m 8652 2776 S 1.0 0.4 0:00.24 httpd

Last edited by grob115; 12-10-2010 at 11:10 AM.
 
Old 12-10-2010, 03:21 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,273

Rep: Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028
There are lies, damn lies ... and statistics. Good luck with that last one.

All the CPU usage numbers (not just top) are based on sampling. That's problem number one. Top uses numbers from "files" in /proc, which are read sequentially - problem number two; they are not all done at the same time.
And it uses different numbers for the summary at the top.

So, given all that:
- you are better off using %us+%sys+%ni to get a representative number
- the process numbers are not normalised, that is a percentage of one CPU/core - it's possible to see several hundred here
- processes (threads) tend to get re-dispatched on the same CPU/core (the Linux scheduler does this deliberately), so an errant process can drive one CPU/core to the limit quite easily.

Your numbers for CPU1 are way out of whack (all the fields in the summary line should add to around 100) - makes any analysis effectively meaningless.
<Edit:> Even CPU0 is outside what I'd call reasonable numbers. Something else going on here </Edit:>

Last edited by syg00; 12-10-2010 at 03:54 PM.
 
Old 12-11-2010, 12:12 AM   #3
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
There's no entitlement to be called a Guru unless you are indeed one. You are most probably correct in that the following two lines are out of whack.
Code:
Cpu0 : 8.3%us, 0.7%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.7%hi, 1.3%si, 0.0%st
Cpu1 : 72.4%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
I actually typed the message, restarted the Apache service, before I copied the output. So I basically copied the "top" output and changed the percentage of us time just to make the point.

The explanation is insightful. Questions:
1) Why would the scheduler deliberately assigns processes to one CPU? Wouldn't doing so cause more interrupts for higher priority processes, and more queued time for lower priority processes?
2) Each process line, below the summary section, for the "top" output can have the CPU % go above 100%? I did see it went up to 100.2% but thought it was a rounding error of some sort. Can you explain how this happens?

Thanks!
 
Old 12-11-2010, 01:13 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,273

Rep: Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028
So the "Something else going on here" I suspected was actually you fudging the numbers and not telling us ?.

You'll not find many willing to help if you continue that - analysis of the paltry metrics within Linux is difficult enough without being lied to.

1) You may have mis-interpreted my (somewhat misleading ) comment - the scheduler attempts to re-dispatch each process on the processor it was last on. Primarily for cache/TLB performance. I wasn't implying every process would re-dispatch on the one (same) processor.
2) You are correct about the rounding error case. Depending on (top) options, multi-threaded apps may display as a single line - all the CPU consumption is attributed to the [grand]father task, and not normalised.
 
Old 12-11-2010, 09:18 AM   #5
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
Apologies on the numbers. I wasn't trying to lie but the situation was gone and I tried the best to make the top output to look like the way it was. Sorry didn't know it actually would cause inconsistencies.

Anyway, for the case when the individual process CPU % being listed at 100%. What exactly does this mean then? If both CPUs are not maxed out with 100%us, then how come a process can be 100%?

The same thing just happened again... here it is the actual top output captured. This time it is the actual output. Is there a way to make the scheduler to not keep assigning the same PID against the same CPU? Not sure why but one of the Apache's PID consistently is up at 100%.

top - 08:11:06 up 122 days, 11:05, 1 user, load average: 1.05, 1.04, 0.97
Tasks: 135 total, 3 running, 132 sleeping, 0 stopped, 0 zombie
Cpu0 : 5.7%us, 0.7%sy, 0.0%ni, 90.7%id, 1.7%wa, 0.7%hi, 0.7%si, 0.0%st
Cpu1 : 73.4%us, 26.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2059516k total, 1906896k used, 152620k free, 154016k buffers
Swap: 4095992k total, 80k used, 4095912k free, 1005564k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4188 daemon 25 0 115m 18m 2948 R 100.2 0.9 43:25.32 httpd
6602 daemon 15 0 109m 12m 2856 S 2.7 0.6 0:00.17 httpd
6678 daemon 15 0 105m 8416 2600 S 1.3 0.4 0:00.05 httpd



Here're all the processes for httpd. It looks like PID 4188 is the only one that is working, taking up 96% of CPU with a State = Running and the rest 0% with a State = Sleeping.
Code:
[root@production ~]# ps -ef | grep http
UID        PID  PPID  C STIME TTY          TIME CMD
daemon    4188 28338 96 07:26 ?        00:45:36 /usr/local/apache2/bin/httpd
daemon    6715 28338  0 08:10 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6759 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6760 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6770 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6773 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6794 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6795 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6796 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6814 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6816 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6817 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6819 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6820 28338  0 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6821 28338  1 08:12 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6824 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6825 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6827 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6829 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6830 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6831 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6833 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6834 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6835 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6837 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6838 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6839 28338  1 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon    6840 28338  0 08:13 ?        00:00:00 /usr/local/apache2/bin/httpd
root     28338     1  0 Dec10 ?        00:00:03 /usr/local/apache2/bin/httpd


Code:
[root@production ~]# ps aux | grep http
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
daemon    4188 97.1  0.8 118548 18484 ?        R    07:26  49:45 /usr/local/apache2/bin/httpd
daemon    6922  0.3  0.6 112804 13036 ?        S    08:14   0:00 /usr/local/apache2/bin/httpd
daemon    7034  0.1  0.5 111840 11704 ?        S    08:15   0:00 /usr/local/apache2/bin/httpd
daemon    7055  0.2  0.4 109892 10124 ?        S    08:15   0:00 /usr/local/apache2/bin/httpd
daemon    7064  0.0  0.4 108520  8668 ?        S    08:15   0:00 /usr/local/apache2/bin/httpd
daemon    7069  0.0  0.2 106044  4168 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7077  0.3  0.6 113168 13380 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7078  0.1  0.4 109652  9596 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7106  0.2  0.7 115492 15384 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7107  0.1  0.4 108536  8728 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7109  0.3  0.4 108540  8748 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7110  0.1  0.4 109636  9592 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7112  0.5  0.6 112652 12872 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7136  0.0  0.4 108832  8492 ?        S    08:16   0:00 /usr/local/apache2/bin/httpd
daemon    7146  0.0  0.2 106044  4180 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
daemon    7148  0.0  0.2 106044  4152 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
daemon    7165  1.1  0.4 108520  8692 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
daemon    7167  0.7  0.3 108104  8228 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
daemon    7168  0.0  0.2 106044  4156 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
daemon    7169  0.0  0.2 106044  4156 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
daemon    7170  0.0  0.2 106044  4156 ?        S    08:17   0:00 /usr/local/apache2/bin/httpd
root      7173  0.0  0.0  61160   716 pts/2    R+   08:17   0:00 grep http
root     28338  0.0  0.2 106044  5252 ?        Ss   Dec10   0:03 /usr/local/apache2/bin/httpd
A few minutes later.... Note PID 4188 is still there but the rest of the PIDs have incremented. Why would the PIDs change if there START times haven't?
Code:
[root@production ~]# ps aux | grep http
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
daemon    4188 97.6  0.8 118548 18484 ?        R    07:26  60:45 /usr/local/apache2/bin/httpd
daemon    7570  0.4  0.7 114724 15008 ?        S    08:24   0:01 /usr/local/apache2/bin/httpd
daemon    7577  0.1  0.7 116496 16344 ?        S    08:24   0:00 /usr/local/apache2/bin/httpd
daemon    7648  0.4  0.6 112752 13052 ?        S    08:25   0:00 /usr/local/apache2/bin/httpd
daemon    7673  0.2  0.8 117260 17040 ?        S    08:26   0:00 /usr/local/apache2/bin/httpd
daemon    7747  0.1  0.6 113392 13580 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7770  0.1  0.4 109808  9768 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7772  0.4  0.9 120484 20424 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7776  0.5  0.6 113932 14276 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7780  0.3  0.6 113420 13680 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7784  0.4  0.6 112968 13164 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7785  0.4  0.6 112244 12508 ?        S    08:27   0:00 /usr/local/apache2/bin/httpd
daemon    7845  0.5  0.4 109652  9580 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7869  1.5  0.7 114464 14848 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7871  0.4  0.4 108612  8772 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7872  0.0  0.2 106044  4172 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7873  1.1  0.4 108552  8740 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7875  0.5  0.4 109656  9528 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7876  0.1  0.4 108484  8372 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7880  0.0  0.2 106044  4172 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7881  1.6  0.4 109636  9592 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7882  0.8  0.6 112816 13036 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7883  0.0  0.2 106044  4176 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7884  0.1  0.4 108500  8388 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7885  0.6  0.5 110316 10484 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
daemon    7886  0.0  0.4 108832  8468 ?        S    08:28   0:00 /usr/local/apache2/bin/httpd
root      7905  0.0  0.0  61160   716 pts/2    R+   08:28   0:00 grep http
root     28338  0.0  0.2 106044  5252 ?        Ss   Dec10   0:03 /usr/local/apache2/bin/httpd

Last edited by grob115; 12-11-2010 at 10:30 AM.
 
Old 12-12-2010, 03:01 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,273

Rep: Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028
Quote:
Originally Posted by grob115 View Post
Anyway, for the case when the individual process CPU % being listed at 100%. What exactly does this mean then? If both CPUs are not maxed out with 100%us, then how come a process can be 100%?
See problems #1 and #2 above. The task is not (necessarily) using 100% CPU - it is merely using the/a CPU 100% of the times it was sampled.
Entirely different thing.
Quote:
This time it is the actual output. Is there a way to make the scheduler to not keep assigning the same PID against the same CPU?
Why would you want to do that ?. The scheduler only dispatches runnable work. If apache determines that only one of its threads is dispatchable (bug or design), then that's what gets added to the run queue. Forcing it onto another CPU changes nothing - certainly won't cause other processes to magically become runnable.
Quote:
Here're all the processes for httpd. It looks like PID 4188 is the only one that is working, taking up 96% of CPU with a State = Running and the rest 0% with a State = Sleeping.
see above.
Quote:
Why would the PIDs change if there START times haven't?
The start times have changed.
 
Old 12-12-2010, 10:33 AM   #7
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
Quote:
The task is not (necessarily) using 100% CPU - it is merely using the/a CPU 100% of the times it was sampled.
Would it be correct if I state the following:
1) The %us figure for CPU0 and CPU1 indicates the percentage of time each of the CPUs are busy based on previous sampling. For example, if there were 76 samples within the last 100 samples when CPU1 was busy, then "top" would display 76%us for CPU1.
2) The process' %CPU indicates the percentage of time each of these processes were busy based on previous sampling. For example, if there were 96 samples within the last 100 samples when PID 4188 was in State = Running, then "top" would show PID 4188 with 96% for %CPU.

Quote:
Why would you want to do that ?. The scheduler only dispatches runnable work. If apache determines that only one of its threads is dispatchable (bug or design), then that's what gets added to the run queue. Forcing it onto another CPU changes nothing - certainly won't cause other processes to magically become runnable.
I was thinking that if CPU1 is busy so often, any processes that is being dispatched to CPU1 would need to wait rather than getting executed immediately. It's kind of like there are 2 cashier at the supermarket. Cashier 1 is busy 100% of the time doing other stuff (maybe counting money), then a customer would get faster service if s/he goes to Cashier 0 who is not busy at all.

How can you check whether a thread is dispatchable? Is there a command or /proc file to check for this?

Quote:
The start times have changed.
Sorry my bad. I can see that now.

Based on Apache's explanation here, PID 28338 is the parent process and it should be the one forking the child processes and recycling them after an x number of pages have been served. This explains why the child process' PIDs and Start Time are changing. However, this doesn't explain what is happening with PID 4188. Any ideas?

Last edited by grob115; 12-12-2010 at 10:39 AM.
 
Old 12-12-2010, 03:41 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,273

Rep: Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028Reputation: 1028
You would need to look at the code to be sure, but that CPU% analysis seems reasonable.
<Edit:> Hmmm - I have a vague recollection I did look at this a while back. Maybe top uses the timer tick count to determine CPU% busy. Will be (much) more accurate than just a sample count, but still subject to some rounding errors. Maybe I'll look it up again someday.</Edit:>
Note I said "tends to be dispatched". There are heuristics that balance out starvation. Movement across CPUs can (and does) happen if fair share is disrupted. For state, look at /proc/<pid>/stat - note this is likely to be extremely volatile. Only valid at the instant you looked.
As for what may happening with that process, who knows. May be working as designed; or may be a bug - apache, user, web page ...

Last edited by syg00; 12-12-2010 at 06:06 PM.
 
Old 12-13-2010, 08:18 AM   #9
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
Definitely a bug. Take a look at the following just captured now.
See the extreme time of 2806:08 for PID 4188 while the rest are hardly over 1 min.

Code:
top - 06:16:14 up 124 days,  9:10,  1 user,  load average: 1.03, 1.20, 1.18
Tasks: 128 total,   3 running, 125 sleeping,   0 stopped,   0 zombie
Cpu0  : 12.3%us,  1.3%sy,  0.0%ni, 85.3%id,  0.0%wa,  0.3%hi,  0.7%si,  0.0%st
Cpu1  : 73.8%us, 26.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2059516k total,  1923432k used,   136084k free,   157940k buffers
Swap:  4095992k total,       80k used,  4095912k free,   986552k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4188 daemon    25   0  115m  18m 2948 R 99.9  0.9   2806:08 httpd
  645 daemon    15   0  108m  10m 2852 S  4.7  0.5   0:00.15 httpd
  550 daemon    15   0  109m  12m 2868 S  2.7  0.6   0:00.56 httpd
  632 daemon    15   0  108m  11m 2836 S  2.7  0.6   0:00.15 httpd
  633 daemon    15   0  109m  11m 2832 S  2.0  0.6   0:00.19 httpd
  647 daemon    16   0  106m 8860 2740 S  1.7  0.4   0:00.05 httpd
  476 daemon    16   0  108m  11m 2932 S  0.7  0.6   0:01.63 httpd
 
Old 12-14-2010, 12:28 AM   #10
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
You could also get that effect if someone was auto hammering your site with just connects (basic request), but not asking for anything to be done; the main thread (Apache ctrl dispatcher) will get hammered, but there's (almost) nothing for the worker thrs to do.
 
Old 12-14-2010, 10:17 AM   #11
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
Thanks though that's not the case. The dispatcher is executed under user "root" with PID 28338. PID 4188 is one of the worker processes.
 
Old 12-14-2010, 07:17 PM   #12
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
In that case, can you tell from eg Apache access_log or error_log what that process is trying to do? Is there anything on your website that would require an Apache child to remain open that long eg status update page of some sort?
Try

ps -ef|grep 4188

to see if its calling something else.
Do you need to adjust one of the timeout settings http://stackoverflow.com/questions/7...-configuration.
What happens if you just kill 4188; does the problem go away or does another process start to exhibit the same symptoms?
 
Old 12-15-2010, 09:28 AM   #13
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
The two logs don't have enough detail. Technically I suppose I can turn on more logging but I try to avoid that unless it's necessary. I forgot I should do the "ps -ef | grep 4188" part. I'll try to do that next time. Definitely nothing should be running that long.
 
Old 12-19-2010, 09:22 AM   #14
grob115
Member
 
Registered: Oct 2005
Posts: 528

Original Poster
Rep: Reputation: 32
Hi, the situation just happened again today. Here's the upper portion of the top output.
The same characteristics, CPU1 is busy and CPU0 is not. One of the httpd processes (PID 5175 this time) has been running for excessively long relative to the other ones, and is busy 100.2% of the time.

Code:
top - 07:17:29 up 130 days, 10:12,  1 user,  load average: 1.16, 1.06, 1.06
Tasks: 125 total,   3 running, 122 sleeping,   0 stopped,   0 zombie
Cpu0  :  5.3%us,  0.3%sy,  0.0%ni, 93.7%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
Cpu1  : 72.3%us, 27.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2059516k total,  1917784k used,   141732k free,   157504k buffers
Swap:  4095992k total,       80k used,  4095912k free,   984164k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5175 daemon    25   0  115m  18m 3012 R 100.2  0.9 386:01.47 httpd
24602 daemon    15   0  112m  15m 2880 S  3.3  0.8   0:00.10 httpd
24425 daemon    15   0  113m  16m 2972 S  2.0  0.8   0:00.88 httpd
24597 daemon    15   0  108m  10m 2820 S  0.7  0.5   0:00.08 httpd
As suggested, I've tried to grab the "ps -ef" output to see what PID 5175 may be running but found nothing.
Code:
[root@production ~]# ps -ef | grep 5175
daemon    5175 28338 99 00:49 ?        06:27:38 /usr/local/apache2/bin/httpd
root     24712 24604  0 07:19 pts/2    00:00:00 grep 5175
[root@production ~]#
This again proves that PID 5175 isn't the root process, which is PID 28338.
Code:
[root@production ~]# ps -ef | grep httpd
daemon    5175 28338 99 00:49 ?        06:28:50 /usr/local/apache2/bin/httpd
daemon   24566 28338  0 07:16 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24600 28338  0 07:17 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24603 28338  0 07:17 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24702 28338  0 07:18 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24703 28338  0 07:19 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24707 28338  0 07:19 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24709 28338  0 07:19 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24735 28338  0 07:19 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24740 28338  0 07:19 ?        00:00:00 /usr/local/apache2/bin/httpd
daemon   24764 28338  0 07:19 ?        00:00:00 /usr/local/apache2/bin/httpd
root     24782 24604  0 07:20 pts/2    00:00:00 grep httpd
root     28338     1  0 Dec10 ?        00:00:32 /usr/local/apache2/bin/httpd
[root@production ~]#
Any way I can tell what's happening?
 
Old 12-19-2010, 09:44 AM   #15
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
You could always use tcpdump or wireshark to watch the network traffic, or you could increase the log level on Apache. Of the two watching network traffic would have the least impact on performance.

Reading all of the previous posts with their observations I wondered if your server was being hit with http requests for files that don't exist. Years ago I read about this technique as an attempt to create a DDOS attack. It would require this type of request from many machines to actually overwhelm a server.

Certainly increasing the Apache log level would show this. It would also show if the work was coming from the same remote machine. You could also determine if this was the case by watching network traffic and reading the data portion of the packets.

Last edited by stress_junkie; 12-19-2010 at 09:46 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
About the top command output jacobselvin Linux - General 1 07-17-2009 12:16 AM
Best way to save top output? 18Googol2 Linux - General 9 03-13-2009 01:22 AM
Help me understand my top output neocontrol Linux - Server 5 02-21-2008 05:46 PM
Help Determinig Top Output XaViaR Linux - General 2 02-13-2007 04:41 PM
Odd output of 'top' philforrest Linux - General 1 07-20-2004 12:05 AM


All times are GMT -5. The time now is 11:04 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration