Load peaks... but cannot identify the reason
I am running a soft real-time system. That is, response times do not have to be guaranteed to be constant or very short, but the system has to react to real-time events an act accordingly.
The system comprises about 15 processes for process control, all PHP, TCL and Bash scripts. Each process has a number of tasks to be executed. Tasks inside those processes run roughly each 200 ms, 1 second, 2 seconds or 15 seconds.
All processes communicate with each other thru a MySQL database. For example, one process rewrites a table every 1 second, the second process reads from this table every 200 ms etc.
Apache runs to serve a web based user interface. Web pages also access the MySQL database to enter settings, or display status. Most status web pages are refreshed every 2 seconds through AJAX scripts.
The entire system is on intranet and cannot be accessed from the outside. So it is impossible that suddenly a few hundred users access this server.
The top command shows a CPU percentage for the MySQL process between 10 and 20%. All process control processes run well below 3%. Processes are started one after another, and have a sleep statement in their process loops. So I am not starting a task at a defined time, but simply after the process has been put to sleep. I assume that after some time because of the different processing times, any correlation in timing of the processes has disappeared.
Total CPU % for user processes run between 15%-20% on one core and 10%-15% on the other core.
The average 1-min processor load is about 0.8-1.2. My problem is that every few minutes (somewhere between 5-10 minutes) my 1-min average processor load increases to 4 for some time (about 30-60 seconds), and then decays back to the lower values. In all that time, the CPU % does not increase.
I know the avg load and CPU % are two entirely different values, but I expect to see at least some correlation between a load of 4 and CPU %.
There is no problem with the system, but I find it scary that I cannot explain this load average peak. After all, a value of 4 means that at certain times processes are waiting to be scheduled.
Any suggestion as how to find the cause?
jlinkels
|