Hello!
This is my first time around this forums so I hope I'm in the right place. I usually manage to solve all of my linux-related problems by just googling, but this time around I would like to clarify some doubts I'm having.
We are developing a device which runs a version of embedded linux, it runs a simple lighttpd webserver for control and configuration.
The device is used for various CPU-intensive tasks. I'm trying to determine how many of this tasks we can run simultaneously and use that to improve performance where possible.
To start off somewhere I decided to monitor the CPU usage and system load average over time. I'm mostly new with all of this so the load average is something I'm still not very familiar with. I've been reading everything I can about it and I think I have sort of an understanding of what it measures and how to look at it, but I'd like to clarify a couple of doubts that I still have:
1) The load average shows how many processes are using or waiting for system resources. Does this mean that, as long as the load doesn't increase over time, all of them are being successfully processed? Not taking into account what delay this would entail.
2) Contrary to the previous point, a slowly increasing load average would mean that processes are getting piled up because there's not enough system resources, is that correct?
3) High load average means an increase in response time. So, if latency is not an issue then a high load average (say 20 or so for a 1-core processor) wouldn't be an issue? As long as it doesn't increase over time, of course.
This all started with that device I mentioned above. I increased the number of tasks until I reached a CPU (single-core) usage of about ~80%, however, I noticed the webserver started slowing down and the pages took longer to load. Looking at the top command I have something like this (I've trimmed the bottom, everything there only showed up as 0 CPU and 0 MEM):
Code:
top - 19:17:26 up 18 min, 1 user, load average: 4.59, 4.86, 2.84
Tasks: 69 total, 1 running, 68 sleeping, 0 stopped, 0 zombie
Cpu(s): 78.1%us, 20.3%sy, 0.0%ni, 0.6%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 221936k total, 44588k used, 177348k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 14892k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1308 root 20 0 494m 10m 2788 S 78.1 4.8 12:01.21 main_app
22 root 20 0 0 0 0 S 1.0 0.0 0:02.48 kworker/0:1
1359 root 20 0 2644 1204 632 S 0.3 0.5 0:03.64 lighttpd
4289 root 20 0 2428 1128 876 R 0.3 0.5 0:05.34 top
26722 root 20 0 2860 760 592 S 0.3 0.3 0:00.01 sh
26735 root 20 0 2724 372 312 S 0.3 0.2 0:00.01 sleep
Where all the processing threads are spawned inside the main_app process (I can look at them with shift+H). Most of them are waiting on sleep or mutex routines.
I'm not sure how to narrow down my search as to what is causing the increase in load average. I can clearly see that there's basically no idle time available, does this mean the CPU is the bottleneck at this point?
I hope someone can help me out with this. At least with the questions about the load average, the rest is something I'll have to figure out on my own probably, but I wanted to provide some background on where this questions are coming from.
Thank you!
-- Sycc