LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Network problems under high load which causes high CPU usage in processes using network (https://www.linuxquestions.org/questions/linux-networking-3/network-problems-under-high-load-which-causes-high-cpu-usage-in-processes-using-network-4175698807/)

arivo 08-06-2021 09:59 AM

Network problems under high load which causes high CPU usage in processes using network
 
Hello,

we found a very strange problem. We are running Neural Networks on images from cameras on our Intel NUCs and also have some communication with other PCs and to our servers.

When processing too many images the NUC becomes too hot and starts to reduce the CPU clock and there the strange behavior starts. In this state we get network disconnects and other network problems, though the high workload is mainly because of the GPU and to a lesser extend the CPU. And while other processes continue normally (only slightly higher CPU usage), processes using the network start to use much more CPU (a service which only forwards messages from and to another PC suddenly uses 30% CPU instead of 3% in normal circumstances).

Does anyone know why the high workload has such an impact on the network communication? And is there something we can do to reduce the problem?

Thanks,
Thomas

berndbausch 08-07-2021 09:00 AM

I am sure someone who knows the application well can provide a solid explanation. All others can only make idle speculations. Here is my speculation: Could it be that more network traffic is generated when the workload gets higher? Could the relation between workload and traffic volume be non-linear?

An obvious solution is to limit the image processing. A better solution is to change the application's configuration, but again, one needs to know the application to do that.

arivo 08-09-2021 03:21 AM

Ok, maybe you need more details about the problem and our setup.
We are running multiple services, each in a Docker container. There are services reading images from the camera's video stream and running Neural Networks (let's call them service A), services doing some processing without use of network (service B), a service communicating with other PCs over Ethernet (service C) and services communicating with hardware over Ethernet (not much data is transmitted) (service D). Only services A do a lot of processing, the other services need less then 3% CPU usage in normal circumstances.

When the PC becomes too hot and thermal throttling starts, the following impact on the services can be observed:
services A: The throughput of images goes down and also less images are read from the video stream. This makes sense as there is not enough processing power available anymore.
services B: No impact, as they do not need much processing power.
service C: This service usually has a CPU usage of about 3%. However after thermal throttling the CPU usage increases to over 30%. This services reads messages from a zmq message queue working over TCP with a few hundred messages per second (each message < 1kB)
services D: These services run still mostly normal, however we observe a lot more reconnects during thermal throttling, even where we usually have a perfect connection with no reconnects at all in normal circumstances.

So the higher workload caused by the thermal throttling has side effects on the stability of connections over Ethernet and also the CPU usage of services using the network dramatically increases, even as the network traffic does not increase or even is reduced.

I will try to create a minimal setup which reproduces the problem but this might take a while.


All times are GMT -5. The time now is 12:10 AM.