LinuxQuestions.org - Under moderate load in a multicore system of 13 processors one processor gets 100% us

- Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)

- - Under moderate load in a multicore system of 13 processors one processor gets 100% us (https://www.linuxquestions.org/questions/linux-kernel-70/under-moderate-load-in-a-multicore-system-of-13-processors-one-processor-gets-100-us-745476/)

Under moderate load in a multicore system of 13 processors one processor gets 100% us

Hi
I have a multicore system consisting of 14 processors under heavy load that I simulate by sending packets to the system when I put "top" command I find that randomly any one processor is 0% idle and it is running softirq up to 100% though other all processors are only 97% to 98% idle means only 1% or 2% is used. As far as I know that softirq are reentrant and 2.6 schedular performs good load balance. So I want to know the possible reasons where I can start debugging.My driver is NAPI compliant. So looking for some valuable suggestions as whether driver is faulty or kernel code is?

As this is a e1000 NAPI complinat driver so it balances the irq and performs the rest of packet receiving job by calling the poll function as poll function dequeues the buffer so no. of interrupts decreases considerably therfore balancing of interrupts does not play a major role.As activation and execution of softirq is performed by the same cpu so any random cpu gets occupancy of 100% which has schedule the softirq. Now the question is Why scheduler is not able to create a load balance / If this intrepretation is fine then how this issue will be rectified.

In fact further investigation reavels that irq is balanced on the machine -------

As this behaviour is in compliance with NAPI because when this feature is enabled then only one cpu that receives the interrupt at first will do the processing of rx ring buffer so it will mount the load on the same cpu and in the case of high volume traffic it reaches up to 100%.up to this extent it is okay . Can anyone share some design changes in driver so that in the case of one NIC and many more cpu the processing can be disbursed.

I would be extremely astonished to find that "14 processors or cores" would be saturated by any load-pattern that strictly originated from a physical device (or devices). CPUs can always run much faster than "the real world."

Naturally, you'd like for any one of several CPUs to be able to "take the interrupt," but really that does not matter so much. What really matters here is the handling of the workload that is represented by all those incoming packets. Presumably, each packet is "a request to 'do something,'" and it is the act of "doing something," not the act of handling the I/O traffic, that will make productive and balanced use of a large farm of available CPUs.

Each of the incoming packets should be moved as quickly as possible to a user-land queue which can be serviced by several processes or threads. Each of these threads would dequeue a work-request from the queue and then carry out that request. Each of them would then enqueue a response for later delivery.

Quite naturally, some of the CPUs (i.e. the ones to which the devices are physically attached) will tend to become "I/O-handling specialists." The others will each be running the worker-threads that are handling the actual workload that is, presumably, this busy server's raision d'entre.

Success really has precious-little to do with the kernel. This is a matter of good application design.

Agrred,There might be way like creating some thread with specific affinity to a cpu for solving this issue. But my priority is to distribute the load among several processors by changing something in driver or exploring all possibility by remaining in kernel side.
Now I have disabled NAPI. Now all NIC interrupts are getting to only one cpu at one time until I forced them to move to another cpu by writing in /proc/irq.....
I googled and found very vague answer.There were many suggestions such as:
(1) run irqbalance either deamon or enable CONFIG_IRQBALANCE
In my system I could not find this configuration option in config file(kerneL 2.6.21) and didnot find this process in the the output of ps -ef either.Therefore I am sure it is a part of my kernel.Moreover If it is left then there must be some purpose so I am not going to use it soon.

(2)write to /proc/irq/<no.>/smp_affinity
I found this value already set to ffff even then I changed it to some different value but in vain.still all the interrupts are falling on a single processor .one interesting thing is that when I pull NIC interrupt from one cpu to anothe cpu by changing smp_affinity value to a single bit then it moves corresponding to that cpu but when I write bit pattern having multiple 1's then it crashes.I donot know how by default it is ffff.when I put the same value it crashes.

My question is will irqbalance address this issue?
smp_affinity is by default ffff.Then why it is falling to a single cpu?

Hi I hope This time some of the questions will be answered

ENVIRONMENT:
KERNEL 2.6.21
DRIVER E1000
NAPI DISABLED
NO. OF CPU=14
Q.1.Driver code set interrupt affinity to all the cores only when NAPI is enabled?(This is not present in open source code)

Q.2. When NAPI is disabled all the interrupts are falling on only one cpu .why?(/proc/interrupts)

Q.3.If I try to set its affinity in /proc (though default is ffff but cat/proc/interrupts reports that it is only one cpu that is receiving all the interrupts from eth0)it gets panic.
---------------------------------------------------------------
root@:/proc/irq/45> cat smp_affinity
ffff
---------------------------------------------------------------
CPU00 CPU01 CPU02 CPU03 CPU04 CPU05 CPU06 CPU07

45: 0 0 0 808204 0 0 0 0 CIU eth0, eth1

----------------------------------------------------------------
Cpu0 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.3%us, 0.0%sy, 0.0%ni, 81.5%id, 0.0%wa, 5.3%hi, 12.9%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Throughput degradition

I disabled the NAPI then I found that all the interrupts were getting to one cpu then I put a spinlock in my driver code and comiled it.during execution I set its affinity to multicore by writing in proc file. It worked fine. But outcome was that cpu utilization went up . That is understandable. But throughput of NIC degraded even if I have many cpus processing interrupts.