Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I'm running a test against 2.6.9-42.7.ELsmp. We are considering upgrading from 2.4.21-32.0.1.ELsmp to this version, hince the test. However, when I run the test, the response from the linux box gets really slow, like 1-5 minutes for the response from a command. Just issuing the date command takes 1.5 minutes:
Thu Sep 6 17:21:31 EDT 2007
Thu Sep 6 17:23:01 EDT 2007
I'm ssh'ing through eth0 so I looked there but found no errors:
There is a tool we use to emulate BGP peer's. I noticed when the tool was running is when the response was really slow. When I stopped the tool, the response time returned to normal. What's strange to me is that there doesn't seem to be much additional CPU free as compared to before, but the response has greatly improved.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9656 root 15 0 190m 20m 460 S 4 2.1 2:08.26 routem.latest
9665 root 15 0 190m 20m 460 S 4 2.1 2:07.64 routem.latest
9641 root 15 0 190m 20m 460 R 4 2.1 2:07.39 routem.latest
"routem" is the tool we use to emulate the BGP peers. I'm not sure why iostat would report idle CPU yet top show none. However, since the 1, 5 & 15 minute averages are all over 100% I can't really contribute the difference between iostat and top as CPU required to run top.
We run RedHat Enterprise here, and I think the latest deployed in our kickstart process is v.4 update 4. :-(.
Those load averages aren't in %, they're in no. of processes on average trying to run. A load average of 1 represents a fully occupied single core machine. If you machine is dual core, then a load average of 2 represents a perfectly loaded host. So if you're running a dual processor host, then you load average is actually at 5000%.
Looks like the kernel is really chewing on something ( 88.7% sy ).
To be honest, the upgrade from 2.4 to 2.6 kernels isn't trivial - is the rest of the software on the host updated as well? There's numerous changes to binutils and others required for the 2.6 kernel, if memory serves.
It turns out that the problem was with the BGP emulator. It loops through all the peers it is emulating then sleeps 10000 usec. It appears in the 2.4 kernel, this was ok but in the 2.6 kernel, it causes problems. Increasing that time to 50000 usec helped quite a bit but still doesn't completely resolve the problem.