LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 08-09-2012, 04:47 AM   #1
iluvatar
Member
 
Registered: Jul 2003
Location: netherlands
Distribution: debian
Posts: 403

Rep: Reputation: 30
localtimer and rescheduling interrupts going through the roof


Hi everybody,

First some background information. We have a blade setup with 10 Dell PowerEdge 1955 blades, all configured exactly the same. They all run on an untouched, fresh Debian Squeeze install. The blades all have two quad cores (xeon E5345), all servers run the same kernel: Linux 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux.

We've written clustered index building software which runs on these machines, this software is already live on a production environment on another set of blades (also all the same model) and it works correct.

On the new blades, we noticed a very big performance problem, caused by a single blade. Upon further analysis, I saw the particular blade causing the problems had a huge amount of interrupts. I wrote a script to analyze the interrupt counts, recording the number of interrupts in the last 5 seconds. Here is an extract from the output I got when our software is running:
Quote:
normal blade, localtimer interrupts:
6987
7219
7168
7031
6884
7166
6846
6699
7416
7018
Quote:
problem blade, localtimer interrupts:
20292
15861
17124
15181
18748
25346
15790
15386
14714
16959
Quote:
normal blade, rescheduling interrupts:
27
30
37
50
5
23
26
35
18
21
Quote:
problem blade, rescheduling interrupts:
4281
5139
8334
4908
5115
4492
4920
5972
5268
5596
Here I'm stuck however: I'm not a kernel hacker and don't know how to debug / analyse / test this further. How to see what's really going on here, what are my options to test? Are there certain kernel parameters to tweak? Could it be faulty hardware, and if so, how to determine what is broken?

Any help would be very welcome, I can post more details if you need to know anything else.

[EDIT]
I followed directions from this document I found: https://help.ubuntu.com/community/Re...lingInterrupts, tried all kernel parameters there (acpi=noirq, acpi=off, noapic and nolapic) but this didn't change anything. Unfortunatly, I don't have access to the BIOS now... Are there other options to try?

Last edited by iluvatar; 08-09-2012 at 07:36 AM. Reason: added more things I tried
 
Old 08-09-2012, 01:38 PM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
Take the blade offline and replace it with another one. If the problem goes away, consider the problem solved. (Because, in fact, it is.)

My best-guess is that something is preventing timer interrupts from being serviced timely, or maybe the real-time clock is screwed. Or maybe the system is being otherwise flooded with interrupts.

Doesn't matter, because your goal is to get production done. There's gonna be zero return-on-investment for you futzing around with it. If replacing this blade without figuring out why solves the problem, don't bother to figure out why. Send it back to the manufacturer and ask for another one. If you're a good customer with a good service rep, you'll get it.

Last edited by sundialsvcs; 08-09-2012 at 01:40 PM.
 
Old 09-28-2012, 05:54 AM   #3
rew
Member
 
Registered: May 2010
Posts: 36

Rep: Reputation: 3
I'm having a similar problem. My workstation encounters lots of interrupts.
About 120 thousand to 150 thousand per second. i.e. a lot more than TS here....

It's just that I thought my system would be more or less idle (with a few hundred interrupts per second, max) when I wouldn't touch it.
 
Old 09-28-2012, 07:33 AM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
Same recommendation. Call the vendor and tell them to bring you another one. They can go home and figure out why it's busted on their own time. If you've got ten supposedly "identical" computers, all running the same software, and "one" of them is the odd-man out, "it's hardware. Gotta be." And therefore, it's someone else's job to figure out why. They can bring you a rental car for the interim.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
rescheduling interrupts in ubuntu 9.04 vipi12 Linux - Software 5 12-18-2009 03:49 AM
"Rescheduling interrupts" in powertop dmarti Linux - Laptop and Netbook 2 01-12-2009 07:14 PM
The roof, the roof, the roof is on fire... Titan2k SUSE / openSUSE 2 02-22-2005 09:47 AM
kdeinit and xfree86 process through the roof while burning Moloko Debian 0 01-13-2005 04:26 PM
I have no roof! WebmastaX Linux - Newbie 2 09-19-2003 10:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 08:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration