LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Kipmi0 eating up to 99.8% cpu on centos 6.4 (https://www.linuxquestions.org/questions/linux-server-73/kipmi0-eating-up-to-99-8-cpu-on-centos-6-4-a-4175460915/)

newbie14 05-06-2013 02:07 PM

Kipmi0 eating up to 99.8% cpu on centos 6.4
 
We have centos 6.4 and the kipmi0 is showing as 99.8%cpu and 0.0% memory and load average is 1.00. What should we do to rectify on this? Thank you.

siremaxus 05-06-2013 02:40 PM

Hi,

Maybe you can find your answer here:
http://www.serveradminblog.com/2011/02/kipmi0-problem/

hope it helps...

Sire Maxus

newbie14 05-06-2013 02:43 PM

Dear Sire,
I dont find this in my centos 6.4 /etc/sysconfig/lm_sensors ?

newbie14 05-10-2013 12:24 PM

Hi,
Any help or indication how to resolve this matter?

siremaxus 05-10-2013 12:29 PM

Hi,

I've been thinking about this, but if you don't have the package lm_sensors installed then the problem must lie elsewhere.

can you post more info from the "top" command and perhaps install the package sysstat (yum install sysstat -y) and try some test on your box.
try to use iostat and "vmstat 5 5" and post your results here.

Good Luck

Sire Maxus

newbie14 05-11-2013 01:42 AM

Hi,
What type of test should I run with the sysstat? I am not so clear on that ? How many iostat and top samples do you want? Thank you.

newbie14 05-12-2013 02:02 AM

Hi Sire,
Below are some of my samples data captured.Please let me know if those are not suffice.

Quote:

top - 11:53:32 up 24 days, 23:03, 1 user, load average: 1.24, 1.10, 1.04
Tasks: 210 total, 2 running, 208 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7990988k total, 1787480k used, 6203508k free, 165400k buffers
Swap: 8126456k total, 0k used, 8126456k free, 1333796k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
130 root 39 19 0 0 0 R 99.8 0.0 26939:03 kipmi0
1059 root 20 0 0 0 0 S 0.3 0.0 1:06.99 jbd2/dm-2-8
7268 mysql 20 0 2685m 90m 7020 S 0.3 1.2 58:05.13 mysqld
1 root 20 0 19228 1500 1220 S 0.0 0.0 0:00.78 init


top - 11:53:47 up 24 days, 23:04, 1 user, load average: 1.18, 1.09, 1.04
Tasks: 210 total, 2 running, 208 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7990988k total, 1787604k used, 6203384k free, 165400k buffers
Swap: 8126456k total, 0k used, 8126456k free, 1333796k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
130 root 39 19 0 0 0 R 99.8 0.0 26939:18 kipmi0
1 root 20 0 19228 1500 1220 S 0.0 0.0 0:00.78 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.06 kthreadd


top - 11:59:42 up 24 days, 23:10, 1 user, load average: 1.07, 1.08, 1.03
Tasks: 210 total, 2 running, 208 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7990988k total, 1787372k used, 6203616k free, 165400k buffers
Swap: 8126456k total, 0k used, 8126456k free, 1333812k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
130 root 39 19 0 0 0 R 99.8 0.0 26945:12 kipmi0
4926 root 20 0 15128 1404 1008 R 0.3 0.0 0:01.05 top
1 root 20 0 19228 1500 1220 S 0.0 0.0 0:00.78 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.06 kthreadd


vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 6203740 165400 1333816 0 0 0 5 1 1 0 9 90 0 0
1 0 0 6203600 165400 1333836 0 0 0 7 1039 34 0 13 87 0 0
1 0 0 6203464 165408 1333836 0 0 0 2 1042 32 0 13 87 0 0
1 0 0 6203464 165408 1333836 0 0 0 5 1126 122 1 13 87 0 0
1 0 0 6203464 165408 1333836 0 0 0 2 1033 30 0 13 87 0 0



vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 6201780 165616 1335004 0 0 0 5 2 1 0 9 90 0 0
1 0 0 6201640 165616 1335004 0 0 0 119 1150 134 1 13 87 0 0
1 0 0 6201640 165616 1335004 0 0 0 4 1033 32 0 12 87 0 0
1 0 0 6201648 165616 1335004 0 0 0 26 1074 229 0 13 87 0 0
1 0 0 6201584 165616 1335004 0 0 0 12 1036 38 0 13 87 0 0

vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 6201768 165616 1335052 0 0 0 5 2 1 0 9 90 0 0
1 0 0 6201884 165616 1335052 0 0 0 0 1034 27 0 13 87 0 0
1 0 0 6201884 165616 1335052 0 0 0 0 1032 30 0 13 88 0 0
1 0 0 6201884 165616 1335052 0 0 0 0 1035 28 0 13 87 0 0
1 0 0 6201884 165616 1335052 0 0 0 7 1034 32 0 13 87 0 0



sar -u 1 3
Linux 2.6.32-358.2.1.el6.x86_64 (localhost.localdomain) 05/12/2013 _x86_64_ (8 CPU)

12:07:22 PM CPU %user %nice %system %iowait %steal %idle
12:07:23 PM all 0.00 0.00 12.62 0.00 0.00 87.38
12:07:24 PM all 0.00 0.00 12.50 0.00 0.00 87.50
12:07:25 PM all 0.00 0.00 12.62 0.00 0.00 87.38
Average: all





sar -P ALL 1 1
Linux 2.6.32-358.2.1.el6.x86_64 (localhost.localdomain) 05/12/2013 _x86_64_ (8 CPU)

12:08:17 PM CPU %user %nice %system %iowait %steal %idle
12:08:18 PM all 0.00 0.00 12.50 0.00 0.00 87.50
12:08:18 PM 0 0.00 0.00 0.00 0.00 0.00 100.00
12:08:18 PM 1 0.00 0.00 0.00 0.00 0.00 100.00
12:08:18 PM 2 0.00 0.00 0.00 0.00 0.00 100.00
12:08:18 PM 3 0.00 0.00 100.00 0.00 0.00 0.00
12:08:18 PM 4 0.00 0.00 0.00 0.00 0.00 100.00
12:08:18 PM 5 0.00 0.00 0.00 0.00 0.00 100.00
12:08:18 PM 6 0.00 0.00 0.00 0.00 0.00 100.00
12:08:18 PM 7 0.00 0.00 0.00 0.00 0.00 100.00

Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.00 0.00 12.50 0.00 0.00 87.50
Average: 0 0.00 0.00 0.00 0.00 0.00 100.00
Average: 1 0.00 0.00 0.00 0.00 0.00 100.00
Average: 2 0.00 0.00 0.00 0.00 0.00 100.00
Average: 3 0.00 0.00 100.00 0.00 0.00 0.00
Average: 4 0.00 0.00 0.00 0.00 0.00 100.00
Average: 5 0.00 0.00 0.00 0.00 0.00 100.00
Average: 6 0.00 0.00 0.00 0.00 0.00 100.00
Average: 7 0.00 0.00 0.00 0.00 0.00 100.00




sar -P ALL 1 1
Linux 2.6.32-358.2.1.el6.x86_64 (localhost.localdomain) 05/12/2013 _x86_64_ (8 CPU)

12:08:50 PM CPU %user %nice %system %iowait %steal %idle
12:08:51 PM all 0.00 0.00 12.50 0.12 0.00 87.38
12:08:51 PM 0 0.00 0.00 0.00 1.00 0.00 99.00
12:08:51 PM 1 0.00 0.00 0.00 0.00 0.00 100.00
12:08:51 PM 2 0.00 0.00 0.00 0.00 0.00 100.00
12:08:51 PM 3 0.00 0.00 100.00 0.00 0.00 0.00
12:08:51 PM 4 0.00 0.00 0.00 0.00 0.00 100.00
12:08:51 PM 5 0.00 0.00 0.00 0.00 0.00 100.00
12:08:51 PM 6 0.00 0.00 0.00 0.00 0.00 100.00
12:08:51 PM 7 0.00 0.00 0.00 0.00 0.00 100.00


sar -q 1 3
Linux 2.6.32-358.2.1.el6.x86_64 (localhost.localdomain) 05/12/2013 _x86_64_ (8 CPU)

12:10:17 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
12:10:18 PM 1 246 1.01 1.02 1.00
12:10:19 PM 1 246 1.01 1.02 1.00
12:10:20 PM 1 247 1.01 1.02 1.00
Average: 1 246 1.01 1.02 1.00


Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.00 0.00 12.50 0.12 0.00 87.38
Average: 0 0.00 0.00 0.00 1.00 0.00 99.00
Average: 1 0.00 0.00 0.00 0.00 0.00 100.00
Average: 2 0.00 0.00 0.00 0.00 0.00 100.00
Average: 3 0.00 0.00 100.00 0.00 0.00 0.00
Average: 4 0.00 0.00 0.00 0.00 0.00 100.00
Average: 5 0.00 0.00 0.00 0.00 0.00 100.00
Average: 6 0.00 0.00 0.00 0.00 0.00 100.00
Average: 7 0.00 0.00 0.00 0.00 0.00 100.00



sar -P ALL 1 1
Linux 2.6.32-358.2.1.el6.x86_64 (localhost.localdomain) 05/12/2013 _x86_64_ (8 CPU)

12:26:56 PM CPU %user %nice %system %iowait %steal %idle
12:26:57 PM all 0.00 0.00 12.61 0.00 0.00 87.39
12:26:57 PM 0 0.00 0.00 0.00 0.00 0.00 100.00
12:26:57 PM 1 0.00 0.00 100.00 0.00 0.00 0.00
12:26:57 PM 2 0.00 0.00 0.00 0.00 0.00 100.00
12:26:57 PM 3 0.00 0.00 0.00 0.00 0.00 100.00
12:26:57 PM 4 0.00 0.00 0.00 0.00 0.00 100.00
12:26:57 PM 5 0.00 0.00 0.00 0.00 0.00 100.00
12:26:57 PM 6 0.00 0.00 0.99 0.00 0.00 99.01
12:26:57 PM 7 0.00 0.00 0.00 0.00 0.00 100.00

Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.00 0.00 12.61 0.00 0.00 87.39
Average: 0 0.00 0.00 0.00 0.00 0.00 100.00
Average: 1 0.00 0.00 100.00 0.00 0.00 0.00
Average: 2 0.00 0.00 0.00 0.00 0.00 100.00
Average: 3 0.00 0.00 0.00 0.00 0.00 100.00
Average: 4 0.00 0.00 0.00 0.00 0.00 100.00
Average: 5 0.00 0.00 0.00 0.00 0.00 100.00
Average: 6 0.00 0.00 0.99 0.00 0.00 99.01
Average: 7 0.00 0.00 0.00 0.00 0.00 100.00

siremaxus 05-12-2013 03:38 AM

Hello,

Your system looks OK, besides kipmi0 using 99.8% (it's only using 1 core not all CPU)
In general your system is using 12.5% or 12.6% (average)and sometimes uses .12% on IOWAIT which is not a bad number either.

If you could post the exit of this command:
#ps -feaux > /root/process.txt

and the upload that file, we could check the process kipmi0 and any other process that could be making kipmi0 to use that much memory.

Good Luck

Sire Maxus

newbie14 05-12-2013 04:27 AM

1 Attachment(s)
Hi Sire,
What I/O wait could be considered as bad or danger? I have uploaded the required file. Thank you and appreciate your help.

newbie14 05-13-2013 09:03 AM

Hi Sire,
Any updates based on the process list? Thank you.

siremaxus 05-13-2013 09:39 AM

Hi,

As far as I know, 20-25% is considered acceptable for IOWAIT, more than that signals an issue with the storage devices.

Code:

root      129  0.0  0.0      0    0 ?        S    Apr17  0:00  \_ [pciehpd]
root      130 75.1  0.0      0    0 ?        RN  Apr17 27255:54  \_ [kipmi0]

The process "pciehpd" is related to hot-plug and that is what causing kipmi0 to use that much CPU.
Maybe some piece of hardware attached recently or the service didn't update cleanly when you update your system.

One question I have not asked yet is if you have rebooted your system after the update?

Sire Maxus

newbie14 05-13-2013 09:49 AM

Hi Sire,
I am sorry kind of new into this area. So which of the previous commands best to be used to monitor the IOWAIT. IOTWait signify that there is delay in the harddisk rite? I am sure there is no additionaly hardware attached to this machine. Possible the service didnt update cleanly. How ensure that there is a clean update cause I just run yum update always thats it. Normally I dont reboot. But there was once I reboot last month but after reboot is ok then slowly it again hike to this values. I am curious how do you linked pciehpd to kipmi0? Actually what is the exact role of kipmi0.

siremaxus 05-13-2013 10:24 AM

Hello,

The option "f" for the PS command shows all process with their parent and child process.
So there you have the process "pciehpd" which is a parent process for the "kipmi0" process, you can see the relationships between process with the lines drawn to the left of the process name.

You can check more infor on kipmi in this links:
http://www-01.ibm.com/support/docvie...2575fa0050f604
http://lists.us.dell.com/pipermail/l...ay/031305.html
https://supportcenter.checkpoint.com...tionid=sk43262
http://www.linux-archive.org/red-hat...el-thread.html

I hope this helps you,

Sire Maxus

newbie14 05-13-2013 10:28 AM

Hi sire,
Actually I have visited all the given links via google and non of it working e.g. service ipmi stop also is not working. I am quite lost on how to exactly solve this? Do you think a reboot again will help? Any way to ensure a clean yum update?

siremaxus 05-13-2013 10:40 AM

Hi,
I thought maybe a reboot could help, but if you have already done it, and then the process keeps hogging CPU then that is not a solution.
Of all the links I've read kipmi0 is related to IPMI, which is a set usually used to monitor hardware or used by some applications to monitor some process.
From the links I posted on my previous post, the IBM one says that it does not matter if kipmi is reporting high CPU usage, but it only runs on idle time and is standard behavior for this process.

If it bothers you that the process uses so much CPU the you can disable it as said in the following link:
http://www.novell.com/support/kb/doc.php?id=7003352 (Novel SUSE)
http://unix.stackexchange.com/questi...-on-centos-6-4 (CentOS)

Good Luck

Sire Maxus

newbie14 05-13-2013 10:49 AM

Hi Sire,
IS quite challenging as I have followed on the Centos OS solution but I dont find this file /etc/sysconfig/lm_sensors too. I am practically lost and have no idea what to do next as the disable also not working either.

siremaxus 05-13-2013 11:05 AM

Hi,

How about the "ipmi" service in section "workaround" from "Potential cause #1", does it work?

In case the ipmi service is not there, What does the "dmesg" command output?
#dmesg > /root/dmesg.txt

Good Luck

Sire Maxus

newbie14 05-13-2013 11:22 AM

1 Attachment(s)
Hi Sire,
How mean service ipmi stop no even this does not work either too. It gives me ipmi:unrecognized service. I have attached dmesg.txt file too.

siremaxus 05-13-2013 12:11 PM

Hi,

Looking at your dmesg file, I found that the server is a Cisco UCS C200 M2 (right?)

Then, searching the forums for cisco support i've found this:
Problem with sensor that is corrected with a Bios upgrade
https://supportforums.cisco.com/message/3530456#3530456

I'll let you know if I find anything else.

Good Luck,

Sire Maxus

newbie14 05-13-2013 12:49 PM

Hi Sire,
Yes you are right we are using Cicso UCS C200 M2. IF we upgrade the bios or firmware I am not too sure will these effect the warranty of the servers what is your opinion on this?

siremaxus 05-13-2013 01:31 PM

Hello,

If your servers are still under warranty you can open a support case with cisco and ask them if the bios update would solve the issue you are having right now.
Take advantage of your support contract, that may lead to a hardware change if they detect that there is some faulty hardware.
Usually support ask you to update bios, CIMC, firmware, etc. and test if that solves any of the issues you have, if not, then they begin to dig deeper.
For every update or upgrade you can ask cisco support the procedure according to best practices.

Good Luck

Sire Maxus

newbie14 05-13-2013 01:34 PM

Hi Sire,
Definitely I will now focus on that but incase you find any clue from Linux do let me know so I can test them too. Thank you.

Petruha69 10-02-2013 06:34 AM

Get rid of kipmi0 (CentOS 6.4)
 
Hi! For some reason I also wanted to get rid of kipmi0 on my CentOS 6.4 x64 installation. It was built in the kernel. Thus I edited kernel .config and replaced all *_IPMI*=y entries by =m. Rebuild the kernel and reboot. No more kipmi0.
Good luck!

newbie14 10-02-2013 07:53 AM

Dear Pethruha,
How do you rebuild the kernel ? What is the =m stands for ?

Petruha69 10-08-2013 03:47 AM

Dear newbie14,
I summed up what I did on http://myelectrons.com/build-linux-kernel-centos/
I'd be glad if you let me know whether it was helpful or otherwise.
Cheers,
- Serge.


All times are GMT -5. The time now is 06:06 AM.