LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 06-06-2008, 07:34 AM   #1
rajwinder
LQ Newbie
 
Registered: Jun 2008
Posts: 21

Rep: Reputation: 15
NIce History


qq .. we have a box which we monitor via Nagios. Now some times we get an alert saying "High CPU" but when we go on the box the process that caused the hi CPU came to noraml lets say after 10 min or so. NOw how can we see that what spiked the CPU 2-3 hours back ?
 
Old 06-06-2008, 08:14 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Nagios check_procs doesn't give you too much info, so AFAIK you can't, unless you already sent or saved detailed process information.
 
Old 06-06-2008, 08:26 AM   #3
rajwinder
LQ Newbie
 
Registered: Jun 2008
Posts: 21

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by unSpawn View Post
Nagios check_procs doesn't give you too much info, so AFAIK you can't, unless you already sent or saved detailed process information.
Yep Nagios wouldnt provide that ... so the onlny solution is to write a custom script to run every minute or so take a snapshot of top and store it ?
 
Old 06-06-2008, 10:11 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
It depends on what you're monitoring performance for. If you just need to have detailed nfo anyway maybe you should look into some kind of database-backed solution (search Freshmeat, Sourceforge, Savannah, Berlios). If you OTOH only need it for assessing what's going wrong *now* you could run something like 'atop', which writes detailed stats to file you can step through and replay later on, or have for instance check_load trigger something polling the box over SNMP or HTTP and return output from like '/bin/ps -eo %C -eo pid,command | grep -v '^ 0.0''. OTOH if you have no idea what is or are the bottlenecks you may want to look into more generic stats first like Atsar or SAR or Dstat or Collectl (which both combine output from the sysstat package tools).
 
Old 06-06-2008, 11:42 AM   #5
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
This is a reason most places I've worked ditched Nagios for OpenNMS which has the capability to graph resources. So you can view a complete history of cpu usage, disk space, network traffic, etc.
 
Old 06-06-2008, 11:53 AM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by trickykid View Post
OpenNMS which has the capability to graph resources. So you can view a complete history of cpu usage, disk space, network traffic, etc.
Sure, shiny graphs are easy for overview, but can you *still* zoom in on the gory per-process details with it?..
 
Old 06-06-2008, 12:32 PM   #7
rajwinder
LQ Newbie
 
Registered: Jun 2008
Posts: 21

Original Poster
Rep: Reputation: 15
WEll Hobbit is better then nagios in this respect where we can have snapshot of top.

But I got ur point guys ... I am not looking for some specific process here so a general Top in non interactive mode will do for me ..

Thanks
 
Old 06-07-2008, 08:45 AM   #8
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by unSpawn View Post
Sure, shiny graphs are easy for overview, but can you *still* zoom in on the gory per-process details with it?..
Yup, it supports zooming in on the graph to see smaller time increments within the given window you're viewing..

Oh wait, you wanted to know other details of each process...
 
Old 06-07-2008, 11:47 AM   #9
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by trickykid View Post
Oh wait
I'd rather not wait until you manage to add another of your "invaluable expert" replies.
 
Old 06-08-2008, 12:35 AM   #10
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by unSpawn View Post
I'd rather not wait until you manage to add another of your "invaluable expert" replies.
What's that suppose to mean? Are you joking or were you actually serious?

I thought by providing information about OpenNMS in which it graphs would or could give insight to the users problem, they could at least see if the CPU load actually did spike. My experience with Nagios sometimes provided false alerts. At least with OpenNMS, monitoring not only CPU but networking, processes and just about anything else, it would be easier to narrow down the culprit if there was indeed a CPU load or spike.
 
Old 06-08-2008, 06:25 AM   #11
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by trickykid View Post
I thought by providing information about OpenNMS in which it graphs would or could give insight to the users problem, they could at least see if the CPU load actually did spike.
That would only apply if I reacted to something in your reply to the OP, which I did not.


Quote:
Originally Posted by trickykid View Post
What's that suppose to mean?
I asked you a question to which you replied
Quote:
Originally Posted by trickykid View Post
Oh wait, you wanted to know other details of each process...
.
So what kind of response is that? What kind of value does a reply like that have?
 
Old 06-08-2008, 08:33 AM   #12
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by unSpawn View Post
That would only apply if I reacted to something in your reply to the OP, which I did not.



I asked you a question to which you replied .
So what kind of response is that? What kind of value does a reply like that have?
So only half of my reply gets a reply from you? I'm actually offended by your first response to it in which I questioned. You make it sound as if *all* my replies on this forum are of "invaluable expert." If that's the case, I'll just stop contributing if you honestly feel that way.

That portion of my reply was being half sarcastic and also realizing you were implying that *zoom* in on gory process details was for individual processes, not just taking a snapshot of the load. That's all. But with some custom graphing and monitors, I'm sure it's possible with OpenNMS. Does that satisfy you as a valuable response? I'll just be sure to stop any light hearted discussions in any threads you participate in okay.
 
Old 06-09-2008, 12:13 PM   #13
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
The answer will do, thanks.
 
Old 06-09-2008, 03:27 PM   #14
markseger
Member
 
Registered: Jul 2003
Posts: 244

Rep: Reputation: 26
Hmm... I just saw the note from unSpawn which said "OTOH if you have no idea what is or are the bottlenecks you may want to look into more generic stats first like Atsar or SAR or Dstat or Collectl (which both combine output from the sysstat package tools)."

As the author of collectl I just want to say collectl has nothing to do with sysstat - it's a completely separate, standalone tool. Also, since the posting of this note I've been adding a lot of extra goodies such as monitoring process I/O stats if you have the right kernel. Someone had mentioned detailed process monitoring and while collectl by default only looks at processes once every 60 seconds to keep the load down, if you tell it to look at specific processes you can monitor them every second or so and not generate any appreciable load. That means you can watch memory, cpu, i/o, page faults over time.

There was also mention about watching memory, and while slab monitoring is system-wide, if you do have a few slabs that are growing uncontrolled you can sometimes figure out who's using them just by their name or you can google them and learn more too.

Anyhow be sure to check out http://collectl.sourceforge.net/ and within the next couple of days of this posting I expect to release version 2.6.4 which will have the capability of showing top I/O users in much the same way the top command can show top cpu consumers. Stay tuned...

-mark

Last edited by markseger; 06-09-2008 at 03:29 PM.
 
Old 06-09-2008, 04:18 PM   #15
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by markseger View Post
Hmm... I just saw the note from unSpawn which said "OTOH if you have no idea what is or are the bottlenecks you may want to look into more generic stats first like Atsar or SAR or Dstat or Collectl (which both combine output from the sysstat package tools)."

As the author of collectl I just want to say collectl has nothing to do with sysstat - it's a completely separate, standalone tool.
You just misread my remark. If I re-phrase it like "... (Atsar or SAR) or (Dstat or Collectl), the last two aggregate output somewhat similar to running all tools from the sysstat package at once." it should be more clear I think.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Linux: History Of Nice Levels LXer Syndicated Linux News 0 07-19-2007 04:46 PM
Download history and copy history? inverted.gravity Linux - Newbie 1 02-21-2006 12:31 PM
How to find back "history" database after "history -c" ? san_lss Linux - Newbie 1 01-07-2004 11:53 AM
Google has been nice to my site. Is it nice to your site? Travis86 General 5 08-31-2003 01:38 PM
Nice and confused about nice Hangdog42 Linux - General 5 06-03-2003 04:44 PM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 12:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration