Linux - EnterpriseThis forum is for all items relating to using Linux in the Enterprise.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
qq .. we have a box which we monitor via Nagios. Now some times we get an alert saying "High CPU" but when we go on the box the process that caused the hi CPU came to noraml lets say after 10 min or so. NOw how can we see that what spiked the CPU 2-3 hours back ?
It depends on what you're monitoring performance for. If you just need to have detailed nfo anyway maybe you should look into some kind of database-backed solution (search Freshmeat, Sourceforge, Savannah, Berlios). If you OTOH only need it for assessing what's going wrong *now* you could run something like 'atop', which writes detailed stats to file you can step through and replay later on, or have for instance check_load trigger something polling the box over SNMP or HTTP and return output from like '/bin/ps -eo %C -eo pid,command | grep -v '^ 0.0''. OTOH if you have no idea what is or are the bottlenecks you may want to look into more generic stats first like Atsar or SAR or Dstat or Collectl (which both combine output from the sysstat package tools).
This is a reason most places I've worked ditched Nagios for OpenNMS which has the capability to graph resources. So you can view a complete history of cpu usage, disk space, network traffic, etc.
I'd rather not wait until you manage to add another of your "invaluable expert" replies.
What's that suppose to mean? Are you joking or were you actually serious?
I thought by providing information about OpenNMS in which it graphs would or could give insight to the users problem, they could at least see if the CPU load actually did spike. My experience with Nagios sometimes provided false alerts. At least with OpenNMS, monitoring not only CPU but networking, processes and just about anything else, it would be easier to narrow down the culprit if there was indeed a CPU load or spike.
I thought by providing information about OpenNMS in which it graphs would or could give insight to the users problem, they could at least see if the CPU load actually did spike.
That would only apply if I reacted to something in your reply to the OP, which I did not.
Quote:
Originally Posted by trickykid
What's that suppose to mean?
I asked you a question to which you replied
Quote:
Originally Posted by trickykid
Oh wait, you wanted to know other details of each process...
.
So what kind of response is that? What kind of value does a reply like that have?
That would only apply if I reacted to something in your reply to the OP, which I did not.
I asked you a question to which you replied .
So what kind of response is that? What kind of value does a reply like that have?
So only half of my reply gets a reply from you? I'm actually offended by your first response to it in which I questioned. You make it sound as if *all* my replies on this forum are of "invaluable expert." If that's the case, I'll just stop contributing if you honestly feel that way.
That portion of my reply was being half sarcastic and also realizing you were implying that *zoom* in on gory process details was for individual processes, not just taking a snapshot of the load. That's all. But with some custom graphing and monitors, I'm sure it's possible with OpenNMS. Does that satisfy you as a valuable response? I'll just be sure to stop any light hearted discussions in any threads you participate in okay.
Hmm... I just saw the note from unSpawn which said "OTOH if you have no idea what is or are the bottlenecks you may want to look into more generic stats first like Atsar or SAR or Dstat or Collectl (which both combine output from the sysstat package tools)."
As the author of collectl I just want to say collectl has nothing to do with sysstat - it's a completely separate, standalone tool. Also, since the posting of this note I've been adding a lot of extra goodies such as monitoring process I/O stats if you have the right kernel. Someone had mentioned detailed process monitoring and while collectl by default only looks at processes once every 60 seconds to keep the load down, if you tell it to look at specific processes you can monitor them every second or so and not generate any appreciable load. That means you can watch memory, cpu, i/o, page faults over time.
There was also mention about watching memory, and while slab monitoring is system-wide, if you do have a few slabs that are growing uncontrolled you can sometimes figure out who's using them just by their name or you can google them and learn more too.
Anyhow be sure to check out http://collectl.sourceforge.net/ and within the next couple of days of this posting I expect to release version 2.6.4 which will have the capability of showing top I/O users in much the same way the top command can show top cpu consumers. Stay tuned...
Hmm... I just saw the note from unSpawn which said "OTOH if you have no idea what is or are the bottlenecks you may want to look into more generic stats first like Atsar or SAR or Dstat or Collectl (which both combine output from the sysstat package tools)."
As the author of collectl I just want to say collectl has nothing to do with sysstat - it's a completely separate, standalone tool.
You just misread my remark. If I re-phrase it like "... (Atsar or SAR) or (Dstat or Collectl), the last two aggregate output somewhat similar to running all tools from the sysstat package at once." it should be more clear I think.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.