Remote performance monitor

Thakowbbery · 02-23-2008, 09:42 AM

Greetings

Can anyone give me some tips on a program for performance monitoring linux machines? Need the usual: CPU, memory, disk I/O, disk buffer, swap, network.
But I also need history reports, or graphics.
I´ll start running some tests with Ganglia. Hope it helps me out.

Well, thanks in advance for the attention.

Lepakko · 02-23-2008, 10:00 AM

MRTG is what you need.

Thakowbbery · 02-23-2008, 11:35 PM

Quote:

Originally Posted by Lepakko

MRTG is what you need.

But I though MRTG was only intended for network traffic.

btmiller · 02-24-2008, 12:16 AM

MRTG can monitor just about anything via SNMP. I also like Ganglia for this task. There are a couple other solutions (some free and some not) if you Google for Linux performance monitoring or the like.

farslayer · 02-24-2008, 12:20 AM

MRTG can graph anything you can query via snmp.. so everything you listed and more.

Other options to look at if you want more features and automated notification: Zenoss, Nagios, OpenNMS, jffnms
http://tcsa.org/monitoring/

Lepakko · 02-24-2008, 03:06 AM

From the MRTG web site:

"Routers are only the beginning. MRTG is being used to graph all sorts of network devices as well as everything else from weather data to vending machines."

I have MRTG running on my server, where it logs things like memory usage, disk space and uptime (in addition to the standard logs, i.e. traffic and such). The Perl scripts to do the logging are fairly easy to find on the Internet, or write yourself.

predatorz · 02-24-2008, 03:38 AM

Perhaps you can take a look at Zabbix?
www.zabbix.com

Thakowbbery · 02-24-2008, 06:44 PM

Quote:

Originally Posted by predatorz

Perhaps you can take a look at Zabbix?
www.zabbix.com

Well people, thanks for the tips.
I'm already running my tests, and guess I'll have my candidate elected pretty soon.

Thakowbbery · 02-25-2008, 10:48 AM

Hum...
I decided on MRTG since I need to generate those reports asap.
The problem is: MRTG only checks the data every 5 minutes, and if I try to run it every minute, it seems to be no changes at all to the graphs (in this case, I´m talking about CPU checks).
Anything I can do to make it work every minute?

farslayer · 02-25-2008, 10:56 AM

From the mrtg mail list..

Quote:

Note that unless you are using rrdtool you can not set Interval to less than 5 minutes.

If you are using rrdtool you can set interval down to 1 minute.

Note though, setting the Interval for an rrdtool/mrtg setup will influence the initial creation of the database. If you change the interval later, all existing databases will remain at the resolution they were initially created with.

https://lists.oetiker.ch/pipermail/m...er/031973.html
http://www.mail-archive.com/mrtg@lis.../msg28967.html

And from the MRTG Docs..
http://lebos.org/mrtg/mrtg/doc/mrtg-reference.en.html

Quote:

Interval

How often do you call mrtg? The default is 5 minutes. If you call it less often, you should specify it here. This does two things:

* The generated HTML page contains the right information about the calling interval ...
* A META header in the generated HTML page will instruct caches about the time-to-live of this page .....

In this example, we tell mrtg that we will be calling it every 10 minutes. If you are calling mrtg every 5 minutes, you can leave this line commented out.

Example:

Interval: 10

Note that unless you are using rrdtool you can not set Interval to less than 5 minutes. If you are using rrdtool you can set interval in the format

Interval: MM[:SS]

Down to 1 second. Note though, setting the Interval for an rrdtool/mrtg setup will influence the initial creation of the database. If you change the interval later, all existing databases will remain at the resolution they were initially created with. Also note that you must make sure that your mrtg-rrd Web-frontend can deal with this kind of Interval setting.

markseger · 06-11-2009, 05:55 AM

I realize this post is a little late, but I have issues with centralized monitoring for several reasons:
- it doesn't scale, at least not to 100s or 1000s of systems
- if there's a network problem you lose the data you most need
- monitoring interval over a minute aren't worth the bother for diagnostic purposes

I had open sourced collectl several years ago after using it internally and with HPC customers systemes for several years before then so it has been road tested and pretty stable. It's always been able to export its data over a socket to a remote application but recently I added support for it to send data directly to a ganglia gmond for Pacific Northwest National Labs to use for monitoring their 2300 node cluster! This gives them the ability to collect data at the rate of one sample every 10 seconds on each machine but only send a subset and at a lower rate to ganglia. However they don't write it to rrd because they'd overwhelm it and only use ganglia as the transport to get the data to their own presentation engine.

The reason I mention this is that as more people try to use collectl with ganglia it will help solidify the interface between the two.

btw - HP recently started a blog on High Performance Clusters to which I've added some further discussion on HCP monitoring in case anyone is interested. see: http://www.communities.hp.com/online...e/default.aspx

-mark