first of all I see you're using rrdtool for plotting and I'm also guessing you're taking infrequent samples, so you lose on 2 counts! first of all, rrd 'normalizes your data so if you have more samples than will fit on the graph, they get combined and by whatever way you have things set up to do. second, are those 15 minutes buckets? think about it. an 8MB spike in 15 minutes could easily be 100MB spike for many seconds or even minutes. you can't tell!
if this is something you really care about resolving you need finer grain metrics, full stop! if your monitoring tool can't handle, either use sar or collectl - I prefer collectl
. if you still want rrd graphics for even a coarse view, run both! tools like sar are and collectl and take almost no resources.
I also think the 'top' tools are only somewhat useful. they give you a quick look at what's going on right now, but no sense of looking at a longer term time line.
so many people sample at 5-10 minute intervals and happily think there system is doing just fine when in fact if there are spikes they'll never see them OR they'll be very small. for example, you might have short busts of 100% cpu, disk or network loads and never even know it. that's not to say short burts aren't necessarily a bad thing but if you have system errors corresponding to them they could be. w/o data you're just guessing...
-mark