Thanks for the responses...
- while there are a few entries in /proc for the nvidia card, none seem to reflect memory usage.
- gDEBugger - this seems promising, but doesn't run on my machine - a lib issue. I'll ask my admin about it today (::grumble:: about not having admin privileges)
- nvPerfkit requires you to run its own driver- that driver is outdated and I couldn't install it anyway (see grumble above

)
- the ati perf studio requires windows and an ati card.
- display lists vs vbo? I could get long winded, but the short answer is "because I chose to". This is not trying to sound nasty, just saying that both methods do similar things in very different ways, and both have advantages over the other in certain cases. Several of the advatages of VBOs don't apply in my case (shared vertexes, for example), so I went the display list route. I haven't done an A vs B, but the display lists are running quite fast, so much so that I didn't see a need to optimize getting data to the screen any further... just need to get it off of the disk faster and less frequently.