LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Solaris / OpenSolaris (http://www.linuxquestions.org/questions/solaris-opensolaris-20/)
-   -   How do I measure our cache server performance? (http://www.linuxquestions.org/questions/solaris-opensolaris-20/how-do-i-measure-our-cache-server-performance-758504/)

terrytibbs 09-29-2009 03:21 PM

How do I measure our cache server performance?
 
Hi guys,

I need a method of measuring our cache servers performance.

We really need some extra hardware or look at outsourcing the caching to a provider such as Akamai.

The problem being we, need some statistics to prove how hot the servers are getting at peek times so the company will pay for it all.

We are running Solaris 10 on our systems with Sun One doing the caching. Would Dtrace be able to offer us the statistics?

On a side note, how would you increase the performance of a web cache server?

I am sorry if I have not provided enough info. Let me know if you need more details.

Thanks

salasi 09-29-2009 08:37 PM

I apologise that this isn't an answer, but I hope that I can help with some pointers even though it isn't an answer.

clarification
I assume that you are running something like, eg, squid in 'http accelerator' mode (that is caching some of the web pages from your web server, on their way to being delivered to the rest of the network). is that assumption correct ('cos if you are doing something else, the rest of this message really won't make any sense)?

In a general sense, there are various things that you can consider measuring. The first is hit rate; ie, what percentage of the requests that are served come from cache and what percentage actually result in a request to the underlying data source.

I haven't done this in http accelerator mode, but in the more normal squid usage mode (caching the external web for internal usage) it is almost laughably trivial to do (badly) and there are also utilities that exist to give you reports on squid's performance. The trivial way is to look through the log files, identify which are hits and which are misses and with some combination of grep, sed and awk count the number of each. Render the result as a percentage (or use calamaris, or another squid monitoring util, if that kind of thing is more to your taste), remembering that bash doesn't do floating point calcs, but you can easily find a way around that (bc...yuk) if you need to.

The trouble with this is its a hit rate. The trouble with it being a hit rate is:
  • a low hit rate is bad, but that doesn't tell you how to improve things; maybe something about the structure of the queries or the urls is making caching unnecessarily hard, maybe you need more memory devoted to caching, maybe you have set a 'no cache' directive for some or all of the information. It doesn't tell you which of those it is, but if you play with one and it gets better (or worse...worse can be good too) you know you have found a sensitive point.
  • hit rate isn't everything; you'd really want to know about the time to get your data (including extra latency) and loading on the underlying data source, assuming that creating the data is a compute-intensive (including i/o-intensive) process, as it may well be if there is a database involved (if there are just static pages and its just a matter of pulling pre-prepared static pages off disk, then the various numbers, and the considerations involved, are going to be markedly different)

I don't want to sound negative, but there is little more that I can tell you unless you say more about your application and the considerations (is the problem, eg, an excess loading on an underlying database engine, or have I completely invented that) and, even then, I probably can't tell you much that is useful.

terrytibbs 09-30-2009 11:28 AM

Quote:

Originally Posted by salasi (Post 3701148)
I apologise that this isn't an answer, but I hope that I can help with some pointers even though it isn't an answer.

clarification
I assume that you are running something like, eg, squid in 'http accelerator' mode (that is caching some of the web pages from your web server, on their way to being delivered to the rest of the network). is that assumption correct ('cos if you are doing something else, the rest of this message really won't make any sense)?

Thanks for the reply.

We are using Sun One to manage the caching. I could not find any information on it.

I was hoping to find a way to use D-Trace to capture I/O calls for Flusher PID and for WebServer PID so we can clearly identify the performance bottleneck.

Is this possible?

I really appreciate your input. Thanks again.

jlliagre 09-30-2009 11:45 AM

Quote:

Originally Posted by terrytibbs (Post 3701926)
Thanks for the reply.

We are using Sun One to manage the caching. I could not find any information on it.

Sun One used to be a line of products, now Sun Java System.
Have a look at your proxy error logs to find out what precise version of proxy server you are using.
Quote:

I was hoping to find a way to use D-Trace to capture I/O calls for Flusher PID and for WebServer PID so we can clearly identify the performance bottleneck.

Is this possible?
Sure, but what is the bottleneck you are complaining of ?
There are plenty of metrics you should first monitor before investigating with dtrace or similar tools.
Are you using any monitoring software on that box ?

terrytibbs 09-30-2009 03:56 PM

Quote:

Originally Posted by jlliagre (Post 3701945)
Sun One used to be a line of products, now Sun Java System.
Have a look at your proxy error logs to find out what precise version of proxy server you are using.

We are using a Sun One Plug-in that was apparently configured many moons ago. Nobody here knows anything about it, we call up the guy who configured it, of whom charges us lots of money fix/change it.

I am hoping to move to something like Squid, though frankly this is my first week as a Sys Admin and I have enough to get my head around!


Quote:

Originally Posted by jlliagre (Post 3701945)
Sure, but what is the bottleneck you are complaining of ?
There are plenty of metrics you should first monitor before investigating with dtrace or similar tools.
Are you using any monitoring software on that box ?

We are using Big Brother (though about to move to Nagios in the long term).

During our busy periods the CPU's in the Caching Servers are taking a hammering. Big Brother reports the CPU load average of between 700/800.

I really need to gather further information to justify spending cash.

How would I find a specific site that is causing more requests than others?

Again, would D-trace help me here, or is there a better option?

Once again, thanks for your help.

jlliagre 09-30-2009 04:21 PM

So you have no real clue about what product/version you are running.

What kind of servers are running the proxies ?

What exactly are you expecting dtrace to display ?

Isn't a 700+ CPU load enough to justify something must be done to improve the service ?

terrytibbs 09-30-2009 05:03 PM

Quote:

Originally Posted by jlliagre (Post 3702283)
So you have no real clue about what product/version you are running.

Correct.

Though I am working on it. I will post when I have more info. I may have jumped the gun a little. Sorry about that.

Quote:

Originally Posted by jlliagre (Post 3702283)
What kind of servers are running the proxies ?

Sun T2000's

Quote:

Originally Posted by jlliagre (Post 3702283)
What exactly are you expecting dtrace to display ?

We need to know when the load average increases, for how long and which specific pages are being called upon the most.

We will have to pay a provider such as Akamia to take the load off our caching servers. We obviously want to keep the cost down, so by knowing how long we need to off load our caching to them and at what times we off load our caching will help us work out our budget.

Quote:

Originally Posted by jlliagre (Post 3702283)
Isn't a 700+ CPU load enough to justify something must be done to improve the service ?

Yes, to me and you it is!

jlliagre 09-30-2009 05:40 PM

T2000 are very well suited to be used as proxies. There must be something wrong in your sizing. How many requests per second are you serving ?

terrytibbs 10-04-2009 04:12 PM

Really sorry, I'll revive this thread when I have more info.

Thanks for the input thus far, I'll be back!!


All times are GMT -5. The time now is 01:32 AM.