LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Red Hat (http://www.linuxquestions.org/questions/red-hat-31/)
-   -   performance analysis for red hat linux (http://www.linuxquestions.org/questions/red-hat-31/performance-analysis-for-red-hat-linux-4175420626/)

holroyd 08-06-2012 06:58 AM

performance analysis for red hat linux
 
Hi,
we've setup a couple of RH vms as test servers.

All up we have three - all cloned.

However, two of them have massive performance problems and we are scratching our heads as to why.

the third one runs fine and the only difference the 'poor performers' is they share an NFS share.

All resources are identical (CPU, memory etc). At this point the only significant difference is the output from free:

'good performer':
Code:

            total      used      free    shared    buffers    cached
Mem:          5962      5902        60          0        49        692
-/+ buffers/cache:      5161        801
Swap:        5051      1779      3272




Code:

'poor performer'
            total      used      free    shared    buffers    cached
Mem:          5962      5926        36          0          0          9
-/+ buffers/cache:      5917        45
Swap:        5051      2586      2465

there does seem to be a difference in terms of 'cached' and 'buffers'. Could this explain the difference in performance ? If yes how can we track down the offending process etc.

Any ideas are welcome.

thanks,
Michael

sameerss 08-07-2012 01:16 AM

Hi,

Can you share o/p of top command also ,

which application is running on these vm's ?


--
Sameer S.

holroyd 08-07-2012 02:54 AM

Hi,
this is the top of top:

Code:

good

top - 09:51:37 up 61 days, 17:42,  4 users,  load average: 0.00, 0.04, 0.06
Tasks: 137 total,  2 running, 135 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.0%us,  0.2%sy,  4.1%ni, 94.5%id,  0.8%wa,  0.2%hi,  0.2%si,  0.0%st
Mem:  6106104k total,  6064628k used,    41476k free,  272320k buffers
Swap:  5172920k total,  2020036k used,  3152884k free,  634972k cached


bad
top - 09:52:32 up 5 days, 17:17,  1 user,  load average: 0.02, 0.05, 0.06
Tasks: 118 total,  2 running, 116 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.3%us,  0.0%sy,  1.3%ni, 97.3%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  6106104k total,  6066260k used,    39844k free,    2188k buffers
Swap:  5172920k total,  2655548k used,  2517372k free,    20232k cached

The VMs are running several java processes (Weblogic application servers)

I've managed to reduce the swap usage by shutting down some of the processes and got the numbers of the order
of the 'good system' however, the 'buffers' and 'cache' entries remain unchanged ie 0 and <10 - and, unfortunately, the performance didn't really improve.

From what
i was able to find was that this is for file writing... could that account for the poor performance.

thanks,
Michael

deadeyes 08-07-2012 04:39 AM

Quote:

Originally Posted by holroyd (Post 4747865)
Hi,
this is the top of top:

Code:

good

top - 09:51:37 up 61 days, 17:42,  4 users,  load average: 0.00, 0.04, 0.06
Tasks: 137 total,  2 running, 135 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.0%us,  0.2%sy,  4.1%ni, 94.5%id,  0.8%wa,  0.2%hi,  0.2%si,  0.0%st
Mem:  6106104k total,  6064628k used,    41476k free,  272320k buffers
Swap:  5172920k total,  2020036k used,  3152884k free,  634972k cached


bad
top - 09:52:32 up 5 days, 17:17,  1 user,  load average: 0.02, 0.05, 0.06
Tasks: 118 total,  2 running, 116 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.3%us,  0.0%sy,  1.3%ni, 97.3%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  6106104k total,  6066260k used,    39844k free,    2188k buffers
Swap:  5172920k total,  2655548k used,  2517372k free,    20232k cached

The VMs are running several java processes (Weblogic application servers)

I've managed to reduce the swap usage by shutting down some of the processes and got the numbers of the order
of the 'good system' however, the 'buffers' and 'cache' entries remain unchanged ie 0 and <10 - and, unfortunately, the performance didn't really improve.

From what
i was able to find was that this is for file writing... could that account for the poor performance.

thanks,
Michael

Could you post output from vmstat?
iostat output can help as well.

holroyd 08-07-2012 04:53 AM

Hi,
iostat:

Code:

good:
avg-cpu:  %user  %nice %system %iowait  %steal  %idle
          2.92    4.11    0.18    1.20    0.00  91.59

Device:            tps  Blk_read/s  Blk_wrtn/s  Blk_read  Blk_wrtn
sda              7.35      107.95      196.57  576590746 1049915473
sda1              6.78        92.86      180.80  495976026  965685401
sda2              0.56        15.09        15.77  80613272  84230072
fd0              0.00        0.00        0.00          8          0


bad:

avg-cpu:  %user  %nice %system %iowait  %steal  %idle
          0.72    4.58    0.32    5.12    0.00  89.27

Device:            tps  Blk_read/s  Blk_wrtn/s  Blk_read  Blk_wrtn
sda              39.32      698.92      199.56  350423034  100052928
sda1              4.41        55.29        45.12  27721994  22624032
sda2            34.91      643.63      154.43  322700616  77428896


VM Stat:

Code:

good:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b  swpd  free  buff  cache  si  so    bi    bo  in  cs us sy id wa st
 0  0 2020308  46600  50564 710744    4    4    27    49    1    4  7  0 92  1  0


bad:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b  swpd  free  buff  cache  si  so    bi    bo  in  cs us sy id wa st
 0  0 1431496 744900  17000 374328  161  39  175    50  29  18  5  0 89  5  0

thanks,

Michael

sameerss 08-07-2012 05:51 AM

Hi,

Are there any processes in D state ?

It can slow down the performance.

unSpawn 08-07-2012 06:22 AM

Quote:

Originally Posted by holroyd (Post 4747865)
I've managed to reduce the swap usage by shutting down some of the processes

Such lines are not not meaningful. Please always be as verbose as possible: which processes exactly and what reduction did they cause? What processes do consume the most memory and what are their SAR (System Activity Report) statistics wrt disk I/O? If unsure run any SAR like Atop, Dstat, collectl, etc, etc.


Quote:

Originally Posted by holroyd (Post 4747865)
The VMs are running several java processes (Weblogic application servers)

...and there's the killer. Oracle/BEA WebLogic requires Java (which JVM are you running?). Java has a different way of memory handling and in some cases doesn't free memory like regular applications would. I suggest you invest time reading the basic diagnostics documentation for your JVM and researching analysis methods and tools for Java like jProfiler and minimally use jtop to get a grip on memory usage. Since you're using WebLogic you should read the documentation. It suggests minimum hardware requirements (follow those) and diagnostics (see for instance http://serveraddress/console/dashboard and 'jrcmd' if you use JRockit). If your machine doesn't contain enough RAM to serve (I don't know what you run but WAS may require gigabytes of RAM on its own) then IMHO you should not try to starve the system from its own resources but put Java processes itself on a diet. Probably need some more to read, some random links:
http://magazine.redhat.com/2006/09/1...andrew-oliver/
http://docs.oracle.com/cd/E13222_01/...an/capgen.html
http://www.javaperformancetuning.com/resources.shtml

holroyd 08-07-2012 03:02 PM

unSpawn,
fair enough. We know, or at least strongly suspect that its the java processes running the Weblogic instances - we are using Jrockit. The processes that we shutdown were weblogic processes. We've already had a look at the java processes with the jrockit mission control. . .

I guess the point i'm trying to make is that we reduced the java processes on the 'bad' machines such that 'free' and 'top' showed us swap usage ofless than the 'good' machine, the performance was still unacceptable.
Memory ussage on a JVM basis looked the same on each machine. The JVM configuration is identical on both machines and the webapplications run are almost identical.

The difference being the two are setup as a cluster with a shared NFS file mount and internal communication between the nodes. Although shutting down one machine didn't make a difference either.

Looking at top and free you can have 1.8gigs of swap being used and some of the java processes were using up to 30% cpu and still the overall performance was ok. We get similar numbers on the 'bad' machines and the whole thing grinds to a halt. So i don't think its the memory usage of the java processes - (though i will eat my words on this thread if it turns out it is), some of us in the team think its a network config issue - the cluster, nfs etc.

thanks,
Michael

markseger 08-07-2012 04:33 PM

wow, looking at this post is so 1990
When I want to compare a few hundred! machines I run colmux/collectl on all of them at the same time and sort on the different columns using the arrow keys. This lets me instantaneously compare what all the machine are doing with respect to cpu, disk, network, memory, etc. With only 3 machines it would be trivial. no need to post output from top which is in one format, iostat which is in another or free which is in yet another. this is just too painful to look at.
-mark


All times are GMT -5. The time now is 11:47 AM.