LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-08-2011, 02:09 AM   #1
mario.almeida
Member
 
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179

Rep: Reputation: 27
Question Oracle server load avg


Dear All,

I am not an oracle DBA, but need to find the cause of high load avg, even after adding more CPU power the load avg is the same.

Old servers
IBM x3350
2x4c
Intel(R) Xeon(R)E5405 @ 2.00GHz
8GB RAM

New servers
IBM x3850 X5
4x6c
Intel(R) Xeon(R) E7540 @ 2.00GHz
32GB RAM

OS: CentOS 4.8 64bit
Oracle 9i Standard Edition.
HBA: Qlogic 4Gbps
SAN: 100GB Fiberchannel,

Old and new servers load avg is the same (between 6-10 peak hours and 3 to 5 during off-peak hours) no difference. I expected the laod to come down as we added more CPUs.
Our DBA says queries are optimise.

Can some Oracle DBA give some tips to find the cause.
 
Old 05-08-2011, 02:46 AM   #2
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Dude - you need to profile your system.

For starters, benchmark system performance: "top", "iostat", "sar", etc.

Then you need to benchmark your Oracle performance. There are a million web links, books, tools and courses to guide you.

Who knows - there MIGHT not even be a problem. You don't need to be a DBA ... but you do need to do a bit of basic homework.

This link (one of MANY Linux performance guides) might help get you started in the right direction:

http://www.redbooks.ibm.com/abstracts/redp4285.html

PS:
Most information you might find about RedHat or Fedora should be equally applicable to CentOS.

PPS:
Just because your DBA *says* "queries are optimized" ... doesn't necessarily mean that they *are* optimized . Or that your tables are correctly indexed, or that your Oracle tablespaces and buffers are tuned

Last edited by paulsm4; 05-08-2011 at 02:50 AM.
 
Old 05-08-2011, 03:36 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,314

Rep: Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172
Quote:
Originally Posted by mario.almeida View Post
I expected the laod to come down as we added more CPUs.
That presumes that your loadavg is entirely comprised of (blocked) runnable tasks. Possible, but not obligatory. Linux, unlike historical Unix also includes tasks in uninteruptible sleep. Usually described as waiting on disk I/O, but processes (threads) can be placed in that state.Try this to see if you have any
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' > topsave.txt
 
Old 05-08-2011, 05:46 AM   #4
mario.almeida
Member
 
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179

Original Poster
Rep: Reputation: 27
@syg00: Thanks for that, before posting here I had checked for process in D state but did not find anything.

At present reading http://www.redbooks.ibm.com/abstracts/redp4285.html once again just to see if I have missed anything.
 
Old 05-08-2011, 06:22 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,314

Rep: Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172Reputation: 4172
Might be that your work arrives in "bursts" - that script only looks at a point in time, and presumes anything in "D" stays that way for a while (long enough to be counted).
The numbers for loadavg are maintained at schedule() I think, so they are impervious to the actual arrival rate.

Maybe look at something like collectl to get better granularity of your data.
 
Old 05-08-2011, 07:58 AM   #6
mario.almeida
Member
 
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179

Original Poster
Rep: Reputation: 27
@syg00: For your reference, Current load avg is 4 to 5 (off-peak day) and below is the output of collectl.

Code:
#               <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Date Time      cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
05/08 11:47:17   16   0  7879  13615      8     16    278      4   2491   6850   2182    6301 
05/08 11:47:18    9   0  8323  15745      8     16      0      1      0      1      0       0 
05/08 11:47:19   11   0  6614   9545      8     16      1      1   2292   7789   2632    7368 
05/08 11:47:20    6   0  4853   5409      8     16      1      1   1785   4730   1348    4294 
05/08 11:47:21    8   0  6035   8051      8     16      0      1   2423   6625   1861    6207 
05/08 11:47:22    7   0  6551   9045      8     16      1      1      0      2      0       0 
05/08 11:47:23    7   0  4803   5928      8     16    153      4   1254   4173   1276    3982 
05/08 11:47:24   10   0  6948  10398      8     16     13      3   1022   2848    796    2628 
05/08 11:47:25    7   0  6334   8052      8     16      1      1   1697   5112   1534    4775 
05/08 11:47:26    2   0  6331   7540      8     16      0      1   1097   7650   3293    7545 
05/08 11:47:27    2   0  4513   4013      8     16      1      1    249   2030    957    1991 
05/08 11:47:28   15   1  7660  16794      8     16    141      4   2433   8223   2791    7957 
05/08 11:47:29   12   0  7308  11121      8     16      0      1   1699   5425   1791    5097 
05/08 11:47:30   13   0  6916  10363      8     16      5      1   2079   5071   1377    4613 
05/08 11:47:31   12   0  6725   9775      8     16      0      2   2068   4741   1295    4308 
05/08 11:47:32    8   0  6568  10131      8     16    193      5   1687   4549   1360    4207 
05/08 11:47:33    7   0  6181   7513      8     16    181      7    940   3565   1262    3401 
05/08 11:47:34   10   1  6785   9068      8     16      0      1   1254   4276   1523    4013 
05/08 11:47:35   10   1  6395   8686      8     16    394      1   1812   4340   1180    3931 
05/08 11:47:36    7   0  5701   7049      8     16      1     72   1263   3424    975    3221 
05/08 11:47:37    7   0  4336   4470      8     16     13      3    802   2122    532    1947 
05/08 11:47:38   11   0  6155   7661      8     16     93      4   1198   3671   1140    3435 
05/08 11:47:39    9   0  5106   5881      8     16      0      1   1082   2818    747    2591 
05/08 11:47:40   13   0  6776  10370      8     16     60      1   1692   4794   1434    4520 
05/08 11:47:41   15   1  7817  13341      8     16      1     14   2086   6413   2000    6047 
05/08 11:47:42   15   0  6911  10196      8     16      1      1   1888   5033   1537    4643 
05/08 11:47:43   16   1  7269  12226      8     16   6027     52   2498   5999   1682    5496 
05/08 11:47:44   13   0  6166   8844      8     16  14122    115   1727   4146   1086    3707 
05/08 11:47:45    9   0  6306   8955      7     12  15904    129   1696   4213   1134    3863 
05/08 11:47:46   10   0  6801   9770      9     20  15857    130   1414   4747   1609    4456 
05/08 11:47:47    9   0  8114  14064      8     16  15640    125   1735   6985   2447    6736 
05/08 11:47:48   15   0  7999  13470      8     16  13631    117   2015   6643   2296    6281 
05/08 11:47:49   10   0  6713   9604      8     16  14254    115   1676   4661   1525    4346 
05/08 11:47:50   13   0  6825   9633      8     16  15548    129   1713   4638   1496    4311 
05/08 11:47:51    9   1  5943   7872      8     16  11413     92   1433   3683   1052    3370 
05/08 11:47:52    8   0  4950   6295      8     16  14178    118   1180   2711    748    2454 
05/08 11:47:53    7   0  4323   4700      8     16  13901    118    992   2150    568    1885 
05/08 11:47:54   10   0  6027   8009      8     16  14618    118   1455   3883   1049    3579 
05/08 11:47:55   12   0  6096   8155      8     16  12674    104   1663   3953   1102    3646 
05/08 11:47:56   11   0  6824  10533      8     16  12833    108   1992   5042   1434    4683 
05/08 11:47:57    6   0  5871   7506      8     16  14370    117   1161   3571    995    3370 
05/08 11:47:58    8   0  5735   7110      8     16  14397    117   1154   3426   1101    3200 
05/08 11:47:59    7   0  5486   7093      8     16  14370    119   1213   3359    935    3167 
05/08 11:48:00   11   0  6166   8847      8     16  14659    118   1524   4085   1138    3779
 
Old 05-08-2011, 03:23 PM   #7
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Hi again, mario.almeida -

To say "I need to find the cause of high load avg" is hopelessly, hopelessly vague.

Clearly you've done more work and more analysis than you've told us ... but if you want to make any progress, you need to:

* Benchmark your system performance
* Benchmark your Oracle performance
* Identify the bottlenecks (which may - or may NOT be "high load average". "It depends"!!!!!)
* Identify potential culprits to investigate further

Q: Are users complaining of any specific problems/symptoms?

Q: Do you have a particular peformance goal (or better, goals)?

Q: Would it be possible for you to post "top" output illustrating the problem?
<= Two separate questions here: a) can you post the output, and b) does "top" even illustrate the problem???

PS:
Just because your DBA *says* "queries are optimized" ... doesn't necessarily mean that they *are* optimized . Or that your tables are correctly indexed, or that your Oracle tablespaces and buffers are tuned . Oracle tuning is DEFINITELY something to be scrutinized here!

PS:
You *are* running the 64-bit versions of CentOS and Oracle, aren't you?

Last edited by paulsm4; 05-08-2011 at 03:26 PM.
 
Old 05-08-2011, 04:21 PM   #8
mario.almeida
Member
 
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179

Original Poster
Rep: Reputation: 27
Sometimes users complain that the system is slow, during that time load will be 10 to 15.

Yes, OS and oracle both 64bit.
I'll put the output of top tomorrow morning.

PIDs which are taking more CPU% was able to find the related queries, These queries are taking lots of CPU time, I will check with our developers for those queries.

I have noted 1 thing from the output of top, when the load is high, not all CPUs are utilised, I mean when top is running and you press 1 it will display the cpu% of all the cpus. That output I checked and found only some CPUs go above 50% and some below 4% to 5%. May be the queries which are taking more time are for the cpus which are 50% and above. If load avg is the percentage of running process and the processes still in queue, Then why the percentage is so high when other CPUs are still below 4 to 5%? this something worrying me. May be the SQL queries are given to the same PID?
 
Old 05-08-2011, 05:39 PM   #9
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Important point:

1. "High %CPU utilization" (for example, ++50%) can be GOOD.
There's nothing WRONG with using the CPU you paid for, and there's plenty of breathing room.

2. "Low %CPU utilization" (for example, --10%), if accompanied by "high load average" (for example, anything above "2") can be BAD. This means that your CPUs are SITTING ON THEIR HANDS, GRIDLOCKED, when there's work to be done.

Two cases that might account for low %CPU and high load average:
a) you're I/O bound (disk or network)
b) you're memory bound

3. Two good tools to check load average are "top" and "uptime".
Two good tools to check if you're swapping are "top" and "swap -s".
Two good tools to check if you're I/O bound are "top" and "iostat"

And don't forget that Oracle tuning
 
Old 05-09-2011, 01:27 AM   #10
mario.almeida
Member
 
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179

Original Poster
Rep: Reputation: 27
Below output is during the time when there is high disk activity on dm-1

Code:
top - 05:07:45 up 15 days,  7:34,  3 users,  load average: 5.33, 6.04, 5.54
Tasks: 542 total,   7 running, 535 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.9% us,  0.6% sy,  0.0% ni, 85.8% id,  0.5% wa,  0.0% hi,  0.2% si
Mem:  32927116k total, 32795464k used,   131652k free,   346588k buffers
Swap: 12287992k total,    69884k used, 12218108k free, 25963360k cached
iostat -xkd 1
Code:
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-1         0.00   0.00  0.00 3985.71    0.00 31885.71     0.00 15942.86     8.00    27.58    8.24   0.12  48.88
dm-2         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-3         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowers1   0.00   2.04  0.00  8.16    0.00   81.63     0.00    40.82    10.00     0.00    0.12   0.12   0.10
emcpowerf1   0.00   2.04  0.00  8.16    0.00   81.63     0.00    40.82    10.00     0.01    1.12   1.12   0.92
emcpowerr1   0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowerq1   0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowerp1   0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowero1   0.00   1.02  0.00  1.02    0.00   16.33     0.00     8.16    16.00     0.00    0.00   0.00   0.00
vmstat 1
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 5  1  69884 197164 346588 25962328    0    0     8 14929 6500  8930  7  1 92  1
 2  1  69884 200412 346588 25962328    0    0    64 13468 5083  5515  6  1 93  1
 6  1  69884 197996 346588 25962328    0    0   120 15353 5959  7741  9  1 89  1
 5  1  69884 198060 346588 25962328    0    0    72 16820 7678 12920 13  1 84  1
 6  0  69884 138220 346588 25962328    0    0    16 12125 6799  9487 12  1 87  1
 4  1  69884 130308 346588 25962328    0    0     8 15544 5632  8233  9  1 89  1
 4  0  69884 133004 346588 25962328    0    0    16 15733 7099 10659 11  1 88  1
 4  0  69884 131212 346588 25962328    0    0    32 12708 7248 10219  7  0 91  1
 5  1  69884 129612 346588 25962328    0    0    16 15289 7996 13611 12  1 87  1
 4  1  69884 129468 346588 25962328    0    0     8 15436 7623 12921  8  1 90  1
 5  0  69884 128660 346588 25962328    0    0     8 13413 7436 11788 10  1 88  1
 6  0  69884 128980 346588 25962844    0    0   112 12540 8299 15421 12  1 86  1
 3  1  69884 130956 346588 25962844    0    0     8 15421 7518 11872  9  1 89  1
 8  0  69884 134076 346588 25963360    0    0    32 16704 8523 15826 13  1 85  1
 6  1  69884 130892 346588 25963360    0    0    16   393 7258 11060 12  1 87  0
 7  0  69884 131972 346588 25963360    0    0     8    36 7246 11389 14  1 85  0
Below output when less disk activity.
Code:
top - 05:09:26 up 15 days,  7:35,  3 users,  load average: 5.39, 5.79, 5.50
Tasks: 536 total,   9 running, 527 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.2% us,  0.5% sy,  0.0% ni, 89.2% id,  0.0% wa,  0.0% hi,  0.1% si
Mem:  32927116k total, 32690864k used,   236252k free,   346608k buffers
Swap: 12287992k total,    69884k used, 12218108k free, 25964888k cached
iostat -xkd 1
Code:
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
dm-1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-3         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowers1   0.00   0.00  0.00  1.98    0.00   15.84     0.00     7.92     8.00     0.00    0.50   0.50   0.10
emcpowerf1   0.00   0.00  0.00  1.98    0.00   15.84     0.00     7.92     8.00     0.00    0.00   0.00   0.00
emcpowerr1   0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowerq1   0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowerp1   0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
emcpowero1   0.00   0.99  0.00  1.98    0.00   23.76     0.00    11.88    12.00     0.00    0.50   0.50   0.10
vmstat 1
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0  69884 267404 346596 25964384    0    0     4    21    0     0  6  0 94  0
 7  0  69884 264380 346596 25964384    0    0     8    36 7410 10808  7  1 92  0
 5  0  69884 263924 346596 25964384    0    0   112   129 8076 13983 12  1 87  0
 4  0  69884 265092 346596 25964384    0    0    16    40 6757  9686 12  1 87  0
15  0  69884 254772 346596 25964384    0    0    24    93 7977 16084 20  1 79  0
11  0  69884 251812 346596 25964384    0    0    16   744 8781 19092 19  1 80  0
 4  0  69884 253220 346596 25964384    0    0    32  1105 7897 13941 16  1 83  0
 2  0  69884 249636 346596 25964384    0    0    40   760 6726  9254 10  1 89  0
 5  0  69884 246180 346596 25964384    0    0     8   445 7417 11569 11  1 88  0
 3  0  69884 246444 346600 25964380    0    0    32   140 6893 10536 10  1 89  0
 5  0  69884 241012 346600 25964380    0    0    24    57 7651 13273 10  1 89  0
 7  0  69884 242164 346600 25964380    0    0    16   132 6519  8749  9  1 90  0
 4  0  69884 244268 346600 25964380    0    0     8   889 5738  7052 10  1 90  0
 6  0  69884 245324 346600 25964380    0    0   184   700 6175  8875  9  1 90  0
 5  0  69884 242324 346608 25964888    0    0   552   185 6702  9540 11  1 88  1
 5  0  69884 239500 346608 25964888    0    0    48    36 7416 11478 12  1 87  0
 8  0  69884 242956 346608 25964888    0    0   120   173 8230 14909 14  1 85  0
 9  0  69884 241812 346608 25964888    0    0     8   124 7645 12596 13  1 86  0
 9  0  69884 242428 346608 25964888    0    0     8   169 5913  8074 11  1 88  0
 6  0  69884 242116 346608 25964888    0    0     8     8 6048  8377 13  1 86  0
 7  0  69884 241924 346608 25964888    0    0     8    33 6482  9735 13  1 87  0
 7  0  69884 241524 346608 25964888    0    0     8    28 6461  9498 10  1 89  0
Its definitely not disk I/O problem, the output of top for 'wa' is always 0.0% to 1.0% even during peak hours and even when dm-1 does lots of read and write the load avg will remain the same. If I/O is the problem then dm-1 should put more load. It won't be network issue too but still let me check on that too.
 
Old 05-15-2011, 11:04 AM   #11
markseger
Member
 
Registered: Jul 2003
Posts: 244

Rep: Reputation: 26
This is a great example of why I think running standalone utilities NOT the way to go and one of the reasons I wrote collectl. The output you posted has no timestamps so it makes it real difficult to correlate anything. Are you away you can get ALL this data from collectl? For example, if you really like vmstat output and have a saved collectl log, you can just do:

collectl -p filename --vmstat

If you want to see iostat-like output you can

collectl -p filename -sD

Top format? no problem

collectl -p filename --top

and in the case of top you can sort on any field you like.

But one other thing jumped out at me and that is your cpu load. Remember, this is an average. If I'm reading things right and you have a 2 socket/4 code system AND have hyperthreading enabled, that means you have 16 CPUs. When your load shows up as 10-20%, how is it being distributed? In some cases I've see a process or device pinned to a specific CPU. That CPU could be at 100% and you'd never know it. Try

collectl -p filename -sC

and in all of these don't forget -oT to show timestamps. Also, if you want to see the load averages,

collectl -p filename -sc --verbose

And speaking of load averages, this can get a little tricky. I remember running an NFS server with a load average over 80! You'd think this is a bad thing but that server was running at close to 500MB/sec over 10Gb. The reason for the high load? NFS had something close to 500 theads and since a lot of them were busy doing I/O the number of busy processes was high.

Simple rule - make sure you understand why a value is what it is because sometimes there's nothing wrong.

-mark
 
Old 05-16-2011, 03:24 PM   #12
ramram29
Member
 
Registered: Jul 2003
Location: Miami, Florida, USA
Distribution: Debian
Posts: 848
Blog Entries: 1

Rep: Reputation: 47
Your problem is high I/O. The bottleneck in Oracle is usually high I/O. From what I see in your report you will need to narrow it down to a problem with your Oracle Normalization. Two of the most common I/O problems are full table scans due to the lack of a proper Index or the lack of use of Bind Variables. You may want start looking at your developers and what they are doing, how they are retrieving the data. Are they using procedures, indexes, views? Are they doing full table scans without bind variables? OEM will tell you a lot about Top SQL; the SQL Queries with highest load. You may also want to start using Spotlight or Toad - it's a very good investment for DBAs.
 
  


Reply

Tags
load average


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
IOWAIT & Load avg ?!? turbo_acura Linux - Server 1 10-27-2010 09:47 AM
Teach me about conky and linux resource management... How do I read load? CPU avg? Mysticle31 Linux - Software 1 12-15-2007 07:18 PM
Server Performance - Load Avg, Swap, Memory? jantman Linux - Server 8 02-03-2007 07:36 AM
Load Avg High/Phys Mem High teamh Debian 2 12-26-2006 06:03 PM
Disk performance causing high Load Avg? craigeb78 Linux - Hardware 6 03-09-2006 05:47 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 02:45 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration