Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
05-08-2011, 02:09 AM
|
#1
|
Member
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179
Rep:
|
Oracle server load avg
Dear All,
I am not an oracle DBA, but need to find the cause of high load avg, even after adding more CPU power the load avg is the same.
Old servers
IBM x3350
2x4c
Intel(R) Xeon(R)E5405 @ 2.00GHz
8GB RAM
New servers
IBM x3850 X5
4x6c
Intel(R) Xeon(R) E7540 @ 2.00GHz
32GB RAM
OS: CentOS 4.8 64bit
Oracle 9i Standard Edition.
HBA: Qlogic 4Gbps
SAN: 100GB Fiberchannel,
Old and new servers load avg is the same (between 6-10 peak hours and 3 to 5 during off-peak hours) no difference. I expected the laod to come down as we added more CPUs.
Our DBA says queries are optimise.
Can some Oracle DBA give some tips to find the cause.
|
|
|
05-08-2011, 02:46 AM
|
#2
|
LQ Guru
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Rep:
|
Dude - you need to profile your system.
For starters, benchmark system performance: "top", "iostat", "sar", etc.
Then you need to benchmark your Oracle performance. There are a million web links, books, tools and courses to guide you.
Who knows - there MIGHT not even be a problem. You don't need to be a DBA ... but you do need to do a bit of basic homework.
This link (one of MANY Linux performance guides) might help get you started in the right direction:
http://www.redbooks.ibm.com/abstracts/redp4285.html
PS:
Most information you might find about RedHat or Fedora should be equally applicable to CentOS.
PPS:
Just because your DBA *says* "queries are optimized" ... doesn't necessarily mean that they *are* optimized . Or that your tables are correctly indexed, or that your Oracle tablespaces and buffers are tuned
Last edited by paulsm4; 05-08-2011 at 02:50 AM.
|
|
|
05-08-2011, 03:36 AM
|
#3
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,314
|
Quote:
Originally Posted by mario.almeida
I expected the laod to come down as we added more CPUs.
|
That presumes that your loadavg is entirely comprised of (blocked) runnable tasks. Possible, but not obligatory. Linux, unlike historical Unix also includes tasks in uninteruptible sleep. Usually described as waiting on disk I/O, but processes (threads) can be placed in that state.Try this to see if you have any
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' > topsave.txt
|
|
|
05-08-2011, 05:46 AM
|
#4
|
Member
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179
Original Poster
Rep:
|
@syg00: Thanks for that, before posting here I had checked for process in D state but did not find anything.
At present reading http://www.redbooks.ibm.com/abstracts/redp4285.html once again just to see if I have missed anything.
|
|
|
05-08-2011, 06:22 AM
|
#5
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,314
|
Might be that your work arrives in "bursts" - that script only looks at a point in time, and presumes anything in "D" stays that way for a while (long enough to be counted).
The numbers for loadavg are maintained at schedule() I think, so they are impervious to the actual arrival rate.
Maybe look at something like collectl to get better granularity of your data.
|
|
|
05-08-2011, 07:58 AM
|
#6
|
Member
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179
Original Poster
Rep:
|
@syg00: For your reference, Current load avg is 4 to 5 (off-peak day) and below is the output of collectl.
Code:
# <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Date Time cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
05/08 11:47:17 16 0 7879 13615 8 16 278 4 2491 6850 2182 6301
05/08 11:47:18 9 0 8323 15745 8 16 0 1 0 1 0 0
05/08 11:47:19 11 0 6614 9545 8 16 1 1 2292 7789 2632 7368
05/08 11:47:20 6 0 4853 5409 8 16 1 1 1785 4730 1348 4294
05/08 11:47:21 8 0 6035 8051 8 16 0 1 2423 6625 1861 6207
05/08 11:47:22 7 0 6551 9045 8 16 1 1 0 2 0 0
05/08 11:47:23 7 0 4803 5928 8 16 153 4 1254 4173 1276 3982
05/08 11:47:24 10 0 6948 10398 8 16 13 3 1022 2848 796 2628
05/08 11:47:25 7 0 6334 8052 8 16 1 1 1697 5112 1534 4775
05/08 11:47:26 2 0 6331 7540 8 16 0 1 1097 7650 3293 7545
05/08 11:47:27 2 0 4513 4013 8 16 1 1 249 2030 957 1991
05/08 11:47:28 15 1 7660 16794 8 16 141 4 2433 8223 2791 7957
05/08 11:47:29 12 0 7308 11121 8 16 0 1 1699 5425 1791 5097
05/08 11:47:30 13 0 6916 10363 8 16 5 1 2079 5071 1377 4613
05/08 11:47:31 12 0 6725 9775 8 16 0 2 2068 4741 1295 4308
05/08 11:47:32 8 0 6568 10131 8 16 193 5 1687 4549 1360 4207
05/08 11:47:33 7 0 6181 7513 8 16 181 7 940 3565 1262 3401
05/08 11:47:34 10 1 6785 9068 8 16 0 1 1254 4276 1523 4013
05/08 11:47:35 10 1 6395 8686 8 16 394 1 1812 4340 1180 3931
05/08 11:47:36 7 0 5701 7049 8 16 1 72 1263 3424 975 3221
05/08 11:47:37 7 0 4336 4470 8 16 13 3 802 2122 532 1947
05/08 11:47:38 11 0 6155 7661 8 16 93 4 1198 3671 1140 3435
05/08 11:47:39 9 0 5106 5881 8 16 0 1 1082 2818 747 2591
05/08 11:47:40 13 0 6776 10370 8 16 60 1 1692 4794 1434 4520
05/08 11:47:41 15 1 7817 13341 8 16 1 14 2086 6413 2000 6047
05/08 11:47:42 15 0 6911 10196 8 16 1 1 1888 5033 1537 4643
05/08 11:47:43 16 1 7269 12226 8 16 6027 52 2498 5999 1682 5496
05/08 11:47:44 13 0 6166 8844 8 16 14122 115 1727 4146 1086 3707
05/08 11:47:45 9 0 6306 8955 7 12 15904 129 1696 4213 1134 3863
05/08 11:47:46 10 0 6801 9770 9 20 15857 130 1414 4747 1609 4456
05/08 11:47:47 9 0 8114 14064 8 16 15640 125 1735 6985 2447 6736
05/08 11:47:48 15 0 7999 13470 8 16 13631 117 2015 6643 2296 6281
05/08 11:47:49 10 0 6713 9604 8 16 14254 115 1676 4661 1525 4346
05/08 11:47:50 13 0 6825 9633 8 16 15548 129 1713 4638 1496 4311
05/08 11:47:51 9 1 5943 7872 8 16 11413 92 1433 3683 1052 3370
05/08 11:47:52 8 0 4950 6295 8 16 14178 118 1180 2711 748 2454
05/08 11:47:53 7 0 4323 4700 8 16 13901 118 992 2150 568 1885
05/08 11:47:54 10 0 6027 8009 8 16 14618 118 1455 3883 1049 3579
05/08 11:47:55 12 0 6096 8155 8 16 12674 104 1663 3953 1102 3646
05/08 11:47:56 11 0 6824 10533 8 16 12833 108 1992 5042 1434 4683
05/08 11:47:57 6 0 5871 7506 8 16 14370 117 1161 3571 995 3370
05/08 11:47:58 8 0 5735 7110 8 16 14397 117 1154 3426 1101 3200
05/08 11:47:59 7 0 5486 7093 8 16 14370 119 1213 3359 935 3167
05/08 11:48:00 11 0 6166 8847 8 16 14659 118 1524 4085 1138 3779
|
|
|
05-08-2011, 03:23 PM
|
#7
|
LQ Guru
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Rep:
|
Hi again, mario.almeida -
To say "I need to find the cause of high load avg" is hopelessly, hopelessly vague.
Clearly you've done more work and more analysis than you've told us ... but if you want to make any progress, you need to:
* Benchmark your system performance
* Benchmark your Oracle performance
* Identify the bottlenecks (which may - or may NOT be "high load average". "It depends"!!!!!)
* Identify potential culprits to investigate further
Q: Are users complaining of any specific problems/symptoms?
Q: Do you have a particular peformance goal (or better, goals)?
Q: Would it be possible for you to post "top" output illustrating the problem?
<= Two separate questions here: a) can you post the output, and b) does "top" even illustrate the problem???
PS:
Just because your DBA *says* "queries are optimized" ... doesn't necessarily mean that they *are* optimized . Or that your tables are correctly indexed, or that your Oracle tablespaces and buffers are tuned . Oracle tuning is DEFINITELY something to be scrutinized here!
PS:
You *are* running the 64-bit versions of CentOS and Oracle, aren't you?
Last edited by paulsm4; 05-08-2011 at 03:26 PM.
|
|
|
05-08-2011, 04:21 PM
|
#8
|
Member
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179
Original Poster
Rep:
|
Sometimes users complain that the system is slow, during that time load will be 10 to 15.
Yes, OS and oracle both 64bit.
I'll put the output of top tomorrow morning.
PIDs which are taking more CPU% was able to find the related queries, These queries are taking lots of CPU time, I will check with our developers for those queries.
I have noted 1 thing from the output of top, when the load is high, not all CPUs are utilised, I mean when top is running and you press 1 it will display the cpu% of all the cpus. That output I checked and found only some CPUs go above 50% and some below 4% to 5%. May be the queries which are taking more time are for the cpus which are 50% and above. If load avg is the percentage of running process and the processes still in queue, Then why the percentage is so high when other CPUs are still below 4 to 5%? this something worrying me. May be the SQL queries are given to the same PID?
|
|
|
05-08-2011, 05:39 PM
|
#9
|
LQ Guru
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Rep:
|
Important point:
1. "High %CPU utilization" (for example, ++50%) can be GOOD.
There's nothing WRONG with using the CPU you paid for, and there's plenty of breathing room.
2. "Low %CPU utilization" (for example, --10%), if accompanied by "high load average" (for example, anything above "2") can be BAD. This means that your CPUs are SITTING ON THEIR HANDS, GRIDLOCKED, when there's work to be done.
Two cases that might account for low %CPU and high load average:
a) you're I/O bound (disk or network)
b) you're memory bound
3. Two good tools to check load average are "top" and "uptime".
Two good tools to check if you're swapping are "top" and "swap -s".
Two good tools to check if you're I/O bound are "top" and "iostat"
And don't forget that Oracle tuning
|
|
|
05-09-2011, 01:27 AM
|
#10
|
Member
Registered: May 2008
Location: India
Distribution: Ubuntu 10.04, CentOS, Manjaro
Posts: 179
Original Poster
Rep:
|
Below output is during the time when there is high disk activity on dm-1
Code:
top - 05:07:45 up 15 days, 7:34, 3 users, load average: 5.33, 6.04, 5.54
Tasks: 542 total, 7 running, 535 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.9% us, 0.6% sy, 0.0% ni, 85.8% id, 0.5% wa, 0.0% hi, 0.2% si
Mem: 32927116k total, 32795464k used, 131652k free, 346588k buffers
Swap: 12287992k total, 69884k used, 12218108k free, 25963360k cached
iostat -xkd 1
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
dm-1 0.00 0.00 0.00 3985.71 0.00 31885.71 0.00 15942.86 8.00 27.58 8.24 0.12 48.88
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowers1 0.00 2.04 0.00 8.16 0.00 81.63 0.00 40.82 10.00 0.00 0.12 0.12 0.10
emcpowerf1 0.00 2.04 0.00 8.16 0.00 81.63 0.00 40.82 10.00 0.01 1.12 1.12 0.92
emcpowerr1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerp1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowero1 0.00 1.02 0.00 1.02 0.00 16.33 0.00 8.16 16.00 0.00 0.00 0.00 0.00
vmstat 1
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
5 1 69884 197164 346588 25962328 0 0 8 14929 6500 8930 7 1 92 1
2 1 69884 200412 346588 25962328 0 0 64 13468 5083 5515 6 1 93 1
6 1 69884 197996 346588 25962328 0 0 120 15353 5959 7741 9 1 89 1
5 1 69884 198060 346588 25962328 0 0 72 16820 7678 12920 13 1 84 1
6 0 69884 138220 346588 25962328 0 0 16 12125 6799 9487 12 1 87 1
4 1 69884 130308 346588 25962328 0 0 8 15544 5632 8233 9 1 89 1
4 0 69884 133004 346588 25962328 0 0 16 15733 7099 10659 11 1 88 1
4 0 69884 131212 346588 25962328 0 0 32 12708 7248 10219 7 0 91 1
5 1 69884 129612 346588 25962328 0 0 16 15289 7996 13611 12 1 87 1
4 1 69884 129468 346588 25962328 0 0 8 15436 7623 12921 8 1 90 1
5 0 69884 128660 346588 25962328 0 0 8 13413 7436 11788 10 1 88 1
6 0 69884 128980 346588 25962844 0 0 112 12540 8299 15421 12 1 86 1
3 1 69884 130956 346588 25962844 0 0 8 15421 7518 11872 9 1 89 1
8 0 69884 134076 346588 25963360 0 0 32 16704 8523 15826 13 1 85 1
6 1 69884 130892 346588 25963360 0 0 16 393 7258 11060 12 1 87 0
7 0 69884 131972 346588 25963360 0 0 8 36 7246 11389 14 1 85 0
Below output when less disk activity.
Code:
top - 05:09:26 up 15 days, 7:35, 3 users, load average: 5.39, 5.79, 5.50
Tasks: 536 total, 9 running, 527 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.2% us, 0.5% sy, 0.0% ni, 89.2% id, 0.0% wa, 0.0% hi, 0.1% si
Mem: 32927116k total, 32690864k used, 236252k free, 346608k buffers
Swap: 12287992k total, 69884k used, 12218108k free, 25964888k cached
iostat -xkd 1
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowers1 0.00 0.00 0.00 1.98 0.00 15.84 0.00 7.92 8.00 0.00 0.50 0.50 0.10
emcpowerf1 0.00 0.00 0.00 1.98 0.00 15.84 0.00 7.92 8.00 0.00 0.00 0.00 0.00
emcpowerr1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowerp1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
emcpowero1 0.00 0.99 0.00 1.98 0.00 23.76 0.00 11.88 12.00 0.00 0.50 0.50 0.10
vmstat 1
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 69884 267404 346596 25964384 0 0 4 21 0 0 6 0 94 0
7 0 69884 264380 346596 25964384 0 0 8 36 7410 10808 7 1 92 0
5 0 69884 263924 346596 25964384 0 0 112 129 8076 13983 12 1 87 0
4 0 69884 265092 346596 25964384 0 0 16 40 6757 9686 12 1 87 0
15 0 69884 254772 346596 25964384 0 0 24 93 7977 16084 20 1 79 0
11 0 69884 251812 346596 25964384 0 0 16 744 8781 19092 19 1 80 0
4 0 69884 253220 346596 25964384 0 0 32 1105 7897 13941 16 1 83 0
2 0 69884 249636 346596 25964384 0 0 40 760 6726 9254 10 1 89 0
5 0 69884 246180 346596 25964384 0 0 8 445 7417 11569 11 1 88 0
3 0 69884 246444 346600 25964380 0 0 32 140 6893 10536 10 1 89 0
5 0 69884 241012 346600 25964380 0 0 24 57 7651 13273 10 1 89 0
7 0 69884 242164 346600 25964380 0 0 16 132 6519 8749 9 1 90 0
4 0 69884 244268 346600 25964380 0 0 8 889 5738 7052 10 1 90 0
6 0 69884 245324 346600 25964380 0 0 184 700 6175 8875 9 1 90 0
5 0 69884 242324 346608 25964888 0 0 552 185 6702 9540 11 1 88 1
5 0 69884 239500 346608 25964888 0 0 48 36 7416 11478 12 1 87 0
8 0 69884 242956 346608 25964888 0 0 120 173 8230 14909 14 1 85 0
9 0 69884 241812 346608 25964888 0 0 8 124 7645 12596 13 1 86 0
9 0 69884 242428 346608 25964888 0 0 8 169 5913 8074 11 1 88 0
6 0 69884 242116 346608 25964888 0 0 8 8 6048 8377 13 1 86 0
7 0 69884 241924 346608 25964888 0 0 8 33 6482 9735 13 1 87 0
7 0 69884 241524 346608 25964888 0 0 8 28 6461 9498 10 1 89 0
Its definitely not disk I/O problem, the output of top for 'wa' is always 0.0% to 1.0% even during peak hours and even when dm-1 does lots of read and write the load avg will remain the same. If I/O is the problem then dm-1 should put more load. It won't be network issue too but still let me check on that too.
|
|
|
05-15-2011, 11:04 AM
|
#11
|
Member
Registered: Jul 2003
Posts: 244
Rep:
|
This is a great example of why I think running standalone utilities NOT the way to go and one of the reasons I wrote collectl. The output you posted has no timestamps so it makes it real difficult to correlate anything. Are you away you can get ALL this data from collectl? For example, if you really like vmstat output and have a saved collectl log, you can just do:
collectl -p filename --vmstat
If you want to see iostat-like output you can
collectl -p filename -sD
Top format? no problem
collectl -p filename --top
and in the case of top you can sort on any field you like.
But one other thing jumped out at me and that is your cpu load. Remember, this is an average. If I'm reading things right and you have a 2 socket/4 code system AND have hyperthreading enabled, that means you have 16 CPUs. When your load shows up as 10-20%, how is it being distributed? In some cases I've see a process or device pinned to a specific CPU. That CPU could be at 100% and you'd never know it. Try
collectl -p filename -sC
and in all of these don't forget -oT to show timestamps. Also, if you want to see the load averages,
collectl -p filename -sc --verbose
And speaking of load averages, this can get a little tricky. I remember running an NFS server with a load average over 80! You'd think this is a bad thing but that server was running at close to 500MB/sec over 10Gb. The reason for the high load? NFS had something close to 500 theads and since a lot of them were busy doing I/O the number of busy processes was high.
Simple rule - make sure you understand why a value is what it is because sometimes there's nothing wrong.
-mark
|
|
|
05-16-2011, 03:24 PM
|
#12
|
Member
Registered: Jul 2003
Location: Miami, Florida, USA
Distribution: Debian
Posts: 848
Rep:
|
Your problem is high I/O. The bottleneck in Oracle is usually high I/O. From what I see in your report you will need to narrow it down to a problem with your Oracle Normalization. Two of the most common I/O problems are full table scans due to the lack of a proper Index or the lack of use of Bind Variables. You may want start looking at your developers and what they are doing, how they are retrieving the data. Are they using procedures, indexes, views? Are they doing full table scans without bind variables? OEM will tell you a lot about Top SQL; the SQL Queries with highest load. You may also want to start using Spotlight or Toad - it's a very good investment for DBAs.
|
|
|
All times are GMT -5. The time now is 02:45 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|