LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-11-2013, 06:50 AM   #1
blockdump
LQ Newbie
 
Registered: Jan 2012
Location: vienna
Distribution: Redhat, Oracle Enterprise Linux, Fedora
Posts: 10

Rep: Reputation: Disabled
sar disk activity question - await svctm


Hi,

after 2 days of searching I could not find an answer for how to interpret the following figures.
This output is from a server running Red Hat Enterprise Linux Server release 5.4 64 bit.
It hosts an oracle database which is running big jobs at night using lots of CPU and IO Resources.

This is part of a "sar -d" output I'd like to understand and get a conclusion about IO saturation:

Code:

01:30:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
01:40:01 AM       sda    241.20   4593.82   3245.67     32.50     11.72     48.54      1.51     36.34
01:40:01 AM      sda1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:40:01 AM      sda2    241.20   4593.82   3245.67     32.50     11.72     48.54      1.51     36.34
01:40:01 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:40:01 AM       sdc     10.77     15.85   6429.96    598.76      0.08      7.40      3.72      4.00
01:40:01 AM       sdd    201.29   8474.98  11966.81    101.56      0.99      4.91      1.55     31.14
01:40:01 AM     nodev     22.43    581.15     50.82     28.17      0.50     22.32      4.97     11.16
01:40:01 AM     nodev      0.66      1.43      4.04      8.22      0.02     23.56     16.04      1.07
01:40:01 AM     nodev    900.26   4011.25   3190.82      8.00     63.27     70.28      0.40     35.90
01:40:01 AM     nodev     56.39   6514.50   3836.41    183.57      0.51      9.10      1.74      9.79
01:40:01 AM     nodev     37.99      4.13   3774.09     99.44      0.32      8.50      1.77      6.74
01:40:01 AM     nodev     16.98    622.76     10.04     37.27      0.06      3.61      1.10      1.86
01:40:01 AM     nodev    130.10   1333.57   4346.28     43.66      0.60      4.62      2.03     26.47
01:40:01 AM     nodev    798.49     15.85   6429.96      8.07      9.20     11.53      0.05      4.01
01:50:07 AM       sda    197.52   3663.15   2357.34     30.48      5.74     28.63      2.45     48.31
01:50:07 AM      sda1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:50:07 AM      sda2    197.52   3663.15   2357.34     30.48      5.74     28.63      2.45     48.31
01:50:07 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
01:50:07 AM       sdc      2.81     15.21     64.53     28.38      0.00      0.88      0.87      0.25
01:50:07 AM       sdd    420.88 180580.13   9225.34    450.98      6.96     16.53      1.65     69.28
01:50:07 AM     nodev     20.31    573.43     53.66     30.87      0.33     16.25      3.88      7.88
01:50:07 AM     nodev      0.78      1.95      4.65      8.46      0.01     17.16     11.35      0.89
01:50:07 AM     nodev    673.36   3087.87   2299.03      8.00     28.79     42.76      0.71     47.64
01:50:07 AM     nodev     32.02     31.65   2874.41     90.76      0.44     13.79      3.25     10.42
01:50:07 AM     nodev     29.37      3.26   2814.24     95.94      0.36     12.27      3.16      9.29
01:50:07 AM     nodev     25.84    975.35     10.63     38.16      0.08      3.14      0.97      2.52
01:50:07 AM     nodev    582.31 179574.95   3526.59    314.44      8.72     14.98      1.16     67.64
01:50:07 AM     nodev      3.00     15.21     64.53     26.60      0.00      0.95      0.86      0.26
02:00:09 AM       sda    406.32   6575.33   6317.95     31.73     48.05    113.16      2.46     99.91
02:00:09 AM      sda1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
02:00:09 AM      sda2    406.32   6575.33   6317.95     31.73     48.05    113.16      2.46     99.91
02:00:09 AM       sdb      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
02:00:09 AM       sdc     10.63     16.51   6213.25    585.95      0.08      7.49      3.77      4.01
02:00:09 AM       sdd    519.43 259859.66  11266.40    521.97      8.09     15.58      1.63     84.51
02:00:09 AM     nodev     23.33    506.72     37.37     23.33      1.97     84.49     15.31     35.71
02:00:09 AM     nodev      0.74      2.04      3.95      8.05      0.09    116.90     74.28      5.52
02:00:09 AM     nodev   1542.92   6066.76   6276.63      8.00    210.04    136.13      0.65     99.89
02:00:09 AM     nodev     83.21   6329.26   4391.70    128.84      0.78      9.41      1.10      9.12
02:00:09 AM     nodev     61.38      5.33   4331.31     70.65      0.65     10.60      0.95      5.82
02:00:09 AM     nodev     36.56   1325.83      9.01     36.51      0.13      3.54      0.95      3.49
02:00:09 AM     nodev    710.32 252212.79   2533.86    358.64      9.46     13.31      1.18     84.03
02:00:09 AM     nodev    771.72     16.51   6213.25      8.07      8.99     11.65      0.05      4.02

sorry for the formatting !

as one can see there are high await times ( service time + queue time ) in comparison to low svctm times ( just the device service times )

I'd like to understand what makes the IOs waiting in the queue so long. ?
The svctm time is on average under 1 ms that should give the system the possibility of doing 1000 IO per second. So what could be a reason that so many IOs are queued for such a long time ?

any ideas welcome !

best regards
guenter

Last edited by blockdump; 04-14-2013 at 12:57 PM.
 
Old 04-11-2013, 09:47 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 19,189

Rep: Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406
Quote:
Originally Posted by blockdump View Post
Hi,
after 2 days of searching I could not find an answer for how to interpret the following figures.
Since you're using Red Hat Enterprise with Oracle, you should just call Oracle and Red Hat support. That's part of what you're paying for, is their tech support and diagnostic services. They can instruct you how to run traces/dumps, and will analyze them, to tell you what's going on.
Quote:
This output is from a server running Red Hat Enterprise Linux Server release 5.4 64 bit. It hosts an oracle database which is running big jobs at night using lots of CPU and IO Resources. This is part of a "sar -d" output I'd like to understand and get a conclusion about IO saturation:

as one can see there are high await times ( service time + queue time ) in comparison to low svctm times ( just the device service times )

I'd like to understand what makes the IOs waiting in the queue so long? The svctm time is on average under 1 ms that should give the system the possibility of doing 1000 IO per second. So what could be a reason that so many IOs are queued for such a long time ?
Without knowing anything about the system (what kind of disk(s)? Attached how? RAID level? Hardware or software RAID? Server memory? Number of users? database size? Number of queries? Tables? Anything??), it's impossible to say. Could be that one table of the database is getting hammered...could be that there is a poorly formatted query somewhere. Without details, it's impossible to say.

Again, the best/easiest way to figure out performance problems on an Oracle/Red Hat system, is to first contact Oracle support...they will run traces/dumps, and give you information as to where the bottlneck may be. From there, you can either fix it (if its an Oracle/query issue), or take that info to Red Hat for help with how to spread things out and/or optimize your system.
 
Old 04-12-2013, 01:24 AM   #3
blockdump
LQ Newbie
 
Registered: Jan 2012
Location: vienna
Distribution: Redhat, Oracle Enterprise Linux, Fedora
Posts: 10

Original Poster
Rep: Reputation: Disabled
Hi,

the database was just a background information about why there are so many IOs.

I 'd like to understand: why can an IO request remain so long in the queue, when the disk subsystem seems pretty fast. ( see values of wait and svctm ) - This is not related to database or Oracle specific products.

I suppose it' related to interrupts but I'm still investigating.

best regards
 
Old 04-12-2013, 02:59 PM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 19,189

Rep: Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406Reputation: 4406
Quote:
Originally Posted by blockdump View Post
Hi,
the database was just a background information about why there are so many IOs.

I 'd like to understand: why can an IO request remain so long in the queue, when the disk subsystem seems pretty fast. ( see values of wait and svctm ) - This is not related to database or Oracle specific products.

I suppose it' related to interrupts but I'm still investigating.
Again, without knowing what the system is doing, who can tell? You could have programs that aren't behaving correctly, or a slew of other reasons. Again, Oracle and Red Hat can tell you exactly why.
 
Old 04-12-2013, 10:48 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,888

Rep: Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192Reputation: 2192
Quote:
Originally Posted by blockdump View Post
sorry for the formatting !
Ugh - edit the post and change the [quote] tags to [code]
 
Old 04-15-2013, 03:47 AM   #6
blockdump
LQ Newbie
 
Registered: Jan 2012
Location: vienna
Distribution: Redhat, Oracle Enterprise Linux, Fedora
Posts: 10

Original Poster
Rep: Reputation: Disabled
formatting looks now better ( thanks for the hint )

Last edited by blockdump; 04-15-2013 at 04:03 AM.
 
Old 04-15-2013, 04:02 AM   #7
blockdump
LQ Newbie
 
Registered: Jan 2012
Location: vienna
Distribution: Redhat, Oracle Enterprise Linux, Fedora
Posts: 10

Original Poster
Rep: Reputation: Disabled
Hi,


Maybe my question was misleading so I'd like to clarify.

Actually my question is not about " help me make my system go faster " ( like calling support ) but I'd like to understand how Linux works. I was curious about the time discrepancy between the await and svctm.

I already search on the internet and found some good information on http://tldp.org/LDP/tlk/tlk-toc.html and http://www.alexonlinux.com/smp-affin...dling-in-linux http://http://honglus.blogspot.co.at...-affinity.html.

Most information and articles I found say that interrupts ( a particular interrupt ) should not be spread among all CPUs because the interrupt routine would have to be loaded after each context switch.

Most articles point into direction that balancing IRQ is NOT needed, because the /proc/irq/default_smp_affinity or irqbalance daemon can distribute IRQ signals among CPUs automatically.

here is a example of the distribution on this machine:

Code:
 cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
  0:  319189965          0          0          0          0          0    IO-APIC-edge  timer
  1:         11          0          0          0          0          0    IO-APIC-edge  i8042
  6:          5          0          0          0          0          0    IO-APIC-edge  floppy
  7:          0          0          0          0          0          0    IO-APIC-edge  parport0
  8:          0          0          0          0          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0          0          0   IO-APIC-level  acpi
 12:        116          0          0          0          0          0    IO-APIC-edge  i8042
 14:   18959567    3267884     459895    1558837    3684775     596818    IO-APIC-edge  ide0
 51:  802586254  104602518   44610193     660284  184831837   64369345   IO-APIC-level  ioc0, ioc3
 59:        651         48         46         16         16          0   IO-APIC-level  ioc1
 67:    6111832     717773    2128979    7356366    2487667     607517   IO-APIC-level  ioc2, vmci
 75:  164458384  411355440          0          0          0 1057019592         PCI-MSI  eth0
NMI:          0          0          0          0          0          0
LOC:  319189945  319190559  319190523  319190627  319190561  319190441
ERR:          0
MIS:          0
If you take a look at IRQ 51 : It's distributed over all CPUs but very skew. The process irqbalance should be distributing the IRQs over all CPUs, so why is that skew ? Is probably ksoftirqd doing a bad job queuing the interrupts ?

So how does IRQ process affinity and the job of irqbalance fit together ?

I suspect the high await times are caused by heavy irq rates not distributed over all CPUs, but that is just a guess at the moment

If someone knows a good article or book which bring the topics IO-Waits and interrupts together please let me know .



best regards

Last edited by blockdump; 04-15-2013 at 04:03 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sar output understanding of disk activity thllgo Linux - Server 4 10-10-2011 04:12 PM
Monitoring Disk Activity using sar ahmedb72 Linux - Newbie 1 10-27-2009 01:50 PM
Is sar & iostat output of svctm accurate Baffled Linux - Server 1 07-09-2009 02:15 PM
`sar -B` - does paging stats include file cache or literally physical disk? zQUEz Linux - Server 3 06-04-2009 06:04 PM
SAR Ouput Question te_conway Linux - General 10 01-26-2003 09:29 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration