High read/write system call time

ambadiaravind · 05-28-2015, 12:36 AM

I am facing a weird problem with one of process taking more processing time in one system comparing to other systems. Most of the time is spending on read and write system call on problem host. I am not finding any difference in hardware and software between both systems. How can i debug further?

Problem host:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.51 0.268764 890 302 write
32.06 0.174047 4 38689 88 read
8.66 0.046992 385 122 3 nanosleep
3.31 0.017996 5999 3 poll
2.49 0.013491 7 1989 480 futex
1.17 0.006350 8 783 mprotect
0.92 0.004997 238 21 munmap
Same process debug on another host

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
41.09 0.285169 337 845 183 futex
13.08 0.090785 241 377 write
12.52 0.086900 2 39573 18 read
12.24 0.084969 3694 23 poll
11.09 0.076985 1283 60 3 nanosleep
6.29 0.043648 43648 1 unlink
0.91 0.006317 7 948 mprotect
0.61 0.004234 0 38324 18 lseek
0.38 0.002641 43 61 munmap

rtmistler · 05-28-2015, 09:37 AM

How did you derive these statistics? What's your thinking as to why you feel the reads and writes are the problem?

ambadiaravind · 05-28-2015, 10:12 AM

On this particular host process taking more than 60secs to complete an operation where in which all host completes the operation in 40secs. I used strace to get system call timings and found that read and write system call is using more time in problem host.

suicidaleggroll · 05-28-2015, 10:14 AM

Filesystem approaching capacity? Slower hard drive? Failing hard drive? Degraded RAID?

Without any information about the hardware this system is using it's impossible to diagnose.

ambadiaravind · 05-28-2015, 10:37 AM

Harddisk is configured as RAID 1 array

Smart Array E200i in Slot 0 (Embedded)

array A (SAS, Unused Space: 0 MB)

logicaldrive 1 (68.3 GB, RAID 1, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 72 GB
Rotational Speed: 15000
Firmware Revision: HPD3
Serial Number:
Model: HP
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknown

array A

Logical Drive: 1
Size: 68.3 GB
Fault Tolerance: 1
Heads: 255
Sectors Per Track: 32
Cylinders: 17562
Strip Size: 128 KB
Full Stripe Size: 128 KB
Status: OK
Caching: Enabled
Unique Identifier:
Disk Name: /dev/cciss/c0d0
Mount Points:
OS Status: LOCKED
Logical Drive Label:
Mirror Group 0:
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
Mirror Group 1:
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
Drive Type: Data

File system details is given below

Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 2621440
Block count: 2620603
Reserved block count: 131030
Free blocks: 1847892
Free inodes: 2600703
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 639
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 32768
Inode blocks per group: 1024
Filesystem created: Wed Aug 4 09:35:24 2010
Last mount time: Sat May 9 22:26:04 2015
Last write time: Sat May 9 22:26:04 2015
Mount count: 10
Maximum mount count: -1
Last checked: Wed Aug 4 09:35:24 2010
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
First orphan inode: 2262638
Default directory hash: tea
Directory Hash Seed:
Journal backup: inode blocks
Journal size: 128M

Below is the iostat details

avg-cpu: %user %nice %system %iowait %steal %idle
36.91 0.01 18.13 0.40 0.00 44.56

Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
cciss/c0d0 127.30 0.01 6.12 21277 9769904
cciss/c0d0p1 0.00 0.00 0.00 202 0
cciss/c0d0p2 0.00 0.00 0.00 0 2
cciss/c0d0p3 16.46 0.00 1.00 3497 1595041
cciss/c0d0p4 0.00 0.00 0.00 0 0
cciss/c0d0p5 4.98 0.01 0.13 10484 206860
cciss/c0d0p6 103.36 0.00 4.95 1349 7891840
cciss/c0d0p7 0.92 0.00 0.02 828 37684
cciss/c0d0p8 1.58 0.00 0.02 4901 38474

File systems are only less than 50% utilized

suicidaleggroll · 05-28-2015, 11:04 AM

Any information on the physical status of the drives? Do they support SMART?

If I/O speed is that critical, then why are you only using a pair of SAS drives in RAID 1?

ambadiaravind · 05-28-2015, 11:15 AM

Device support smart

Vendor: HP
Product: EH0072
Revision: HPD3
User Capacity: 73,407,865,856 bytes [73.4 GB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca00b036bf8
Serial number:
Device type: disk
Transport protocol: SAS
Local Time is: Thu May 28 09:56:26 2015 MDT
Device supports SMART and is Enabled
Temperature Warning Enabled

In my environment all the servers using same configuration but process misbehaves on this particular server. I am trying to figure out what is the cause for the same

suicidaleggroll · 05-28-2015, 12:12 PM

What's the smart output for the two drives (smartctl -a <device>)?

ambadiaravind · 05-28-2015, 11:27 PM

Below is the smart status for both drives

Vendor: HP
Product:
Revision: HPDD
User Capacity: 73,407,865,856 bytes [73.4 GB]
Logical block size: 512 bytes
Logical Unit id: 0x5000c5001ceddd43
Serial number:
Device type: disk
Transport protocol: SAS
Local Time is: Thu May 28 22:24:18 2015 MDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 29 C
Drive Trip Temperature: 65 C
Elements in grown defect list: 1
Vendor (Seagate) cache information
Blocks sent to initiator = 1244750564
Blocks received from initiator = 2597670309
Blocks read from cache and sent to initiator = 47275435
Number of read and write commands whose size <= segment size = 1254424438
Number of read and write commands whose size > segment size = 0

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 607802108 0 637.312 0
write: 0 0 0 0 0 98381.411 0
verify: 0 0 0 14 0 0.000 5

Non-medium error count: 0
No self-tests have been logged
Long (extended) Self Test duration: 610 seconds [10.2 minutes]

Vendor: HP
Product:
Revision: HPD3
User Capacity: 73,407,865,856 bytes [73.4 GB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca00b036bf8
Serial number:
Device type: disk
Transport protocol: SAS
Local Time is: Thu May 28 22:26:34 2015 MDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 29 C
Drive Trip Temperature: 70 C
Manufactured in week 41 of year 2009
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 22
Elements in grown defect list: 0

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 11641 0 11641 0 49893.147 0
write: 0 3211791 0 3211791 0 115869.440 0

Non-medium error count: 37
No self-tests have been logged
Long (extended) Self Test duration: 713 seconds [11.9 minutes]