High read/write system call time
I am facing a weird problem with one of process taking more processing time in one system comparing to other systems. Most of the time is spending on read and write system call on problem host. I am not finding any difference in hardware and software between both systems. How can i debug further?
Problem host: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 49.51 0.268764 890 302 write 32.06 0.174047 4 38689 88 read 8.66 0.046992 385 122 3 nanosleep 3.31 0.017996 5999 3 poll 2.49 0.013491 7 1989 480 futex 1.17 0.006350 8 783 mprotect 0.92 0.004997 238 21 munmap Same process debug on another host % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 41.09 0.285169 337 845 183 futex 13.08 0.090785 241 377 write 12.52 0.086900 2 39573 18 read 12.24 0.084969 3694 23 poll 11.09 0.076985 1283 60 3 nanosleep 6.29 0.043648 43648 1 unlink 0.91 0.006317 7 948 mprotect 0.61 0.004234 0 38324 18 lseek 0.38 0.002641 43 61 munmap |
How did you derive these statistics? What's your thinking as to why you feel the reads and writes are the problem?
|
On this particular host process taking more than 60secs to complete an operation where in which all host completes the operation in 40secs. I used strace to get system call timings and found that read and write system call is using more time in problem host.
|
Filesystem approaching capacity? Slower hard drive? Failing hard drive? Degraded RAID?
Without any information about the hardware this system is using it's impossible to diagnose. |
Harddisk is configured as RAID 1 array
Smart Array E200i in Slot 0 (Embedded) array A (SAS, Unused Space: 0 MB) logicaldrive 1 (68.3 GB, RAID 1, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK physicaldrive 1I:1:1 Port: 1I Box: 1 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 72 GB Rotational Speed: 15000 Firmware Revision: HPD3 Serial Number: Model: HP PHY Count: 2 PHY Transfer Rate: 3.0Gbps, Unknown array A Logical Drive: 1 Size: 68.3 GB Fault Tolerance: 1 Heads: 255 Sectors Per Track: 32 Cylinders: 17562 Strip Size: 128 KB Full Stripe Size: 128 KB Status: OK Caching: Enabled Unique Identifier: Disk Name: /dev/cciss/c0d0 Mount Points: OS Status: LOCKED Logical Drive Label: Mirror Group 0: physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK) Mirror Group 1: physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK) Drive Type: Data File system details is given below Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2621440 Block count: 2620603 Reserved block count: 131030 Free blocks: 1847892 Free inodes: 2600703 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 639 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 32768 Inode blocks per group: 1024 Filesystem created: Wed Aug 4 09:35:24 2010 Last mount time: Sat May 9 22:26:04 2015 Last write time: Sat May 9 22:26:04 2015 Mount count: 10 Maximum mount count: -1 Last checked: Wed Aug 4 09:35:24 2010 Check interval: 0 (<none>) Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 First orphan inode: 2262638 Default directory hash: tea Directory Hash Seed: Journal backup: inode blocks Journal size: 128M Below is the iostat details avg-cpu: %user %nice %system %iowait %steal %idle 36.91 0.01 18.13 0.40 0.00 44.56 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn cciss/c0d0 127.30 0.01 6.12 21277 9769904 cciss/c0d0p1 0.00 0.00 0.00 202 0 cciss/c0d0p2 0.00 0.00 0.00 0 2 cciss/c0d0p3 16.46 0.00 1.00 3497 1595041 cciss/c0d0p4 0.00 0.00 0.00 0 0 cciss/c0d0p5 4.98 0.01 0.13 10484 206860 cciss/c0d0p6 103.36 0.00 4.95 1349 7891840 cciss/c0d0p7 0.92 0.00 0.02 828 37684 cciss/c0d0p8 1.58 0.00 0.02 4901 38474 File systems are only less than 50% utilized |
Any information on the physical status of the drives? Do they support SMART?
If I/O speed is that critical, then why are you only using a pair of SAS drives in RAID 1? |
Device support smart
Vendor: HP Product: EH0072 Revision: HPD3 User Capacity: 73,407,865,856 bytes [73.4 GB] Logical block size: 512 bytes Logical Unit id: 0x5000cca00b036bf8 Serial number: Device type: disk Transport protocol: SAS Local Time is: Thu May 28 09:56:26 2015 MDT Device supports SMART and is Enabled Temperature Warning Enabled In my environment all the servers using same configuration but process misbehaves on this particular server. I am trying to figure out what is the cause for the same |
What's the smart output for the two drives (smartctl -a <device>)?
|
Below is the smart status for both drives
Vendor: HP Product: Revision: HPDD User Capacity: 73,407,865,856 bytes [73.4 GB] Logical block size: 512 bytes Logical Unit id: 0x5000c5001ceddd43 Serial number: Device type: disk Transport protocol: SAS Local Time is: Thu May 28 22:24:18 2015 MDT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 29 C Drive Trip Temperature: 65 C Elements in grown defect list: 1 Vendor (Seagate) cache information Blocks sent to initiator = 1244750564 Blocks received from initiator = 2597670309 Blocks read from cache and sent to initiator = 47275435 Number of read and write commands whose size <= segment size = 1254424438 Number of read and write commands whose size > segment size = 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 607802108 0 637.312 0 write: 0 0 0 0 0 98381.411 0 verify: 0 0 0 14 0 0.000 5 Non-medium error count: 0 No self-tests have been logged Long (extended) Self Test duration: 610 seconds [10.2 minutes] Vendor: HP Product: Revision: HPD3 User Capacity: 73,407,865,856 bytes [73.4 GB] Logical block size: 512 bytes Logical Unit id: 0x5000cca00b036bf8 Serial number: Device type: disk Transport protocol: SAS Local Time is: Thu May 28 22:26:34 2015 MDT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 29 C Drive Trip Temperature: 70 C Manufactured in week 41 of year 2009 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 22 Elements in grown defect list: 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 11641 0 11641 0 49893.147 0 write: 0 3211791 0 3211791 0 115869.440 0 Non-medium error count: 37 No self-tests have been logged Long (extended) Self Test duration: 713 seconds [11.9 minutes] |
All times are GMT -5. The time now is 06:20 AM. |