LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 05-28-2015, 12:36 AM   #1
ambadiaravind
LQ Newbie
 
Registered: May 2015
Posts: 7

Rep: Reputation: Disabled
High read/write system call time


I am facing a weird problem with one of process taking more processing time in one system comparing to other systems. Most of the time is spending on read and write system call on problem host. I am not finding any difference in hardware and software between both systems. How can i debug further?

Problem host:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.51 0.268764 890 302 write
32.06 0.174047 4 38689 88 read
8.66 0.046992 385 122 3 nanosleep
3.31 0.017996 5999 3 poll
2.49 0.013491 7 1989 480 futex
1.17 0.006350 8 783 mprotect
0.92 0.004997 238 21 munmap
Same process debug on another host

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
41.09 0.285169 337 845 183 futex
13.08 0.090785 241 377 write
12.52 0.086900 2 39573 18 read
12.24 0.084969 3694 23 poll
11.09 0.076985 1283 60 3 nanosleep
6.29 0.043648 43648 1 unlink
0.91 0.006317 7 948 mprotect
0.61 0.004234 0 38324 18 lseek
0.38 0.002641 43 61 munmap
 
Old 05-28-2015, 09:37 AM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
How did you derive these statistics? What's your thinking as to why you feel the reads and writes are the problem?
 
Old 05-28-2015, 10:12 AM   #3
ambadiaravind
LQ Newbie
 
Registered: May 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
On this particular host process taking more than 60secs to complete an operation where in which all host completes the operation in 40secs. I used strace to get system call timings and found that read and write system call is using more time in problem host.
 
Old 05-28-2015, 10:14 AM   #4
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
Filesystem approaching capacity? Slower hard drive? Failing hard drive? Degraded RAID?

Without any information about the hardware this system is using it's impossible to diagnose.
 
Old 05-28-2015, 10:37 AM   #5
ambadiaravind
LQ Newbie
 
Registered: May 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
Harddisk is configured as RAID 1 array

Smart Array E200i in Slot 0 (Embedded)

array A (SAS, Unused Space: 0 MB)


logicaldrive 1 (68.3 GB, RAID 1, OK)

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK


physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 72 GB
Rotational Speed: 15000
Firmware Revision: HPD3
Serial Number:
Model: HP
PHY Count: 2
PHY Transfer Rate: 3.0Gbps, Unknown


array A

Logical Drive: 1
Size: 68.3 GB
Fault Tolerance: 1
Heads: 255
Sectors Per Track: 32
Cylinders: 17562
Strip Size: 128 KB
Full Stripe Size: 128 KB
Status: OK
Caching: Enabled
Unique Identifier:
Disk Name: /dev/cciss/c0d0
Mount Points:
OS Status: LOCKED
Logical Drive Label:
Mirror Group 0:
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
Mirror Group 1:
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
Drive Type: Data

File system details is given below


Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 2621440
Block count: 2620603
Reserved block count: 131030
Free blocks: 1847892
Free inodes: 2600703
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 639
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 32768
Inode blocks per group: 1024
Filesystem created: Wed Aug 4 09:35:24 2010
Last mount time: Sat May 9 22:26:04 2015
Last write time: Sat May 9 22:26:04 2015
Mount count: 10
Maximum mount count: -1
Last checked: Wed Aug 4 09:35:24 2010
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
First orphan inode: 2262638
Default directory hash: tea
Directory Hash Seed:
Journal backup: inode blocks
Journal size: 128M

Below is the iostat details


avg-cpu: %user %nice %system %iowait %steal %idle
36.91 0.01 18.13 0.40 0.00 44.56

Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
cciss/c0d0 127.30 0.01 6.12 21277 9769904
cciss/c0d0p1 0.00 0.00 0.00 202 0
cciss/c0d0p2 0.00 0.00 0.00 0 2
cciss/c0d0p3 16.46 0.00 1.00 3497 1595041
cciss/c0d0p4 0.00 0.00 0.00 0 0
cciss/c0d0p5 4.98 0.01 0.13 10484 206860
cciss/c0d0p6 103.36 0.00 4.95 1349 7891840
cciss/c0d0p7 0.92 0.00 0.02 828 37684
cciss/c0d0p8 1.58 0.00 0.02 4901 38474

File systems are only less than 50% utilized
 
Old 05-28-2015, 11:04 AM   #6
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
Any information on the physical status of the drives? Do they support SMART?

If I/O speed is that critical, then why are you only using a pair of SAS drives in RAID 1?
 
Old 05-28-2015, 11:15 AM   #7
ambadiaravind
LQ Newbie
 
Registered: May 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
Device support smart

Vendor: HP
Product: EH0072
Revision: HPD3
User Capacity: 73,407,865,856 bytes [73.4 GB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca00b036bf8
Serial number:
Device type: disk
Transport protocol: SAS
Local Time is: Thu May 28 09:56:26 2015 MDT
Device supports SMART and is Enabled
Temperature Warning Enabled

In my environment all the servers using same configuration but process misbehaves on this particular server. I am trying to figure out what is the cause for the same
 
Old 05-28-2015, 12:12 PM   #8
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
What's the smart output for the two drives (smartctl -a <device>)?
 
Old 05-28-2015, 11:27 PM   #9
ambadiaravind
LQ Newbie
 
Registered: May 2015
Posts: 7

Original Poster
Rep: Reputation: Disabled
Below is the smart status for both drives


Vendor: HP
Product:
Revision: HPDD
User Capacity: 73,407,865,856 bytes [73.4 GB]
Logical block size: 512 bytes
Logical Unit id: 0x5000c5001ceddd43
Serial number:
Device type: disk
Transport protocol: SAS
Local Time is: Thu May 28 22:24:18 2015 MDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 29 C
Drive Trip Temperature: 65 C
Elements in grown defect list: 1
Vendor (Seagate) cache information
Blocks sent to initiator = 1244750564
Blocks received from initiator = 2597670309
Blocks read from cache and sent to initiator = 47275435
Number of read and write commands whose size <= segment size = 1254424438
Number of read and write commands whose size > segment size = 0

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 607802108 0 637.312 0
write: 0 0 0 0 0 98381.411 0
verify: 0 0 0 14 0 0.000 5

Non-medium error count: 0
No self-tests have been logged
Long (extended) Self Test duration: 610 seconds [10.2 minutes]


Vendor: HP
Product:
Revision: HPD3
User Capacity: 73,407,865,856 bytes [73.4 GB]
Logical block size: 512 bytes
Logical Unit id: 0x5000cca00b036bf8
Serial number:
Device type: disk
Transport protocol: SAS
Local Time is: Thu May 28 22:26:34 2015 MDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 29 C
Drive Trip Temperature: 70 C
Manufactured in week 41 of year 2009
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 22
Elements in grown defect list: 0

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 11641 0 11641 0 49893.147 0
write: 0 3211791 0 3211791 0 115869.440 0

Non-medium error count: 37
No self-tests have been logged
Long (extended) Self Test duration: 713 seconds [11.9 minutes]
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
is there a way to read one char at a time in a file opened w/ open system call doughyi8u Programming 30 04-30-2015 09:53 AM
Sudden spike in write system call time Bharadwaj Programming 6 07-17-2012 10:06 AM
How do I write System call that returns a current time abhi1242 Linux - Newbie 3 09-26-2011 12:38 AM
how to write a system call which returns current system time using call by reference mukul2kul4 Debian 2 09-25-2011 11:17 PM
difficulties with large files and system call read/write jwstric2 Programming 2 08-22-2004 11:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 05:51 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration