LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-31-2015, 08:42 AM   #1
usao
Member
 
Registered: Dec 2011
Location: Chandler, AZ
Posts: 286

Rep: Reputation: Disabled
Disk utilization 100%


Having an odd issue with SAN storage.
We use an EMC Clariion CX4 array to talk to an HP server box through FC connects.
Using ESXi 6.0 on the HP box and a VM running on that box with raw-mapped luns.

The host runs for several days, then it essentially stops with the disk utilization going to 100.

avg-cpu: user nice %system %iowait %steal %idle
0.00 0.00 0.01 3.12 0.00 96.87

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 129.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 131.03 0.00 0.00 100.02


It doesn't recovery (have waited up to 2 days) on multiple occasions, and the only way to clear it is to reboot the VM (CentOS 6.6). The OS remains responsive, but disk related commands all hang, such as "sync", "pvs", etc... and cannot be killed, even with a kill -9 as root.

The only thing the logs show is multiple blocks similar to this:

Dec 31 00:50:05 lx1 kernel: INFO: task events/27:542 blocked for more than 120 seconds.
Dec 31 00:50:05 lx1 kernel: Not tainted 2.6.32-573.8.1.el6.x86_64 #1
Dec 31 00:50:05 lx1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 31 00:50:05 lx1 kernel: events/27 D 000000000000001b 0 542 2 0x00000000
Dec 31 00:50:05 lx1 kernel: ffff880fe8caba50 0000000000000046 ffff880fe8caba18 ffff880fe8caba14
Dec 31 00:50:05 lx1 kernel: ffff880fe4d60148 ffff880fffe84600 0000d46b9a707ae8 ffff881037a159c0
Dec 31 00:50:05 lx1 kernel: 0000000000000400 000000010de949d3 ffff880fe8c9fad8 ffff880fe8cabfd8
Dec 31 00:50:05 lx1 kernel: Call Trace:
Dec 31 00:50:05 lx1 kernel: [<ffffffff81538d43>] io_schedule+0x73/0xc0
Dec 31 00:50:05 lx1 kernel: [<ffffffff81276038>] get_request_wait+0x108/0x1d0
Dec 31 00:50:05 lx1 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
Dec 31 00:50:05 lx1 kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
Dec 31 00:50:05 lx1 kernel: [<ffffffff81276756>] blk_get_request+0x46/0xa0
Dec 31 00:50:05 lx1 kernel: [<ffffffff813943a8>] scsi_execute+0x48/0x180
Dec 31 00:50:05 lx1 kernel: [<ffffffffa004db4a>] spi_execute+0xaa/0x130 [scsi_transport_spi]
Dec 31 00:50:05 lx1 kernel: [<ffffffff81537f5a>] ? printk+0x41/0x47
Dec 31 00:50:05 lx1 kernel: [<ffffffffa004df1f>] spi_dv_device_compare_inquiry+0x7f/0x120 [scsi_transport_spi]
Dec 31 00:50:05 lx1 kernel: [<ffffffffa004e12e>] spi_dv_device+0x16e/0x7b0 [scsi_transport_spi]
Dec 31 00:50:05 lx1 kernel: [<ffffffff8129232a>] ? kobject_get+0x1a/0x30
Dec 31 00:50:05 lx1 kernel: [<ffffffffa0086fc4>] mptspi_dv_device+0xb4/0x1b0 [mptspi]
Dec 31 00:50:05 lx1 kernel: [<ffffffffa00871ab>] mptspi_dv_renegotiate_work+0xeb/0x120 [mptspi]
Dec 31 00:50:05 lx1 kernel: [<ffffffffa00870c0>] ? mptspi_dv_renegotiate_work+0x0/0x120 [mptspi]
Dec 31 00:50:05 lx1 kernel: [<ffffffff8109a780>] worker_thread+0x170/0x2a0
Dec 31 00:50:05 lx1 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
Dec 31 00:50:05 lx1 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
Dec 31 00:50:05 lx1 kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
Dec 31 00:50:05 lx1 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Dec 31 00:50:05 lx1 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
Dec 31 00:50:05 lx1 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

Last edited by usao; 12-31-2015 at 08:51 AM.
 
Old 01-01-2016, 09:49 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,297

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Back in the day I had something like this. It turned out to be hardware; 2 disks on the same ide ribbon dragging the logic levels crazy. To do anything, you need a ramdisk.Run for 2 hours, then it suddenly spits out to stdout
/dev/hda not mounted
/dev/hdb not mounted

Yours looks like poor housekeeping. Search back in the logs from that block you showed us and fund the first error in the chain. And check out that scsi_transport_spi file or module. The error is almost certainly in the stuff you describe in the opening paragraph.I would also chart disk utilization every few hours. Does it build, or is it sudden?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
100% CPU utilization linuxandtsm Linux - Newbie 6 05-31-2012 01:38 AM
100% disk utilization on /boot partition!! locus84 Linux - Newbie 3 08-26-2011 03:10 AM
CPU 100 % Utilization rajaniyer123 Solaris / OpenSolaris 1 09-20-2007 11:31 AM
100 % CPU utilization Ganesh Kamat Linux - General 14 04-18-2003 01:34 PM
Utilization of a 100 MB switch... WeNdeL Linux - Networking 3 02-11-2003 09:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration