Hi all,
On 17/06 at 10:27 AM one of our Oracle servers was restarted for an unknown reason. We really have to know why this happened and how we can prevent this in the future.
This is a physical server and uses an EVA HSV300 as SAN based storage. This is our research so far.
In /var/log/messages, we can see that the Jounaling Block Device was multiple times blocked for more then 120 seconds. This must lead to an enormous load of queued IO data.
Code:
Jun 17 10:00:08 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:01:49 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:03:29 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:05:09 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:06:49 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:08:29 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:10:09 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:11:50 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:13:30 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:14:48 vptcvl-srvora2 auditd[19956]: Audit daemon rotating log files
Jun 17 10:15:10 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:16:50 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:18:30 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:20:11 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:21:51 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:23:31 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:25:11 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:26:51 vptcvl-srvora2 hpasmlited[27594]: hpIoctl(): IOCTL Timeout -- Waiting on IPMI
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task jbd2/dm-26-8:11123 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: jbd2/dm-26-8 D 0000000000000002 0 11123 2 0x00000000
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff88102473da80 0000000000000046 ffff8801007b45c0 0000000000000441
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff88102473d9f0 ffffffff810149b9 ffff88102473da30 0000000000000282
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881027383058 ffff88102473dfd8 000000000000fbc8 ffff881027383058
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f930>] ? sync_page+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815280a3>] io_schedule+0x73/0xc0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f96d>] sync_page+0x3d/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528b6f>] __wait_on_bit+0x5f/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111fba3>] wait_on_page_bit+0x73/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b320>] ? wake_bit_function+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81135bf5>] ? pagevec_lookup_tag+0x25/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81134b15>] write_cache_pages+0x395/0x4c0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa000443c>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811336a0>] ? __writepage+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01bb1e1>] ? __journal_remove_journal_head+0xc1/0x130 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01bb274>] ? jbd2_journal_remove_journal_head+0x24/0x40 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81134c64>] generic_writepages+0x24/0x30
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b34d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b39ed>] jbd2_journal_commit_transaction+0x37d/0x1500 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8108412c>] ? lock_timer_base+0x3c/0x70
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81084d2b>] ? try_to_del_timer_sync+0x7b/0xe0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9a48>] kjournald2+0xb8/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9990>] ? kjournald2+0x0/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109aef6>] kthread+0x96/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task jbd2/dm-27-8:11125 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: jbd2/dm-27-8 D 0000000000000001 0 11125 2 0x00000000
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8810260dbc20 0000000000000046 ffff8810260dbbe0 ffffffffa000443c
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8810260dbb90 ffffffff810149b9 ffff8810260dbbd0 ffffffff810a70a1
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8810201aa638 ffff8810260dbfd8 000000000000fbc8 ffff8810201aa638
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa000443c>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811bf120>] ? sync_buffer+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815280a3>] io_schedule+0x73/0xc0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811bf160>] sync_buffer+0x40/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528b6f>] __wait_on_bit+0x5f/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811bf120>] ? sync_buffer+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528c18>] out_of_line_wait_on_bit+0x78/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b320>] ? wake_bit_function+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b127>] ? bit_waitqueue+0x17/0xd0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811bf116>] __wait_on_buffer+0x26/0x30
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b4116>] jbd2_journal_commit_transaction+0xaa6/0x1500 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81084d2b>] ? try_to_del_timer_sync+0x7b/0xe0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9a48>] kjournald2+0xb8/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9990>] ? kjournald2+0x0/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109aef6>] kthread+0x96/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task jbd2/dm-28-8:11127 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: jbd2/dm-28-8 D 0000000000000001 0 11127 2 0x00000000
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024623b30 0000000000000046 0000000000000000 ffff881024f31e40
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024623aa0 ffffffff810149b9 ffff881024623ae0 0000000000000286
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8810201bfaf8 ffff881024623fd8 000000000000fbc8 ffff8810201bfaf8
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f930>] ? sync_page+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815280a3>] io_schedule+0x73/0xc0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f96d>] sync_page+0x3d/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528b6f>] __wait_on_bit+0x5f/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111fba3>] wait_on_page_bit+0x73/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b320>] ? wake_bit_function+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81135bf5>] ? pagevec_lookup_tag+0x25/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111ffcb>] wait_on_page_writeback_range+0xfb/0x190
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81267020>] ? submit_bio+0x70/0x120
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8112008f>] filemap_fdatawait+0x2f/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b3e59>] jbd2_journal_commit_transaction+0x7e9/0x1500 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81084d2b>] ? try_to_del_timer_sync+0x7b/0xe0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9a48>] kjournald2+0xb8/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9990>] ? kjournald2+0x0/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109aef6>] kthread+0x96/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task jbd2/dm-30-8:11129 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: jbd2/dm-30-8 D 0000000000000001 0 11129 2 0x00000000
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024aa1b30 0000000000000046 ffff8809b7ef93c0 ffff881024f31e40
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024aa1aa0 ffffffff810149b9 ffff881024aa1ae0 0000000000000286
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881025c10638 ffff881024aa1fd8 000000000000fbc8 ffff881025c10638
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f930>] ? sync_page+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815280a3>] io_schedule+0x73/0xc0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f96d>] sync_page+0x3d/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528b6f>] __wait_on_bit+0x5f/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111fba3>] wait_on_page_bit+0x73/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b320>] ? wake_bit_function+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81135bf5>] ? pagevec_lookup_tag+0x25/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111ffcb>] wait_on_page_writeback_range+0xfb/0x190
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81267020>] ? submit_bio+0x70/0x120
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8112008f>] filemap_fdatawait+0x2f/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b3e59>] jbd2_journal_commit_transaction+0x7e9/0x1500 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9a48>] kjournald2+0xb8/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9990>] ? kjournald2+0x0/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109aef6>] kthread+0x96/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task jbd2/dm-29-8:11131 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: jbd2/dm-29-8 D 0000000000000003 0 11131 2 0x00000000
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024bafb30 0000000000000046 ffff8805e9b53300 ffff881024f31e40
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024bafaa0 ffffffff810149b9 ffff881024bafae0 0000000000000286
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8810207bd098 ffff881024baffd8 000000000000fbc8 ffff8810207bd098
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f930>] ? sync_page+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815280a3>] io_schedule+0x73/0xc0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f96d>] sync_page+0x3d/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528b6f>] __wait_on_bit+0x5f/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111fba3>] wait_on_page_bit+0x73/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b320>] ? wake_bit_function+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81135bf5>] ? pagevec_lookup_tag+0x25/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111ffcb>] wait_on_page_writeback_range+0xfb/0x190
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81267020>] ? submit_bio+0x70/0x120
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8112008f>] filemap_fdatawait+0x2f/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b3e59>] jbd2_journal_commit_transaction+0x7e9/0x1500 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81084d2b>] ? try_to_del_timer_sync+0x7b/0xe0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9a48>] kjournald2+0xb8/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9990>] ? kjournald2+0x0/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109aef6>] kthread+0x96/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task jbd2/dm-25-8:11133 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: jbd2/dm-25-8 D 0000000000000002 0 11133 2 0x00000000
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024bd1b30 0000000000000046 0000000000000000 ffff881027294e00
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff881024bd1aa0 ffffffff810149b9 ffff881024bd1ae0 0000000000000286
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8810201ab098 ffff881024bd1fd8 000000000000fbc8 ffff8810201ab098
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810149b9>] ? read_tsc+0x9/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f930>] ? sync_page+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815280a3>] io_schedule+0x73/0xc0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111f96d>] sync_page+0x3d/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528b6f>] __wait_on_bit+0x5f/0x90
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111fba3>] wait_on_page_bit+0x73/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b320>] ? wake_bit_function+0x0/0x50
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81135bf5>] ? pagevec_lookup_tag+0x25/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8111ffcb>] wait_on_page_writeback_range+0xfb/0x190
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81267020>] ? submit_bio+0x70/0x120
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8112008f>] filemap_fdatawait+0x2f/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b3e59>] jbd2_journal_commit_transaction+0x7e9/0x1500 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9a48>] kjournald2+0xb8/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01b9990>] ? kjournald2+0x0/0x220 [jbd2]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109aef6>] kthread+0x96/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task multipathd:9904 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: multipathd D 0000000000000003 0 9904 1 0x00000080
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff880cc0771968 0000000000000086 ffff880cc07718d8 ffffffff8127bf7f
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff880e683a5bb0 ffff88102091cfa8 ffff880cc07718e8 ffffffff8108064b
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff88000fb83af8 ffff880cc0771fd8 000000000000fbc8 ffff88000fb83af8
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8127bf7f>] ? cfq_init_prio_data+0x6f/0x140
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8108064b>] ? round_jiffies_up+0x1b/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815287b5>] schedule_timeout+0x215/0x2e0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810841dd>] ? del_timer+0x7d/0xe0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528433>] wait_for_common+0x123/0x180
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8152854d>] wait_for_completion+0x1d/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8126d9dc>] blk_execute_rq+0x8c/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81225f79>] ? security_capable+0x29/0x30
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8107f6aa>] ? capable+0x2a/0x60
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81271c9c>] ? blk_verify_command+0x2c/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81271f05>] sg_io+0x215/0x3d0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81272890>] scsi_cmd_ioctl+0x400/0x470
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81069384>] ? enqueue_task_fair+0x64/0x100
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81059216>] ? enqueue_task+0x66/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81272951>] scsi_cmd_blk_ioctl+0x51/0x70
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01a15bf>] sd_ioctl+0xaf/0x110 [sd_mod]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8126f627>] __blkdev_driver_ioctl+0x67/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8126faad>] blkdev_ioctl+0x1ed/0x6e0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811c46cc>] block_ioctl+0x3c/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810b1c0b>] ? sys_futex+0x7b/0x170
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task multipathd:9905 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: multipathd D 0000000000000003 0 9905 1 0x00000080
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff880ab3949968 0000000000000086 ffff880ab39498d8 ffffffff8127bf7f
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff880b6bc44470 ffff881020a066b8 ffff880ab39498e8 ffffffff8108064b
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8805b365d098 ffff880ab3949fd8 000000000000fbc8 ffff8805b365d098
Jun 17 10:27:30 vptcvl-srvora2 kernel: Call Trace:
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8127bf7f>] ? cfq_init_prio_data+0x6f/0x140
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8108064b>] ? round_jiffies_up+0x1b/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff815287b5>] schedule_timeout+0x215/0x2e0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810841dd>] ? del_timer+0x7d/0xe0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81528433>] wait_for_common+0x123/0x180
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8152854d>] wait_for_completion+0x1d/0x20
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8126d9dc>] blk_execute_rq+0x8c/0xf0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81225f79>] ? security_capable+0x29/0x30
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8107f6aa>] ? capable+0x2a/0x60
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81271c9c>] ? blk_verify_command+0x2c/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81271f05>] sg_io+0x215/0x3d0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81272890>] scsi_cmd_ioctl+0x400/0x470
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81069384>] ? enqueue_task_fair+0x64/0x100
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81059216>] ? enqueue_task+0x66/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff81272951>] scsi_cmd_blk_ioctl+0x51/0x70
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffffa01a15bf>] sd_ioctl+0xaf/0x110 [sd_mod]
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8126f627>] __blkdev_driver_ioctl+0x67/0x80
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8126faad>] blkdev_ioctl+0x1ed/0x6e0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff811c46cc>] block_ioctl+0x3c/0x40
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810b1c0b>] ? sys_futex+0x7b/0x170
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
Jun 17 10:27:30 vptcvl-srvora2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Jun 17 10:27:30 vptcvl-srvora2 kernel: INFO: task multipathd:9906 blocked for more than 120 seconds.
Jun 17 10:27:30 vptcvl-srvora2 kernel: Tainted: P --------------- 2.6.32-431.el6.x86_64 #1
Jun 17 10:27:30 vptcvl-srvora2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 17 10:27:30 vptcvl-srvora2 kernel: multipathd D 0000000000000003 0 9906 1 0x00000080
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff880cc06a7968 0000000000000086 ffff880cc06a78d8 ffffffff8127bf7f
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff880b6bc44380 ffff88102093c4b8 ffff880cc06a78e8 ffffffff8108064b
Jun 17 10:27:30 vptcvl-srvora2 kernel: ffff8805b365daf8 ffff880cc06a7fd8 000000000000fbc8 ffff8805b365daf8
In /var/log/multipathd.log we can often see the message "tur checker not finished".
With the following command, we could see that the multipathing device /dev/dm-6 (dev253-6) had a big await value just before the restart.
sar -d -f /var/log/sa/sa17 -s 10:20:00 -e 10:40:00
Code:
10:20:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
...
10:30:01 AM dev253-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:30:01 AM dev253-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:30:01 AM dev253-5 5.67 0.00 30.23 5.33 0.70 7.44 124.12 70.39
10:30:01 AM dev253-6 0.04 0.46 0.19 17.45 8.25 16639.36 15735.64 59.04
10:30:01 AM dev253-7 3.07 49.89 13.05 20.50 4.20 10.09 224.15 68.81
10:30:01 AM dev253-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:30:01 AM dev253-9 1.06 0.71 0.35 1.00 1.33 10.54 633.15 67.28
10:30:01 AM dev253-10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...
With the -p (pretty) parameter, we are able to see the multipath alias "CBSTSTDATA2".
sar -d -p -f /var/log/sa/sa17 -s 10:20:00 -e 10:40:00
Code:
10:20:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
...
10:30:01 AM mpathao 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:30:01 AM CBSPRODFRA2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:30:01 AM ORASRV02-ATOSHOME 5.67 0.00 30.23 5.33 0.70 7.44 124.12 70.39
10:30:01 AM CBSTSTDATA2 0.04 0.46 0.19 17.45 8.25 16639.36 15735.64 59.04
10:30:01 AM BACKUP 3.07 49.89 13.05 20.50 4.20 10.09 224.15 68.81
10:30:01 AM AOCR 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:30:01 AM BOCR 1.06 0.71 0.35 1.00 1.33 10.54 633.15 67.28
10:30:01 AM CBSTSTDATA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...
We see that dev253-6 has problems just before the restart. Could this lead to an automatic restart?
A second question, can I be sure that CBSTSTDATA2 was linked to dev253-6 before the restart?
How is the "-p" parameter translating dev253-6 to CBSTSTDATA2? Is it using a mapping file or are both names saved in the binary file /var/log/sa/sa17?
We also know that the mapping between those names are changed when the server reboots. So was CBSDATA2 a few months ago linked to /dev/dm-36 (dev253-36).
We also have file based backup of this machine before the restart, maybe there exists a mapping file that we can use to be sure?
Many thanks,
Dieter