LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Task blocked for more than 120 seconds errors and crashes (https://www.linuxquestions.org/questions/linux-server-73/task-blocked-for-more-than-120-seconds-errors-and-crashes-890981/)

Red Squirrel 07-10-2011 06:25 PM

Task blocked for more than 120 seconds errors and crashes
 
I see tons of these in my dmesg, and it's causing my virtualbox vms to crash with disk IO errors. It's only affecting the Linux guests. What would cause this?


Code:

INFO: task tar:1865 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tar          D 0000000000000002    0  1865  1775
 ffff880210489b78 0000000000000086 0000000000000096 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff880084fec4a0 ffff88021b2996e0
 ffff880084fec7e8 00000001a01cca3c ffff88021aac8f30 ffff880084fec7e8
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffffa0035cc7>] wait_on_buffer+0x41/0x45 [ext3]
 [<ffffffffa0038160>] ext3_bread+0x4c/0x76 [ext3]
 [<ffffffffa003cb48>] htree_dirblock_to_tree+0x33/0x13d [ext3]
 [<ffffffff810c5fa9>] ? mntput+0x18/0x1a
 [<ffffffffa003ccc4>] ext3_htree_fill_tree+0x72/0x1ea [ext3]
 [<ffffffff810bfad5>] ? file_move+0x48/0x4e
 [<ffffffffa0033e6b>] ext3_readdir+0x188/0x5c8 [ext3]
 [<ffffffff810caa56>] ? filldir+0x0/0xc5
 [<ffffffff810c1f60>] ? cp_new_stat+0xe2/0xef
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff810caa56>] ? filldir+0x0/0xc5
 [<ffffffff810cac3d>] vfs_readdir+0x79/0xaf
 [<ffffffff810cadb2>] sys_getdents+0x7d/0xc4
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task kjournald:1219 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald    D ffff8801f3546700    0  1219      2
 ffff880218dddca0 0000000000000046 0000000000000086 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff880218c5c4a0 ffff8801f3515b80
 ffff880218c5c7e8 00000003a01cca3c ffff88021aac8f30 ffff880218c5c7e8
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff8113cb81>] ? submit_bio+0xe0/0xe9
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffffa0024deb>] wait_on_buffer+0x41/0x45 [jbd]
 [<ffffffffa002540e>] journal_commit_transaction+0x55d/0xf2f [jbd]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81049a0d>] ? try_to_del_timer_sync+0x58/0x63
 [<ffffffffa0028ad8>] kjournald+0xe3/0x23a [jbd]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffffa00289f5>] ? kjournald+0x0/0x23a [jbd]
 [<ffffffff810534bf>] kthread+0x49/0x76
 [<ffffffff81011719>] child_rip+0xa/0x11
 [<ffffffff81010a37>] ? restore_args+0x0/0x30
 [<ffffffff81053476>] ? kthread+0x0/0x76
 [<ffffffff8101170f>] ? child_rip+0x0/0x11

INFO: task VirtualBox:28939 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
VirtualBox    D ffff8801f3546700    0 28939  23819
 ffff88000c7118f8 0000000000000086 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff8801eecc2dc0 ffff880001172dc0
 ffff8801eecc3108 00000000a01cca3c ffff88021aac8f30 ffff8801eecc3108
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffff810e154f>] wait_on_buffer+0x41/0x45
 [<ffffffff810e22fe>] __block_prepare_write+0x2ba/0x2fd
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108d627>] ? add_to_page_cache_locked+0x9a/0xae
 [<ffffffff810e24b6>] block_write_begin+0x86/0xd8
 [<ffffffffa00374c3>] ext3_write_begin+0xdf/0x1a7 [ext3]
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108e0fa>] generic_file_buffered_write+0x14b/0x643
 [<ffffffff810d623b>] ? mnt_drop_write+0x82/0x143
 [<ffffffff810d453d>] ? mnt_want_write+0x77/0x8d
 [<ffffffff8108e9e7>] __generic_file_aio_write_nolock+0x25e/0x292
 [<ffffffff8108f233>] generic_file_aio_write+0x67/0xc3
 [<ffffffffa003443f>] ext3_file_write+0x1e/0x9f [ext3]
 [<ffffffff810beb88>] do_sync_write+0xe7/0x12d
 [<ffffffffa031e8a2>] ? RTMemFree+0x1e/0x20 [vboxdrv]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff812c20fa>] ? _spin_lock+0x9/0xc
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81120e84>] ? security_file_permission+0x11/0x13
 [<ffffffff810bf444>] vfs_write+0xab/0x105
 [<ffffffff810bf562>] sys_write+0x47/0x6f
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task VirtualBox:28957 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
VirtualBox    D ffff8801fb89c000    0 28957  23819
 ffff88010a04b8f8 0000000000000086 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff8801fcd044a0 ffff880001172dc0
 ffff8801fcd047e8 00000000a01cca3c ffff88021aac8f30 ffff8801fcd047e8
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffff810e154f>] wait_on_buffer+0x41/0x45
 [<ffffffff810e22fe>] __block_prepare_write+0x2ba/0x2fd
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108d627>] ? add_to_page_cache_locked+0x9a/0xae
 [<ffffffff810e24b6>] block_write_begin+0x86/0xd8
 [<ffffffffa00374c3>] ext3_write_begin+0xdf/0x1a7 [ext3]
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108e0fa>] generic_file_buffered_write+0x14b/0x643
 [<ffffffff810d623b>] ? mnt_drop_write+0x82/0x143
 [<ffffffff810d453d>] ? mnt_want_write+0x77/0x8d
 [<ffffffff8108e9e7>] __generic_file_aio_write_nolock+0x25e/0x292
 [<ffffffff8108f233>] generic_file_aio_write+0x67/0xc3
 [<ffffffffa003443f>] ext3_file_write+0x1e/0x9f [ext3]
 [<ffffffff810beb88>] do_sync_write+0xe7/0x12d
 [<ffffffffa031e8a2>] ? RTMemFree+0x1e/0x20 [vboxdrv]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81120e84>] ? security_file_permission+0x11/0x13
 [<ffffffff810bf444>] vfs_write+0xab/0x105
 [<ffffffff810bf562>] sys_write+0x47/0x6f
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task VirtualBox:1290 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
VirtualBox    D ffff880195f2da00    0  1290  23819
 ffff88011af6d8f8 0000000000000086 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff880001012dc0 ffff88021a915b80
 ffff880001013108 00000003a01cca3c ffff88021aac8f30 ffff880001013108
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffff810e154f>] wait_on_buffer+0x41/0x45
 [<ffffffff810e22fe>] __block_prepare_write+0x2ba/0x2fd
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108d627>] ? add_to_page_cache_locked+0x9a/0xae
 [<ffffffff810e24b6>] block_write_begin+0x86/0xd8
 [<ffffffffa00374c3>] ext3_write_begin+0xdf/0x1a7 [ext3]
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108e0fa>] generic_file_buffered_write+0x14b/0x643
 [<ffffffff810d623b>] ? mnt_drop_write+0x82/0x143
 [<ffffffff8108e9e7>] __generic_file_aio_write_nolock+0x25e/0x292
 [<ffffffff8108f233>] generic_file_aio_write+0x67/0xc3
 [<ffffffffa003443f>] ext3_file_write+0x1e/0x9f [ext3]
 [<ffffffff810beb88>] do_sync_write+0xe7/0x12d
 [<ffffffffa031e8a2>] ? RTMemFree+0x1e/0x20 [vboxdrv]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81120e84>] ? security_file_permission+0x11/0x13
 [<ffffffff810bf444>] vfs_write+0xab/0x105
 [<ffffffff810bf562>] sys_write+0x47/0x6f
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task fetchmail:1892 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fetchmail    D ffff880100775380    0  1892  1889
 ffff8800b56d99d8 0000000000000082 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff8802118896e0 ffff8801e7885b80
 ffff880211889a28 00000003a01cca3c ffff88021aac8f30 ffff880211889a28
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e1c3e>] ? ll_rw_block+0x9c/0xe7
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffffa003b45f>] ext3_find_entry+0x406/0x5ee [ext3]
 [<ffffffff81099d90>] ? zone_statistics+0x62/0x67
 [<ffffffff81056abb>] ? up_read+0x9/0xb
 [<ffffffff812c4d7c>] ? do_page_fault+0x6eb/0xaed
 [<ffffffff81037140>] ? check_preempt_wakeup+0x179/0x186
 [<ffffffffa003be56>] ext3_lookup+0x2f/0xc5 [ext3]
 [<ffffffff810d06fd>] ? d_alloc+0x18e/0x19b
 [<ffffffff810c63de>] do_lookup+0xd3/0x15d
 [<ffffffff810c787d>] __link_path_walk+0x37a/0x754
 [<ffffffff810c8125>] path_walk+0x61/0xc4
 [<ffffffff810c838c>] do_path_lookup+0x165/0x1be
 [<ffffffff810c999e>] user_path_at+0x52/0x8c
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8103c847>] ? finish_task_switch+0x31/0xc9
 [<ffffffff810c2298>] vfs_stat_fd+0x21/0x4e
 [<ffffffff810c232e>] sys_newstat+0x22/0x3c
 [<ffffffff810c60bf>] ? path_put+0x1d/0x21
 [<ffffffff8107cf8d>] ? audit_syscall_entry+0x101/0x135
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b


This started randomly, I have not changed anything.
Here's the output of uname -a if it helps:

Linux borg.loc 2.6.27.25-78.2.56.fc9.x86_64 #1 SMP Thu Jun 18 12:24:37 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

I hope I can find a solution to this, it's my main home server, I can't afford for this to go down.

Also not finding much on google other than it's not a distro specific issue as people with newer versions of Fedora, Ubuntu etc have this issue as well.

More info that may be useful:

OS drive is a single drive (can't recall brand)

Data drive is a MD raid 5 with 5 1TB Hitachi drives. When I recently added 2 new drives, I went through many RMAs until I finally got two drives that were not failing out of the box. Could these errors be disk related? I do have one drive that has lot CRC smart errors but it was like that from the start. I had given up with all the RMAs and said screw it. Should I replace this drive?

macemoneta 07-11-2011 12:46 PM

If you are getting accumulating CRC errors, you should replace the drive. Also, make sure you have TLER (time limited error recovery) enabled on the drives:
Code:

smartctl -l scterc /dev/sda
smartctl -l scterc /dev/sdb
...

If it's disabled, set it to 7 seconds:
Code:

smartctl -l scterc,70,70 /dev/sda
smartctl -l scterc,70,70 /dev/sdb
...

Some more information on TLER here.

Red Squirrel 07-11-2011 04:37 PM

I just get invalid argument when I try that command.

Code:

[root@borg ~]# smartctl -l scterc /dev/sdb
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=======> INVALID ARGUMENT TO -l: scterc
=======> VALID ARGUMENTS ARE: error, selftest, selective, directory, background, scttemp[sts|hist] <=======

Use smartctl -h to get a usage summary

[root@borg ~]#

I will order a new drive and replace the one with the errors. Maybe I'll be lucky and it will make the issues go away.


Another thing I get a lot is these errors:

CIFS VFS: Unexpected lookup error -112

Nothing at all on Google for that.

macemoneta 07-11-2011 06:15 PM

smartctl 5.41 supports the option. The CIFS message is related to a timeout accessing data on an SMB/CIFS network share.

Red Squirrel 07-11-2011 09:46 PM

Also, should I do a fsck on the file system? Could it maybe be a file system issue? It seems to be happening based on how much activity there is. With no VMs running I went a whole day, with VMs running, it started doing it within maybe 5 hours.

macemoneta 07-12-2011 04:01 AM

It happens when trying to write to a physically bad sector. A spare has to be allocated and remapped by the drive.

Red Squirrel 07-12-2011 11:23 AM

hmm so that would probably cause the raid to block for a bit which would then cause that right?


Now here's a weird one:

Code:

EXT3-fs error (device sdh1): ext3_find_entry: reading directory #2 offset 0
Buffer I/O error on device sdh1, logical block 0
lost page write due to I/O error on sdh1
EXT3-fs error (device sdh1): ext3_find_entry: reading directory #2 offset 0
------------[ cut here ]------------
WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x2f/0x87() (Tainted: G        W)
Hardware name:       
Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nls_utf8 cifs fuse bridge stp bnep rfcomm l2cap bluetooth vboxnetadp vboxnetflt vboxdrv ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi coretemp hwmon sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_log dm_multipath scsi_dh dm_mod raid456 async_xor async_memcpy async_tx xor pata_marvell pata_acpi i915 pcspkr iTCO_wdt serio_raw iTCO_vendor_support sr_mod cdrom sg i2c_i801 ata_generic sata_sil24 drm i2c_algo_bit i2c_core e1000e ahci libata sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: kvm]
Pid: 15835, comm: updatedb Tainted: G        W 2.6.27.25-78.2.56.fc9.x86_64 #1

Call Trace:
 [<ffffffff8103ff10>] warn_on_slowpath+0x80/0xae
 [<ffffffff812bfe29>] ? printk+0x67/0x6e
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff810e0e9f>] mark_buffer_dirty+0x2f/0x87
 [<ffffffffa003d609>] ext3_commit_super+0x50/0x66 [ext3]
 [<ffffffffa003ee46>] ext3_handle_error+0x86/0xad [ext3]
 [<ffffffffa003ef70>] ext3_error+0x9b/0xa9 [ext3]
 [<ffffffff812c0e04>] ? __wait_on_bit+0x6a/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] ? out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e1c3e>] ? ll_rw_block+0x9c/0xe7
 [<ffffffffa003b493>] ext3_find_entry+0x43a/0x5ee [ext3]
 [<ffffffff810b1259>] ? alloc_pages_current+0xb9/0xc2
 [<ffffffff810b8d88>] ? __slab_alloc+0x73/0x3d6
 [<ffffffff810d0593>] ? d_alloc+0x24/0x19b
 [<ffffffff812c20fa>] ? _spin_lock+0x9/0xc
 [<ffffffff810ce989>] ? d_rehash+0x43/0x4a
 [<ffffffffa003be56>] ext3_lookup+0x2f/0xc5 [ext3]
 [<ffffffff810d06fd>] ? d_alloc+0x18e/0x19b
 [<ffffffff810c63de>] do_lookup+0xd3/0x15d
 [<ffffffff810c7b05>] __link_path_walk+0x602/0x754
 [<ffffffff810c8125>] path_walk+0x61/0xc4
 [<ffffffff810c838c>] do_path_lookup+0x165/0x1be
 [<ffffffff810c999e>] user_path_at+0x52/0x8c
 [<ffffffff810c817b>] ? path_walk+0xb7/0xc4
 [<ffffffff810c1f60>] ? cp_new_stat+0xe2/0xef
 [<ffffffff810c21bc>] vfs_lstat_fd+0x1e/0x4b
 [<ffffffff810c220b>] sys_newlstat+0x22/0x3c
 [<ffffffff810c60bf>] ? path_put+0x1d/0x21
 [<ffffffff8107cf8d>] ? audit_syscall_entry+0x101/0x135
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

---[ end trace 4eaa2a86a8e2da22 ]---
Buffer I/O error on device sdh1, logical block 0
lost page write due to I/O error on sdh1

The weird part is, I don't even have a /dev/sdh.


As for the bad sectors, is there a way I can do a check? Since the drives are in raid, I can't do fsck on them individually. (just says invalid file system) I want to confirm which drive is failing so I know which one to replace. I wish when drives fail they would just drop out, end of story, this crap is what makes it so irritating.

macemoneta 07-12-2011 02:39 PM

Did you unplug a usb or similar device?

Red Squirrel 07-12-2011 04:51 PM

I do have an esata drive I use for backup, but it would have been unplugged the night before as the backup completed. Would it be normal for the error to be delayed? I've also never seen big errors like that over a drive being unplugged.

Red Squirrel 07-12-2011 08:15 PM

I ran a full fsck on the md device yesterday before bed as well, forgot to mention that. It found some errors which it corrected (I forget what, something about blocks incorrectly reported or something). Even with that error I posted, the VMs did not crap out. So MAYBE the fsck solved this issue, but still early to tell. I started up ALL my VMs (8 of them) so I'll see how it goes tonight when the backup jobs run. So far nothing funny in dmesg.

I will also be replacing that drive with the CRC errors regardless. Just ordered a 1TB WD black so it should be here by end of next week hopefully. I'm not buying Hitachi anymore, I've had too much bad luck with the 2010-2011 drives.

__PJ 07-18-2011 06:37 AM

Just a FYI - this may not be a disk issue. I'm trying to resolve the same issue on Debian and there are a number of reports showing this as an issue with the CFS CPU scheduler. And the majority of them seem to involve virtualization of some sort.

If I find a solution I'll post back.

browny_amiga 04-13-2012 08:56 PM

I see these error messages pop up everywhere, on my server at home, on customers systems, across distros. And everytime the system becomes extremely slow and soon goes beyond being usable and you cannot even do a shut down anymore. Load AVG climbs up to obscene heights, stuff like 20 is not unknown.

What is causing this? The tasks that get blocked are totally random, it could be anything.
Where is this error message from the kernel documented? How to avoid it? I also thought first that it was disk related, but this does not make sense, as the systems in question have all different disk and arrays, some raid, some single disks etc.. And sometimes even the kernel tasks themselves get blocked.

linuxnomo 06-23-2012 05:02 PM

Hi,

anything new on the topic? I am also experiencing these issues with random tasks (see the quote). At first, I also assumed a disk error and even installed a new disk. Since my system contains only RAIDs, I then struggled to find out, which RAID element to replace since all disks reported healthy status after long smartctl checks. I did an "apt-get update" as I hoped this problem could have been solved by a patch. But it no more 36h later my system experienced the same error.

Happy for any updates.

Best regards,

Linuxnomo

------

For information: here following you see an extract of dmesg and then ps -eaf. As I had only one remote ssh running on the system when the system crashed, these two (dmesg, ps -eaf) where the only commands that I could run. the ^C I issued after the ps -eaf stopped working. I was not able to revitalize any connection to the system thereafter.

Quote:

[ 0.202014] pnp 00:05: Plug and Play ACPI device, IDs PNP0b00 (active)

[ 80.766148] lp0: using parport0 (interrupt-driven).
[ 80.809979] ppdev: user-space parallel port driver
[ 91.009689] colord[2840]: segfault at 8 ip 000000000040bc6d sp 00007fff23e57d
e0 error 4 in colord[400000+20000]
[ 91.797741] powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 380
0+ (2 cpu cores) (version 2.20.00)
[ 91.797759] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects foun
d.
[ 91.797761] [Firmware Bug]: powernow-k8: Try again with latest BIOS.
[37593.201644] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery d
irectory
[37593.223946] NFSD: starting 90-second grace period
[41760.576037] INFO: task kjournald:1435 blocked for more than 120 seconds.
[41760.576043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41760.576046] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[41760.576052] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[41760.576057] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[41760.576062] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[41760.576067] Call Trace:
[41760.576076] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41760.576082] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41760.576085] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[41760.576089] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41760.576093] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[41760.576097] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41760.576102] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41760.576124] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[41760.576128] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41760.576133] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[41760.576140] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[41760.576144] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41760.576151] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[41760.576155] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[41760.576160] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[41760.576164] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[41760.576168] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[41760.576212] INFO: task java:22429 blocked for more than 120 seconds.
[41760.576214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41760.576217] java D ffff88003fc13540 0 22429 1 0x00000000
[41760.576221] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[41760.576226] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[41760.576230] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[41760.576235] Call Trace:
[41760.576240] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[41760.576244] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41760.576247] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[41760.576251] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41760.576258] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[41760.576262] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[41760.576266] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41760.576270] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41760.576274] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[41760.576280] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[41760.576292] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[41760.576296] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[41760.576301] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[41760.576305] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41760.576309] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[41760.576314] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[41760.576317] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41760.576321] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[41760.576326] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[41760.576332] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[41760.576335] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[41760.576339] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[41760.576343] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[41880.576031] INFO: task kjournald:1435 blocked for more than 120 seconds.
[41880.576036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41880.576040] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[41880.576046] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[41880.576051] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[41880.576055] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[41880.576060] Call Trace:
[41880.576069] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41880.576075] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41880.576079] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[41880.576083] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41880.576087] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[41880.576090] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41880.576096] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41880.576118] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[41880.576123] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41880.576127] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[41880.576135] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[41880.576139] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41880.576146] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[41880.576150] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[41880.576155] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[41880.576159] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[41880.576163] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[41880.576207] INFO: task java:22429 blocked for more than 120 seconds.
[41880.576209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41880.576212] java D ffff88003fc13540 0 22429 1 0x00000000
[41880.576217] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[41880.576221] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[41880.576226] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[41880.576230] Call Trace:
[41880.576236] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[41880.576239] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41880.576243] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[41880.576247] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41880.576254] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[41880.576258] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[41880.576262] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41880.576266] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41880.576270] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[41880.576276] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[41880.576289] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[41880.576293] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[41880.576299] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[41880.576302] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41880.576306] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[41880.576311] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[41880.576314] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41880.576319] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[41880.576324] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[41880.576329] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[41880.576333] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[41880.576336] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[41880.576340] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[42000.576029] INFO: task kjournald:1435 blocked for more than 120 seconds.
[42000.576035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42000.576038] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[42000.576044] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[42000.576049] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[42000.576054] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[42000.576058] Call Trace:
[42000.576068] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42000.576074] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42000.576077] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[42000.576081] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42000.576085] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[42000.576088] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42000.576094] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42000.576116] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[42000.576121] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42000.576126] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[42000.576133] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[42000.576137] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42000.576144] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[42000.576148] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[42000.576153] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[42000.576157] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[42000.576161] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[42000.576206] INFO: task java:22429 blocked for more than 120 seconds.
[42000.576209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42000.576211] java D ffff88003fc13540 0 22429 1 0x00000000
[42000.576216] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[42000.576221] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[42000.576225] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[42000.576229] Call Trace:
[42000.576235] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[42000.576238] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42000.576242] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[42000.576246] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42000.576253] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[42000.576258] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[42000.576261] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42000.576265] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42000.576269] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[42000.576276] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[42000.576289] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[42000.576294] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[42000.576299] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[42000.576303] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42000.576307] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[42000.576311] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[42000.576315] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42000.576319] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[42000.576324] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[42000.576330] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[42000.576334] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[42000.576337] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[42000.576341] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[42120.576043] INFO: task kjournald:1435 blocked for more than 120 seconds.
[42120.576049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42120.576052] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[42120.576058] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[42120.576064] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[42120.576068] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[42120.576073] Call Trace:
[42120.576082] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42120.576088] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42120.576091] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[42120.576095] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42120.576099] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[42120.576102] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42120.576108] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42120.576130] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[42120.576134] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42120.576139] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[42120.576147] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[42120.576151] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42120.576158] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[42120.576162] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[42120.576167] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[42120.576171] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[42120.576175] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[42120.576218] INFO: task java:22429 blocked for more than 120 seconds.
[42120.576220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42120.576223] java D ffff88003fc13540 0 22429 1 0x00000000
[42120.576227] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[42120.576232] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[42120.576236] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[42120.576241] Call Trace:
[42120.576246] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[42120.576250] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42120.576253] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[42120.576257] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42120.576265] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[42120.576269] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[42120.576273] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42120.576277] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42120.576281] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[42120.576287] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[42120.576300] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[42120.576304] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[42120.576309] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[42120.576313] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42120.576317] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[42120.576321] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[42120.576325] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42120.576329] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[42120.576334] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[42120.576339] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[42120.576342] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[42120.576346] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[42120.576350] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[42240.576029] INFO: task kjournald:1435 blocked for more than 120 seconds.
[42240.576035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42240.576038] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[42240.576044] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[42240.576050] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[42240.576054] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[42240.576059] Call Trace:
[42240.576068] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42240.576074] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42240.576077] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[42240.576081] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42240.576085] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[42240.576089] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42240.576094] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42240.576115] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[42240.576120] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42240.576124] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[42240.576131] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[42240.576135] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42240.576142] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[42240.576146] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[42240.576151] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[42240.576156] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[42240.576159] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[42240.576203] INFO: task java:22429 blocked for more than 120 seconds.
[42240.576205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42240.576208] java D ffff88003fc13540 0 22429 1 0x00000000
[42240.576212] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[42240.576217] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[42240.576221] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[42240.576226] Call Trace:
[42240.576231] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[42240.576235] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42240.576239] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[42240.576242] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42240.576250] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[42240.576254] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[42240.576257] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42240.576261] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42240.576265] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[42240.576272] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[42240.576284] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[42240.576289] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[42240.576294] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[42240.576298] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42240.576302] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[42240.576307] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[42240.576310] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42240.576314] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[42240.576319] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[42240.576324] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[42240.576328] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[42240.576331] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[42240.576335] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
debian-amd64:~# ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jun22 ? 00:00:01 init [2]
root 2 0 0 Jun22 ? 00:00:00 [kthreadd]
root 3 2 0 Jun22 ? 00:00:00 [ksoftirqd/0]
root 4 2 0 Jun22 ? 00:00:03 [kworker/0:0]
root 5 2 0 Jun22 ? 00:00:00 [kworker/u:0]
root 6 2 0 Jun22 ? 00:00:00 [migration/0]
root 7 2 0 Jun22 ? 00:00:00 [watchdog/0]
root 8 2 0 Jun22 ? 00:00:00 [migration/1]
root 10 2 0 Jun22 ? 00:00:01 [ksoftirqd/1]
root 12 2 0 Jun22 ? 00:00:00 [watchdog/1]
root 13 2 0 Jun22 ? 00:00:00 [cpuset]
root 14 2 0 Jun22 ? 00:00:00 [khelper]
root 15 2 0 Jun22 ? 00:00:00 [kdevtmpfs]
root 16 2 0 Jun22 ? 00:00:00 [netns]
root 17 2 0 Jun22 ? 00:00:00 [sync_supers]
root 18 2 0 Jun22 ? 00:00:00 [bdi-default]
root 19 2 0 Jun22 ? 00:00:00 [kintegrityd]
root 20 2 0 Jun22 ? 00:00:00 [kblockd]
root 23 2 0 Jun22 ? 00:00:00 [khungtaskd]
root 24 2 0 Jun22 ? 00:00:01 [kswapd0]
root 25 2 0 Jun22 ? 00:00:00 [ksmd]
root 26 2 0 Jun22 ? 00:00:00 [khugepaged]
root 27 2 0 Jun22 ? 00:00:00 [fsnotify_mark]
root 28 2 0 Jun22 ? 00:00:00 [crypto]
root 110 2 0 Jun22 ? 00:00:00 [khubd]
root 159 2 0 Jun22 ? 00:00:00 [ata_sff]
root 165 2 0 Jun22 ? 00:00:03 [kworker/0:2]
root 166 2 0 Jun22 ? 00:00:00 [scsi_eh_0]
root 169 2 0 Jun22 ? 00:00:00 [scsi_eh_1]
root 170 2 0 Jun22 ? 00:00:00 [scsi_eh_2]
root 173 2 0 Jun22 ? 00:00:00 [kworker/u:2]
root 179 2 0 Jun22 ? 00:00:00 [scsi_eh_3]
root 180 2 0 Jun22 ? 00:00:00 [scsi_eh_4]
root 181 2 0 Jun22 ? 00:00:00 [scsi_eh_5]
root 182 2 0 Jun22 ? 00:00:00 [scsi_eh_6]
root 264 2 0 Jun22 ? 00:00:00 [md]
root 274 2 0 Jun22 ? 00:00:08 [md2_raid5]
root 284 2 0 Jun22 ? 00:00:00 [md3_raid1]
root 296 2 0 Jun22 ? 00:00:12 [md1_raid5]
root 303 2 0 Jun22 ? 00:00:00 [md4_raid1]
root 321 2 0 Jun22 ? 00:00:01 [kjournald]
root 416 1 0 Jun22 ? 00:00:00 udevd --daemon
root 560 2 0 Jun22 ? 00:00:00 [edac-poller]
root 571 2 0 Jun22 ? 00:00:00 [kpsmoused]
root 582 2 0 Jun22 ? 00:00:06 [w1_bus_master1]
root 995 2 0 Jun22 ? 00:00:00 [flush-9:2]
root 1127 416 0 Jun22 ? 00:00:00 udevd --daemon
root 1434 2 0 Jun22 ? 00:00:00 [kjournald]
root 1435 2 0 Jun22 ? 00:00:01 [kjournald]
root 1438 2 0 Jun22 ? 00:00:00 [xfs_mru_cache]
root 1439 2 0 Jun22 ? 00:00:00 [xfslogd]
root 1440 2 0 Jun22 ? 00:00:00 [xfsdatad]
root 1441 2 0 Jun22 ? 00:00:00 [xfsconvertd]
root 1442 2 0 Jun22 ? 00:00:00 [xfsbufd/md4]
root 1443 2 0 Jun22 ? 00:00:09 [xfsaild/md4]
root 1721 1 0 Jun22 ? 00:00:00 /sbin/rpcbind -w
root 1753 2 0 Jun22 ? 00:00:00 [rpciod]
root 1756 2 0 Jun22 ? 00:00:00 [nfsiod]
root 1999 2 0 Jun22 ? 00:00:00 [lockd]
syslog 2096 1 0 Jun22 ? 00:00:00 /sbin/syslogd -u syslog
103 2128 1 0 Jun22 ? 00:00:00 /usr/bin/dbus-daemon --system
root 2153 1 0 Jun22 ? 00:00:00 /usr/sbin/NetworkManager
root 2178 1 0 Jun22 ? 00:00:00 /usr/sbin/acpid
root 2205 1 0 Jun22 ? 00:00:00 /usr/lib/policykit-1/polkitd --n
nobody 2225 1 0 Jun22 ? 00:00:00 /usr/sbin/openvpn --writepid /va
root 2254 1 0 Jun22 ? 00:00:00 /usr/sbin/modem-manager
bind 2258 1 0 Jun22 ? 00:00:01 /usr/sbin/named -u bind
root 2298 1 0 Jun22 ? 00:00:00 /usr/sbin/gdm3
root 2345 2298 0 Jun22 ? 00:00:00 /usr/lib/gdm3/gdm-simple-slave -
root 2359 2345 0 Jun22 tty7 00:00:04 /usr/bin/Xorg :0 -br -verbose -n
root 2361 1 0 Jun22 ? 00:00:02 /usr/sbin/apache2 -k start
daemon 2388 1 0 Jun22 ? 00:00:00 /usr/sbin/atd
www-data 2420 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2421 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2422 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2423 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2424 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
avahi 2426 1 0 Jun22 ? 00:00:00 avahi-daemon: running [debian-am
avahi 2427 2426 0 Jun22 ? 00:00:00 avahi-daemon: chroot helper
root 2482 1 0 Jun22 ? 00:00:00 /usr/sbin/bluetoothd
root 2489 416 0 Jun22 ? 00:00:00 udevd --daemon
root 2494 2 0 Jun22 ? 00:00:00 [krfcommd]
root 2647 1 0 Jun22 ? 00:00:00 /usr/sbin/console-kit-daemon --n
clamav 2714 1 0 Jun22 ? 00:00:21 /usr/bin/freshclam -d --quiet
121 2799 1 0 Jun22 ? 00:00:00 /usr/bin/dbus-daemon --fork --pr
121 2800 2345 0 Jun22 ? 00:00:00 /usr/bin/gnome-session --session
root 2806 1 0 Jun22 ? 00:00:00 /usr/sbin/cron
root 2838 1 0 Jun22 ? 00:00:00 /usr/sbin/cupsd -C /etc/cups/cup
121 2844 1 0 Jun22 ? 00:00:00 /usr/lib/x86_64-linux-gnu/gconf/
121 2847 2800 0 Jun22 ? 00:00:03 /usr/lib/gnome-settings-daemon/g
121 2849 2800 0 Jun22 ? 00:00:00 /usr/lib/at-spi/at-spi-registryd
colord 2851 1 0 Jun22 ? 00:00:00 /usr/lib/x86_64-linux-gnu/colord
root 2881 1 0 Jun22 ? 00:00:00 /usr/lib/upower/upowerd
121 2883 1 0 Jun22 ? 00:00:00 /usr/lib/bonobo-activation/bonob
root 2887 1 0 Jun22 ? 00:00:06 ddclient - sleeping for 100 seco
121 3093 1 0 Jun22 ? 00:00:00 /usr/lib/gvfs/gvfsd
dirmngr 3094 1 0 Jun22 ? 00:00:08 /usr/bin/dirmngr --daemon --sh
121 3142 1 0 Jun22 ? 00:00:00 /usr/bin/pulseaudio --start --lo
rtkit 3144 1 0 Jun22 ? 00:00:00 /usr/lib/rtkit/rtkit-daemon
root 3194 1 0 Jun22 ? 00:00:04 /usr/sbin/kerneloops
121 3201 2800 0 Jun22 ? 00:00:00 metacity
root 3210 1 0 Jun22 ? 00:00:00 /bin/dd bs=1 if=/proc/kmsg of=/v
klog 3211 1 0 Jun22 ? 00:00:00 /sbin/klogd -P /var/run/klogd/km
root 3328 1 0 Jun22 ? 00:00:00 /sbin/mdadm --monitor --pid-file
121 3354 2800 0 Jun22 ? 00:00:02 /usr/lib/gdm3/gdm-simple-greeter
121 3355 2800 0 Jun22 ? 00:00:00 /usr/lib/policykit-1-gnome/polki
root 3357 1 0 Jun22 ? 00:00:00 /usr/sbin/cnid_metad -l log_note
root 3361 2345 0 Jun22 ? 00:00:00 /usr/lib/gdm3/gdm-session-worker
root 3363 1 0 Jun22 ? 00:00:00 /usr/lib/accountsservice/account
root 3365 1 0 Jun22 ? 00:00:00 /usr/sbin/afpd -U uams_dhx2.so,u
ntp 3417 1 0 Jun22 ? 00:00:04 /usr/sbin/ntpd -p /var/run/ntpd.
root 3446 1 0 Jun22 ? 00:00:00 /usr/sbin/inetd
root 3597 1 0 Jun22 ? 00:00:00 /usr/sbin/nmbd -D
^C

k0rdi 06-26-2012 07:40 AM

Hello,

I have similar problem with my xen guest

kernel version: 2.6.32-5-686-bigmem
linux debian 6.0

many entries

Quote:

[13196101.641021] INFO: task postgres:29772 blocked for more than 120 seconds.
[13196101.641027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13196101.641032] postgres D ca2c0f80 0 29772 30707 0x00000000
[13196101.641037] ecb43b80 00000286 c10b998c ca2c0f80 c10b93d0 c1441e20 c1441e20 d21a3f00
[13196101.641048] ecb43d3c c50bae20 00000000 2efb7d95 002ee1a9 80000000 c1004433 00000000
[13196101.641058] 53388067 ecb43d3c ecaeab00 d11a4000 53388067 80000001 c10a2875 00000000
[13196101.641069] Call Trace:
[13196101.641073] [<c10b998c>] ? nameidata_to_filp+0x29/0x3c
[13196101.641076] [<c10b93d0>] ? generic_file_open+0x0/0x39
[13196101.641080] [<c1004433>] ? pfn_to_mfn+0xe/0x14
[13196101.641083] [<c10a2875>] ? handle_mm_fault+0x40c/0xa22
[13196101.641087] [<c127f55f>] ? __mutex_lock_common+0xe8/0x13b
[13196101.641091] [<c127f5c1>] ? __mutex_lock_slowpath+0xf/0x11
[13196101.641095] [<c127f652>] ? mutex_lock+0x17/0x24
[13196101.641099] [<c127f652>] ? mutex_lock+0x17/0x24
[13196101.641102] [<c10b9640>] ? do_sys_open+0xc9/0xdf
[13196101.641106] [<c10bba21>] ? generic_file_llseek+0x17/0x42
[13196101.641110] [<c10bba0a>] ? generic_file_llseek+0x0/0x42
[13196101.641113] [<c10ba8e5>] ? vfs_llseek+0x30/0x34
[13196101.641117] [<c10bb91f>] ? sys_llseek+0x3a/0x7a
[13196101.641120] [<c10b969a>] ? sys_open+0x1e/0x23
[13196101.641123] [<c100813b>] ? sysenter_do_call+0x12/0x28

grabblecat 07-10-2012 02:51 AM

I have the exact same issue. The load average hit 3000+ . I was wondering if anyone had nfs shares mounted? I've narrowed it down to either being a problem with the drives or a nfs issue although it could be something else.
This link seems promising as it explains everything rather nicely, however I still believe the problem is related to nfs

blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/

rajthampi 07-10-2012 12:49 PM

Quote:

Originally Posted by grabblecat (Post 4723686)
I have the exact same issue. The load average hit 3000+ . I was wondering if anyone had nfs shares mounted? I've narrowed it down to either being a problem with the drives or a nfs issue although it could be something else.
This link seems promising as it explains everything rather nicely, however I still believe the problem is related to nfs

blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/

I am also experiencing the same issue, and with NFS actually. Trying to import and unpack Oracle VM template

the log shows stuff like below

Jul 10 20:00:39 oravms kernel: INFO: task nfsd:2415 blocked for more than 120 se
conds.
Jul 10 20:00:39 oravms kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs
" disables this message.
Jul 10 20:00:39 oravms kernel: nfsd D 0000000000000000 0 2415
2 0x00000000


If anybody knows a solution, please!

regards,

raj

anonymous_itop 11-05-2013 02:39 AM

This site for help you.

http://blog.ronnyegner-consulting.de...comment-223981

cwcarlson 04-01-2014 01:12 PM

Interesting that this has just cropped up for us.

We are running a VMWare server with about 6 VMs, each running Ubuntu 12.04. We've had no problem until last week when I did a restart on two of the systems. For whatever reason, logging in, the systems told us a restart was required. Ever since then, we have these Java errors followed by a call trace:

INFO: task java:1120 blocked for more than 120 seconds.
Apr 1 10:21:56 osbuild4 kernel: [ 2156.765270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 1 10:21:56 osbuild4 kernel: [ 2156.765393] java D c1010847 0 1120 1110 0x00000000
Apr 1 10:21:56 osbuild4 kernel: [ 2156.765394] ee1b5e24 00000086 ee1b5dd0 c1010847 0000000b f7508ca0 c1930e00 c1930e00
Apr 1 10:21:56 osbuild4 kernel: [ 2156.765398] f2c40326 000001d1 f7987e00 f6804bc0 f6b53f20 00000000 ee1b5e5c c15a5d69
Apr 1 10:21:56 osbuild4 kernel: [ 2156.765401] ffffffec 00000000 ee1b5dfc c15a7f3d 67776bb0 00000000 f6804bc0 ed2c2400


All times are GMT -5. The time now is 05:28 PM.