LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 07-10-2011, 06:25 PM   #1
Red Squirrel
Member
 
Registered: Dec 2003
Distribution: FC9 on main server
Posts: 753

Rep: Reputation: 36
Task blocked for more than 120 seconds errors and crashes


I see tons of these in my dmesg, and it's causing my virtualbox vms to crash with disk IO errors. It's only affecting the Linux guests. What would cause this?


Code:
INFO: task tar:1865 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tar           D 0000000000000002     0  1865   1775
 ffff880210489b78 0000000000000086 0000000000000096 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff880084fec4a0 ffff88021b2996e0
 ffff880084fec7e8 00000001a01cca3c ffff88021aac8f30 ffff880084fec7e8
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffffa0035cc7>] wait_on_buffer+0x41/0x45 [ext3]
 [<ffffffffa0038160>] ext3_bread+0x4c/0x76 [ext3]
 [<ffffffffa003cb48>] htree_dirblock_to_tree+0x33/0x13d [ext3]
 [<ffffffff810c5fa9>] ? mntput+0x18/0x1a
 [<ffffffffa003ccc4>] ext3_htree_fill_tree+0x72/0x1ea [ext3]
 [<ffffffff810bfad5>] ? file_move+0x48/0x4e
 [<ffffffffa0033e6b>] ext3_readdir+0x188/0x5c8 [ext3]
 [<ffffffff810caa56>] ? filldir+0x0/0xc5
 [<ffffffff810c1f60>] ? cp_new_stat+0xe2/0xef
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff810caa56>] ? filldir+0x0/0xc5
 [<ffffffff810cac3d>] vfs_readdir+0x79/0xaf
 [<ffffffff810cadb2>] sys_getdents+0x7d/0xc4
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task kjournald:1219 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff8801f3546700     0  1219      2
 ffff880218dddca0 0000000000000046 0000000000000086 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff880218c5c4a0 ffff8801f3515b80
 ffff880218c5c7e8 00000003a01cca3c ffff88021aac8f30 ffff880218c5c7e8
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff8113cb81>] ? submit_bio+0xe0/0xe9
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffffa0024deb>] wait_on_buffer+0x41/0x45 [jbd]
 [<ffffffffa002540e>] journal_commit_transaction+0x55d/0xf2f [jbd]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81049a0d>] ? try_to_del_timer_sync+0x58/0x63
 [<ffffffffa0028ad8>] kjournald+0xe3/0x23a [jbd]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffffa00289f5>] ? kjournald+0x0/0x23a [jbd]
 [<ffffffff810534bf>] kthread+0x49/0x76
 [<ffffffff81011719>] child_rip+0xa/0x11
 [<ffffffff81010a37>] ? restore_args+0x0/0x30
 [<ffffffff81053476>] ? kthread+0x0/0x76
 [<ffffffff8101170f>] ? child_rip+0x0/0x11

INFO: task VirtualBox:28939 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
VirtualBox    D ffff8801f3546700     0 28939  23819
 ffff88000c7118f8 0000000000000086 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff8801eecc2dc0 ffff880001172dc0
 ffff8801eecc3108 00000000a01cca3c ffff88021aac8f30 ffff8801eecc3108
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffff810e154f>] wait_on_buffer+0x41/0x45
 [<ffffffff810e22fe>] __block_prepare_write+0x2ba/0x2fd
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108d627>] ? add_to_page_cache_locked+0x9a/0xae
 [<ffffffff810e24b6>] block_write_begin+0x86/0xd8
 [<ffffffffa00374c3>] ext3_write_begin+0xdf/0x1a7 [ext3]
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108e0fa>] generic_file_buffered_write+0x14b/0x643
 [<ffffffff810d623b>] ? mnt_drop_write+0x82/0x143
 [<ffffffff810d453d>] ? mnt_want_write+0x77/0x8d
 [<ffffffff8108e9e7>] __generic_file_aio_write_nolock+0x25e/0x292
 [<ffffffff8108f233>] generic_file_aio_write+0x67/0xc3
 [<ffffffffa003443f>] ext3_file_write+0x1e/0x9f [ext3]
 [<ffffffff810beb88>] do_sync_write+0xe7/0x12d
 [<ffffffffa031e8a2>] ? RTMemFree+0x1e/0x20 [vboxdrv]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff812c20fa>] ? _spin_lock+0x9/0xc
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81120e84>] ? security_file_permission+0x11/0x13
 [<ffffffff810bf444>] vfs_write+0xab/0x105
 [<ffffffff810bf562>] sys_write+0x47/0x6f
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task VirtualBox:28957 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
VirtualBox    D ffff8801fb89c000     0 28957  23819
 ffff88010a04b8f8 0000000000000086 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff8801fcd044a0 ffff880001172dc0
 ffff8801fcd047e8 00000000a01cca3c ffff88021aac8f30 ffff8801fcd047e8
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffff810e154f>] wait_on_buffer+0x41/0x45
 [<ffffffff810e22fe>] __block_prepare_write+0x2ba/0x2fd
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108d627>] ? add_to_page_cache_locked+0x9a/0xae
 [<ffffffff810e24b6>] block_write_begin+0x86/0xd8
 [<ffffffffa00374c3>] ext3_write_begin+0xdf/0x1a7 [ext3]
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108e0fa>] generic_file_buffered_write+0x14b/0x643
 [<ffffffff810d623b>] ? mnt_drop_write+0x82/0x143
 [<ffffffff810d453d>] ? mnt_want_write+0x77/0x8d
 [<ffffffff8108e9e7>] __generic_file_aio_write_nolock+0x25e/0x292
 [<ffffffff8108f233>] generic_file_aio_write+0x67/0xc3
 [<ffffffffa003443f>] ext3_file_write+0x1e/0x9f [ext3]
 [<ffffffff810beb88>] do_sync_write+0xe7/0x12d
 [<ffffffffa031e8a2>] ? RTMemFree+0x1e/0x20 [vboxdrv]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81120e84>] ? security_file_permission+0x11/0x13
 [<ffffffff810bf444>] vfs_write+0xab/0x105
 [<ffffffff810bf562>] sys_write+0x47/0x6f
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task VirtualBox:1290 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
VirtualBox    D ffff880195f2da00     0  1290  23819
 ffff88011af6d8f8 0000000000000086 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff880001012dc0 ffff88021a915b80
 ffff880001013108 00000003a01cca3c ffff88021aac8f30 ffff880001013108
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffff810e154f>] wait_on_buffer+0x41/0x45
 [<ffffffff810e22fe>] __block_prepare_write+0x2ba/0x2fd
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108d627>] ? add_to_page_cache_locked+0x9a/0xae
 [<ffffffff810e24b6>] block_write_begin+0x86/0xd8
 [<ffffffffa00374c3>] ext3_write_begin+0xdf/0x1a7 [ext3]
 [<ffffffffa0037e85>] ? ext3_get_block+0x0/0xfc [ext3]
 [<ffffffff8108e0fa>] generic_file_buffered_write+0x14b/0x643
 [<ffffffff810d623b>] ? mnt_drop_write+0x82/0x143
 [<ffffffff8108e9e7>] __generic_file_aio_write_nolock+0x25e/0x292
 [<ffffffff8108f233>] generic_file_aio_write+0x67/0xc3
 [<ffffffffa003443f>] ext3_file_write+0x1e/0x9f [ext3]
 [<ffffffff810beb88>] do_sync_write+0xe7/0x12d
 [<ffffffffa031e8a2>] ? RTMemFree+0x1e/0x20 [vboxdrv]
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81120e84>] ? security_file_permission+0x11/0x13
 [<ffffffff810bf444>] vfs_write+0xab/0x105
 [<ffffffff810bf562>] sys_write+0x47/0x6f
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

INFO: task fetchmail:1892 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fetchmail     D ffff880100775380     0  1892   1889
 ffff8800b56d99d8 0000000000000082 0000000000000092 ffff88021aac8000
 ffffffff8162a500 ffffffff8162a500 ffff8802118896e0 ffff8801e7885b80
 ffff880211889a28 00000003a01cca3c ffff88021aac8f30 ffff880211889a28
Call Trace:
 [<ffffffff8101686f>] ? read_tsc+0xe/0x24
 [<ffffffff810590b6>] ? getnstimeofday+0x54/0xb0
 [<ffffffff812c08a3>] io_schedule+0x63/0xa5
 [<ffffffff810e15e1>] sync_buffer+0x3b/0x3f
 [<ffffffff812c0de1>] __wait_on_bit+0x47/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e1c3e>] ? ll_rw_block+0x9c/0xe7
 [<ffffffff810e150a>] __wait_on_buffer+0x36/0x3a
 [<ffffffffa003b45f>] ext3_find_entry+0x406/0x5ee [ext3]
 [<ffffffff81099d90>] ? zone_statistics+0x62/0x67
 [<ffffffff81056abb>] ? up_read+0x9/0xb
 [<ffffffff812c4d7c>] ? do_page_fault+0x6eb/0xaed
 [<ffffffff81037140>] ? check_preempt_wakeup+0x179/0x186
 [<ffffffffa003be56>] ext3_lookup+0x2f/0xc5 [ext3]
 [<ffffffff810d06fd>] ? d_alloc+0x18e/0x19b
 [<ffffffff810c63de>] do_lookup+0xd3/0x15d
 [<ffffffff810c787d>] __link_path_walk+0x37a/0x754
 [<ffffffff810c8125>] path_walk+0x61/0xc4
 [<ffffffff810c838c>] do_path_lookup+0x165/0x1be
 [<ffffffff810c999e>] user_path_at+0x52/0x8c
 [<ffffffff81053829>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8103c847>] ? finish_task_switch+0x31/0xc9
 [<ffffffff810c2298>] vfs_stat_fd+0x21/0x4e
 [<ffffffff810c232e>] sys_newstat+0x22/0x3c
 [<ffffffff810c60bf>] ? path_put+0x1d/0x21
 [<ffffffff8107cf8d>] ? audit_syscall_entry+0x101/0x135
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

This started randomly, I have not changed anything.
Here's the output of uname -a if it helps:

Linux borg.loc 2.6.27.25-78.2.56.fc9.x86_64 #1 SMP Thu Jun 18 12:24:37 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

I hope I can find a solution to this, it's my main home server, I can't afford for this to go down.

Also not finding much on google other than it's not a distro specific issue as people with newer versions of Fedora, Ubuntu etc have this issue as well.

More info that may be useful:

OS drive is a single drive (can't recall brand)

Data drive is a MD raid 5 with 5 1TB Hitachi drives. When I recently added 2 new drives, I went through many RMAs until I finally got two drives that were not failing out of the box. Could these errors be disk related? I do have one drive that has lot CRC smart errors but it was like that from the start. I had given up with all the RMAs and said screw it. Should I replace this drive?

Last edited by Red Squirrel; 07-10-2011 at 06:42 PM.
 
Old 07-11-2011, 12:46 PM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
If you are getting accumulating CRC errors, you should replace the drive. Also, make sure you have TLER (time limited error recovery) enabled on the drives:
Code:
smartctl -l scterc /dev/sda
smartctl -l scterc /dev/sdb
...
If it's disabled, set it to 7 seconds:
Code:
smartctl -l scterc,70,70 /dev/sda
smartctl -l scterc,70,70 /dev/sdb
...
Some more information on TLER here.

Last edited by macemoneta; 07-11-2011 at 12:47 PM.
 
Old 07-11-2011, 04:37 PM   #3
Red Squirrel
Member
 
Registered: Dec 2003
Distribution: FC9 on main server
Posts: 753

Original Poster
Rep: Reputation: 36
I just get invalid argument when I try that command.

Code:
[root@borg ~]# smartctl -l scterc /dev/sdb
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=======> INVALID ARGUMENT TO -l: scterc
=======> VALID ARGUMENTS ARE: error, selftest, selective, directory, background, scttemp[sts|hist] <=======

Use smartctl -h to get a usage summary

[root@borg ~]#
I will order a new drive and replace the one with the errors. Maybe I'll be lucky and it will make the issues go away.


Another thing I get a lot is these errors:

CIFS VFS: Unexpected lookup error -112

Nothing at all on Google for that.
 
Old 07-11-2011, 06:15 PM   #4
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
smartctl 5.41 supports the option. The CIFS message is related to a timeout accessing data on an SMB/CIFS network share.
 
Old 07-11-2011, 09:46 PM   #5
Red Squirrel
Member
 
Registered: Dec 2003
Distribution: FC9 on main server
Posts: 753

Original Poster
Rep: Reputation: 36
Also, should I do a fsck on the file system? Could it maybe be a file system issue? It seems to be happening based on how much activity there is. With no VMs running I went a whole day, with VMs running, it started doing it within maybe 5 hours.
 
Old 07-12-2011, 04:01 AM   #6
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
It happens when trying to write to a physically bad sector. A spare has to be allocated and remapped by the drive.
 
Old 07-12-2011, 11:23 AM   #7
Red Squirrel
Member
 
Registered: Dec 2003
Distribution: FC9 on main server
Posts: 753

Original Poster
Rep: Reputation: 36
hmm so that would probably cause the raid to block for a bit which would then cause that right?


Now here's a weird one:

Code:
EXT3-fs error (device sdh1): ext3_find_entry: reading directory #2 offset 0
Buffer I/O error on device sdh1, logical block 0
lost page write due to I/O error on sdh1
EXT3-fs error (device sdh1): ext3_find_entry: reading directory #2 offset 0
------------[ cut here ]------------
WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x2f/0x87() (Tainted: G        W)
Hardware name:         
Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables nls_utf8 cifs fuse bridge stp bnep rfcomm l2cap bluetooth vboxnetadp vboxnetflt vboxdrv ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi coretemp hwmon sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_log dm_multipath scsi_dh dm_mod raid456 async_xor async_memcpy async_tx xor pata_marvell pata_acpi i915 pcspkr iTCO_wdt serio_raw iTCO_vendor_support sr_mod cdrom sg i2c_i801 ata_generic sata_sil24 drm i2c_algo_bit i2c_core e1000e ahci libata sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: kvm]
Pid: 15835, comm: updatedb Tainted: G        W 2.6.27.25-78.2.56.fc9.x86_64 #1

Call Trace:
 [<ffffffff8103ff10>] warn_on_slowpath+0x80/0xae
 [<ffffffff812bfe29>] ? printk+0x67/0x6e
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff81031103>] ? need_resched+0x1e/0x28
 [<ffffffff810e0e9f>] mark_buffer_dirty+0x2f/0x87
 [<ffffffffa003d609>] ext3_commit_super+0x50/0x66 [ext3]
 [<ffffffffa003ee46>] ext3_handle_error+0x86/0xad [ext3]
 [<ffffffffa003ef70>] ext3_error+0x9b/0xa9 [ext3]
 [<ffffffff812c0e04>] ? __wait_on_bit+0x6a/0x79
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff810e15a6>] ? sync_buffer+0x0/0x3f
 [<ffffffff812c0e7d>] ? out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff81053861>] ? wake_bit_function+0x0/0x2a
 [<ffffffff810e1c3e>] ? ll_rw_block+0x9c/0xe7
 [<ffffffffa003b493>] ext3_find_entry+0x43a/0x5ee [ext3]
 [<ffffffff810b1259>] ? alloc_pages_current+0xb9/0xc2
 [<ffffffff810b8d88>] ? __slab_alloc+0x73/0x3d6
 [<ffffffff810d0593>] ? d_alloc+0x24/0x19b
 [<ffffffff812c20fa>] ? _spin_lock+0x9/0xc
 [<ffffffff810ce989>] ? d_rehash+0x43/0x4a
 [<ffffffffa003be56>] ext3_lookup+0x2f/0xc5 [ext3]
 [<ffffffff810d06fd>] ? d_alloc+0x18e/0x19b
 [<ffffffff810c63de>] do_lookup+0xd3/0x15d
 [<ffffffff810c7b05>] __link_path_walk+0x602/0x754
 [<ffffffff810c8125>] path_walk+0x61/0xc4
 [<ffffffff810c838c>] do_path_lookup+0x165/0x1be
 [<ffffffff810c999e>] user_path_at+0x52/0x8c
 [<ffffffff810c817b>] ? path_walk+0xb7/0xc4
 [<ffffffff810c1f60>] ? cp_new_stat+0xe2/0xef
 [<ffffffff810c21bc>] vfs_lstat_fd+0x1e/0x4b
 [<ffffffff810c220b>] sys_newlstat+0x22/0x3c
 [<ffffffff810c60bf>] ? path_put+0x1d/0x21
 [<ffffffff8107cf8d>] ? audit_syscall_entry+0x101/0x135
 [<ffffffff8101027a>] system_call_fastpath+0x16/0x1b

---[ end trace 4eaa2a86a8e2da22 ]---
Buffer I/O error on device sdh1, logical block 0
lost page write due to I/O error on sdh1
The weird part is, I don't even have a /dev/sdh.


As for the bad sectors, is there a way I can do a check? Since the drives are in raid, I can't do fsck on them individually. (just says invalid file system) I want to confirm which drive is failing so I know which one to replace. I wish when drives fail they would just drop out, end of story, this crap is what makes it so irritating.
 
Old 07-12-2011, 02:39 PM   #8
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
Did you unplug a usb or similar device?
 
Old 07-12-2011, 04:51 PM   #9
Red Squirrel
Member
 
Registered: Dec 2003
Distribution: FC9 on main server
Posts: 753

Original Poster
Rep: Reputation: 36
I do have an esata drive I use for backup, but it would have been unplugged the night before as the backup completed. Would it be normal for the error to be delayed? I've also never seen big errors like that over a drive being unplugged.
 
Old 07-12-2011, 08:15 PM   #10
Red Squirrel
Member
 
Registered: Dec 2003
Distribution: FC9 on main server
Posts: 753

Original Poster
Rep: Reputation: 36
I ran a full fsck on the md device yesterday before bed as well, forgot to mention that. It found some errors which it corrected (I forget what, something about blocks incorrectly reported or something). Even with that error I posted, the VMs did not crap out. So MAYBE the fsck solved this issue, but still early to tell. I started up ALL my VMs (8 of them) so I'll see how it goes tonight when the backup jobs run. So far nothing funny in dmesg.

I will also be replacing that drive with the CRC errors regardless. Just ordered a 1TB WD black so it should be here by end of next week hopefully. I'm not buying Hitachi anymore, I've had too much bad luck with the 2010-2011 drives.
 
Old 07-18-2011, 06:37 AM   #11
__PJ
LQ Newbie
 
Registered: Dec 2009
Posts: 10

Rep: Reputation: 0
Just a FYI - this may not be a disk issue. I'm trying to resolve the same issue on Debian and there are a number of reports showing this as an issue with the CFS CPU scheduler. And the majority of them seem to involve virtualization of some sort.

If I find a solution I'll post back.
 
Old 04-13-2012, 08:56 PM   #12
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu 10.04, Debian Squeeze, Windoze 7
Posts: 512

Rep: Reputation: 34
Question

I see these error messages pop up everywhere, on my server at home, on customers systems, across distros. And everytime the system becomes extremely slow and soon goes beyond being usable and you cannot even do a shut down anymore. Load AVG climbs up to obscene heights, stuff like 20 is not unknown.

What is causing this? The tasks that get blocked are totally random, it could be anything.
Where is this error message from the kernel documented? How to avoid it? I also thought first that it was disk related, but this does not make sense, as the systems in question have all different disk and arrays, some raid, some single disks etc.. And sometimes even the kernel tasks themselves get blocked.
 
Old 06-23-2012, 05:02 PM   #13
linuxnomo
LQ Newbie
 
Registered: Mar 2009
Posts: 2

Rep: Reputation: 0
Hi,

anything new on the topic? I am also experiencing these issues with random tasks (see the quote). At first, I also assumed a disk error and even installed a new disk. Since my system contains only RAIDs, I then struggled to find out, which RAID element to replace since all disks reported healthy status after long smartctl checks. I did an "apt-get update" as I hoped this problem could have been solved by a patch. But it no more 36h later my system experienced the same error.

Happy for any updates.

Best regards,

Linuxnomo

------

For information: here following you see an extract of dmesg and then ps -eaf. As I had only one remote ssh running on the system when the system crashed, these two (dmesg, ps -eaf) where the only commands that I could run. the ^C I issued after the ps -eaf stopped working. I was not able to revitalize any connection to the system thereafter.

Quote:
[ 0.202014] pnp 00:05: Plug and Play ACPI device, IDs PNP0b00 (active)

[ 80.766148] lp0: using parport0 (interrupt-driven).
[ 80.809979] ppdev: user-space parallel port driver
[ 91.009689] colord[2840]: segfault at 8 ip 000000000040bc6d sp 00007fff23e57d
e0 error 4 in colord[400000+20000]
[ 91.797741] powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 380
0+ (2 cpu cores) (version 2.20.00)
[ 91.797759] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects foun
d.
[ 91.797761] [Firmware Bug]: powernow-k8: Try again with latest BIOS.
[37593.201644] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery d
irectory
[37593.223946] NFSD: starting 90-second grace period
[41760.576037] INFO: task kjournald:1435 blocked for more than 120 seconds.
[41760.576043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41760.576046] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[41760.576052] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[41760.576057] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[41760.576062] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[41760.576067] Call Trace:
[41760.576076] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41760.576082] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41760.576085] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[41760.576089] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41760.576093] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[41760.576097] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41760.576102] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41760.576124] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[41760.576128] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41760.576133] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[41760.576140] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[41760.576144] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41760.576151] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[41760.576155] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[41760.576160] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[41760.576164] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[41760.576168] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[41760.576212] INFO: task java:22429 blocked for more than 120 seconds.
[41760.576214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41760.576217] java D ffff88003fc13540 0 22429 1 0x00000000
[41760.576221] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[41760.576226] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[41760.576230] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[41760.576235] Call Trace:
[41760.576240] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[41760.576244] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41760.576247] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[41760.576251] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41760.576258] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[41760.576262] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[41760.576266] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41760.576270] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41760.576274] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[41760.576280] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[41760.576292] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[41760.576296] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[41760.576301] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[41760.576305] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41760.576309] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[41760.576314] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[41760.576317] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41760.576321] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[41760.576326] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[41760.576332] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[41760.576335] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[41760.576339] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[41760.576343] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[41880.576031] INFO: task kjournald:1435 blocked for more than 120 seconds.
[41880.576036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41880.576040] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[41880.576046] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[41880.576051] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[41880.576055] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[41880.576060] Call Trace:
[41880.576069] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41880.576075] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41880.576079] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[41880.576083] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41880.576087] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[41880.576090] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[41880.576096] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41880.576118] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[41880.576123] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41880.576127] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[41880.576135] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[41880.576139] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[41880.576146] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[41880.576150] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[41880.576155] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[41880.576159] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[41880.576163] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[41880.576207] INFO: task java:22429 blocked for more than 120 seconds.
[41880.576209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[41880.576212] java D ffff88003fc13540 0 22429 1 0x00000000
[41880.576217] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[41880.576221] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[41880.576226] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[41880.576230] Call Trace:
[41880.576236] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[41880.576239] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[41880.576243] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[41880.576247] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[41880.576254] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[41880.576258] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[41880.576262] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41880.576266] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[41880.576270] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[41880.576276] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[41880.576289] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[41880.576293] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[41880.576299] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[41880.576302] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41880.576306] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[41880.576311] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[41880.576314] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[41880.576319] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[41880.576324] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[41880.576329] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[41880.576333] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[41880.576336] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[41880.576340] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[42000.576029] INFO: task kjournald:1435 blocked for more than 120 seconds.
[42000.576035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42000.576038] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[42000.576044] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[42000.576049] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[42000.576054] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[42000.576058] Call Trace:
[42000.576068] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42000.576074] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42000.576077] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[42000.576081] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42000.576085] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[42000.576088] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42000.576094] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42000.576116] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[42000.576121] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42000.576126] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[42000.576133] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[42000.576137] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42000.576144] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[42000.576148] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[42000.576153] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[42000.576157] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[42000.576161] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[42000.576206] INFO: task java:22429 blocked for more than 120 seconds.
[42000.576209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42000.576211] java D ffff88003fc13540 0 22429 1 0x00000000
[42000.576216] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[42000.576221] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[42000.576225] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[42000.576229] Call Trace:
[42000.576235] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[42000.576238] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42000.576242] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[42000.576246] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42000.576253] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[42000.576258] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[42000.576261] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42000.576265] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42000.576269] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[42000.576276] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[42000.576289] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[42000.576294] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[42000.576299] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[42000.576303] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42000.576307] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[42000.576311] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[42000.576315] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42000.576319] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[42000.576324] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[42000.576330] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[42000.576334] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[42000.576337] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[42000.576341] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[42120.576043] INFO: task kjournald:1435 blocked for more than 120 seconds.
[42120.576049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42120.576052] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[42120.576058] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[42120.576064] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[42120.576068] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[42120.576073] Call Trace:
[42120.576082] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42120.576088] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42120.576091] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[42120.576095] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42120.576099] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[42120.576102] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42120.576108] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42120.576130] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[42120.576134] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42120.576139] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[42120.576147] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[42120.576151] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42120.576158] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[42120.576162] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[42120.576167] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[42120.576171] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[42120.576175] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[42120.576218] INFO: task java:22429 blocked for more than 120 seconds.
[42120.576220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42120.576223] java D ffff88003fc13540 0 22429 1 0x00000000
[42120.576227] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[42120.576232] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[42120.576236] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[42120.576241] Call Trace:
[42120.576246] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[42120.576250] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42120.576253] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[42120.576257] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42120.576265] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[42120.576269] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[42120.576273] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42120.576277] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42120.576281] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[42120.576287] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[42120.576300] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[42120.576304] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[42120.576309] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[42120.576313] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42120.576317] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[42120.576321] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[42120.576325] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42120.576329] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[42120.576334] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[42120.576339] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[42120.576342] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[42120.576346] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[42120.576350] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
[42240.576029] INFO: task kjournald:1435 blocked for more than 120 seconds.
[42240.576035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42240.576038] kjournald D ffff88003fc13540 0 1435 2 0x00000000
[42240.576044] ffff88003d1a4830 0000000000000046 ffff880000000000 ffffffff8160d
020
[42240.576050] 0000000000013540 ffff88003704bfd8 ffff88003704bfd8 ffff88003d1a4
830
[42240.576054] 0000000000000206 0000000181065a37 ffff880036f81bb0 ffff88003fc13
d90
[42240.576059] Call Trace:
[42240.576068] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42240.576074] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42240.576077] [<ffffffff8111bf5f>] ? sleep_on_buffer+0x6/0xa
[42240.576081] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42240.576085] [<ffffffff81349109>] ? out_of_line_wait_on_bit+0x6f/0x78
[42240.576089] [<ffffffff8111bf59>] ? wait_on_buffer+0x28/0x28
[42240.576094] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42240.576115] [<ffffffffa012409c>] ? journal_commit_transaction+0x4d3/0xda7 [j
bd]
[42240.576120] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42240.576124] [<ffffffff81051fbe>] ? lock_timer_base.isra.29+0x23/0x47
[42240.576131] [<ffffffffa012758b>] ? kjournald+0xe0/0x21e [jbd]
[42240.576135] [<ffffffff8105f553>] ? add_wait_queue+0x3c/0x3c
[42240.576142] [<ffffffffa01274ab>] ? commit_timeout+0x5/0x5 [jbd]
[42240.576146] [<ffffffff8105ef0d>] ? kthread+0x76/0x7e
[42240.576151] [<ffffffff81350df4>] ? kernel_thread_helper+0x4/0x10
[42240.576156] [<ffffffff8105ee97>] ? kthread_worker_fn+0x139/0x139
[42240.576159] [<ffffffff81350df0>] ? gs_change+0x13/0x13
[42240.576203] INFO: task java:22429 blocked for more than 120 seconds.
[42240.576205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[42240.576208] java D ffff88003fc13540 0 22429 1 0x00000000
[42240.576212] ffff88002152e300 0000000000000086 0000000000000000 ffffffff8160d
020
[42240.576217] 0000000000013540 ffff88000407ffd8 ffff88000407ffd8 ffff88002152e
300
[42240.576221] ffff88000407fa48 000000010407fa48 ffff88001bba76e0 ffff88003fc13
d90
[42240.576226] Call Trace:
[42240.576231] [<ffffffff810b3c9d>] ? lock_page+0x20/0x20
[42240.576235] [<ffffffff81348c24>] ? io_schedule+0x59/0x71
[42240.576239] [<ffffffff810b3ca3>] ? sleep_on_page+0x6/0xa
[42240.576242] [<ffffffff81349067>] ? __wait_on_bit+0x3e/0x71
[42240.576250] [<ffffffffa0123645>] ? do_get_write_access+0x32b/0x367 [jbd]
[42240.576254] [<ffffffff810b3ddf>] ? wait_on_page_bit+0x6e/0x73
[42240.576257] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42240.576261] [<ffffffff8105f57d>] ? autoremove_wake_function+0x2a/0x2a
[42240.576265] [<ffffffff810b4642>] ? grab_cache_page_write_begin+0x9c/0xae
[42240.576272] [<ffffffffa0123185>] ? journal_dirty_metadata+0xb6/0xc1 [jbd]
[42240.576284] [<ffffffffa013ae42>] ? ext3_write_begin+0xac/0x1e7 [ext3]
[42240.576289] [<ffffffff810b412d>] ? generic_file_buffered_write+0x10f/0x259
[42240.576294] [<ffffffff810b4fde>] ? __generic_file_aio_write+0x248/0x278
[42240.576298] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42240.576302] [<ffffffff810b506b>] ? generic_file_aio_write+0x5d/0xb5
[42240.576307] [<ffffffff810f8cf7>] ? do_sync_write+0xab/0xe3
[42240.576310] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[42240.576314] [<ffffffff81279b85>] ? sock_ioctl+0x205/0x212
[42240.576319] [<ffffffff811067bd>] ? do_vfs_ioctl+0x459/0x49a
[42240.576324] [<ffffffff811628a1>] ? security_file_permission+0x16/0x2d
[42240.576328] [<ffffffff810f93df>] ? vfs_write+0xa2/0xe9
[42240.576331] [<ffffffff810f95bc>] ? sys_write+0x45/0x6b
[42240.576335] [<ffffffff8134ec92>] ? system_call_fastpath+0x16/0x1b
debian-amd64:~# ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jun22 ? 00:00:01 init [2]
root 2 0 0 Jun22 ? 00:00:00 [kthreadd]
root 3 2 0 Jun22 ? 00:00:00 [ksoftirqd/0]
root 4 2 0 Jun22 ? 00:00:03 [kworker/0:0]
root 5 2 0 Jun22 ? 00:00:00 [kworker/u:0]
root 6 2 0 Jun22 ? 00:00:00 [migration/0]
root 7 2 0 Jun22 ? 00:00:00 [watchdog/0]
root 8 2 0 Jun22 ? 00:00:00 [migration/1]
root 10 2 0 Jun22 ? 00:00:01 [ksoftirqd/1]
root 12 2 0 Jun22 ? 00:00:00 [watchdog/1]
root 13 2 0 Jun22 ? 00:00:00 [cpuset]
root 14 2 0 Jun22 ? 00:00:00 [khelper]
root 15 2 0 Jun22 ? 00:00:00 [kdevtmpfs]
root 16 2 0 Jun22 ? 00:00:00 [netns]
root 17 2 0 Jun22 ? 00:00:00 [sync_supers]
root 18 2 0 Jun22 ? 00:00:00 [bdi-default]
root 19 2 0 Jun22 ? 00:00:00 [kintegrityd]
root 20 2 0 Jun22 ? 00:00:00 [kblockd]
root 23 2 0 Jun22 ? 00:00:00 [khungtaskd]
root 24 2 0 Jun22 ? 00:00:01 [kswapd0]
root 25 2 0 Jun22 ? 00:00:00 [ksmd]
root 26 2 0 Jun22 ? 00:00:00 [khugepaged]
root 27 2 0 Jun22 ? 00:00:00 [fsnotify_mark]
root 28 2 0 Jun22 ? 00:00:00 [crypto]
root 110 2 0 Jun22 ? 00:00:00 [khubd]
root 159 2 0 Jun22 ? 00:00:00 [ata_sff]
root 165 2 0 Jun22 ? 00:00:03 [kworker/0:2]
root 166 2 0 Jun22 ? 00:00:00 [scsi_eh_0]
root 169 2 0 Jun22 ? 00:00:00 [scsi_eh_1]
root 170 2 0 Jun22 ? 00:00:00 [scsi_eh_2]
root 173 2 0 Jun22 ? 00:00:00 [kworker/u:2]
root 179 2 0 Jun22 ? 00:00:00 [scsi_eh_3]
root 180 2 0 Jun22 ? 00:00:00 [scsi_eh_4]
root 181 2 0 Jun22 ? 00:00:00 [scsi_eh_5]
root 182 2 0 Jun22 ? 00:00:00 [scsi_eh_6]
root 264 2 0 Jun22 ? 00:00:00 [md]
root 274 2 0 Jun22 ? 00:00:08 [md2_raid5]
root 284 2 0 Jun22 ? 00:00:00 [md3_raid1]
root 296 2 0 Jun22 ? 00:00:12 [md1_raid5]
root 303 2 0 Jun22 ? 00:00:00 [md4_raid1]
root 321 2 0 Jun22 ? 00:00:01 [kjournald]
root 416 1 0 Jun22 ? 00:00:00 udevd --daemon
root 560 2 0 Jun22 ? 00:00:00 [edac-poller]
root 571 2 0 Jun22 ? 00:00:00 [kpsmoused]
root 582 2 0 Jun22 ? 00:00:06 [w1_bus_master1]
root 995 2 0 Jun22 ? 00:00:00 [flush-9:2]
root 1127 416 0 Jun22 ? 00:00:00 udevd --daemon
root 1434 2 0 Jun22 ? 00:00:00 [kjournald]
root 1435 2 0 Jun22 ? 00:00:01 [kjournald]
root 1438 2 0 Jun22 ? 00:00:00 [xfs_mru_cache]
root 1439 2 0 Jun22 ? 00:00:00 [xfslogd]
root 1440 2 0 Jun22 ? 00:00:00 [xfsdatad]
root 1441 2 0 Jun22 ? 00:00:00 [xfsconvertd]
root 1442 2 0 Jun22 ? 00:00:00 [xfsbufd/md4]
root 1443 2 0 Jun22 ? 00:00:09 [xfsaild/md4]
root 1721 1 0 Jun22 ? 00:00:00 /sbin/rpcbind -w
root 1753 2 0 Jun22 ? 00:00:00 [rpciod]
root 1756 2 0 Jun22 ? 00:00:00 [nfsiod]
root 1999 2 0 Jun22 ? 00:00:00 [lockd]
syslog 2096 1 0 Jun22 ? 00:00:00 /sbin/syslogd -u syslog
103 2128 1 0 Jun22 ? 00:00:00 /usr/bin/dbus-daemon --system
root 2153 1 0 Jun22 ? 00:00:00 /usr/sbin/NetworkManager
root 2178 1 0 Jun22 ? 00:00:00 /usr/sbin/acpid
root 2205 1 0 Jun22 ? 00:00:00 /usr/lib/policykit-1/polkitd --n
nobody 2225 1 0 Jun22 ? 00:00:00 /usr/sbin/openvpn --writepid /va
root 2254 1 0 Jun22 ? 00:00:00 /usr/sbin/modem-manager
bind 2258 1 0 Jun22 ? 00:00:01 /usr/sbin/named -u bind
root 2298 1 0 Jun22 ? 00:00:00 /usr/sbin/gdm3
root 2345 2298 0 Jun22 ? 00:00:00 /usr/lib/gdm3/gdm-simple-slave -
root 2359 2345 0 Jun22 tty7 00:00:04 /usr/bin/Xorg :0 -br -verbose -n
root 2361 1 0 Jun22 ? 00:00:02 /usr/sbin/apache2 -k start
daemon 2388 1 0 Jun22 ? 00:00:00 /usr/sbin/atd
www-data 2420 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2421 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2422 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2423 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 2424 2361 0 Jun22 ? 00:00:00 /usr/sbin/apache2 -k start
avahi 2426 1 0 Jun22 ? 00:00:00 avahi-daemon: running [debian-am
avahi 2427 2426 0 Jun22 ? 00:00:00 avahi-daemon: chroot helper
root 2482 1 0 Jun22 ? 00:00:00 /usr/sbin/bluetoothd
root 2489 416 0 Jun22 ? 00:00:00 udevd --daemon
root 2494 2 0 Jun22 ? 00:00:00 [krfcommd]
root 2647 1 0 Jun22 ? 00:00:00 /usr/sbin/console-kit-daemon --n
clamav 2714 1 0 Jun22 ? 00:00:21 /usr/bin/freshclam -d --quiet
121 2799 1 0 Jun22 ? 00:00:00 /usr/bin/dbus-daemon --fork --pr
121 2800 2345 0 Jun22 ? 00:00:00 /usr/bin/gnome-session --session
root 2806 1 0 Jun22 ? 00:00:00 /usr/sbin/cron
root 2838 1 0 Jun22 ? 00:00:00 /usr/sbin/cupsd -C /etc/cups/cup
121 2844 1 0 Jun22 ? 00:00:00 /usr/lib/x86_64-linux-gnu/gconf/
121 2847 2800 0 Jun22 ? 00:00:03 /usr/lib/gnome-settings-daemon/g
121 2849 2800 0 Jun22 ? 00:00:00 /usr/lib/at-spi/at-spi-registryd
colord 2851 1 0 Jun22 ? 00:00:00 /usr/lib/x86_64-linux-gnu/colord
root 2881 1 0 Jun22 ? 00:00:00 /usr/lib/upower/upowerd
121 2883 1 0 Jun22 ? 00:00:00 /usr/lib/bonobo-activation/bonob
root 2887 1 0 Jun22 ? 00:00:06 ddclient - sleeping for 100 seco
121 3093 1 0 Jun22 ? 00:00:00 /usr/lib/gvfs/gvfsd
dirmngr 3094 1 0 Jun22 ? 00:00:08 /usr/bin/dirmngr --daemon --sh
121 3142 1 0 Jun22 ? 00:00:00 /usr/bin/pulseaudio --start --lo
rtkit 3144 1 0 Jun22 ? 00:00:00 /usr/lib/rtkit/rtkit-daemon
root 3194 1 0 Jun22 ? 00:00:04 /usr/sbin/kerneloops
121 3201 2800 0 Jun22 ? 00:00:00 metacity
root 3210 1 0 Jun22 ? 00:00:00 /bin/dd bs=1 if=/proc/kmsg of=/v
klog 3211 1 0 Jun22 ? 00:00:00 /sbin/klogd -P /var/run/klogd/km
root 3328 1 0 Jun22 ? 00:00:00 /sbin/mdadm --monitor --pid-file
121 3354 2800 0 Jun22 ? 00:00:02 /usr/lib/gdm3/gdm-simple-greeter
121 3355 2800 0 Jun22 ? 00:00:00 /usr/lib/policykit-1-gnome/polki
root 3357 1 0 Jun22 ? 00:00:00 /usr/sbin/cnid_metad -l log_note
root 3361 2345 0 Jun22 ? 00:00:00 /usr/lib/gdm3/gdm-session-worker
root 3363 1 0 Jun22 ? 00:00:00 /usr/lib/accountsservice/account
root 3365 1 0 Jun22 ? 00:00:00 /usr/sbin/afpd -U uams_dhx2.so,u
ntp 3417 1 0 Jun22 ? 00:00:04 /usr/sbin/ntpd -p /var/run/ntpd.
root 3446 1 0 Jun22 ? 00:00:00 /usr/sbin/inetd
root 3597 1 0 Jun22 ? 00:00:00 /usr/sbin/nmbd -D
^C
 
Old 06-26-2012, 07:40 AM   #14
k0rdi
LQ Newbie
 
Registered: Jun 2011
Posts: 1

Rep: Reputation: Disabled
Hello,

I have similar problem with my xen guest

kernel version: 2.6.32-5-686-bigmem
linux debian 6.0

many entries

Quote:
[13196101.641021] INFO: task postgres:29772 blocked for more than 120 seconds.
[13196101.641027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13196101.641032] postgres D ca2c0f80 0 29772 30707 0x00000000
[13196101.641037] ecb43b80 00000286 c10b998c ca2c0f80 c10b93d0 c1441e20 c1441e20 d21a3f00
[13196101.641048] ecb43d3c c50bae20 00000000 2efb7d95 002ee1a9 80000000 c1004433 00000000
[13196101.641058] 53388067 ecb43d3c ecaeab00 d11a4000 53388067 80000001 c10a2875 00000000
[13196101.641069] Call Trace:
[13196101.641073] [<c10b998c>] ? nameidata_to_filp+0x29/0x3c
[13196101.641076] [<c10b93d0>] ? generic_file_open+0x0/0x39
[13196101.641080] [<c1004433>] ? pfn_to_mfn+0xe/0x14
[13196101.641083] [<c10a2875>] ? handle_mm_fault+0x40c/0xa22
[13196101.641087] [<c127f55f>] ? __mutex_lock_common+0xe8/0x13b
[13196101.641091] [<c127f5c1>] ? __mutex_lock_slowpath+0xf/0x11
[13196101.641095] [<c127f652>] ? mutex_lock+0x17/0x24
[13196101.641099] [<c127f652>] ? mutex_lock+0x17/0x24
[13196101.641102] [<c10b9640>] ? do_sys_open+0xc9/0xdf
[13196101.641106] [<c10bba21>] ? generic_file_llseek+0x17/0x42
[13196101.641110] [<c10bba0a>] ? generic_file_llseek+0x0/0x42
[13196101.641113] [<c10ba8e5>] ? vfs_llseek+0x30/0x34
[13196101.641117] [<c10bb91f>] ? sys_llseek+0x3a/0x7a
[13196101.641120] [<c10b969a>] ? sys_open+0x1e/0x23
[13196101.641123] [<c100813b>] ? sysenter_do_call+0x12/0x28
 
Old 07-10-2012, 02:51 AM   #15
grabblecat
LQ Newbie
 
Registered: Jul 2012
Posts: 1

Rep: Reputation: Disabled
I have the exact same issue. The load average hit 3000+ . I was wondering if anyone had nfs shares mounted? I've narrowed it down to either being a problem with the drives or a nfs issue although it could be something else.
This link seems promising as it explains everything rather nicely, however I still believe the problem is related to nfs

blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Fedora crashes a few seconds after startup f1r3fl3x Linux - Newbie 14 04-05-2010 09:05 AM
INFO: <application>blocked for more than 120 seconds alok.rhct Linux - Hardware 1 04-08-2009 01:35 AM
Thunderbird crashes after a few seconds julz_51 Linux - Software 5 11-18-2005 10:37 PM
Cron Job Syntax for a task in seconds AJones Linux - Software 2 07-22-2005 12:56 PM
rh 7.2 Gnome Crashes No task bar JMO8 Linux - Newbie 0 11-24-2003 08:57 PM


All times are GMT -5. The time now is 04:04 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration