LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-17-2011, 10:24 AM   #1
sh4ka
LQ Newbie
 
Registered: Apr 2011
Posts: 13

Rep: Reputation: 0
Server crash - kernel: INFO: task blocked


Hello,

Server was working fine, more than 60 days of uptime, low traffic, low load average above 1 and suddendly it was down. It is a CentOS 64 bit 5.x, kernel 2.6.18-274.3.1.el5.

Same happened to another server using the same kernel, low resource usage and traffic, maybe a kernel bug? Any ideas?

This is what I found from the logs:

Code:
Oct 17 09:20:46 srv1 kernel: INFO: task sendmail:16864 blocked for more than 120 seconds.
Oct 17 09:20:46 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:20:46 srv1 kernel: sendmail      D ffffffff801546d1     0 16864  16845               16849 (NOTLB)
Oct 17 09:20:46 srv1 kernel:  ffff8102c01ffc58 0000000000000086 ffff81023b071800 0000000000000000
Oct 17 09:20:46 srv1 kernel:  ffff8102b1e89cd0 0000000000000001 ffff81032a9947a0 ffff8100470dd100
Oct 17 09:20:46 srv1 kernel:  000a149f4ea78a80 0000000000018579 ffff81032a994988 0000000100021c10
Oct 17 09:20:46 srv1 kernel: Call Trace:
Oct 17 09:26:11 srv1 kernel:  [<ffffffff800ceeb4>] zone_statistics+0x3e/0x6d
Oct 17 09:26:11 srv1 kernel:  [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8000985a>] __d_lookup+0xb0/0xff
Oct 17 09:26:11 srv1 kernel:  [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8000d0ac>] do_lookup+0x90/0x1e6
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8000a2e3>] __link_path_walk+0xa3a/0xfd1
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8000eb88>] link_path_walk+0x45/0xb8
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8000ce9c>] do_path_lookup+0x294/0x310
Oct 17 09:26:11 srv1 kernel:  [<ffffffff80023959>] __path_lookup_intent_open+0x56/0x97
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8001b1d6>] open_namei+0x73/0x718
Oct 17 09:26:11 srv1 kernel:  [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8002771b>] do_filp_open+0x1c/0x38
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8001a089>] do_sys_open+0x44/0xbe
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:26:11 srv1 kernel: 
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=537 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=9 
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=538 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=10 
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=545 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=11 
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=552 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=12 
Oct 17 09:26:11 srv1 kernel: INFO: task sendmail:16864 blocked for more than 120 seconds.
Oct 17 09:26:11 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:26:11 srv1 kernel: sendmail      D ffffffff801546d1     0 16864  16845               16849 (NOTLB)
Oct 17 09:26:11 srv1 kernel:  ffff8102c01ffc58 0000000000000086 ffff81023b071800 0000000000000000
Oct 17 09:26:11 srv1 kernel:  ffff8102b1e89cd0 0000000000000001 ffff81032a9947a0 ffff8100470dd100
Oct 17 09:26:11 srv1 kernel:  000a149f4ea78a80 0000000000018579 ffff81032a994988 0000000100021c10
Oct 17 09:26:11 srv1 kernel: Call Trace:
Oct 17 09:26:11 srv1 kernel:  [<ffffffff800ceeb4>] zone_statistics+0x3e/0x6d
Oct 17 09:26:11 srv1 kernel:  [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:26:11 srv1 kernel:  [<ffffffff8000985a>] __d_lookup+0xb0/0xff
Oct 17 09:27:12 srv1 kernel:  [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8000d0ac>] do_lookup+0x90/0x1e6
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8000a2e3>] __link_path_walk+0xa3a/0xfd1
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8000eb88>] link_path_walk+0x45/0xb8
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8000ce9c>] do_path_lookup+0x294/0x310
Oct 17 09:27:12 srv1 kernel:  [<ffffffff80023959>] __path_lookup_intent_open+0x56/0x97
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8001b1d6>] open_namei+0x73/0x718
Oct 17 09:27:12 srv1 kernel:  [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8002771b>] do_filp_open+0x1c/0x38
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8001a089>] do_sys_open+0x44/0xbe
Oct 17 09:27:12 srv1 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:27:12 srv1 kernel: 
Oct 17 09:27:12 srv1 kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=46.133.148.137 DST=96.127.155.235 LEN=
52 TOS=0x00 PREC=0x00 TTL=117 ID=10773 DF PROTO=TCP SPT=62646 DPT=5900 WINDOW=8192 RES=0x00 SYN URGP=0 
Oct 17 09:27:12 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.149.206 DST=MY.SERVER.IP LEN
=60 TOS=0x00 PREC=0x00 TTL=110 ID=1222 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=13 
Oct 17 09:27:12 srv1 kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=46.133.148.137 DST=96.127.155.235 LEN=
52 TOS=0x00 PREC=0x00 TTL=117 ID=10909 DF PROTO=TCP SPT=62646 DPT=5900 WINDOW=8192 RES=0x00 SYN URGP=0 
Oct 17 09:27:12 srv1 kernel: INFO: task httpd:14150 blocked for more than 120 seconds.
Oct 17 09:27:12 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:27:13 srv1 kernel: httpd         D ffffffff801546d1     0 14150  10748         14164 14134 (NOTLB)
Oct 17 09:27:13 srv1 kernel:  ffff8101e28a7d28 0000000000000082 ffff81015974a228 ffff81023b8e3a78
Oct 17 09:27:13 srv1 kernel:  0000000000000000 0000000000000001 ffff81015974a040 ffff81023fb387a0
Oct 17 09:27:13 srv1 kernel:  000a14e49ff198b9 0000000000006b16 ffff81015974a228 0000000100000026
Oct 17 09:27:13 srv1 kernel: Call Trace:
Oct 17 09:27:13 srv1 kernel:  [<ffffffff800c0558>] delayacct_end+0x5d/0x86
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:27:13 srv1 kernel:  [<ffffffff800219a8>] generic_file_aio_write+0x50/0xc3
Oct 17 09:27:13 srv1 kernel:  [<ffffffff8804c1c2>] :ext3:ext3_file_write+0x16/0x91
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80018415>] do_sync_write+0xc7/0x104
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:27:13 srv1 kernel:  [<ffffffff800a2e5d>] autoremove_wake_function+0x0/0x2e
Oct 17 09:27:13 srv1 kernel:  [<ffffffff800339e8>] do_setitimer+0x5aa/0x67b
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80062ff2>] thread_return+0x62/0xfe
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80039e3b>] fcntl_setlk+0x243/0x273
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80028f74>] do_sigaction+0x76/0x199
Oct 17 09:27:13 srv1 kernel:  [<ffffffff80016b92>] vfs_write+0xce/0x174
Oct 17 09:27:13 srv1 kernel:  [<ffffffff8001745b>] sys_write+0x45/0x6e
Oct 17 09:27:13 srv1 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:27:13 srv1 kernel: 
Oct 17 09:27:13 srv1 kernel: INFO: task httpd:15859 blocked for more than 120 seconds.
Oct 17 09:27:13 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:27:13 srv1 kernel: httpd         D ffffffff801546d1     0 15859  10748         15860 15858 (NOTLB)
Oct 17 09:27:13 srv1 kernel:  ffff81009c6d1d28 0000000000000086 ffff810326100cf0 ffff81009c6d1e20
Oct 17 09:27:13 srv1 kernel:  0000000226100cf0 0000000000000001 ffff81009c62a100 ffff81014d57f040
Oct 17 09:27:13 srv1 kernel:  000a14f1570a9d2c 000000000001882b ffff81009c62a2e8 000000058002b59d
Oct 17 09:27:13 srv1 kernel: Call Trace:
Oct 17 09:27:13 srv1 kernel:  [<ffffffff800112b0>] do_wp_page+0x3f4/0x911
Oct 17 09:30:36 srv1 kernel:  [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:30:36 srv1 kernel:  [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:30:36 srv1 kernel:  [<ffffffff800219a8>] generic_file_aio_write+0x50/0xc3
Oct 17 09:30:36 srv1 kernel:  [<ffffffff8804c1c2>] :ext3:ext3_file_write+0x16/0x91
Oct 17 09:30:36 srv1 kernel:  [<ffffffff80018415>] do_sync_write+0xc7/0x104
Oct 17 09:30:36 srv1 kernel:  [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:30:36 srv1 kernel:  [<ffffffff800a2e5d>] autoremove_wake_function+0x0/0x2e
Oct 17 09:30:36 srv1 kernel:  [<ffffffff800339e8>] do_setitimer+0x5aa/0x67b
Oct 17 09:30:36 srv1 kernel:  [<ffffffff80028f74>] do_sigaction+0x76/0x199
Oct 17 09:30:59 srv1 kernel:  [<ffffffff80016b92>] vfs_write+0xce/0x174
Oct 17 09:30:59 srv1 kernel:  [<ffffffff8001745b>] sys_write+0x45/0x6e
Oct 17 09:30:59 srv1 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:30:59 srv1 kernel: 
Oct 17 09:30:59 srv1 kernel: INFO: task httpd:16033 blocked for more than 120 seconds.
Oct 17 09:30:59 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:30:59 srv1 kernel: httpd         D ffffffff801546d1     0 16033  10748         16034 16032 (NOTLB)
Oct 17 09:30:59 srv1 kernel:  ffff8100367cdd28 0000000000000086 ffff81003a4662a8 00000010000201d2Oct 17 09:58:00 srv1 syslogd 1.4.1: restart.
Oct 17 09:58:00 srv1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 17 09:58:00 srv1 kernel: Linux version 2.6.18-274.3.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Tue Sep
 6 20:13:52 EDT 2011
Oct 17 09:58:00 srv1 kernel: Command line: ro root=/dev/sda3
Oct 17 09:58:00 srv1 kernel: BIOS-provided physical RAM map:
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 0000000000010000 - 000000000009dc00 (usable)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 0000000000100000 - 00000000bf790000 (usable)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000bf790000 - 00000000bf79e000 (ACPI data)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
Oct 17 09:58:00 srv1 kernel:  BIOS-e820: 0000000100000000 - 0000000340000000 (usable)
Thanks.
 
Old 10-18-2011, 05:02 AM   #2
shridhar005
Member
 
Registered: Jul 2008
Posts: 90

Rep: Reputation: 17
Hi there!!
Did you updated the server?

1. Is that machine able to boot in the runlevel one or single user mode?
If yes then try to get the info from the /var/log/secure /var/log/httpd/error & same for the sendmail.
If not then mount it in readonly mode with the rescue cd

2. Try to run fsck on the partitions (For this you must mount those partitions in the readonly mode).

By the way load average above one for the single processor is not good sign it signifies that overusage of the processor.
And if the server is multiprocessor then it is OK.
 
Old 10-19-2011, 09:26 AM   #3
sh4ka
LQ Newbie
 
Registered: Apr 2011
Posts: 13

Original Poster
Rep: Reputation: 0
Hi,

No, haven't updated the server yet.

Already loook into those logs and the only thing I found was posted here.

Do you think this could be a disk failure?

Thanks a lot-
 
Old 10-20-2011, 12:44 AM   #4
shridhar005
Member
 
Registered: Jul 2008
Posts: 90

Rep: Reputation: 17
Try to update the kernel to the latest one.
If possible update the total system.

This should solve the problem.

This is not related to the hard drive at all. I believe, it is related to the kernel version.
I, myself faced this problem.
Bet of luck
 
1 members found this post helpful.
Old 10-21-2011, 05:16 AM   #5
sh4ka
LQ Newbie
 
Registered: Apr 2011
Posts: 13

Original Poster
Rep: Reputation: 0
Thx dude, will try that! =)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Task blocked for more than 120 seconds errors and crashes Red Squirrel Linux - Server 17 04-01-2014 01:12 PM
System info like you get from Window's Task Manager anon091 Linux - Newbie 6 09-29-2009 03:55 PM
INFO: <application>blocked for more than 120 seconds alok.rhct Linux - Hardware 1 04-08-2009 01:35 AM
On Kernel Panic system hangs, how to get crash info? verixnbi Red Hat 1 03-07-2009 11:25 PM
Task manager equivalent, System info etc... debnewb Debian 5 02-10-2005 02:21 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 08:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration