Hello,
Server was working fine, more than 60 days of uptime, low traffic, low load average above 1 and suddendly it was down. It is a CentOS 64 bit 5.x, kernel 2.6.18-274.3.1.el5.
Same happened to another server using the same kernel, low resource usage and traffic, maybe a kernel bug? Any ideas?
This is what I found from the logs:
Code:
Oct 17 09:20:46 srv1 kernel: INFO: task sendmail:16864 blocked for more than 120 seconds.
Oct 17 09:20:46 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:20:46 srv1 kernel: sendmail D ffffffff801546d1 0 16864 16845 16849 (NOTLB)
Oct 17 09:20:46 srv1 kernel: ffff8102c01ffc58 0000000000000086 ffff81023b071800 0000000000000000
Oct 17 09:20:46 srv1 kernel: ffff8102b1e89cd0 0000000000000001 ffff81032a9947a0 ffff8100470dd100
Oct 17 09:20:46 srv1 kernel: 000a149f4ea78a80 0000000000018579 ffff81032a994988 0000000100021c10
Oct 17 09:20:46 srv1 kernel: Call Trace:
Oct 17 09:26:11 srv1 kernel: [<ffffffff800ceeb4>] zone_statistics+0x3e/0x6d
Oct 17 09:26:11 srv1 kernel: [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:26:11 srv1 kernel: [<ffffffff8000985a>] __d_lookup+0xb0/0xff
Oct 17 09:26:11 srv1 kernel: [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:26:11 srv1 kernel: [<ffffffff8000d0ac>] do_lookup+0x90/0x1e6
Oct 17 09:26:11 srv1 kernel: [<ffffffff8000a2e3>] __link_path_walk+0xa3a/0xfd1
Oct 17 09:26:11 srv1 kernel: [<ffffffff8000eb88>] link_path_walk+0x45/0xb8
Oct 17 09:26:11 srv1 kernel: [<ffffffff8000ce9c>] do_path_lookup+0x294/0x310
Oct 17 09:26:11 srv1 kernel: [<ffffffff80023959>] __path_lookup_intent_open+0x56/0x97
Oct 17 09:26:11 srv1 kernel: [<ffffffff8001b1d6>] open_namei+0x73/0x718
Oct 17 09:26:11 srv1 kernel: [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:26:11 srv1 kernel: [<ffffffff8002771b>] do_filp_open+0x1c/0x38
Oct 17 09:26:11 srv1 kernel: [<ffffffff8001a089>] do_sys_open+0x44/0xbe
Oct 17 09:26:11 srv1 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:26:11 srv1 kernel:
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=537 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=9
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=538 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=10
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=545 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=11
Oct 17 09:26:11 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.154.32 DST=MY.SERVER.IP LEN=
60 TOS=0x00 PREC=0x00 TTL=110 ID=552 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=12
Oct 17 09:26:11 srv1 kernel: INFO: task sendmail:16864 blocked for more than 120 seconds.
Oct 17 09:26:11 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:26:11 srv1 kernel: sendmail D ffffffff801546d1 0 16864 16845 16849 (NOTLB)
Oct 17 09:26:11 srv1 kernel: ffff8102c01ffc58 0000000000000086 ffff81023b071800 0000000000000000
Oct 17 09:26:11 srv1 kernel: ffff8102b1e89cd0 0000000000000001 ffff81032a9947a0 ffff8100470dd100
Oct 17 09:26:11 srv1 kernel: 000a149f4ea78a80 0000000000018579 ffff81032a994988 0000000100021c10
Oct 17 09:26:11 srv1 kernel: Call Trace:
Oct 17 09:26:11 srv1 kernel: [<ffffffff800ceeb4>] zone_statistics+0x3e/0x6d
Oct 17 09:26:11 srv1 kernel: [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:26:11 srv1 kernel: [<ffffffff8000985a>] __d_lookup+0xb0/0xff
Oct 17 09:27:12 srv1 kernel: [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:27:12 srv1 kernel: [<ffffffff8000d0ac>] do_lookup+0x90/0x1e6
Oct 17 09:27:12 srv1 kernel: [<ffffffff8000a2e3>] __link_path_walk+0xa3a/0xfd1
Oct 17 09:27:12 srv1 kernel: [<ffffffff8000eb88>] link_path_walk+0x45/0xb8
Oct 17 09:27:12 srv1 kernel: [<ffffffff8000ce9c>] do_path_lookup+0x294/0x310
Oct 17 09:27:12 srv1 kernel: [<ffffffff80023959>] __path_lookup_intent_open+0x56/0x97
Oct 17 09:27:12 srv1 kernel: [<ffffffff8001b1d6>] open_namei+0x73/0x718
Oct 17 09:27:12 srv1 kernel: [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:27:12 srv1 kernel: [<ffffffff8002771b>] do_filp_open+0x1c/0x38
Oct 17 09:27:12 srv1 kernel: [<ffffffff8001a089>] do_sys_open+0x44/0xbe
Oct 17 09:27:12 srv1 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:27:12 srv1 kernel:
Oct 17 09:27:12 srv1 kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=46.133.148.137 DST=96.127.155.235 LEN=
52 TOS=0x00 PREC=0x00 TTL=117 ID=10773 DF PROTO=TCP SPT=62646 DPT=5900 WINDOW=8192 RES=0x00 SYN URGP=0
Oct 17 09:27:12 srv1 kernel: Firewall: *ICMP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=87.223.149.206 DST=MY.SERVER.IP LEN
=60 TOS=0x00 PREC=0x00 TTL=110 ID=1222 PROTO=ICMP TYPE=8 CODE=0 ID=256 SEQ=13
Oct 17 09:27:12 srv1 kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:25:90:08:f8:24:00:0f:23:4f:f5:00:08:00 SRC=46.133.148.137 DST=96.127.155.235 LEN=
52 TOS=0x00 PREC=0x00 TTL=117 ID=10909 DF PROTO=TCP SPT=62646 DPT=5900 WINDOW=8192 RES=0x00 SYN URGP=0
Oct 17 09:27:12 srv1 kernel: INFO: task httpd:14150 blocked for more than 120 seconds.
Oct 17 09:27:12 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:27:13 srv1 kernel: httpd D ffffffff801546d1 0 14150 10748 14164 14134 (NOTLB)
Oct 17 09:27:13 srv1 kernel: ffff8101e28a7d28 0000000000000082 ffff81015974a228 ffff81023b8e3a78
Oct 17 09:27:13 srv1 kernel: 0000000000000000 0000000000000001 ffff81015974a040 ffff81023fb387a0
Oct 17 09:27:13 srv1 kernel: 000a14e49ff198b9 0000000000006b16 ffff81015974a228 0000000100000026
Oct 17 09:27:13 srv1 kernel: Call Trace:
Oct 17 09:27:13 srv1 kernel: [<ffffffff800c0558>] delayacct_end+0x5d/0x86
Oct 17 09:27:13 srv1 kernel: [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:27:13 srv1 kernel: [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:27:13 srv1 kernel: [<ffffffff800219a8>] generic_file_aio_write+0x50/0xc3
Oct 17 09:27:13 srv1 kernel: [<ffffffff8804c1c2>] :ext3:ext3_file_write+0x16/0x91
Oct 17 09:27:13 srv1 kernel: [<ffffffff80018415>] do_sync_write+0xc7/0x104
Oct 17 09:27:13 srv1 kernel: [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:27:13 srv1 kernel: [<ffffffff800a2e5d>] autoremove_wake_function+0x0/0x2e
Oct 17 09:27:13 srv1 kernel: [<ffffffff800339e8>] do_setitimer+0x5aa/0x67b
Oct 17 09:27:13 srv1 kernel: [<ffffffff80062ff2>] thread_return+0x62/0xfe
Oct 17 09:27:13 srv1 kernel: [<ffffffff80039e3b>] fcntl_setlk+0x243/0x273
Oct 17 09:27:13 srv1 kernel: [<ffffffff80028f74>] do_sigaction+0x76/0x199
Oct 17 09:27:13 srv1 kernel: [<ffffffff80016b92>] vfs_write+0xce/0x174
Oct 17 09:27:13 srv1 kernel: [<ffffffff8001745b>] sys_write+0x45/0x6e
Oct 17 09:27:13 srv1 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:27:13 srv1 kernel:
Oct 17 09:27:13 srv1 kernel: INFO: task httpd:15859 blocked for more than 120 seconds.
Oct 17 09:27:13 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:27:13 srv1 kernel: httpd D ffffffff801546d1 0 15859 10748 15860 15858 (NOTLB)
Oct 17 09:27:13 srv1 kernel: ffff81009c6d1d28 0000000000000086 ffff810326100cf0 ffff81009c6d1e20
Oct 17 09:27:13 srv1 kernel: 0000000226100cf0 0000000000000001 ffff81009c62a100 ffff81014d57f040
Oct 17 09:27:13 srv1 kernel: 000a14f1570a9d2c 000000000001882b ffff81009c62a2e8 000000058002b59d
Oct 17 09:27:13 srv1 kernel: Call Trace:
Oct 17 09:27:13 srv1 kernel: [<ffffffff800112b0>] do_wp_page+0x3f4/0x911
Oct 17 09:30:36 srv1 kernel: [<ffffffff80063c53>] __mutex_lock_slowpath+0x60/0x9b
Oct 17 09:30:36 srv1 kernel: [<ffffffff80063c9d>] .text.lock.mutex+0xf/0x14
Oct 17 09:30:36 srv1 kernel: [<ffffffff800219a8>] generic_file_aio_write+0x50/0xc3
Oct 17 09:30:36 srv1 kernel: [<ffffffff8804c1c2>] :ext3:ext3_file_write+0x16/0x91
Oct 17 09:30:36 srv1 kernel: [<ffffffff80018415>] do_sync_write+0xc7/0x104
Oct 17 09:30:36 srv1 kernel: [<ffffffff80067225>] do_page_fault+0x4cc/0x842
Oct 17 09:30:36 srv1 kernel: [<ffffffff800a2e5d>] autoremove_wake_function+0x0/0x2e
Oct 17 09:30:36 srv1 kernel: [<ffffffff800339e8>] do_setitimer+0x5aa/0x67b
Oct 17 09:30:36 srv1 kernel: [<ffffffff80028f74>] do_sigaction+0x76/0x199
Oct 17 09:30:59 srv1 kernel: [<ffffffff80016b92>] vfs_write+0xce/0x174
Oct 17 09:30:59 srv1 kernel: [<ffffffff8001745b>] sys_write+0x45/0x6e
Oct 17 09:30:59 srv1 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 17 09:30:59 srv1 kernel:
Oct 17 09:30:59 srv1 kernel: INFO: task httpd:16033 blocked for more than 120 seconds.
Oct 17 09:30:59 srv1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 09:30:59 srv1 kernel: httpd D ffffffff801546d1 0 16033 10748 16034 16032 (NOTLB)
Oct 17 09:30:59 srv1 kernel: ffff8100367cdd28 0000000000000086 ffff81003a4662a8 00000010000201d2Oct 17 09:58:00 srv1 syslogd 1.4.1: restart.
Oct 17 09:58:00 srv1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 17 09:58:00 srv1 kernel: Linux version 2.6.18-274.3.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Tue Sep
6 20:13:52 EDT 2011
Oct 17 09:58:00 srv1 kernel: Command line: ro root=/dev/sda3
Oct 17 09:58:00 srv1 kernel: BIOS-provided physical RAM map:
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 0000000000010000 - 000000000009dc00 (usable)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 0000000000100000 - 00000000bf790000 (usable)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000bf790000 - 00000000bf79e000 (ACPI data)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
Oct 17 09:58:00 srv1 kernel: BIOS-e820: 0000000100000000 - 0000000340000000 (usable)
Thanks.