LinuxQuestions.org - Kernel error

- Linux - Virtualization and Cloud (https://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/)

- - Kernel error (https://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/kernel-error-4175453538/)

Hello All,

I am running Virtual machine on xen. Because of some reason VM got crashed and i am not able to find enough information in logs.

This was the kernel error message:-

INFO: task mv:16138 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mv D ffff88002804cf00 0 16138 25947 0x00000080
ffff880357af5d18 0000000000000286 0000000000000000 ffffffff81126737
ffff8800a476a300 ffffffff81aae4c0 ffff8800a476a6d0 0000000103347a7e
0000000057af5d28 0000000000000000 0000000000000000 ffff8803d3da26a4
Call Trace:
[<ffffffff81126737>] ? __link_path_walk+0x10d/0x5fb
[<ffffffff81456ad0>] __mutex_lock_common+0x12f/0x1a1
[<ffffffff81123ed7>] ? path_put+0x22/0x27
[<ffffffff81456b91>] __mutex_lock_slowpath+0x19/0x1b
[<ffffffff81456bfa>] mutex_lock+0x23/0x3a
[<ffffffff8112456d>] lock_rename+0x49/0xad
[<ffffffff81126f07>] sys_renameat+0xbe/0x201
[<ffffffff810f216c>] ? handle_mm_fault+0x14b/0x80f
[<ffffffff81459ec0>] ? do_page_fault+0x28a/0x299
[<ffffffff81127065>] sys_rename+0x1b/0x1d
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
INFO: task mv:16139 blocked for more than 120 seconds.

Could any one please help me to debug it further?

Thank you
Vivek

Quote:

Originally Posted by mail2vivek1 (Post 4908738)

Hello All,
I am running Virtual machine on xen. Because of some reason VM got crashed and i am not able to find enough information in logs.

This was the kernel error message:- Could any one please help me to debug it further?

No, not with what you've posted, no one can. All you've told us it that it "got crashed", and it's a virtual machine on xen. Nothing about the hardware, version/distro of Linux, what runs on the machine, what was going on before it crashed, what logs you checked, or ANYTHING useful. Just looking up the error on Google brings up lots of possibilities, but since you've provided no details, we can't say which may apply to you.

Hi,

This VM is running on IBM blade server.
OS distro is :- Linux Server release 5.8
Kernel:- 2.6.32-300.39.2.el5uek

I checked /var/log/messages, sar log
and on hypervisor:- i checked /var/log/xen

=====================

root@abc:/root# sar -b -f /var/log/sa/sa05 -s 17:01:00
Linux 2.6.32-300.39.2.el5uek (abc.abc.com) 03/05/2013

05:10:01 PM tps rtps wtps bread/s bwrtn/s
05:20:01 PM 22.43 8.03 14.40 176.99 52.11
05:30:02 PM 41.99 19.58 22.41 409.49 94.87
05:40:01 PM 13.78 2.92 10.86 113.91 56.26
05:50:01 PM 9.08 1.86 7.22 76.51 30.28
06:00:01 PM 10.56 1.25 9.31 50.81 47.62
06:10:02 PM 27.00 3.08 23.92 111.46 90.71
06:20:02 PM 36.40 14.88 21.52 403.55 80.54
06:30:01 PM 40.45 28.29 12.15 664.30 56.81
06:40:02 PM 17.88 3.79 14.09 169.26 63.74
06:50:01 PM 8.06 3.17 4.89 111.35 26.71
Average: 22.94 8.74 14.19 229.72 60.30

07:20:17 PM LINUX RESTART

07:31:10 PM LINUX RESTART

07:40:01 PM tps rtps wtps bread/s bwrtn/s
07:50:01 PM 3.36 0.50 2.86 24.18 12.94
08:00:01 PM 5.20 2.16 3.04 80.10 14.61
08:10:01 PM 6.38 2.06 4.32 109.85 20.26
08:20:01 PM 49.15 9.61 39.53 219.40 124.70

==================

root@abc:/root# sar -B -f /var/log/sa/sa05 -s 17:01:00
Linux 2.6.32-300.39.2.el5uek (ab.cabc.com) 03/05/2013

05:10:01 PM pgpgin/s pgpgout/s fault/s majflt/s
05:20:01 PM 44.25 13.02 3931.86 0.05
05:30:02 PM 102.40 23.72 33503.86 0.64
05:40:01 PM 28.47 14.06 11102.83 0.72
05:50:01 PM 19.13 7.57 6025.53 0.71
06:00:01 PM 12.68 11.90 17439.82 0.37
06:10:02 PM 27.89 22.68 21842.83 0.81
06:20:02 PM 100.98 20.14 9606.81 1.30
06:30:01 PM 166.01 14.21 4571.78 1.35
06:40:02 PM 42.38 15.93 5695.02 1.26
06:50:01 PM 27.91 6.67 1255.84 0.52
Average: 57.45 15.07 11504.82 0.77

07:20:17 PM LINUX RESTART

07:31:10 PM LINUX RESTART

07:40:01 PM pgpgin/s pgpgout/s fault/s majflt/s
07:50:01 PM 6.03 3.23 413.07 0.05
08:00:01 PM 20.02 3.66 491.01 0.13
08:10:01 PM 27.46 5.06 1157.37 0.24
08:20:01 PM 54.93 31.18 44016.30 0.33
08:30:01 PM 17.55 12.78 13897.98 0.80
08:40:02 PM 149.59 20.72 6596.87 0.73
08:50:01 PM 19.33 13.25 5152.19 0.79
09:00:02 PM 26.88 12.73 7836.33 0.92

=================

/var/log/message

Mar 5 18:13:54 abc tdodbc[10129]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:15:48 abc tdodbc[18624]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:16:07 abc tdodbc[18624]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:22:13 abc tdodbc[29106]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:27:29 abc tdodbc[2933]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:27:45 abc ntpd[2053]: synchronized to 17.254.0.27, stratum 2
Mar 5 18:28:22 abc ntpd[2053]: synchronized to 17.254.0.28, stratum 2
Mar 5 18:30:42 abc tdodbc[10719]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:31:37 abc ntpd[2053]: synchronized to 17.254.0.27, stratum 2
Mar 5 18:31:43 abc ntpd[2053]: synchronized to 17.254.0.28, stratum 2
Mar 5 18:31:45 abc tdodbc[12859]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:32:50 abc tdodbc[14780]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 18:33:06 abc auditd[1738]: Audit daemon rotating log files
Mar 5 18:36:40 abc tdodbc[19395]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed
Mar 5 19:20:20 abc syslogd 1.4.1: restart.
Mar 5 19:20:20 abc kernel: klogd 1.4.1, log source = /proc/kmsg started.

Please guide me how to proceed from here

Quote:

Originally Posted by mail2vivek1 (Post 4908828)

Hi,
This VM is running on IBM blade server. OS distro is :- Linux Server release 5.8 Kernel:- 2.6.32-300.39.2.el5uek

I checked /var/log/messages, sar log and on hypervisor:- i checked /var/log/xen
07:20:17 PM LINUX RESTART
07:31:10 PM LINUX RESTART

/var/log/message
Mar 5 18:13:54 abc tdodbc[10129]: codbctrace.cpp EnableTraceingForDSN 7 : Could not use /bcd/fgh/gsd/Informatica/DSN.Trace.log for tracing, size exceeds maximum allowed size of 1000000 and overwrite not allowed

Please guide me how to proceed from here

And you still haven't said what runs on this machine, what it was doing before the crash, etc. And did you look at what you posted?? It's telling you that the server was RESTARTED...that means, someone rebooted it. And based on the trace log message, you have a database of some sort running...have you check the parameters of that database? Have you tried to contact Red Hat support? They can often assist you with crashes like this, by analyzing dump/trace files that they have you generate. Since you're paying for RHEL, you're paying for support...that would be my first call.