Logging a Kernel Panic Event - Problem writing the log in panic situation

lucasct · 08-09-2011, 11:47 AM

Hi, I have some units installed remotely and I want to be prevented to be informed if they are doing the auto reboot because a kernel panic (It have happened before)

I have make a Kernel module which writes KERNEL PANIC in a file of the filesystem. It has worked in a precompiled kernel and filesystem during a forced kernel panic but not working today when I tried to applied it.

Here is my driver code, the panic event is received correctly (debugged with printk), and the write file functions works (tested calling the function from load function)

Code:

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/notifier.h>
#include <linux/module.h>
#include <linux/syscalls.h>

#include <linux/fs.h>
#include <linux/fcntl.h>
#include <asm/uaccess.h>

#include <linux/file.h>

char buf[100];
int i;
int count;
struct file *phMscd_Filp = NULL;
mm_segment_t old_fs;

// ****************************************************************************
// Here is where i receive the panic notification (from panic.c)
// ****************************************************************************
static int panic_happened(struct notifier_block *n, unsigned long val, void *message)
{
for(i=0;i<100;i++) buf[i] = 0;
      buf[0] ='K';
      buf[1] ='E';
      buf[2] ='R';
      buf[3] ='N';
      buf[4] ='E';
      buf[5] ='L';
      buf[6] =' ';
      buf[7] ='P';
      buf[8] ='A';
      buf[9] ='N';
      buf[10]='I';
      buf[11]='C';
      buf[12]=0x0A;

  // request file
  phMscd_Filp = filp_open("/etc/gs/logs/panic.log", O_RDWR | O_LARGEFILE | O_CREAT , 0);
  if (phMscd_Filp == NULL)
    printk(KERN_EMERG "filp_open error!!.\n");

  // save actual space
  old_fs=get_fs();  

  // jump to other space
  set_fs(get_ds()); 

  // write the file
  vfs_write(phMscd_Filp, buf, 13, &phMscd_Filp->f_pos);

  // close the file
  filp_close(phMscd_Filp,NULL);  

  // back to old space
  set_fs(old_fs);   
	return 0;
}


// ****************************************************************************
// block
// ****************************************************************************
static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };

// ****************************************************************************
// load
// ****************************************************************************
static int __init register_my_panic(void)
{
	atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
  return 0;
}

// ****************************************************************************
// unload
// ****************************************************************************
static void __exit unregister_my_panic(void)
{
	atomic_notifier_chain_unregister(&panic_notifier_list, &panic_notifier);
}

here is the actual panic.c kernel code on kernel panic from my actual kernel, "atomic_notifier_call_chain(&panic_notifier_list, 0, buf);" is which calls my module.

Code:

NORET_TYPE void panic(const char * fmt, ...)
{
	long i;
	static char buf[1024];
	va_list args;
#if defined(CONFIG_S390)
	unsigned long caller = (unsigned long) __builtin_return_address(0);
#endif

	/*
	 * It's possible to come here directly from a panic-assertion and not
	 * have preempt disabled. Some functions called from here want
	 * preempt to be disabled. No point enabling it later though...
	 */
	preempt_disable();

	bust_spinlocks(1);
	va_start(args, fmt);
	vsnprintf(buf, sizeof(buf), fmt, args);
	va_end(args);
	printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
	bust_spinlocks(0);

	/*
	 * If we have crashed and we have a crash kernel loaded let it handle
	 * everything else.
	 * Do we want to call this before we try to display a message?
	 */
	crash_kexec(NULL);

#ifdef CONFIG_SMP
	/*
	 * Note smp_send_stop is the usual smp shutdown function, which
	 * unfortunately means it may not be hardened to work in a panic
	 * situation.
	 */
	smp_send_stop();
#endif

	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);

	if (!panic_blink)
		panic_blink = no_blink;

	if (panic_timeout > 0) {
		/*
	 	 * Delay timeout seconds before rebooting the machine. 
		 * We can't use the "normal" timers since we just panicked..
	 	 */
		printk(KERN_EMERG "Rebooting in %d seconds..",panic_timeout);
		for (i = 0; i < panic_timeout*1000; ) {
			touch_nmi_watchdog();
			i += panic_blink(i);
			mdelay(1);
			i++;
		}
		/*	This will not be a clean reboot, with everything
		 *	shutting down.  But if there is a chance of
		 *	rebooting the system it will be rebooted.
		 */
		emergency_restart();
	}
...

There is someting here than could be stopping my write file function?

mulyadi.santosa · 09-08-2011, 04:14 AM

Hi Lucas

I am not exactly sure, but I think it's due to that writing is mostly done asynchronously, both in block device and filesystem layer. At the time it is really scheduled, the kernel system is already "dead". Thus, I have a feeling that it was "writing" but the data never reached the actual disk sector.

lucasct · 09-08-2011, 06:56 AM

I Have wrote the file, the solution was retry 3 times with 500mS deleay between each try.

Now my problem is that the kernel panic dumo does not apear in anywhere. In older tests the dump and the backtrace were in /var/log/messages.

mulyadi.santosa · 09-08-2011, 11:34 AM

Quote:

Originally Posted by lucasct

I Have wrote the file, the solution was retry 3 times with 500mS deleay between each try.

hm alright..... so in other word, you are "forcing the luck". But does it always work?

Quote:

Originally Posted by lucasct

Now my problem is that the kernel panic dumo does not apear in anywhere. In older tests the dump and the backtrace were in /var/log/messages.

hmmm.....not sure, it could be due to your writing, the later dump stack became delayed and then stopped because cpu is already stopped...just my guess..... how about doing stack dump manually by calling dump_stack()?

lucasct · 09-08-2011, 12:08 PM

Yes, it always is working, I have forced 3 diferent kernel panics and it always write my word in the file.

does syslogd or klogd save the kernel panic messages in /var/logs/messages or dmesg? Is Kernel panic spopping them?
Could be a solution a network syslogd Installing a server?

mulyadi.santosa · 09-08-2011, 01:44 PM

hi

AFAIK, kernel just send the message into kernel ring buffer. later, syslogd (or its equivalent like syslog-ng) picks them up and save it to configured file, most likely /var/log/message.

So, if something doesn't show in the file, it could be that the log level is not matching (thus it just shows in console likely) or it is so bad situation that syslog daemon isn't able to catch up.

Just my 2 cents analysis