Hello, I've had this problem before, and it conked a server tonight. We have Dell PowerEdge servers with PERC controllers and SAS drives. One of the drives will fail, and even after rebuilding the drive, after reboot, the system is corrupted. It fails with the dreaded
Code:
Kernel panic - not syncing: Attempted to kill init!
I usually have to boot the server to a rescue CD ( Scientific Linux 4.7), and copy corrupted files from a working system.
Today I had a server do this, and upon mounting the bad partition, I see that the /lib directory has been turned into a pipe!
Code:
[root@slinux dm-0]# ls -l
total 196
drwxr-xr-x 2 root root 4096 Dec 6 16:44 bin
drwxr-xr-x 2 root root 4096 Feb 24 2010 boot
drwxr-xr-x 4 root root 4096 Feb 24 2010 dev
drwxr-xr-x 102 root root 12288 Jan 7 16:42 etc
-rw-r--r-- 1 root root 296 Feb 24 2010 event.log
-rw-r--r-- 1 root root 0 Jan 7 23:15 forcefsck
-rw-r--r-- 1 root root 0 Jan 7 16:41 halt
drwxr-xr-x 2 root root 4096 Feb 24 2010 home
p-----S--- 13 170 root 3624927232 Feb 24 2010 lib
drwx------ 2 root root 16384 Feb 24 2010 lost+found
drwxr-xr-x 2 root root 4096 Mar 9 2009 media
drwxr-xr-x 2 root root 4096 Jan 21 2009 misc
drwxr-xr-x 3 root root 4096 Feb 24 2010 mnt
dr-xr-xr-x 2 root root 4096 Feb 24 2010 net
drwxr-xr-x 2 root root 4096 Feb 24 2010 opt
drwxr-xr-x 2 root root 4096 Feb 24 2010 proc
drwxr-x--- 8 root root 4096 Jan 7 16:10 root
drwxr-xr-x 2 root root 12288 Dec 6 16:44 sbin
drwxr-xr-x 2 root root 4096 Feb 24 2010 selinux
drwxr-xr-x 2 root root 4096 Mar 9 2009 srv
drwxr-xr-x 2 root root 4096 Feb 24 2010 sys
drwxr-xr-x 2 root root 4096 Feb 24 2010 tmp
drwxr-xr-x 2 root root 4096 Feb 24 2010 usr
drwxr-xr-x 2 root root 4096 Feb 24 2010 var
Right now, I'm copying /lib from a working server. Rebooting now...
What causes this? Is there an easier way to recover from this kind of error?
Thanks!
Edit: I successfully recovered the server, just had to drop to a shell and fsck 3 times, after copying /lib from the other server. If anyone knows an alternate way of recovering a directory from becoming a pipe, please let me know. It looks like all the files were moved into the "pipe" and maybe could have been repaired.
Should I expect better stability if I were to put my system files in ext3 file systems directly on disk partitions, rather than using LVM?