A sob story about dataloss--any help appreciated

Note: If you want to skip the sobbing part and get to my question, go down to the section labled HELP PLEASE

The chain of absolutely miserable luck needed to put me in the current position is still too mind numbing to totally grasp. For over a year I've been running Mandrake 9.0 on a dedicated box which was serving e-mail, website, DNS, NTP, Samba, you name it. My whole LAN depends on this box.

Today I got my Sun -> VGA adaptor so I could plug a normal monitor into my Sun box and use CDE for once. I had an old monitor which I move around from my closet shelf, to my firewall, to my mail gateway as needed for console access. I tried that with the Sun box, but I couldn't get a visible image, just flickering scan lines. I figured that monitor was too old, so I unplugged my brand new monitor from the KVM switch that my Linux server and my wife's workstation share. I tried plugging the new monitor into the Sun box, but that also did not work. I figured it was no big loss, since I did all the Sun admin stuff from the command line any way, so I plugged the monitor back into the KVM and was about to go about business as usual, but that's when my real problems started.

When I plugged the monitor back into the KVM, I noticed I could no longer switch to the Linux box. The Windows workstation was displaying just fine, but when I tried to switch the monitor to Linux it just kept the Windows desktop and rendered the mouse and keyboard unusable. I went behind the Linux box to unplug the KVM to test the monitor from my workstation, and that show the Linux desktop immediately, so I knew it wasn't the Linux box. As I was plugging the KVM back into the Linux box, I accidently bumped the power cord for the PS. Now this cord has always been a bit sensative, and the slight bump was enough to dislodge it from the PS, cause powerfailure to the Linux box. I paniced and tried to plug it back in immediately, but in the process of trying to securely reinster the power cord it cause the power to go on and off several more times (for some reason this box always boots as soon as you plug the power in--it's very odd).

Finally I got the cord plugged back in and jarred the KVM cables enough that it made a connection and it was displaying the Linux machine on the new monitor again. It booted up all right, but I noticed right away the system was extremely unresponsive. I tried running fsck, but it started prompting my about options that could cause severe dataloss, so I ^C out of it. Then the machine froze. It wasn't responding to input of any type, not even CTRL ALT DEL so I power cycled it. Again the box booted, but it immediately started spitting out messages about not being able to write to a read-only file system. The messages were being echo'd a mile a minute on every terminal, so I couldn't do anything through the chatter. I tried to do sudo shutdown -h now, but it said /var/run/sudo already existed and it couldn't overwrite.

HELP PLEASE: Since I couldn't boot the box normally at all, I booted from disk 1 of my Mandrake 9.0 install CDs and went into rescue mode. I was thinking since all my partitions were ext3 that some how the journaled file system could repair itself. Maybe I was very mistaken in that belief, but I was extremely disappointed when I tried to run fsck.ext3 -p and it said I needed to run it manually. So I ran fsck manually and said yes to relocating all the blocks and no to the one that said it could cause dataloss (relocating inodes? I really can't remember). After that was done, it said there were still errors and it started all over again. I got the exact same questions for the exact same groups as the time before, so this time I just said yes to everything. fsck needed to run manually about 3 more times before I could finally try -p and it got a little way, but then quit and said I needed to run manually again. At this point it had been hours and hours of manual fsck, so I just did fsck.ext3 -y and it finished, telling me it had changed the data on the disk.

I tried to reboot normally, but it couldn't find init, so I went back to the install CD again and into rescue mode. I was able to mount hda1 (boot) and hda5 (/) on /mnt so I could examine what was left of my file system. I think I have about 15% of my files left My entire website, all my e-mail, all the important stuff from /etc, /var, /usr... it's all gone... well, not gone exactly. lost+found is crammed full of thousands of unnamed files and directories, but I have no idea how to begin retreiving any of my data. I couldn't seem to find "find" anywhere on the install CD, so I can't go that route. I'm thinking at this point I could create a small partition on the 20GB of the drive I left unpartitioned when I set it up (wow, that is looking smart in hindsight) and try to run some find scripts over several days to try to recover as many config files as I can. Given that it took me a solid year to got the box really setup how I wanted it, it's heart-shattering to think of the weeks, if not months it will take to rebuild the box, not to mention all the critical data that's just floating around out there without a home.

My question is, what's the most effecient way to find specific data in lost+found? All I need is my complicated config files, for Samba, named, etc.. passwd and shadow would certainly be nice, as well as my saved e-mail, some very important text files, whatever I can salvage of my website and small MP3 collection... really, I'm grasping at straws here, anything will help.

Also, for future reference how SHOULD I proceed when I know the filesystem has not been unmounted cleanly? I thought (gross ignorance I guess) that if I ran fsck, the file system journal could put itself back in order. That appears to have been wildly off the mark. Will running ext3 give me any better shot at saving data, or am I pretty much screwed if the power goes on and off rapidly like that again? Also, I've had many an unclean shutdown with Windows before, but I never seemed to lose anything important. Why is Linux so touchy to uncleanly unmounting when Windows doesn't seem to have that much of a problem. Note I'm saying "seem to" since that's my impression, but I really want to know whether that's the case or not.

Thanks for ANY help that can be provided.

Oh yeah, I forgot to mention the really ironic part. I was trying to hookup the Sun box to a monitor and keyboard so I could have direct access while trying to hookup my RAID array to the Sun box, which I was planning to use for backing up all my systems this weekend. I was literally hours away from having all my data backed up, when I snatched defeat from the jaws of victory Let that be a lesson to everyone that you should ALWAYS backup your data.

Hmm, you're 0wn3d.

There are ways of getting useful stuff out from lost+found, but this is very strange behaviour indeed - I have had several nasty power faliures and no problem - "press y within 5 seconds to force an integrity check" on reboot, but usually no need.

Rapid on-offs as far as linux is concerned is the same as one on-off because the computer wouldn't even have done it's POST before power was lost again.

btw: if everything freezes, but the kernel is still alive, alt+printscrren+s to sync all drives, alt+printscreen+u to unmount all drives, alt+printscreen+b to reboot.

or was it ctrl+printctsreen? Well it's one of those anyway - it's just been so long since I've had to do that...

! The thing *before*
mucking with the physical disks itself would be to hook up the drives to another box (or have a bootable cd with network caps and "netcat") and "dd" the image's out ("dd if=/dev/partition_to_dd of=/some/file), bzip2 'em and save 'em for round 2.

My question is, what's the most effecient way to find specific data in lost+found?

Usually I start with running "file" of each file, and sort them according to magic into subdirs. Then, depending on what I need to retrieve, sort by time or size, or run "strings" on each file in the subdirs and sort them according to results. You could also run Foremost, which is able to retrieve files based on certain header magic.

Round 2 means you still miss data from the box you desperately *need*. Now you'll have to delve into the murky depths of Linux Forensics' Land and run TCT and TASK to uncover all bits and pieces on the dd disk image.
That'll be recovery the hard way, not a trivial task, costs a *lot* of time, and even then there is (my experience) no 50 percent success guarantee. Try to see it as a last resort option to try and extract "usefull" stuff, but at least you know there *is* a last resort option...

Good luck.


