LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Filesystem recovery after hard drive failure (https://www.linuxquestions.org/questions/slackware-14/filesystem-recovery-after-hard-drive-failure-668978/)

bl0tt0 09-10-2008 12:57 PM

Filesystem recovery after hard drive failure
 
My hard drive suffered a mechanical failure, and after taking it in to a data recovery service, they determined that there was also damage to the first 100 or so sectors of an lvm formatted partition (actually, it was lvm formatted on top of LUKS, and explaining that to the technician was a trial in and of itself). After they managed to pull what they could off of the tracks of the hard drive, it at least seemed like there was very little data loss. I was still able to open the LUKS formatted partition and activate the logical volumes on the new drive. However, it appeared that the damage to the drive took out the superblock of my ext3 formatted partition, on top of which the journal had become corrupted. As I mentioned before, the technician wasn't familiar with LUKS, so when I gave him the passphrase to unlock the luks encrypted partition, he tried to decrypt the volume as if it were an efs partition. Of course, that didn't work, and I wound up having to go in to their office to show them how to run cryptsetup to unlock the volume, and then proceeded to do recovery myself.

Not thinking to first perhaps try mounting the partition read-only as an ext2 partition, I immediately ran fsck on it, which resulted in the entire filesystem being moved to the lost+found directory. After browsing through lost+found, it looks like the filesystem is still more or less intact, but the top level directories in / have been relabeled with numbers (to clarify, as is normal with lost+found, everything is labeled #xxxxxxxxxx, but I can still piece together what was originally my /home or /etc directory by running ls on the contents of #xxxxxxxxxx).

What I am wondering is how I should go about recovering the filesystem to the state it was in prior to the hard drive failure. I have a backup of my system from May, but it was made before I upgraded from 12.0 to 12.1. Part of me wants to try moving the contents of lost+found back to what seems to be their place in the filesystem hierarchy, booting up, and then addressing missing files as I go. The risk with that seems to be winding up with an unbootable system and having to spend even more time digging for packages that contain the missing portions of the OS.

The other idea involves making a backup of the recovered drive (which I intend to do regardless of what my final decision), doing a fresh reinstall of Slackware, then copying files from the backup to the correct places in the new installation. This seems like a safer option, but I'm not sure of what all I'll be losing in the process. I've made numerous configuration changes to programs, and I've installed several other programs built from source, which I would have to remember to rebuild.

What do people recommend as a good recovery option? Are there any other ideas I'm not considering? Any advice is welcome.

jailbait 09-10-2008 01:26 PM

Quote:

Originally Posted by bl0tt0 (Post 3276085)


The other idea involves making a backup of the recovered drive (which I intend to do regardless of what my final decision), doing a fresh reinstall of Slackware, then copying files from the backup to the correct places in the new installation. This seems like a safer option, but I'm not sure of what all I'll be losing in the process. I've made numerous configuration changes to programs, and I've installed several other programs built from source, which I would have to remember to rebuild.

This is by far the better way to recover the data. If you try to fix the bad data in place you run the risk of the data degrading further. If you try to fix the problems on a copy then any mistakes can be fixed by copying again. On the new copy you can experiment with data from the old disk, from lost+found, the May backup, the Slackware installer, etc without being able to recover from mistakes in your recovery attempts.

----------------------
Steve Stites

rkrishna 09-17-2008 04:27 AM

depending on the file system there are various methods available.

i have tried on both ext3 and reiserfs filesystems and they all worked well :)

google it or there are good howtos available in LQ

bl0tt0 09-22-2008 02:11 PM

Ok, so after throwing down a little money on a 1TB drive and usb enclosure, I've done a backup of the lost+found directory and done a fresh reinstall of Slackware, after which I copied the contents of the recovered lost+found directory to their respective places in the filesystem. Seems like I was right! Everything seems to be running fine now.

My next step involves setting up a long-term backup strategy so I don't have to worry about this kind of problem again. I've already successfully configured rsnapshot so do hourly, daily, and weekly backups to the new 1TB drive, but this requires that my laptop be connected to the usb drive at the scheduled times.

I would like to throw together a cheap desktop computer (I know of an organization near me that recycles old computer parts and sells them for only $50 a pop), set up and ssh server on it, and then do backups over the network. The problem is that rsnapshot only allows running backups from the server side, meaning I wouldn't be able to run it from my laptop. I know the reasons for this, and this makes sense when doing backups on the scale of a large enterprise, but I only have one computer that needs backing up. Would there be a better program for making this work?

Woodsman 09-22-2008 06:30 PM

Quote:

My next step involves setting up a long-term backup strategy so I don't have to worry about this kind of problem again.
I have such a backup strategy explained at my web site:

A Backup Strategy

I use a drive bay to connect my backup drive. The drive is SATA and not USB and therefore is not automatically recognized when I hotplug the drive, but my backup script takes care of that minor nuisance. You will not have any such issue with an external USB drive.

I run hourly backups and daily/weekly/monthly rotations of various critical files and directories. Those backups are automatic through cron. If you have only one internal drive, then create a separate partition for the hourly backups and rotations.

I have a separate rsnapshot configuration file for my manual weekly backups. Those backups are full system backups. The rsnapshot -c option allows using different config files. If the -c option is not used then the default config file is /etc/rsnapshot.conf.

Quote:

The problem is that rsnapshot only allows running backups from the server side, meaning I wouldn't be able to run it from my laptop.
For a while I struggled with this idea too. The solution is to realize the reverse of what most people usually want. I have two old boxes that I backup in addition to my production box. The rsnapshot backup runs from my main box, but if I power up my two old boxes, then rsnapshot will find them and perform the backups. I think the ssh service must be running for rsync to connect to the box. Creating and using public keys are necessary to automate the process.

If the additional boxes are not available for whatever reason, rsnapshot merely creates an entry in the rsnapshot logs. When the additional boxes are not available, rsnapshot does not rotate the backups for such a machine but instead hard links the last known good backup.

Each machine will have its own backup directory under each backup increment:

../hourly.0
../hourly.0/box0
../hourly.0/box1
../hourly.0/box2
etc.

You could network your laptop to your main box and then run your manual backup script from your main box. As long as ssh and your public keys are configured, rsnapshot (rsync) will find the laptop and backup the machine.

If you read the how-to, study the end of the rsnapshot-weekly.conf file to see how to backup additional machines.

I hope this helps.


All times are GMT -5. The time now is 04:22 AM.