LinuxQuestions.org - How to back up files off a failing hard drive?

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - How to back up files off a failing hard drive? (https://www.linuxquestions.org/questions/linux-hardware-18/how-to-back-up-files-off-a-failing-hard-drive-485322/)

How to back up files off a failing hard drive?

Hello everyone. I have a 120GB drive (formatted with reiserfs) that is in the process of dying. I get strange errors (usually involving DriveNotReady and hda_dma) on the screen, and each new badblocks scan gives me more failed sectors, so I know the drive is on its way out.

I am trying to replace it with a new 250GB drive. I tried to use GNU parted to clone the drive, but the I/O errors prevent it from working. So, I'm trying to just copy as much of the failing hard drive's data as I can to the new one.

I'm currently using system rescue CD to boot the system. I have the old hard drive (/dev/hdb2) mounted on /mnt/temp2, and the new (/dev/hda2) mounted on /mnt/temp1.

I started with trying to use tar:

Code:

cd /mnt/temp2; tar cf - * (cd /mnt/temp1; tar xvpf -)

but that got through only about 30GB of the 110GB of data on the drive before it encountered some bad sectors and stopped. Further research (mostly here on linuxquestions.org forums) suggests that rsync might be able to do the same thing and skip all the already-transferred files (which would be nice, but is not required):

Code:

rsync -avlH --progress /mnt/temp2 /mnt/temp1

but I'm afraid that will also fail once it starts encountering I/O errors. I could try using find /mnt/temp2 -exec to start a separate instance of rsync for each file/directory/link it encounters, but that seems massively inefficient.

Can anyone suggest a method of copying the files from the failing hard drive to the new hard drive that
- will preserve all time/permission/owner/etc. metadata
- will not stall on I/O errors but will try to copy as many files as it can, and
- will give me a list of all files that had problems so I can concentrate on trying to recover those specific files?

I don't mind erasing those 30GB of files and starting over, but I would like to stress the drive as little as possible until I've gotten the data off of it.

Thanks!

Does the system rescue CD come with ddrescue? If so, you can use it to copy the drive to a disk image. Unlike the regular dd command, ddrescue will do it's best to rescue bad blocks. If you can get a clean disk image copied, you can recover your actual data from there.

ddrescue appears to be working well

Neither System Rescue CD nor Knoppix contain ddrescue (although both contain dd_rescue.) (There is a post on the System Rescue CD forums suggesting the inclusion of ddrescue, so it might be there eventually.) Knoppix has enough build tools that I was able to download the source from the ddrescue homepage and build it easily.

I am following the instructions I found on the TestDisk wiki, which are as follows:

Quote:

# first, grab most of the error-free areas in a hurry:
ddrescue -B -n /dev/old_disk /dev/new_disk rescued.log
# then try to recover as much of the dicy areas as possible:
ddrescue -B -r 1 /dev/old_disk /dev/new_disk rescued.log

The first pass went through fine, the second is taking much longer. (I should note that I am getting the data from /dev/hdb2 directly, but writing to an image file on [mounted] /dev/hda2.) I'll post again with the results when it's done.

ddrescue appears to have done the trick

The ddrescue appears to have pulled as much of the old, failing HD as can be -- I now have a 111GB image file on the new HD. It apparently encountered over 1600 errors doing the more in-depth scan, and the logfile is over 5kB, so it definitely did not go flawlessly.

Now, the difficult part -- trying to turn the logfile into a coherent list of files that are corrupted beyond recovery. If there are no irreplaceable files in that list, I can just restore the image, reinstall some applications, and be on my way.

A search of the bug-ddrescue list shows I might need a Perl script called 'ddrsummarize.pl', but I'll have to request that from the list.

Kaynos,

You can get ddrsummarize.pl, ddr2nfi.pl, nficruncher.pl,
ddrlogor.pl, and a lot of other ddrescue-related Perl scripts
by downloading them from my server, here:

www dot burtonsys dot com slash download slash ddr2sr.zip

(Sorry, I've been a "member" here for nearly a year, but this
annoying web site still won't let me post URLs, so change
" dot " to "." and change " slash " to "/" to turn the above
into a usable URL.)

First, I recommend that you look closely at the ddrescue
logfile. One sector is 0x200 bytes. So if the "-" status
logfile entries are all multiples of 0x1000 then what you
are seeing is the result of whole cluster at-a-time reads
failing. You can probably use a "raw" device to get ddrescue
to read individual sectors, which will reduce the number
of bad sectors considerably. However, use of raw devices
depends very much on what OS version you are running, which
makes it difficult to tell you just what commands are
needed.

If your hard disk drive is not in NTFS format, then I can't
offer much help identifying the damaged files. But if your
hard disk drive is in NTFS format, then here's what I would
do if I were you:

1) Copy the entire rescued disk image to a scratch drive.

2) Save a copy of the partition table, like this:
fdisk -lu drive.ima >fdisk-lu_output.txt
or:
fdisk -lu /dev/hdd >fdisk-lu_output.txt
(or whatever)

2) Use ddr2nfi.pl (formerly called srddrnfi.pl) to generate a .bat
script of 'nfi' commands from the ddrescue logfile. Call the .bat
script "nficmds.bat":
perl -w ddr2nfi.pl nficmds.bat - fdisk-lu_output.txt drive.log

or if you used SpinRite (probably via ddr2sr.pl):
perl -w ddr2nfi.pl nficmds.bat SPIN_LOG.3 fdisk-lu_output.txt log_before_SR log_after_SR
where:
SPIN_LOG.3 is the spinrite log file (extension varies),
log_before_SR is the ddrescue logfile saved before running SpinRite,
log_after_SR is the ddrescue logfile created after running SpinRite

3) Copy the nficmds.bat file to a thumb drive or diskette.

3.1) If you don't already have Microsoft's nfi ("NTFS File
Sector Information Utility") then get it, too:

www dot google dot com slash search?q=%22NTFS+file+sector+information%22+site:microsoft.com

4) Attach the scratch drive to a Windows computer as a 2nd
drive ("E:" for this example), and start the computer.

Do NOT let Windows check the drive during startup, because
if you do then you won't get to capture the list of file names
that it mentions when checking the drive.

5) If the drive letter doesn't match the drive letter in
nficmds.bat, then edit nficmds.bat and fix the drive letters.

6) (This step is optional; if you get tired of acknowledging
the pop-up boxes then you can skip steps 6 and 7.)

Run nficmds.bat on the Windows computer, redirecting the
output into a text file:

nficmds.bat >nfioutput.txt

7) Process nfioutput.txt using nficruncher.pl, to produced a
"damaged files report," and various other reports:

perl -w nficruncher.pl -f -d -r -i -u nfioutput.txt

8) In a Windows XP or Win2000 command-prompt window do:

chkdsk /f E: >c:\errors1.log.txt

(c:\errors1.log.txt is effectively another damaged files report.)

9a & 9b) Save the output files produced in steps 6 and 7. Then
repeat steps 6 and 7. (Because of the chkdsk you did in step 8
nfi is less likely to produce pop-up messages which you must
acknowledge.)

10) Your (not necessarily complete) list of damaged files is
the combined list of files found in steps 7, 8, and 9b.

11) It is also possible to use the output of nficruncher.pl
to "get smarter" with ddrescue. For example, it produces a
free-space sector list, in ddrescue logfile format, called
unimportant.log, which you can merge into the regular ddrescue
logfile, to make a subsequent ddrescue run pretend that those
unused sectors were already rescued, so that you don't waste
time trying to recover them.

BTW, that's why you copied the recovered disk or image to a
scratch drive in step 1 -- because Windows changes the drive
when it examines it (and drastically changes it when doing
chkdsk!), which would prevents you from later restarting the
rescue process using ddrescue. But since you only let Windows
touch a scratch copy, Windows can't mess up the original.

To merge the "unimportant.log" file (free-space sector list)
produced by nficruncher.pl, you can use the ddrlogor.pl script
("DDRescue LOGfile logical .OR.").

Note: one trick that I've done is to edit nfioutput.txt before
processing it with nficruncher.pl, to make it look like some
files that I don't care about, e.g., hiberfil.sys and
swapfile.sys, are part of the NTFS partition's free-space.
Then, after ddrlogor.pl merges unimportant.log, ddrescue's
logfile will indicate that those other unimportant files are
already recovered, so ddrescue won't waste time trying to
recover them.

12 & on) Then you can resume the ddrescue recovery process,
perhaps just targetting the most important disk areas, before
going back to step 1.

For lots more instructions, see the comments in the various
Perl scripts.

-Dave
dave340 at burtonsys dot com but please no spam

P.S. -- If you know of (or write) the equivalent of 'nfi' for
FAT32 (or other file systems), or to run under Linux instead
of under Windows, then please tell me!!

You should be able to mount the disk image using the "-loop" loopback option and manually check or rescue important files. Although with 100GB+ of data that might be a bit impractical.

ncdave, you need to have 5 posts before you're allowed to link to other pages. It's an anti-spam defense.

The ddrsummarize.pl script told me that the 1688 errors only totaled about 800kB of lost data, but didn't give me any way to convert the sector numbers to filenames.

I posted[1] on the ddrescue mailing list to ask how to do the conversion, and was referred to a HOWTO[2] with instructions. Unfortunately, the HOWTO is for ext2/ext3 filesystems and the debugfs command used is specific to ext2/ext3. I tried asking[3] the reiserfs mailing list, but found that there is no equivalent command for reiserfs.

I ended up following the instructions[4] in a previous post to the reiserfs list, and ran find /mnt/old_disk -type f -exec cat {} > /dev/null \; . This found only 3 files had problems with reading & writing to /dev/null (two of which were in my Firefox cache, so really no problems there).

I copied the files out of the disk image using the original tar command, and deleted the 3 files. One quick re-LILO later, and the system boot and is pretty much running fine. I haven't done a thorough system check yet, but thus far it seems OK.

Hopefully this thread will be of use to anyone else who has a disk failure on a reiserfs system.

[1] http://lists.gnu.org/archive/html/bu.../msg00004.html
[2] http://smartmontools.sourceforge.net/BadBlockHowTo.txt
[3] http://marc.theaimsgroup.com/?l=reis...8754104268&w=2
[4] http://marc.theaimsgroup.com/?l=reis...5109321290&w=2