Help! Need to recover corrupt bzip2 files..
I recently backed up a large portion of my /home directory to migrate to another machine.
# tar -cvjf /someotherdrive/backup.tar.bz2 <homeuser>
After trying to decompress the bzip2 file, it reported errors similar to the following:
bzip2: Data integrity error when decompressing.
Input file = (stdin), output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
I've used bzip2recover to generate chunks of bzip2 data:
# bzip2recover /someotherdrive/backup.tar.bz2
which gernerates chunks:
I can find the corrupt chunks by using:
#bzip2 -t <given chunk>.tar.bz2
Is it possible to extract the given corrupt chunk and recombine the good data in some way? I'm not too concerned about the block that the decompression failed on, but don't want to lose data following that...!!?
Re: Help! Need to recover corrupt bzip2 files..
If that was the actual command you typed you'd
have a bzip2'ed tarfile called <homeuser> with
a content of /someotherdrive/backup.tar.bz2
Whoops! Thanks for that...
I actually did use it the other way around! :)
*I'll edit the first post to reflect the correct usage*
*Resolved* - Need to recover corrupt bzip2 files..
I've solved the problem in a round-about way, so here's how i did it:
1. Use bzip2recover to recover the individual bzip2 blocks (900k by default)
# bzip2recover corrupt.tar.bz2
which gernerates chunks:
2. Send bzip2 test results to a file for searching (could also grep right here if you want):
#for i in *.bz2; do bzip2 -tvf $i >> corruptblocks.out 2>&1; done
3. Search the test results for errors
# grep -i CRC corruptblocks.out
4. Delete the corrupt block
# rm <corruptblock>.tar.bz2
5. Untar all of the bzip2 blocks elsewhere
# for i in *.bz2; do bzip2 -dcvf $i > /elsewhere/$i.tar; done
6. Here you need to look for the first occurence of a tar header prior to the corrupt block. A tar block basically consists of the tar header filename (eg the actual file archived - /pictures/easter/pic02.jpg), followed by tar header metadata (doesn't matter), followed by the actual data. A tar archive is just made up of sequential tar blocks. The aim is to remove the entire tar block in which the corrupt bzip2 block lived. The untaring will continue as if the corrupt tar block never existed.
I did it using a hex editor as I wasn't too sure in which actual file (from the filesystem) the error had occured. So if the corruption occurred in block 1000, I would check through block 999, then 998 etc... If you know that you were backing up the "/pictures" directory and it failed around the "/pictures/easter" directory, then greping for "/pictures/easter" in the few blocks prior should find a match. You need to do the same to find the closest tar header AFTER the corrupt block. Remember that it needs to be just the tar block in which the corruption occured.
block997.tar - Closest preceeding header (/pictures/easter/pic01.jpg)
block1000.tar - CORRUPT BLOCK
block1002.tar - Closest trailing header (/pictures/easter/pic02.jpg)
a. Here you would open block997.tar and remove ALL data from the start of the header (/pictures/easter/pic01.jpg......) to the end of the file (the first "/" char onwards).
b. Make sure you delete block998.tar, block999.tar, block1000.tar (should have already been deleted earlier) and block1001.tar.
c. Open block1002.tar and remove ALL data from the byte prior to the start of the next header (/pictures/easter/pic02.jpg) to the start of the file (the new file should now start with /pictures/easter/pic02.jpg as opposed to raw data)
(After this, the tar block with the corrupt data should have been removed)
7. Glue everything back together using:
# cat /elsewhere/*.tar > recovered.tar
8. Untar as usual to recover existing data
* tar -xvf recovered
Hope this helps.
|All times are GMT -5. The time now is 10:58 AM.|