LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   creating tar files with high data integrity (http://www.linuxquestions.org/questions/linux-software-2/creating-tar-files-with-high-data-integrity-481911/)

edman007 09-09-2006 12:53 PM

creating tar files with high data integrity
 
is there a way i can create a tar file with high data integrity, i want to back up some important stuff and burn it to a CD but i don't really feel 100% of the data will safe, i would like to sacrifice some space and add extra parity to the tar file, is there a way to do that?

Tinkster 09-09-2006 03:40 PM

I'd be more worried about the media than the tar-file... maybe burn a few
copies and store them at different sites?



Cheers,
Tink

stress_junkie 09-09-2006 03:46 PM

The tar command has a verify option.
Code:

tar -cW ...
Code:

tar -c --verify ...
It's amazing what you can discover using the man utility.

Edit for spelling correction only.

Tinkster 09-09-2006 03:54 PM

While this is true it won't help him if the CD with the archive gets scratched
while in storage ...



Cheers,
Tink

stress_junkie 09-09-2006 07:23 PM

Quote:

Originally Posted by Tinkster
While this is true it won't help him if the CD with the archive gets scratched
while in storage ...
Cheers,
Tink

All righty then. The original question appeared to me to be asking about verifying a backup that you are creating, but let's go with verifying a backup from storage.

I would say that you would have to prepare a checksum for your backup at the time that you make it. Then you can run the same checksum operation when you take the backup medium out of storage. It would go something like this.

Prepare your backup for storage:
1) Make your backup.
2) Ensure that you are happy with it.
3) Obtain an MD5 checksum on the storage medium.
4) Write that checksum by hand on the storage medium.

Verify medium from storage:
1) Obtain an MD5 checksum on the data on the medium.
2) See if that MD5 checksum matches the one that was made at the time that the backup was done.

Here is how I would implement it.

Create the backup. Let's use the following conditions. We are going to create a tar archive file on a disk partition mounted on /mnt/backup. The tar file will be named test.tar.
Code:

root> tar -c --verify -vf /mnt/backup/test.tar .
Okay so I just executed that and it worked. The tar file was created and it was verified. Now let's create an MD5 checksum for this known good file.
Code:

root> md5sum /mnt/backup/test.tar
9888fdc35f1233a021d6f2c2e4ce5a6f  /mnt/backup/test.tar

Now write that MD5 checksum onto the backup medium.

When you retrieve the backup medium from storage you just run an MD5 checksum on the backup image and compare it to the one that is written on the medium.

You can play this game with any medium whether it be tape, CD-RW, DVD-RW or EEPROM, or whatever. You could make the tar file on a hard disk and then use something like K3B to burn that file onto a CD or DVD which could then be mounted like a disk. You could then do a MD5 sum on the CD or DVD image to see if it matched the tar file MD5 sum on the hard disk. That would validate the burn onto CD or DVD. Actually K3B will do this for you if you ask it nicely.

Funny thing. I'm sure that I used to use tar to stream directly onto a DVD-RW. just like a tape drive. I just tried it and it wouldn't work. The only difference that I can think of is that I don't use IDE=SCSI in the kernel any longer. That's why my example above uses a tar file on a hard disk partition that can be mounted to a normal mount point. The procedure will certainly work if you send your tar stream directly to a tape drive.

Anyway, that's my idea for verifying backup media out of storage, which I didn't think that the original question even asked about.

win32sux 09-09-2006 07:35 PM

getting the MD5SUM of the tarball is nice, but having a correct MD5SUM of a corrupt tarball is a very real possiblity... for this reason, i would recommend taking it a step further by aperforming an MD5SUM of every single file which is gonna go into the tarball (in addition to getting an MD5 of the tarball after)... this way when you untar the tarball, you can check the actual files you care about for integrity...

let's say we are in the directory which contains everything we wanna back-up:
Code:

find . -type f -exec md5sum {} >> CHECKSUMS.md5 \;
now our backup directory will contain a checksum file which will get tarballed along with everything else... to verify the integrity of the files after untaring, just cd to the base dir and do a:
Code:

md5sum -c CHECKSUMS.md5

stress_junkie 09-09-2006 07:35 PM

Quote:

Originally Posted by win32sux
getting the MD5SUM of the tarball is nice, but having a correct MD5SUM of a corrupt tarball is a very real possiblity

We know that the tar file is good at the time that it is created because we used the --verify option in tar.

I have to admit that I like your idea of getting a checksum on every file to be archived. My procedure only lets us know if the tar file is bad from damage to the medium during storage. Your procedure lets us know which files are still good even if the tar file suffered some degradation.

win32sux 09-09-2006 07:50 PM

Quote:

Originally Posted by stress_junkie
We know that the tar file is good because we used the --verify option in tar.

hehe, sorry, i don't know how i missed that...

Quote:

I have to admit that I like your idea of getting a checksum on every file to be archived. My procedure only lets us know if the tar file is bad. Your procedure lets us know which files are still good even if the tar file suffered some degradation.
yeah, i think mixing both procedures would be great... :)

maybe something like:
Code:

cd /mnt/backup

rm -f CHECKSUMS.md5

find . -type f -exec md5sum {} >> CHECKSUMS.md5 \;

tar cvzf --verify /tmp/backup.tar.gz .

cd /tmp

md5sum backup.tar.gz > backup.tar.gz.md5

mkisofs -R -J -pad -v -o backup.iso \
backup.tar.gz backup.tar.gz.md5

cdrecord dev=/dev/cdrom -pad -v -eject backup.iso


unSpawn 09-09-2006 07:58 PM

While this is true it won't help him if the CD with the archive gets scratched while in storage ...
Maybe check out dvdisaster: "dvdisaster provides a margin of safety against data loss on CD and DVD media caused by aging or scratches. (..) dvdisaster is available for recent versions of the FreeBSD, Linux and Windows operating systems."

haertig 09-09-2006 08:06 PM

Verifying good backups and good CD burns is necessary, but so is a test restore IMHO. I think tar/Linux is quite secure in making good backups, but I can't say the same for the Windows world. I know - you're not talking about the Windows world. However, I have personally experienced, and known others who have too, a "good, verified" backup that couldn't be restored at a later date (in the Windows world). So that lesson learned has been carried over even after I switched to Linux. Important stuff always gets a test restore, and that restore is done on a different computer. Using different hardware. It doesn't matter if the CD burner that burned the backup can read its own CD later, if that particular CD drive goes bad on you. You need to make sure that other CD drives can read your burned disk reliably and error-free as well.

And as Tink said, don't trust the media. CD's can go bad. Burn two copies. If you really want to be paranoid, use two different branded media (high quality) and two different CD burners if you have them. Then next year, burn two more (but still hang on to the old ones). And on and on. Put one copy in you bank safe-deposit box, and mail the other one to a relative or friend in a different geographic location.

win32sux 09-09-2006 08:12 PM

don't forget to encrypt the files on the CD that you mail your relative or friend in a different geographic location (you'll need a separate set of test restores)... :D

stress_junkie 09-09-2006 08:23 PM

This is how I do encrypted backups. Actually I put tar backup files onto encrypted partitions on external USB drives, but I'm going to describe how to create an encrypted container file to hold a backup. Then the container file can be copied to some storage medium.

1) Create a container file the size of the backup medium. You only have to do this once.
Code:

dd if=/dev/urandom of=/var/backup/container.file bs=4.7G count=1
2) Mount the container file to a loop device using encryption.
Code:

losetup -e blowfish /dev/loop0 /var/backup/container.file
password: <enter your encryption key-password>

3) Create your file system through the loop device into the container file.
Code:

mkfs -t xfs /dev/loop0
4) Mount the loop device to some mount point.
Code:

mount -o sync /dev/loop0 /mnt/backup
5) Create your tar file in /mnt/backup. <See posts above.>
6) Unmount your container file.
Code:

sync
umount /mnt/backup
losetup -d /dev/loop0

7) Now use whatever method pleases you to copy the /var/backup/container.file to some medium.

I've heard that dm-crypt is better than crypto-loop. I just haven't started to use dm-crypt yet. I'm not sure that dm-crypt would apply to a container file anyway but I've heard that I should be using it for my USB disks.

Edited to change reiserfs to xfs in step 3. I've had a lot of trouble with reiserfs lately.

edman007 09-09-2006 10:02 PM

i did read the man page, the verify option, like checksums does just that, verifies that the data is correct, but that is not what i want, i expect data on a CD to get corrupted over time, it sucks and i have had it happen many many times, but unfortunately verify will just tell me what i probably already know, that its bad

what i am asking for is something that can correct the error, just like RAID, knowing that a drive is broken is nice but its not much help if you still can't get the data, RAID 5 adds parity and thus goes beyond the "this drive is broke" and fixes it to get my files, instead of saying "its broke so you lose" like RAID 0 does

dvdisaster looks like what i want but i was kinda hoping for something that is installed on just about every distro so i don't have to go searching for the program to read it later

mnauta 10-10-2006 02:00 PM

Quote:

............

Create the backup. Let's use the following conditions. We are going to create a tar archive file on a disk partition mounted on /mnt/backup. The tar file will be named test.tar.
Code:

root> tar -c --verify -vf /mnt/backup/test.tar .
Okay so I just executed that and it worked. The tar file was created and it was verified. Now let's create an MD5 checksum for this known good file...........
.
Thanks for this excellent example. When I use the --verify it can't find the files because leading / is removed. How can I solve this?

Thanks
manuel


All times are GMT -5. The time now is 02:59 PM.