LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-09-2006, 12:53 PM   #1
edman007
Member
 
Registered: Sep 2003
Distribution: slackware-current
Posts: 173

Rep: Reputation: 30
creating tar files with high data integrity


is there a way i can create a tar file with high data integrity, i want to back up some important stuff and burn it to a CD but i don't really feel 100% of the data will safe, i would like to sacrifice some space and add extra parity to the tar file, is there a way to do that?
 
Old 09-09-2006, 03:40 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
I'd be more worried about the media than the tar-file... maybe burn a few
copies and store them at different sites?



Cheers,
Tink
 
Old 09-09-2006, 03:46 PM   #3
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
The tar command has a verify option.
Code:
tar -cW ...
Code:
tar -c --verify ...
It's amazing what you can discover using the man utility.

Edit for spelling correction only.

Last edited by stress_junkie; 09-09-2006 at 06:36 PM.
 
Old 09-09-2006, 03:54 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
While this is true it won't help him if the CD with the archive gets scratched
while in storage ...



Cheers,
Tink
 
Old 09-09-2006, 07:23 PM   #5
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Quote:
Originally Posted by Tinkster
While this is true it won't help him if the CD with the archive gets scratched
while in storage ...
Cheers,
Tink
All righty then. The original question appeared to me to be asking about verifying a backup that you are creating, but let's go with verifying a backup from storage.

I would say that you would have to prepare a checksum for your backup at the time that you make it. Then you can run the same checksum operation when you take the backup medium out of storage. It would go something like this.

Prepare your backup for storage:
1) Make your backup.
2) Ensure that you are happy with it.
3) Obtain an MD5 checksum on the storage medium.
4) Write that checksum by hand on the storage medium.

Verify medium from storage:
1) Obtain an MD5 checksum on the data on the medium.
2) See if that MD5 checksum matches the one that was made at the time that the backup was done.

Here is how I would implement it.

Create the backup. Let's use the following conditions. We are going to create a tar archive file on a disk partition mounted on /mnt/backup. The tar file will be named test.tar.
Code:
root> tar -c --verify -vf /mnt/backup/test.tar .
Okay so I just executed that and it worked. The tar file was created and it was verified. Now let's create an MD5 checksum for this known good file.
Code:
root> md5sum /mnt/backup/test.tar
9888fdc35f1233a021d6f2c2e4ce5a6f  /mnt/backup/test.tar
Now write that MD5 checksum onto the backup medium.

When you retrieve the backup medium from storage you just run an MD5 checksum on the backup image and compare it to the one that is written on the medium.

You can play this game with any medium whether it be tape, CD-RW, DVD-RW or EEPROM, or whatever. You could make the tar file on a hard disk and then use something like K3B to burn that file onto a CD or DVD which could then be mounted like a disk. You could then do a MD5 sum on the CD or DVD image to see if it matched the tar file MD5 sum on the hard disk. That would validate the burn onto CD or DVD. Actually K3B will do this for you if you ask it nicely.

Funny thing. I'm sure that I used to use tar to stream directly onto a DVD-RW. just like a tape drive. I just tried it and it wouldn't work. The only difference that I can think of is that I don't use IDE=SCSI in the kernel any longer. That's why my example above uses a tar file on a hard disk partition that can be mounted to a normal mount point. The procedure will certainly work if you send your tar stream directly to a tape drive.

Anyway, that's my idea for verifying backup media out of storage, which I didn't think that the original question even asked about.

Last edited by stress_junkie; 09-09-2006 at 07:35 PM.
 
Old 09-09-2006, 07:35 PM   #6
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
getting the MD5SUM of the tarball is nice, but having a correct MD5SUM of a corrupt tarball is a very real possiblity... for this reason, i would recommend taking it a step further by aperforming an MD5SUM of every single file which is gonna go into the tarball (in addition to getting an MD5 of the tarball after)... this way when you untar the tarball, you can check the actual files you care about for integrity...

let's say we are in the directory which contains everything we wanna back-up:
Code:
find . -type f -exec md5sum {} >> CHECKSUMS.md5 \;
now our backup directory will contain a checksum file which will get tarballed along with everything else... to verify the integrity of the files after untaring, just cd to the base dir and do a:
Code:
md5sum -c CHECKSUMS.md5

Last edited by win32sux; 09-09-2006 at 07:41 PM.
 
Old 09-09-2006, 07:35 PM   #7
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Quote:
Originally Posted by win32sux
getting the MD5SUM of the tarball is nice, but having a correct MD5SUM of a corrupt tarball is a very real possiblity
We know that the tar file is good at the time that it is created because we used the --verify option in tar.

I have to admit that I like your idea of getting a checksum on every file to be archived. My procedure only lets us know if the tar file is bad from damage to the medium during storage. Your procedure lets us know which files are still good even if the tar file suffered some degradation.

Last edited by stress_junkie; 09-09-2006 at 07:58 PM.
 
Old 09-09-2006, 07:50 PM   #8
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
Quote:
Originally Posted by stress_junkie
We know that the tar file is good because we used the --verify option in tar.
hehe, sorry, i don't know how i missed that...

Quote:
I have to admit that I like your idea of getting a checksum on every file to be archived. My procedure only lets us know if the tar file is bad. Your procedure lets us know which files are still good even if the tar file suffered some degradation.
yeah, i think mixing both procedures would be great...

maybe something like:
Code:
cd /mnt/backup

rm -f CHECKSUMS.md5

find . -type f -exec md5sum {} >> CHECKSUMS.md5 \;

tar cvzf --verify /tmp/backup.tar.gz .

cd /tmp

md5sum backup.tar.gz > backup.tar.gz.md5

mkisofs -R -J -pad -v -o backup.iso \
backup.tar.gz backup.tar.gz.md5

cdrecord dev=/dev/cdrom -pad -v -eject backup.iso

Last edited by win32sux; 09-09-2006 at 08:01 PM.
 
Old 09-09-2006, 07:58 PM   #9
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
While this is true it won't help him if the CD with the archive gets scratched while in storage ...
Maybe check out dvdisaster: "dvdisaster provides a margin of safety against data loss on CD and DVD media caused by aging or scratches. (..) dvdisaster is available for recent versions of the FreeBSD, Linux and Windows operating systems."
 
Old 09-09-2006, 08:06 PM   #10
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD, Raspbian, Arch
Posts: 2,331

Rep: Reputation: 357Reputation: 357Reputation: 357Reputation: 357
Verifying good backups and good CD burns is necessary, but so is a test restore IMHO. I think tar/Linux is quite secure in making good backups, but I can't say the same for the Windows world. I know - you're not talking about the Windows world. However, I have personally experienced, and known others who have too, a "good, verified" backup that couldn't be restored at a later date (in the Windows world). So that lesson learned has been carried over even after I switched to Linux. Important stuff always gets a test restore, and that restore is done on a different computer. Using different hardware. It doesn't matter if the CD burner that burned the backup can read its own CD later, if that particular CD drive goes bad on you. You need to make sure that other CD drives can read your burned disk reliably and error-free as well.

And as Tink said, don't trust the media. CD's can go bad. Burn two copies. If you really want to be paranoid, use two different branded media (high quality) and two different CD burners if you have them. Then next year, burn two more (but still hang on to the old ones). And on and on. Put one copy in you bank safe-deposit box, and mail the other one to a relative or friend in a different geographic location.
 
Old 09-09-2006, 08:12 PM   #11
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
don't forget to encrypt the files on the CD that you mail your relative or friend in a different geographic location (you'll need a separate set of test restores)...

Last edited by win32sux; 09-09-2006 at 08:45 PM.
 
Old 09-09-2006, 08:23 PM   #12
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
This is how I do encrypted backups. Actually I put tar backup files onto encrypted partitions on external USB drives, but I'm going to describe how to create an encrypted container file to hold a backup. Then the container file can be copied to some storage medium.

1) Create a container file the size of the backup medium. You only have to do this once.
Code:
dd if=/dev/urandom of=/var/backup/container.file bs=4.7G count=1
2) Mount the container file to a loop device using encryption.
Code:
losetup -e blowfish /dev/loop0 /var/backup/container.file
password: <enter your encryption key-password>
3) Create your file system through the loop device into the container file.
Code:
mkfs -t xfs /dev/loop0
4) Mount the loop device to some mount point.
Code:
mount -o sync /dev/loop0 /mnt/backup
5) Create your tar file in /mnt/backup. <See posts above.>
6) Unmount your container file.
Code:
sync
umount /mnt/backup
losetup -d /dev/loop0
7) Now use whatever method pleases you to copy the /var/backup/container.file to some medium.

I've heard that dm-crypt is better than crypto-loop. I just haven't started to use dm-crypt yet. I'm not sure that dm-crypt would apply to a container file anyway but I've heard that I should be using it for my USB disks.

Edited to change reiserfs to xfs in step 3. I've had a lot of trouble with reiserfs lately.

Last edited by stress_junkie; 10-03-2006 at 09:46 AM.
 
Old 09-09-2006, 10:02 PM   #13
edman007
Member
 
Registered: Sep 2003
Distribution: slackware-current
Posts: 173

Original Poster
Rep: Reputation: 30
i did read the man page, the verify option, like checksums does just that, verifies that the data is correct, but that is not what i want, i expect data on a CD to get corrupted over time, it sucks and i have had it happen many many times, but unfortunately verify will just tell me what i probably already know, that its bad

what i am asking for is something that can correct the error, just like RAID, knowing that a drive is broken is nice but its not much help if you still can't get the data, RAID 5 adds parity and thus goes beyond the "this drive is broke" and fixes it to get my files, instead of saying "its broke so you lose" like RAID 0 does

dvdisaster looks like what i want but i was kinda hoping for something that is installed on just about every distro so i don't have to go searching for the program to read it later
 
Old 10-10-2006, 02:00 PM   #14
mnauta
Member
 
Registered: Apr 2003
Posts: 152

Rep: Reputation: Disabled
Quote:
............

Create the backup. Let's use the following conditions. We are going to create a tar archive file on a disk partition mounted on /mnt/backup. The tar file will be named test.tar.
Code:
root> tar -c --verify -vf /mnt/backup/test.tar .
Okay so I just executed that and it worked. The tar file was created and it was verified. Now let's create an MD5 checksum for this known good file...........
.
Thanks for this excellent example. When I use the --verify it can't find the files because leading / is removed. How can I solve this?

Thanks
manuel
 
  


Reply

Tags
encryption



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to verify downloaded kernel integrity (with *.sign files?) lowpro2k3 Linux - General 7 06-28-2011 01:49 PM
a tough question 4 u, problem in extracting tar & tar.gz files p_garg Linux - General 5 11-08-2010 11:02 AM
Using (s)tar for back-up to hard disk creating very large files jlinkels Linux - Software 3 10-25-2005 08:55 PM
Anyone know where I could find out about Extremely High Data Consuming software? RHLinuxGUY General 1 07-17-2005 06:09 AM
Tar gives error when creating a tar file archive davidas Linux - Newbie 10 04-13-2004 12:35 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:15 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration