LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 10-27-2013, 01:56 PM   #16
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 1,856

Rep: Reputation: 873Reputation: 873Reputation: 873Reputation: 873Reputation: 873Reputation: 873Reputation: 873

You should not use gzip compression around a tar archive for important backups. Compressing tar archives that are unimportant (or you that you have multiple copies of) is not a problem but doing so with a critical tar backup is a bad idea.

If a gzipped archive has a single bit corrupted near its start, the tar file stored within is effectively lost. This is because common compression algorithms depend on the coherency over a long sections of a file to achieve their results. If the file cannot be decompressed none of the archive's contents can be extracted. Read this to see exactly what is involved in attempting to fix a corrupted gzip file.

If you must save space, then compressing files individually within the archive helps because you it means that only some of the files will be ruined if you should have any corruption. Unfortunately tar does not allow you to easily do internal compression but there are other *nix archive formats that do including afio, dar or xar.

You might also find this interesting for more thoughts on this subject.
 
Old 10-29-2013, 01:39 PM   #17
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Original Poster
Rep: Reputation: 50
Quote:
Originally Posted by syg00
Have you thought about trying NTFS ? - the support is much more mature on Linux.
I don't recall how the partition got formatted as exFat -- it's complicated because I have a windows partition and an ubuntu partition on this machine's primary drive. It doubles as an audio workstation on weekends


Quote:
Originally Posted by jefro
I agree that I'd have been cautious on using exfat. Not tested enough.

My thinking is that memory (ram) may be bad also.

Always, always, always test backups before you need them.
I agree it sounds like there may be a problem with exFat.
Machine runs for weeks at a time so I think I disagree about bad memory
YES backups are important. NO shortcuts next time.

Quote:
Originally Posted by suicidaleggroll
If your drive had plenty of space, then why did you make a tarball anyway? I only make tarballs when I need to transfer the files elsewhere (eg: email/ftp), or when I absolutely need the compression. You clearly didn't need the compression, and you weren't transferring the file anywhere, so there was no need. I would be quite weary of a single 62 GB file regardless of the filesystem, but ESPECIALLY on FAT.
It's been my experience that transferring 200,000 tiny individual files takes A LOT longer than just compressing or extracting one monolithic archive. Plus I think I may actually have needed the space. I have room for 62GB, but may not 200GB.

Quote:
Originally Posted by jpollard
Your described problem puts double load on a single filesystem and device when it works - which slows things down. That would make me lean toward a possible memory problem or disk, but nothing definite. Ext4 has had reliable (not perfect) service for several years without issues. I don't use the Fat/exfat/NTFS filesystems as they have too many issues with fragmentation, and don't support security properly.
Thanks for your thoughtful and very informative post. I reckon I should run some memtest or something. Not sure which I like less: bad memory or the thought that my drivers might be buggy.

Quote:
Originally Posted by rknichols
Does that run to completion with just a complaint that "nadanada" was not found in the archive? Can you successfully extract just a portion of your home directory, perhaps one or two subdirectories, to the ext4 filesystem?
I did try some -t listing of contents and also extracting subdirectories. I managed to extract one subdir to the ext4 by naming it specifically. Just to be clear, the original extraction worked for the most part and would extract a large number of directories but would hang at some point during the process, causing most of its contents to not be extracted. I am not certain but I believe the file that caused problems was a large TGZ archive inside this archive. I sort of gave up on trying to list the contents because it would take so long -- the file is quite large.

Quote:
Originally Posted by ruario
You should not use gzip compression around a tar archive for important backups. Compressing tar archives that are unimportant (or you that you have multiple copies of) is not a problem but doing so with a critical tar backup is a bad idea.
In retrospect, I agree with you. Thanks for the additional links!
 
Old 10-29-2013, 03:10 PM   #18
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 271

Rep: Reputation: 54
First get what you can.
1. Extract those directories that you can.
2. Create a test area, using a copy of the tar zip and a copy of your home directory.
When you get something extracted and are sure it is good, then move it somewhere safer.
3. Try alternative gzip programs, winzip, 7zip, to try to get a tar without the zip.
4. Try making copies of the tar zip and then diff back to the original. I expect this will not detect any differences.
5. It is possible to have marginal memory that only fails when exposed to certain use patterns.
A sure test is to move the hard drive to another machine and try the extraction.
Another way to is to lie about how much memory your machine has, to restrict it use less. This changes the use pattern and just might affect results.
6. A bad write to disk should have have a checksum error on the sector. A corruption before the write would not.
7. I have had problems with linux automatically doing a LF conversion on some binary files when moving them to a FAT file system. I have the corrupted files and have verified with bpe that bytes that matched the LF code in the file were changed to CRLF codes. It does not do this to zip files, but must guess on other files. These files had some text headers and the rest of the format was binary, so it corrupted them. With binary data this probably is not completely reversible. Your problem could happen at the odd CRLF that was in the original gzip stream.
7b. To test this must grab the head of the tar wad, open it with a binary editor such as bpe (which may not be able to handle such a big thing). Look for any text strings and see if they have LF codes or CRLF codes. Consider several approaches to ensure that linux is not re-converting, such as copying some raw sectors of the file instead of using normal tools (which could invoke automatic re-conversion to LF).
7c. If the above case if found then need to dd the entire file to ext4, and get a converted version of the file, diff them, and then hand examine each place the LF conversion was done.
There is likely one extra CRLF to LF conversion that should not have been converted.

Last edited by selfprogrammed; 10-29-2013 at 03:22 PM.
 
Old 10-29-2013, 03:18 PM   #19
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Original Poster
Rep: Reputation: 50
Just to be clear, I believe that I have successfully extracted the entire archive without any errors. To reiterate:
1) I compressed my home directory (/home/sneakyimp) on an ext4 partition to an exFat partition (/media/DATA) like so:
Code:
tar -cvzf /media/DATA/bak.tgz /home/sneakyimp
2) I wiped the ext4 partition and installed a new version of Ubuntu (v. 12.04.3) on it.

3) I tried to extract the archive I created before from its location on the exFat partition onto the recently reformatted ext4 partition but it kept choking.

4) I coped the entire .tgz archive from the exFat partition onto the ext4 partition and tried again to extract it:
Code:
cp /media/DATA/bak.tgz /home/sneakyimp/bak
cd /home/sneakyimp/bak
tar -xvzf bak.tgz
It would extract many files but would eventually hang with this error:
Code:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
5) I tried extracting the archive from the exFat drive to the exFat drive and everything worked:
Code:
cd /media/DATA
tar -xvzf bak.tgz
Weird, right?
 
Old 10-29-2013, 03:37 PM   #20
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 271

Rep: Reputation: 54
It wasn't clear if that had partially worked, or if you had got everything.
I assume you have not found the bug that causes direct tar extraction to fail.

Note 7.
An extraction of exFAT to exFAT may avoid automatic conversion as described.
I do not know if tar could be affected this way, but it seems important to verify, and other possibilities do not match the facts as well.


If done then please mark this SOLVED.

I always make my tar archives a max of 500 MB to avoid large scale disasters.
This can be automated with a script file that invokes tar for each item on a list of directories.
It also makes it much easier to deal with files of that size. Can always use directories to organize all the tar files from a particular backup data.
A serious backup puts the tar files on CDROM, so in that case they are more size limited.

When upgrading, it may be safer to just rename your home directory to something the upgrade cannot find. Then can restore my moving directories. Unless you are recreating a new filesystem, such odd directories should be safe. I hate to have only one copy of data to restore from.

I actually have dual Linux partitions, where an upgrade only affects one partition at a time and the other holds another copy of all saved directories. After the upgrade the second partition holds general overflow data and a backup boot option.

Last edited by selfprogrammed; 10-29-2013 at 03:44 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Corrupted *.tar.gz file throwing error. theAdmiral Debian 4 12-08-2012 10:35 AM
.tar.gz file and possibly corrupted rjo98 Linux - Newbie 10 02-20-2011 04:17 PM
mount disk - corrupted file EXT3 file system RAFAL Linux - General 3 04-08-2009 03:27 PM
gave wrong syntax for tar as tar -cvzf file file.tgz how to recover the file gautham Linux - General 4 04-13-2005 03:15 AM
Large tar file taking huge disk space in ext3 file system pcwulf Linux - General 2 10-20-2003 07:45 AM


All times are GMT -5. The time now is 07:36 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration