LinuxQuestions.org - Question about creating split archives

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - Question about creating split archives (https://www.linuxquestions.org/questions/linux-general-1/question-about-creating-split-archives-4175413922/)

raevin

06-28-2012 07:57 PM

Question about creating split archives

The basic idea of what I want to do is this:

(Hopefully) use tar to create an archive of size x. If the size of the actual file (y) is bigger than x, then split the archive into parts.

The tricky part of this, though, is being able to extract files from an archive without having to have every part there. So basically if I make a back up of a folder that has 3 files, "bob.avi" (2 MB), "alice.mpeg" (5 MB) and "tony.avi" (50 MB), and the archive was split into parts, I could still at least run tar -tf on archive1.tar and see if a file is there.

I know you can split archives in tar by using -M & -L (i.e.: tar -M -L 102400 -cf archive.tar videos/*). But if I run tar -tf on any of the generated archives, it says unexpected EOF. Which, I know why it does, but I'm not just not sure if there's a way around this.

Secondly, I know I could write a script and stat each file it's trying to archive, but stat calls can be intensive on huge backups and isn't very clean in a way.

business_kid

06-29-2012 03:32 AM

Try

Tar command |split =size | mkisofs stuff |cdrecord stuff
or whatever your choice is.

Home Movies? BTW I've seen people put away terabytes of home movies, but I've never seen them open the archives again and spend days/weeks looking at them. My son in law sent a gigabyte of useless footage on his new daughter :-/. Am I going to watch it a second time?

A better approach might be to sort things by date, shrink file size (mpeg instead of avi) and categorise them. "Victor's wedding 2001 where the cake blew up"

raevin

06-29-2012 08:51 AM

Quote:

Originally Posted by business_kid (Post 4714809)

Actually this is for system backups, not movies, I was just using that as an example.

I don't want to use split as it's no different than using tar with the -M & -L switches, you need the entire archive to get a file from one of the parts. This is what I'm trying to avoid.

business_kid

06-29-2012 02:40 PM

To quote the Kerryman when asked for directions by a tourist:

"If I was you, I wouldn't start from here at all!"

What size is the archive you're creating? Why must it be one archive, and not many?
Once you put number out, solutions will suggest themselves.

raevin

06-29-2012 02:49 PM

Quote:

Originally Posted by business_kid (Post 4715246)

Size would ideally be 10GB max, since disk space on the provided hardware isn't much (~50 GB VPSes). Archives because it's already implemented that way in the backup script. I could do it otherwise but it would be a very big hassle (worse than this I'm sure) to do so, but it is hard to explain unless you've used ObjectStorage (from Soft Layer) before, as I'm using that for the storage.

business_kid

06-29-2012 05:23 PM

10GB is one hell of an archive.
The only guy I knew handling that sort of data used no archives and complete hard disks. It was cheapest. He had boxes of 500MB ide and 1TB sata (Probably 3 or 4 TB now), and they were his archives. Each one was backed up on another disk, needless to say. His output was disks full of tiff files(for integrity) and jpegs (for everyday use).

He had the knack of hot swapping drives (in windows), a trick I never was tempted to imitate, although I did do it once to prove I had the technique correct. You can probably pull it in linux too, if you mount -noatime or unmount.

raevin

06-29-2012 05:26 PM

Quote:

Originally Posted by business_kid (Post 4715360)

Well I was using 10GB as an example. I really just want to know if there a solution to this, and if so, what it is.

business_kid

06-30-2012 03:08 AM

There is, but it's not an archive. Any archive format I am aware of has a part without which the whole archive is knackered. There is zipfix and equivalents, but that's messing

The solution is called copying. Archiving is also cpu intensive. look at the output of

time bzip2 /pat/to/10G

raevin

06-30-2012 08:40 AM

Quote:

Originally Posted by business_kid (Post 4715555)

While it might be intensive, the thing is if it's done during downtime then it wouldn't be as much (if at all) a problem.

All times are GMT -5. The time now is 03:22 AM.