LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Creating tarball of a large, active, directory? (https://www.linuxquestions.org/questions/linux-general-1/creating-tarball-of-a-large-active-directory-485658/)

abegetchell 09-21-2006 10:27 AM

Creating tarball of a large, active, directory?
 
Greetings!

I have a directory that has over fifteen thousand small files in it. Tens or hundreds of files are created in this directory every second. I am attempting to create a tarball of this directory and am encountering an issue when doing so. Using the following command, I am able to successfully create a tarball of the directory when static:

find /really/big/directory -name 'files.*' -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz

When running this command against the directory when there are files actively being created in it, I only seem to be able to grab the last several hundred files, or so, that have been created (obvious when looking at file creation time and date when looking at the files in the extracted archive). When watching the tarballs size as it's being created (using "watch --interval=1 ls -al" in the archive directory), I see the archive file repeatedly grow and shrink, sometimes even zeroing out.

I'm sure this has something with the way that xargs is interpreting finds output given the files that are being created constantly, but I can't put my finger on the exact issue here, or how to fix it. If anyone has a suggestion or a resolution I would love to hear it!

Thanks in advance for your assistance.

trickykid 09-21-2006 10:37 AM

Create your script or command to look for files older than a certain time so it's not grabbing any files still getting written to or possibly write a script to use lsof on the directory and exclude such files....

man find
man lsof

jlliagre 09-21-2006 12:16 PM

Quote:

Originally Posted by abegetchell
I'm sure this has something with the way that xargs is interpreting finds output given the files that are being created constantly, but I can't put my finger on the exact issue here, or how to fix it. If anyone has a suggestion or a resolution I would love to hear it!

The right way to do that is to use filesystem snapshots.

I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that.

abegetchell 09-21-2006 12:28 PM

Quote:

Originally Posted by trickykid
Create your script or command to look for files older than a certain time so it's not grabbing any files still getting written to or possibly write a script to use lsof on the directory and exclude such files....

man find
man lsof

I think I may have it:

find /really/big/directory -name 'files.*' -mmin +1 -mmin -90 -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz

This command should find all files that were created between one and ninety minutes ago. While there should be no files that were created more than sixty minutes ago in this specific directory (they're archived hourly), I made it ninety minutes for a margin of safety.

I'll post after the next hourly job if this works. Thanks for the idea!

abegetchell 09-21-2006 12:31 PM

Quote:

Originally Posted by jlliagre
The right way to do that is to use filesystem snapshots.

I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that.

Unfortunately I do not have that capability on this system.

abegetchell 09-21-2006 01:29 PM

Quote:

Originally Posted by abegetchell
I think I may have it:

find /really/big/directory -name 'files.*' -mmin +1 -mmin -90 -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz

This command should find all files that were created between one and ninety minutes ago. While there should be no files that were created more than sixty minutes ago in this specific directory (they're archived hourly), I made it ninety minutes for a margin of safety.

I'll post after the next hourly job if this works. Thanks for the idea!

The above did not work. The results were the same as in the initial post - the last few minutes of files were added to the tarball.

theNbomr 09-21-2006 02:16 PM

Not quite sure why you require xargs, here. Can't you put find in backticks as the final argument to tar? Doesn't your way create a new tarball iteratively, thus explaining why it's size varies up and down as things procede? Just speculating here, because I've never used xargs before, and I only think I know what it says in the man page.

--- rod.

puffinman 09-21-2006 02:26 PM

find is still finding things and continuously piping them to tar as it finds them, which seems a little unnecessary. Perhaps pipe the find output to a file and wait until it's all done, then give the list to tar?

abegetchell 09-21-2006 02:48 PM

Quote:

Originally Posted by theNbomr
Not quite sure why you require xargs, here. Can't you put find in backticks as the final argument to tar? Doesn't your way create a new tarball iteratively, thus explaining why it's size varies up and down as things procede? Just speculating here, because I've never used xargs before, and I only think I know what it says in the man page.

--- rod.

Well, xargs is required to get around the "argument list too long" issue. A great description of that problem, and an example of why and how I'm using xargs, can be found here:

http://www.gnu.org/software/coreutil...-list-too-long

I can't put find in backticks as the finally argument because of the above issue.

Running the command:

tar -cvf out.tar `find . -name 'file.*'`

Produces the output:

-bash: /bin/tar: Argument list too long

This is why xargs is required.

puffinman 09-21-2006 02:55 PM

As per my previous post:
Code:

find /really/big/directory -name 'files.*' > /tmp/l33tfilez.txt; tar -czf /really/big/directory/archive/archive.tgz --files-from /tmp/l33tfilez.txt; rm /tmp/l33tfilez.txt
Look ma, no xargs!

theNbomr 09-21-2006 03:27 PM

Quote:

Originally Posted by puffinman
find is still finding things and continuously piping them to tar as it finds them, which seems a little unnecessary. Perhaps pipe the find output to a file and wait until it's all done, then give the list to tar?

Yah, but...

Doesn't xargs invoke tar mulitple times, and on each iteration, tar creates a new tarball, replacing any pre-existing one? The term continuous, here, seems to stretch the meaning, to me. The solution you point out later looks like the definitive solution.

Perhaps if the original xargs method used tar with the '-A' (append) option, rather than '-c' (create), the xargs solution would work.

--- rod.

haertig 09-21-2006 03:35 PM

Quote:

Originally Posted by jlliagre
The right way to do that is to use filesystem snapshots.

I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that.

If the OP is using LVM, then it supports snapshots (LVM2). Otherwise, unionfs can be installed and used on top of whatever underlying filesystem is there to create your snapshots.

abegetchell 09-21-2006 03:36 PM

Quote:

Originally Posted by theNbomr
Yah, but...

Doesn't xargs invoke tar mulitple times, and on each iteration, tar creates a new tarball, replacing any pre-existing one? The term continuous, here, seems to stretch the meaning, to me. The solution you point out later looks like the definitive solution.

Perhaps if the original xargs method used tar with the '-A' (append) option, rather than '-c' (create), the xargs solution would work.

--- rod.

I tried the -A method, but given that this is a new tarball, that method wouldn't work. I suppose I could "pre-create" a tarball and then add files too it, but I am first going to try the method that puffinman suggests above. Getting ready to implement it now.

abegetchell 09-21-2006 03:39 PM

Quote:

Originally Posted by haertig
If the OP is using LVM, then it supports snapshots (LVM2). Otherwise, unionfs can be installed and used on top of whatever underlying filesystem is there to create your snapshots.

LVM? LVM?! We ain't got no stinkin' LVM!

I can't mess around with this system too much in regards to major system changes, as it is a very <i>very</i> busy production system. I haven't researched unionfs at all, but I imagine implementing it is not a trivial task.

abegetchell 09-21-2006 04:06 PM

Quote:

Originally Posted by puffinman
As per my previous post:
Code:

find /really/big/directory -name 'files.*' > /tmp/l33tfilez.txt; tar -czf /really/big/directory/archive/archive.tgz --files-from /tmp/l33tfilez.txt; rm /tmp/l33tfilez.txt
Look ma, no xargs!

Look ma, no xargs indeed! Worked like a charm. 17,501 files tarred and feathered.

Thanks for your help!


All times are GMT -5. The time now is 10:13 AM.