Creating tarball of a large, active, directory?
Greetings!
I have a directory that has over fifteen thousand small files in it. Tens or hundreds of files are created in this directory every second. I am attempting to create a tarball of this directory and am encountering an issue when doing so. Using the following command, I am able to successfully create a tarball of the directory when static: find /really/big/directory -name 'files.*' -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz When running this command against the directory when there are files actively being created in it, I only seem to be able to grab the last several hundred files, or so, that have been created (obvious when looking at file creation time and date when looking at the files in the extracted archive). When watching the tarballs size as it's being created (using "watch --interval=1 ls -al" in the archive directory), I see the archive file repeatedly grow and shrink, sometimes even zeroing out. I'm sure this has something with the way that xargs is interpreting finds output given the files that are being created constantly, but I can't put my finger on the exact issue here, or how to fix it. If anyone has a suggestion or a resolution I would love to hear it! Thanks in advance for your assistance. |
Create your script or command to look for files older than a certain time so it's not grabbing any files still getting written to or possibly write a script to use lsof on the directory and exclude such files....
man find man lsof |
Quote:
I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that. |
Quote:
find /really/big/directory -name 'files.*' -mmin +1 -mmin -90 -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz This command should find all files that were created between one and ninety minutes ago. While there should be no files that were created more than sixty minutes ago in this specific directory (they're archived hourly), I made it ninety minutes for a margin of safety. I'll post after the next hourly job if this works. Thanks for the idea! |
Quote:
|
Quote:
|
Not quite sure why you require xargs, here. Can't you put find in backticks as the final argument to tar? Doesn't your way create a new tarball iteratively, thus explaining why it's size varies up and down as things procede? Just speculating here, because I've never used xargs before, and I only think I know what it says in the man page.
--- rod. |
find is still finding things and continuously piping them to tar as it finds them, which seems a little unnecessary. Perhaps pipe the find output to a file and wait until it's all done, then give the list to tar?
|
Quote:
http://www.gnu.org/software/coreutil...-list-too-long I can't put find in backticks as the finally argument because of the above issue. Running the command: tar -cvf out.tar `find . -name 'file.*'` Produces the output: -bash: /bin/tar: Argument list too long This is why xargs is required. |
As per my previous post:
Code:
find /really/big/directory -name 'files.*' > /tmp/l33tfilez.txt; tar -czf /really/big/directory/archive/archive.tgz --files-from /tmp/l33tfilez.txt; rm /tmp/l33tfilez.txt |
Quote:
Doesn't xargs invoke tar mulitple times, and on each iteration, tar creates a new tarball, replacing any pre-existing one? The term continuous, here, seems to stretch the meaning, to me. The solution you point out later looks like the definitive solution. Perhaps if the original xargs method used tar with the '-A' (append) option, rather than '-c' (create), the xargs solution would work. --- rod. |
Quote:
|
Quote:
|
Quote:
I can't mess around with this system too much in regards to major system changes, as it is a very <i>very</i> busy production system. I haven't researched unionfs at all, but I imagine implementing it is not a trivial task. |
Quote:
Thanks for your help! |
All times are GMT -5. The time now is 10:13 AM. |