LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 09-21-2006, 10:27 AM   #1
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Rep: Reputation: 15
Creating tarball of a large, active, directory?


Greetings!

I have a directory that has over fifteen thousand small files in it. Tens or hundreds of files are created in this directory every second. I am attempting to create a tarball of this directory and am encountering an issue when doing so. Using the following command, I am able to successfully create a tarball of the directory when static:

find /really/big/directory -name 'files.*' -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz

When running this command against the directory when there are files actively being created in it, I only seem to be able to grab the last several hundred files, or so, that have been created (obvious when looking at file creation time and date when looking at the files in the extracted archive). When watching the tarballs size as it's being created (using "watch --interval=1 ls -al" in the archive directory), I see the archive file repeatedly grow and shrink, sometimes even zeroing out.

I'm sure this has something with the way that xargs is interpreting finds output given the files that are being created constantly, but I can't put my finger on the exact issue here, or how to fix it. If anyone has a suggestion or a resolution I would love to hear it!

Thanks in advance for your assistance.
 
Old 09-21-2006, 10:37 AM   #2
trickykid
Guru
 
Registered: Jan 2001
Posts: 24,133

Rep: Reputation: 198Reputation: 198
Create your script or command to look for files older than a certain time so it's not grabbing any files still getting written to or possibly write a script to use lsof on the directory and exclude such files....

man find
man lsof
 
Old 09-21-2006, 12:16 PM   #3
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,499

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Quote:
Originally Posted by abegetchell
I'm sure this has something with the way that xargs is interpreting finds output given the files that are being created constantly, but I can't put my finger on the exact issue here, or how to fix it. If anyone has a suggestion or a resolution I would love to hear it!
The right way to do that is to use filesystem snapshots.

I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that.
 
Old 09-21-2006, 12:28 PM   #4
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by trickykid
Create your script or command to look for files older than a certain time so it's not grabbing any files still getting written to or possibly write a script to use lsof on the directory and exclude such files....

man find
man lsof
I think I may have it:

find /really/big/directory -name 'files.*' -mmin +1 -mmin -90 -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz

This command should find all files that were created between one and ninety minutes ago. While there should be no files that were created more than sixty minutes ago in this specific directory (they're archived hourly), I made it ninety minutes for a margin of safety.

I'll post after the next hourly job if this works. Thanks for the idea!
 
Old 09-21-2006, 12:31 PM   #5
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by jlliagre
The right way to do that is to use filesystem snapshots.

I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that.
Unfortunately I do not have that capability on this system.
 
Old 09-21-2006, 01:29 PM   #6
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by abegetchell
I think I may have it:

find /really/big/directory -name 'files.*' -mmin +1 -mmin -90 -print0 | xargs -0 tar -czf /really/big/directory/archive/archive.tgz

This command should find all files that were created between one and ninety minutes ago. While there should be no files that were created more than sixty minutes ago in this specific directory (they're archived hourly), I made it ninety minutes for a margin of safety.

I'll post after the next hourly job if this works. Thanks for the idea!
The above did not work. The results were the same as in the initial post - the last few minutes of files were added to the tarball.

Last edited by abegetchell; 09-21-2006 at 01:53 PM.
 
Old 09-21-2006, 02:16 PM   #7
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Not quite sure why you require xargs, here. Can't you put find in backticks as the final argument to tar? Doesn't your way create a new tarball iteratively, thus explaining why it's size varies up and down as things procede? Just speculating here, because I've never used xargs before, and I only think I know what it says in the man page.

--- rod.
 
Old 09-21-2006, 02:26 PM   #8
puffinman
Member
 
Registered: Jan 2005
Location: Atlanta, GA
Distribution: Gentoo, Slackware
Posts: 217

Rep: Reputation: 30
find is still finding things and continuously piping them to tar as it finds them, which seems a little unnecessary. Perhaps pipe the find output to a file and wait until it's all done, then give the list to tar?
 
Old 09-21-2006, 02:48 PM   #9
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by theNbomr
Not quite sure why you require xargs, here. Can't you put find in backticks as the final argument to tar? Doesn't your way create a new tarball iteratively, thus explaining why it's size varies up and down as things procede? Just speculating here, because I've never used xargs before, and I only think I know what it says in the man page.

--- rod.
Well, xargs is required to get around the "argument list too long" issue. A great description of that problem, and an example of why and how I'm using xargs, can be found here:

http://www.gnu.org/software/coreutil...-list-too-long

I can't put find in backticks as the finally argument because of the above issue.

Running the command:

tar -cvf out.tar `find . -name 'file.*'`

Produces the output:

-bash: /bin/tar: Argument list too long

This is why xargs is required.

Last edited by abegetchell; 09-21-2006 at 02:52 PM.
 
Old 09-21-2006, 02:55 PM   #10
puffinman
Member
 
Registered: Jan 2005
Location: Atlanta, GA
Distribution: Gentoo, Slackware
Posts: 217

Rep: Reputation: 30
As per my previous post:
Code:
find /really/big/directory -name 'files.*' > /tmp/l33tfilez.txt; tar -czf /really/big/directory/archive/archive.tgz --files-from /tmp/l33tfilez.txt; rm /tmp/l33tfilez.txt
Look ma, no xargs!
 
Old 09-21-2006, 03:27 PM   #11
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by puffinman
find is still finding things and continuously piping them to tar as it finds them, which seems a little unnecessary. Perhaps pipe the find output to a file and wait until it's all done, then give the list to tar?
Yah, but...

Doesn't xargs invoke tar mulitple times, and on each iteration, tar creates a new tarball, replacing any pre-existing one? The term continuous, here, seems to stretch the meaning, to me. The solution you point out later looks like the definitive solution.

Perhaps if the original xargs method used tar with the '-A' (append) option, rather than '-c' (create), the xargs solution would work.

--- rod.
 
Old 09-21-2006, 03:35 PM   #12
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD
Posts: 2,004

Rep: Reputation: 304Reputation: 304Reputation: 304Reputation: 304
Quote:
Originally Posted by jlliagre
The right way to do that is to use filesystem snapshots.

I don't know exactly which Linux file systems have that feature reliably available though, but Solaris ufs and zfs are doing that.
If the OP is using LVM, then it supports snapshots (LVM2). Otherwise, unionfs can be installed and used on top of whatever underlying filesystem is there to create your snapshots.
 
Old 09-21-2006, 03:36 PM   #13
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by theNbomr
Yah, but...

Doesn't xargs invoke tar mulitple times, and on each iteration, tar creates a new tarball, replacing any pre-existing one? The term continuous, here, seems to stretch the meaning, to me. The solution you point out later looks like the definitive solution.

Perhaps if the original xargs method used tar with the '-A' (append) option, rather than '-c' (create), the xargs solution would work.

--- rod.
I tried the -A method, but given that this is a new tarball, that method wouldn't work. I suppose I could "pre-create" a tarball and then add files too it, but I am first going to try the method that puffinman suggests above. Getting ready to implement it now.
 
Old 09-21-2006, 03:39 PM   #14
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by haertig
If the OP is using LVM, then it supports snapshots (LVM2). Otherwise, unionfs can be installed and used on top of whatever underlying filesystem is there to create your snapshots.
LVM? LVM?! We ain't got no stinkin' LVM!

I can't mess around with this system too much in regards to major system changes, as it is a very <i>very</i> busy production system. I haven't researched unionfs at all, but I imagine implementing it is not a trivial task.
 
Old 09-21-2006, 04:06 PM   #15
abegetchell
Member
 
Registered: Mar 2006
Distribution: RHEL, Fedora, Ubuntu
Posts: 32

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by puffinman
As per my previous post:
Code:
find /really/big/directory -name 'files.*' > /tmp/l33tfilez.txt; tar -czf /really/big/directory/archive/archive.tgz --files-from /tmp/l33tfilez.txt; rm /tmp/l33tfilez.txt
Look ma, no xargs!
Look ma, no xargs indeed! Worked like a charm. 17,501 files tarred and feathered.

Thanks for your help!
 
  


Reply

Tags
find, tar


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating large size file for testing kushalkoolwal Programming 5 03-31-2010 07:56 AM
Fedora Directory Server sync Active Directory paul_mat Linux - Networking 8 03-08-2007 10:51 AM
LDAP and Active Directory Ecalvam Linux - Networking 5 11-10-2005 08:53 AM
Using (s)tar for back-up to hard disk creating very large files jlinkels Linux - Software 3 10-25-2005 08:55 PM
Creating a date-based filename for a tarball? HomeBrewer Linux - Newbie 4 12-20-2003 02:16 PM


All times are GMT -5. The time now is 05:26 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration