LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-15-2010, 10:46 AM   #1
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,681
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Advanced file tree archiver needed


I need to archive some big file trees. The catch is that these trees do have a lot of socket file types, which tar cannot handle, and they also have files larger than 4GB (largest is 115GB) that cpio cannot handle. These also need to be streamed over the network while generated, so zip is not an option. Any ideas before I go off and design a new and better archiving format?
 
Old 12-15-2010, 11:59 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
dar is competent. What's special about streaming over a network?

EDIT:

IDK if dar is able to handle the specific requirements you mentioned but it might be worth a look.

Regards "streaming over a network" could that mean writing to a networked file system?

Last edited by catkin; 12-15-2010 at 12:04 PM.
 
Old 12-16-2010, 03:13 PM   #3
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,681

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by catkin View Post
dar is competent.
I'll have a look.

Quote:
Originally Posted by catkin View Post
What's special about streaming over a network?
It would prevent people from recommending zip.

Quote:
Originally Posted by catkin View Post
IDK if dar is able to handle the specific requirements you mentioned but it might be worth a look.
It appears that DAR only writes to files (with features like splitting the archive over many files which maybe could be on different disks). It has no way to say "output the archive to stdout".

I was not able to determine if it supports file sizes beyond 4GB and/or sockets. It's inability to write to stdout is a showstopper, so like a program that detects an error and exits immediately, I quit looking at it any further.

Quote:
Originally Posted by catkin View Post
Regards "streaming over a network" could that mean writing to a networked file system?
What it means is that the archiver generates streamed output and cannot go back to what it wrote earlier and change it. TAR and CPIO can do that. ZIP cannot because zips maintains a file catalog at the end of the file, and moves it as more files are added to the archive. It appears DAR is similar to ZIP in that it outputs only to disk files.

EDIT:

DAR is stated in Wikipedia's Comparison_of_archive_formats to be based on TAR. Thus, it may not support sockets.

Last edited by Skaperen; 12-16-2010 at 03:19 PM.
 
Old 12-16-2010, 09:34 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Thanks for further information.

dar can write to stdout. From the dar man page (my bolding):
Code:
-c, --create [<path>/]<basename>
    creates a backup with the name based on <basename>. All the slices will be created in the directory
    <path> if specified, else in the current directory. If the destination filesystem is too  small  to
    contain  all  the slices of the backup, the -p option (pausing before starting new slices) might be
    of interest. Else, in the case the filesystem is full, dar will suspend the operation,  asking  for
    the  user  to  make free space, then continue its operation. To make free space, the only thing you
    cannot do is to touch the slice being written. If the filename is "-" *and* no slice is  asked  for
    (no -s option) the archive is produced on the standard output allowing the user to send the result-
    ing archive through a pipe.
 
Old 12-17-2010, 09:33 AM   #5
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,681

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by catkin View Post
Thanks for further information.

dar can write to stdout. From the dar man page (my bolding):
Code:
-c, --create [<path>/]<basename>
    creates a backup with the name based on <basename>. All the slices will be created in the directory
    <path> if specified, else in the current directory. If the destination filesystem is too  small  to
    contain  all  the slices of the backup, the -p option (pausing before starting new slices) might be
    of interest. Else, in the case the filesystem is full, dar will suspend the operation,  asking  for
    the  user  to  make free space, then continue its operation. To make free space, the only thing you
    cannot do is to touch the slice being written. If the filename is "-" *and* no slice is  asked  for
    (no -s option) the archive is produced on the standard output allowing the user to send the result-
    ing archive through a pipe.
What it writes seems to be defective, or incomplete. For one thing, it wrote no file names I could see. I tried piping it into "dar -l -" and "dar -x -" (the latter being in /tmp) and it said the data was corrupt (and that message output itself had some garbled characters in it). Maybe the Ubuntu 10.10 amd64 packaging of it is broken?

Code:
( cd /some/test/data && dar -c - ) | ( cd /where/to/save && dar -x - )
 
Old 12-17-2010, 09:52 AM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by Skaperen View Post
What it writes seems to be defective, or incomplete. For one thing, it wrote no file names I could see. I tried piping it into "dar -l -" and "dar -x -" (the latter being in /tmp) and it said the data was corrupt (and that message output itself had some garbled characters in it). Maybe the Ubuntu 10.10 amd64 packaging of it is broken?

Code:
( cd /some/test/data && dar -c - ) | ( cd /where/to/save && dar -x - )
I have found dar to be very rigorously coded and of very high quality so it is unlikely that what "it writes seems to be defective, or incomplete". The man page does not say that the -x option path can be stdin so that is likely the problem. Denis Corbin, the developer, is very responsive in the dar mailing list so you could enquire there about whether dar can do what you want.
 
Old 12-17-2010, 10:25 AM   #7
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,681

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by catkin View Post
I have found dar to be very rigorously coded and of very high quality so it is unlikely that what "it writes seems to be defective, or incomplete". The man page does not say that the -x option path can be stdin so that is likely the problem. Denis Corbin, the developer, is very responsive in the dar mailing list so you could enquire there about whether dar can do what you want.
I no longer deal with mailing lists since mailing lists no longer deal with messages from non-subscribers. I do not subscribe to mailing lists that I am not a developer for. This is one of the reasons I'm using linuxquestions.org, ubuntuforums.org, fedoraforum.org, etc. I use linuxquestions.org as my place to get support for various software. I'm open to other suggestions which are web based. But I won't be doing mailing lists any further where it requires all that subscribing, etc. This is a lifestyle decision I have made for myself to keep my email manageable.

The commands "dar -l -" and "dar -x -" do seem to try to work. But they do give error messages:
Code:
lorentz/root /root 85# ( cd /home/tmp/test-data-to-send && dar -c - ) | ( cd /home/tmp/area-to-receive-data && dar -l - )
�FATAL error, aborting operation
Corrupted data read on pipe
lorentz/root /root 86# ( cd /home/tmp/test-data-to-send && dar -c - ) | ( cd /home/tmp/area-to-receive-data && dar -x - )
�FATAL error, aborting operation
Corrupted data read on pipe
lorentz/root /root 87# du -s /home/tmp/test-data-to-send
11520	/home/tmp/test-data-to-send
lorentz/root /root 88# ( cd /home/tmp/test-data-to-send && dar -c - ) | wc -c


 --------------------------------------------
 2986 inode(s) saved
 with 0 hard link(s) recorded
 0 inode(s) changed at the moment of the backup
 0 inode(s) not saved (no inode/file change)
 0 inode(s) failed to save (filesystem error)
 0 inode(s) ignored (excluded by filters)
 0 inode(s) recorded as deleted from reference backup
 --------------------------------------------
 Total number of inodes considered: 2986
 --------------------------------------------
 EA saved for 0 inode(s)
 --------------------------------------------
8084362
lorentz/root /root 89#
I looked further and found that it did output file names, but output them all at the end of the stream. This would not work to do a stream based transfer for the same reason zip (pkzip) would not work, because it would need to write the files to the filesystem on the target as they arrive on the stream. I'm already up to the level of doing this now with cpio at stream sizes of over 10TB (there is no way to queue the data to wait for the names). But now I'm also getting files over 4GB individually which cpio cannot handle (some docs say the limit is 8GB, but it reports file size failure for a 5GB file ... and I have some files over 16GB, anyway ... and sockets that tar cannot handle).

I'm already starting the process of designing a new portable and open file collection archive/stream format.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
No help needed. Not URGENT. Just a perfunctory post to enable advanced functionality. Ferloogy LinuxQuestions.org Member Intro 6 05-23-2010 05:27 AM
Any good software to produce file system tree and file size report? ginda Linux - General 1 06-23-2009 04:50 AM
Odd Samba Problem. - Possible Permissions Bug? Advanced Help Needed tbeehler Linux - Software 1 05-17-2007 03:12 PM
Utility Needed - list folder tree and files in text file Optiker Linux - Software 21 11-17-2006 02:46 PM
HP ps/2 optical mouse , advanced help needed jellyini Ubuntu 7 10-23-2006 04:05 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration