[SOLVED] seeking safe way to run long (clock time) 'tar' command
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
seeking safe way to run long (clock time) 'tar' command
How can I run a long elapsed time tar operation so that it can "resume" after an interruption without doing a complete restart of the tar command?
NOTE -- Please focus on suggestions that rely on a tar-archive as the result. I know there are other options, but tar-archives are an integral part of our operation.
I know that there is tar --append that will let me put more files at the end of an existing archive. Also, there is tar --concatenate that will put one or more exiting source tar-archive on the end of a target tar-archive.
What I'm really having trouble with is how to know what I've already processed and which files I have yet to process.
When I connect "staff" or "family" laptop to my home-office network, my file server grabs a tar-archive snapshot of that laptop's files. These often have long run times for a variety of reasons -- exclusions, quantity and size of new or changed files, number of snapshots running at the same time, etc. A different group of reasons result in one or more tar operations getting interrupted before they are completed.
I hate to use a win-dose example, but we used to be able to do something like this:
set the ARCHIVE bit on a group of files
xcopy /M {source} {destination} where {source} has the bit set
during the xcopy /M, each successful action cleared the ARCHIVE bit.
if the xcopy /M was interrupted or failed, a simple command repeat would only process files that still have the bit set
{giggle} one frequent use of this was to copy files onto diskette or cartridge media that were often quite small. You would xcopy /M repeatedly. It would fail when your {destination} media was full, then you'd repeat until you had a pile of media holding all of your {source} files.{giggle -still}
I get the feeling that you may be able to pipe some transport that has an ability to resume to tar.
Might look at rsync, scp,wget, maybe even some ftp or even http.
Haven't done that myself.
In the end you don't really care how the files got there, you simply are saying that you want a tar or tar compressed file eventually correct?
I get the feeling that you may be able to pipe some transport that has an ability to resume to tar.
Might look at rsync, scp,wget, maybe even some ftp or even http.
Haven't done that myself.
In the end you don't really care how the files got there, you simply are saying that you want a tar or tar compressed file eventually correct?
You are correct with one additional detail. I don't want (can't use) an implementation that relies on creation of one large archive that gets split after the fact. I know that I can (*) tar this folder, (*) tar that folder, (*) ..., (*) tar last folder to get a bunch of chunks. I would be able to restart the chunk that failed and continue with the remaining chunks. Such an implementation requires that each folder have similar content profiles. That is not my situation.
I'm hoping to find some technique that will checkpoint the operation in progress and enable me to resume if the operation gets interrupted or fails.
It's the interrupted creation of a compressed archive that is causing me grief.
I can tell tar to make chunks by declaring a tape size with tar --multi-volume --tape-length NNN ....
I can use all sorts of commands to feed lists of files to the tar command.
Here are the troubles:
For any given list of files, how do you know what has already been processed before the interruption?
(The old ms-dos command, xcopy, used file attributes to "mark files I've worked".)
If tar gets interrupted, you are more likely left with a corrupt archive and must start over.
There is not enough space to make a massive tar-ball and then split the large file.
ASIDE
A previous post mentioned one ancient situation when this sort of processing was important.
I'm sure there are other, more current, situations when this sort of data collection chunking applies.
If readers know of chunking applications, please add your tuppence.
1. Checkpointing: basically, it checksums the src and target lists and figures out which files have changed since the last run (this includes identifying new files which are deemed to have changed 100%)
2. Deltas: by default it only transmits differences, so for previously known files, it only sends the differences, which reduces bandwidth and speeds up transmission time. Obviously new files are sent completely.
3. Security: you can tell rsync to use ssh as the transport protocol
4. If you really want tar files (pref tar.gz to save size) at the end of the process, you can do that before or after transmission.
By the sounds of the limits on the sending system, I'd do it afterwards on the target end.
It has a fair number of options; some people even use it for local copying
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.