seeking HOWTO -- script mark files 'done' in a long list of files to process
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
seeking HOWTO -- script mark files 'done' in a long list of files to process
There are all sorts of ways to generate a list of files that you want to process within a script. Does anyone have an "elegant" way to mark-off each completed file?
My challenge occurs when the desired processing involves creation of a tar-archive or similar container. In those cases, the open-modify-close operations on the container result in a huge amount of overhead. In addition, there are often unwanted side effects with the resulting content of the container.
Using a loop:
Code:
# create list-of-files
# Get a filespec from the list
# process it
# mark-it done
works well for operations such as filtering photo image files or video, altering standard parameters in documents, bulk changes to source code, and so on.
NOTE -- In ancient times, MS-DOS had a command 'xcopy' that could mark files when the copy completed.
If you are working your way down a list there's no need to to 'mark it done'....
If you mean you may (for some odd reason) end up re-generating the list part way through or similar, I'd just create a 'done' dir and move each file into there immediately after you have finished with it. This is (part of) a classic technique for processing continuously incoming files.
I've had the same challenge and ended up renaming each file when it's been processed. I usually prefix the filename with "done-". The benefit of that is users can monitor the folder and see that files have or have not been processed. You can also have a file processed again by manually renaming and removing the "done-" prefix.
If you are working your way down a list there's no need to to 'mark it done'....
If you mean you may (for some odd reason) end up re-generating the list part way through or similar, I'd just create a 'done' dir and move each file into there immediately after you have finished with it. This is (part of) a classic technique for processing continuously incoming files.
I like this idea for one class of files that I'll be processing -- media cards (SD, CF, thumb, etc) -- but it would be trouble for a live file system.
That said, it might work to use a done folder and fill it with symlinks as the to-do list. Then I could remove the links as I process things leaving behind what reamains to-do.
nw: like I said its a classic soln in eg trading banks (trades come in as files initially).
Also, create a new dir every eg mth for a) ease of finding stuff, b) avoid hitting limit on num files/dir in long run.
If this is really a long term soln, you also need to archive off eventually, or you will run out of inodes possibly even before running out of disk space.
I'm sorry, I've used that term for decades, but then I'm a serious dinosaur.
In general, a 'filespec' is a file specification -- /path1/.../pathN/filename.type
If there is a network involved -- username@hostname:/path1/.../pathN/filename.type
nw: like I said its a classic soln in eg trading banks (trades come in as files initially).
Also, create a new dir every eg mth for a) ease of finding stuff, b) avoid hitting limit on num files/dir in long run.
If this is really a long term soln, you also need to archive off eventually, or you will run out of inodes possibly even before running out of disk space.
All good points that I'd likely not considered until things started failing.
To restate my original requirement, I need to make tar-balls from sets of files. These runs can take lots of wall-clock time. That means that there are lots of opportunities for the run to get interrupted by power or network troubles. It is okay to have tar-ball-1, tar-ball-2, ..., tar-ball-N of varied sizes. My primary concern is that I be able to (1) resume processing after an interruption, and (2) avoid processing input files repeatedly.
Y: so use a 'done' dir for ones that are complete immediately they are completed. This solves the restartability issue.
You may even (paranoia mode) touch a done file just after completing a tar but before mv'ing tar-ball to done dir.
This deals with the faint possibility of failure right at the last possible millisecond
Quote:
The paranoid programmer assumes the system is out to get them and acts accordingly
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.