LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 07-21-2016, 11:31 AM   #1
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
seeking HOWTO -- script mark files 'done' in a long list of files to process


There are all sorts of ways to generate a list of files that you want to process within a script. Does anyone have an "elegant" way to mark-off each completed file?

My challenge occurs when the desired processing involves creation of a tar-archive or similar container. In those cases, the open-modify-close operations on the container result in a huge amount of overhead. In addition, there are often unwanted side effects with the resulting content of the container.

Using a loop:
Code:
    # create list-of-files
    # Get a filespec from the list
    # process it
    # mark-it done
works well for operations such as filtering photo image files or video, altering standard parameters in documents, bulk changes to source code, and so on.

NOTE -- In ancient times, MS-DOS had a command 'xcopy' that could mark files when the copy completed.

Merci d'avance,
~~~ 0;-Dan
 
Old 07-22-2016, 05:09 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,849

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
looks like you need some kind of make tool (like make itself)
 
1 members found this post helpful.
Old 07-22-2016, 07:39 AM   #3
Habitual
LQ Veteran
 
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
Quote:
Originally Posted by SaintDanBert View Post
MS-DOS had a command 'xcopy' that could mark files when the copy completed.
Smacks of rsync.

What the heck does "mark-it done" mean?

Last edited by Habitual; 07-22-2016 at 07:47 AM.
 
1 members found this post helpful.
Old 08-03-2016, 02:53 AM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
If you are working your way down a list there's no need to to 'mark it done'....
If you mean you may (for some odd reason) end up re-generating the list part way through or similar, I'd just create a 'done' dir and move each file into there immediately after you have finished with it. This is (part of) a classic technique for processing continuously incoming files.
 
1 members found this post helpful.
Old 08-03-2016, 10:09 AM   #5
tfjonesjr
LQ Newbie
 
Registered: Jan 2009
Location: Austin, TX
Distribution: CentOS, Ubuntu
Posts: 1

Rep: Reputation: 1
I've had the same challenge and ended up renaming each file when it's been processed. I usually prefix the filename with "done-". The benefit of that is users can monitor the folder and see that files have or have not been processed. You can also have a file processed again by manually renaming and removing the "done-" prefix.
 
1 members found this post helpful.
Old 08-08-2016, 12:59 PM   #6
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771

Original Poster
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by Habitual View Post
Smacks of rsync.

What the heck does "mark-it done" mean?
mark-it-done ===>
  • get a filespec from the to-do list
  • process that filespec
  • somehow record that you worked that filespec
    • delete the filespec from the to-do list
    • alter the to-do list entry for that filespec
    • write a separate done-list with that filespec
    • ...
  • repeat for all un-processed filespecs

If processing must be restarted, you can avoid processing items with the mark-it-done status and resume work with those that you have not processed.

~~~ 0;-Dan
 
Old 08-08-2016, 01:03 PM   #7
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771

Original Poster
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by chrism01 View Post
If you are working your way down a list there's no need to to 'mark it done'....
If you mean you may (for some odd reason) end up re-generating the list part way through or similar, I'd just create a 'done' dir and move each file into there immediately after you have finished with it. This is (part of) a classic technique for processing continuously incoming files.
I like this idea for one class of files that I'll be processing -- media cards (SD, CF, thumb, etc) -- but it would be trouble for a live file system.

That said, it might work to use a done folder and fill it with symlinks as the to-do list. Then I could remove the links as I process things leaving behind what reamains to-do.

Nice thinking, chrism01;5585085
~~~ 0;-Dan
 
Old 08-08-2016, 01:05 PM   #8
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771

Original Poster
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by pan64 View Post
looks like you need some kind of make tool (like make itself)
Ah, the venerable make... and its decendants. I'll need to ponder this option a looooooooonnnnggggg time.

Thanks,
~~~ 0;-Dan
 
Old 08-08-2016, 01:19 PM   #9
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
what the hell is a filespec ?
 
Old 08-09-2016, 12:42 AM   #10
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
nw: like I said its a classic soln in eg trading banks (trades come in as files initially).
Also, create a new dir every eg mth for a) ease of finding stuff, b) avoid hitting limit on num files/dir in long run.
If this is really a long term soln, you also need to archive off eventually, or you will run out of inodes possibly even before running out of disk space.
 
2 members found this post helpful.
Old 08-09-2016, 02:49 PM   #11
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771

Original Poster
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by schneidz View Post
what the hell is a filespec ?
I'm sorry, I've used that term for decades, but then I'm a serious dinosaur.
In general, a 'filespec' is a file specification -- /path1/.../pathN/filename.type
If there is a network involved -- username@hostname:/path1/.../pathN/filename.type
 
Old 08-09-2016, 02:55 PM   #12
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771

Original Poster
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by chrism01 View Post
nw: like I said its a classic soln in eg trading banks (trades come in as files initially).
Also, create a new dir every eg mth for a) ease of finding stuff, b) avoid hitting limit on num files/dir in long run.
If this is really a long term soln, you also need to archive off eventually, or you will run out of inodes possibly even before running out of disk space.
All good points that I'd likely not considered until things started failing.

To restate my original requirement, I need to make tar-balls from sets of files. These runs can take lots of wall-clock time. That means that there are lots of opportunities for the run to get interrupted by power or network troubles. It is okay to have tar-ball-1, tar-ball-2, ..., tar-ball-N of varied sizes. My primary concern is that I be able to (1) resume processing after an interruption, and (2) avoid processing input files repeatedly.

Thanks,
~~~ 0;-Dan
 
Old 08-10-2016, 03:35 AM   #13
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Y: so use a 'done' dir for ones that are complete immediately they are completed. This solves the restartability issue.
You may even (paranoia mode) touch a done file just after completing a tar but before mv'ing tar-ball to done dir.
This deals with the faint possibility of failure right at the last possible millisecond


Quote:
The paranoid programmer assumes the system is out to get them and acts accordingly

Last edited by chrism01; 08-10-2016 at 03:47 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
seeking tools to process email files SaintDanBert Linux - Software 2 01-12-2012 11:12 AM
How can we list all the files used by a particular process indian Programming 1 01-05-2007 07:54 AM
List of arguments too long, need to delete 59,000 files stefaandk Linux - General 4 07-12-2006 02:14 AM
cp lots of files argument list too long dtra Linux - Software 4 07-07-2005 09:14 AM
Howto list last changed files MicroSun Linux - Newbie 3 02-18-2005 05:52 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:33 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration