[SOLVED] how to automate file processing in bash?

khandu · 06-11-2012, 02:52 AM

Ok

The heading might not have made sense..

Here is what I need to do

There is a perl script which takes input of a .txt file and outputs the result to STDOUT!!!

So in essence we run it as on bash prompt

Code:

./script.pl INPUTFILE1.txt > INPUTFILE1_RESULT.txt
./script.pl INPUTFILE2.txt > INPUTFILE2_RESULT.txt
...
...
./script.pl INPUTFILEn.txt > INPUTFILEn_RESULT.txt

Now there are over 500 txt files I need to run this script for.. each should generate a result.txt file (it can contain the orignal filename for easy identification)

How do I do this??

Also to take it to next level..

lets say there are multiple folders which have the INPUTFILE.txt.. it possible to do this same thing as above recursively and create a new output folder (if does not exist) as well for each folder processed with _RESULT in end name of folder.. ofcourse don't want to complicate things .. I can manually go into them and point the result to a manually created folder..

Again solution to 1st problem itself will be great

if it makes any difference, the code will run in RHEL..

I cannot modify the .pl script.. so just want to modify the execution style on bash

Cheers

business_kid · 06-11-2012, 03:14 AM

NO bash expert here, so fix the syntax:

for i in /wherever do;
./script.pl $i > $i.results; # probably need some brackets around '$i'
next i;

pan64 · 06-11-2012, 03:14 AM

Code:

for f in `cat inputfilelist`
do
script $f > $f.out
done

I would try something like this, but it depends on the inputfilelist, the filenames may not contain whitespace.

grail · 06-11-2012, 05:15 AM

Would not the best option be to simply extend the perl script to cope with the new requirement?

414N · 06-11-2012, 05:40 AM

Code:

#!/bin/sh
# Executes script.pl over all the files specified on the CLI, preserving
# the file extensions of the original files

for FILE in "$@"
do
  FILEEXT=`echo "$FILE" | rev | cut -d. -f1 | rev`
  script.pl "$FILE" > "`basename "$FILE" $FILEEXT`-RESULT.$FILEEXT
done

pan64 · 06-11-2012, 05:49 AM

oh, no! see bash variable substitution:

Code:

fileext=${FILE##*.}
basename=${FILE%.*}

Nominal Animal · 06-11-2012, 05:51 AM

Using Bash:

Code:

find DIR(s)... -name '*.txt' -print0 | while read -rd "" source ; do
   target="${source%.*}_RESULT.txt"
   ./script.pl "$source" > "$target"
done

Above, find will emit the files to work on using ASCII NULs as separators, handling even the weirdest file names correctly. This one looks for all files ending with .txt in DIR(s)... and their subdirectories.

The read -rd "" uses the Bash read built-in to read them into variable source one by one in the while .. ; do ... done loop.

target gets set to the value of source but with everything after the final . replaced with _RESULT.txt.

If you want to skip any existing result files, use

Code:

    [[ -e "$target" ]] || ./script.pl "$source" > "$target"

instead. The test is for existence, but || is else/otherwise, so the script is only run if $target does not exist yet. There is a short interval in between, when someone else could create $target. If that is a problem, use set -C to tell Bash to not redirect into existing files.

414N · 06-11-2012, 09:29 AM

Quote:

Originally Posted by pan64

oh, no! see bash variable substitution:

Code:

fileext=${FILE##*.}
basename=${FILE%.*}

Thanks for the enhancement. I still need to grasp bash variable substitution

theNbomr · 06-11-2012, 10:40 AM

Quote:

Originally Posted by grail

Would not the best option be to simply extend the perl script to cope with the new requirement?

Depends on how you define 'best', I suppose. My logic is that when a tool that does something very well already exits, I want to use that. In this case, that tool is find. Using the existing perl script also adheres to this principle. It is known to do one thing well, so use it that way, and build around it.

Nominal Animal demonstrates the power of find very nicely, and also demonstrates and explains a well structured way to approach the problem: iteration over a file set, especially with recursion, suggests using find. The output of find is a list, which feeds the while loop, so there is the iteration part of the problem solved. Once that part of the problem is addressed, he assembles a couple of key variables that are the arguments to the perl script. Finally, that last thing in each iteration is to invoke the perl script with arguments that are variables with nice human-readable names.

There is more to learn from Nominal's code than just how to solve the problem. It is a good demonstration of how structure the solution.
--- rod.