LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Best way to process a group of files (whole directory, wildcards etc.) (http://www.linuxquestions.org/questions/linux-newbie-8/best-way-to-process-a-group-of-files-whole-directory-wildcards-etc-658066/)

garrettderner 07-24-2008 04:56 PM

Best way to process a group of files (whole directory, wildcards etc.)
 
Maybe this is more a Programming than a Newbie question, I'm not sure. I made MS-DOS apps years and years ago, and now I am trying to learn the unix way.

Although there are certainly already many similar programs, I just made a tiny utility I call "unbreak", to reformat text files for easier e-book reading. It's in C and it uses stdin/stdout. You may see it here: http://derner.com/code/unbreak.c

Now that I have it working the way I like, I want to process groups of files, without having to specify each file. I want to be able to do something like:
unbreak raw/*.txt reformatted/
Of course in the above example, the shell would pass as parameters, the names of all the *.txt files in raw/, if they exist, followed by the directory name at the end.

OK, I could easily change the code, to have it open and process each input file listed, and save the output files in the directory named at the end. But is that the best way? Am I missing something easier or more standard? Or maybe it would be simple with some generic shell script?

Garrett

unSpawn 07-24-2008 05:59 PM

If you don't have spaces in filenames then I don't see any glaring problems with your example. I mean something like 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/' looks much more convoluted compared to your simple globbing example, doesn't it? BTW for changing linebreaks there's also 'dos2unix' and 'unix2dos'. Haven't encountered any 'downgradewhatever2osx' tho ;-p

chrism01 07-24-2008 09:29 PM

You could change the first param to be the input dir, if you know it'll only be .txt files. Avoids the 'too many args' error if you have a LOT of files.
You could even have 3 args:

unbreak input_dir file_ext output_dir

maybe even add a 4th, new_ext, so you can tell at a glance which files have been fixed. Handy for other progs to know also...

garrettderner 07-24-2008 11:43 PM

unSpawn, thanks for mentioning spaces in filenames, that was waiting to bite me.
'downgradewhatever2osx' :D No, good luck! In OS X text can be Mac or unix style, anyone's guess.

Chris, all good ideas, thanks!

garrettderner 07-25-2008 12:23 AM

unSpawn: 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/'

Ah yes, the unix way!
http://www.simson.net/ref/ugh.pdf

garrettderner 07-28-2008 04:40 PM

I ended up making a shell script. I did start adding to "unbreak" the ability to open files in a directory, but I got bored. Anyway I needed more practice with shell scripts. So I made a general purpose script, "cmd-dir1-dir2", to use one directory as input and another as output. For instance I can do:
cmd-dir1-dir2 unbreak raw unbroke
Seems to work ok even with spaces in filenames or directories. Posting for comment, or in case it is useful to anyone else.
Code:

#!/bin/sh
# cmd-dir1-dir2
if [ $# != 3 ]
then
 echo "  Usage: $0 cmd dir1 dir2"
 echo '  Using stdin & stdout, runs (cmd) on each file'
 echo '  in dir1, storing output files in dir2.'
 echo ''
else
 app=$1
 din=$2
 dout=$3
 for ifname in $din/*
 do
  ofname=$(echo "$ifname" |sed "s/$din/$dout/")
  ($app < $ifname > $ofname)
 done
fi


chrism01 07-28-2008 07:41 PM

I'd check $? after each invocation of $app and bail with a msg if it fails. You do obey the conventions right? ie if app rtns 0, ok, else error....

I prefer deeper indents (4 spaces) but that's just me.

If doing a numeric comparison, you want '-ne' : http://www.tldp.org/LDP/abs/html/comparison-ops.html

See also this note re double sq brackets vs single: http://www.tldp.org/LDP/abs/html/tes...ml#DBLBRACKETS

garrettderner 07-29-2008 01:06 AM

Quote:

I'd check $? after each invocation of $app and bail with a msg if it fails. You do obey the conventions right? ie if app rtns 0, ok, else error....
Well in a perfect world...

It looks like my early version of "unbreak" returns 255, because I never explicitly return a value from main(). There must be many apps like that. OK, I can say "return 0" at the end, but that's sort of just pretending there is error checking in the app. Real error checking in the case of "unbreak" would mean calling ferror() after every get and put, to make sure a disk didn't fill up or go offline or something. Could be worth it maybe OK, but it does complicate it. Anyway, in the script I will try following the suggestion.

Quote:

I prefer deeper indents (4 spaces) but that's just me.
So do I for readability, but I don't like editing and navigating through so many spaces. So I really like tabs. Tabs were made for indenting. Many developers seem to hate tabs with a passion; I don't know why.

Quote:

If doing a numeric comparison, you want '-ne' : http://www.tldp.org/LDP/abs/html/comparison-ops.html
Thanks, it looks like that is what I want. Should I say,
if [ $# -ne 3 ]
Isn't "$#" a string already, with the dollar sign in front?

Quote:

See also this note re double sq brackets vs single: http://www.tldp.org/LDP/abs/html/tes...ml#DBLBRACKETS
Thank you, I think I will read that about the double brackets again when I am not sleepy.

Right now I am puzzled that I can say with no problem:
if [ $? -ne 0 ]
then
...
...but if I say:

errnum=$?
if [ errnum -ne 0 ]
then
...
...it gives errors.

chrism01 07-29-2008 01:56 AM

$? and $# are special vars: http://www.tldp.org/LDP/abs/html/int...es.html#APPREF
Bookmark/read this doc. Also, http://tldp.org/LDP/Bash-Beginners-G...tml/index.html

When you write to a bash var, do not inc leading $. Do use leading $ when reading a bash var; so:

A=25
echo $A

Actually, I use tabs as well: 1 tab = 4 spaces
My .vimrc says:

set tabstop=4
set shiftwidth=4
set softtabstop=4

garrettderner 07-30-2008 02:49 PM

Thank you Chris! I definitely bookmarked all the links you posted.

Thanks for the vim settings. I hope I may actually be using them one day soon. After 22 yrs of using text editors I still seem too lazy to learn vi or emacs; maybe I am a confirmed n00b? This week I decided to try again. I installed vim and gvim, and succeeded in making some simple changes to my script. But copying and pasting turned out to be beyond me. And I got lost just trying to insert a newline. So I went running back to bluefish with its menus and its nice pretty syntax highlighting.

You can tell I'm hopeless because to create .vimrc I did:
mousepad ~/.vimrc &
But I'm sure if I do get back into coding, vi or emacs will be useful or even indispensable, so maybe I'll keep working at it. One thing I'll need to find is key bindings for the dvorak keyboard layout I use. Edit command keys that are supposed to be next to each other or in locations that make sense, are not where they should be, because I am not using QWERTY.

Below is the script with exit status checking; it refuses to continue if it gets non-zero. Maybe it would be nice if it would report return code -1 as "-1" instead of "255"? I don't know how to do that, or whether it would be desirable.

Also it might be nice to put in switches for forced continuation, and for verbosity.

Garrett

Code:

#!/bin/sh
# cmd-dir1-dir2
if [ $# -ne 3 ]
then
        echo "  Usage: $0 cmd dir1 dir2"
        echo '  Using stdin & stdout, runs (cmd) on each file'
        echo '  in dir1, storing output files in dir2.'
        echo ''
else
        app=$1
        din=$2
        dout=$3
        for ifname in $din/*
        do
                ofname=$(echo "$ifname" |sed "s/$din/$dout/")
                echo "$app < $ifname > $ofname"
                ($app < $ifname > $ofname)
                errnum=$?
                if [ $errnum -ne 0 ]
                then
                        echo "  Error: $app returned $errnum"
                        break
                fi
        done
fi


chrism01 07-30-2008 09:22 PM

Well, a good reason to know at least the basics of vi/vim is that its been the default editor in Unix/Linux based system for yrs, so even if a system is broken, the recovery tool will prob have a cut-down version.
Also, its very low overhead for remote work.
Last but not least, in commercial orgs, sysadmins can be hard to persuade to install your favourite.
vi (esp vim) is a very quick editor once you get used to it.
As for copy/paste, I cheat these days. I use xterms, so I use the mouse; just highlight to copy and centre button or both buttons to paste.
Sometimes you need to add ctrl-c, ctrl-v if copy/paste between xterm and browser/word processor.
Full docs here: http://vimdoc.sourceforge.net/htmldoc/

Re error val: bash only uses values 0-255 (8-bits), so you'll just have to manage with that ;)

If you app writes errors to stderr (it should) then redirect to a temp file and print (cat) that if an err occurs.

garrettderner 07-31-2008 08:15 PM

Thanks for the tips and links!

Good arguments for knowing how to use vi/vim. OK, it's on my to-do list, and I'll work on it from time to time. {'Course I've also had learning Morse Code on that list for a long time and haven't done that yet, but I did at least learn my "Alpha Bravo Charlies.")

I found the key bindings for Dvorak-QWERTY: http://vim.wikia.com/wiki/VimTip1437

Garrett


All times are GMT -5. The time now is 07:32 AM.