Best way to process a group of files (whole directory, wildcards etc.)
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Best way to process a group of files (whole directory, wildcards etc.)
Maybe this is more a Programming than a Newbie question, I'm not sure. I made MS-DOS apps years and years ago, and now I am trying to learn the unix way.
Although there are certainly already many similar programs, I just made a tiny utility I call "unbreak", to reformat text files for easier e-book reading. It's in C and it uses stdin/stdout. You may see it here: http://derner.com/code/unbreak.c
Now that I have it working the way I like, I want to process groups of files, without having to specify each file. I want to be able to do something like:
unbreak raw/*.txt reformatted/
Of course in the above example, the shell would pass as parameters, the names of all the *.txt files in raw/, if they exist, followed by the directory name at the end.
OK, I could easily change the code, to have it open and process each input file listed, and save the output files in the directory named at the end. But is that the best way? Am I missing something easier or more standard? Or maybe it would be simple with some generic shell script?
Garrett
Last edited by garrettderner; 07-24-2008 at 05:01 PM.
Reason: Point to "unbreak.c" latest version instead of specific version.
If you don't have spaces in filenames then I don't see any glaring problems with your example. I mean something like 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/' looks much more convoluted compared to your simple globbing example, doesn't it? BTW for changing linebreaks there's also 'dos2unix' and 'unix2dos'. Haven't encountered any 'downgradewhatever2osx' tho ;-p
You could change the first param to be the input dir, if you know it'll only be .txt files. Avoids the 'too many args' error if you have a LOT of files.
You could even have 3 args:
unbreak input_dir file_ext output_dir
maybe even add a 4th, new_ext, so you can tell at a glance which files have been fixed. Handy for other progs to know also...
unSpawn, thanks for mentioning spaces in filenames, that was waiting to bite me.
'downgradewhatever2osx' No, good luck! In OS X text can be Mac or unix style, anyone's guess.
I ended up making a shell script. I did start adding to "unbreak" the ability to open files in a directory, but I got bored. Anyway I needed more practice with shell scripts. So I made a general purpose script, "cmd-dir1-dir2", to use one directory as input and another as output. For instance I can do:
cmd-dir1-dir2 unbreak raw unbroke
Seems to work ok even with spaces in filenames or directories. Posting for comment, or in case it is useful to anyone else.
Code:
#!/bin/sh
# cmd-dir1-dir2
if [ $# != 3 ]
then
echo " Usage: $0 cmd dir1 dir2"
echo ' Using stdin & stdout, runs (cmd) on each file'
echo ' in dir1, storing output files in dir2.'
echo ''
else
app=$1
din=$2
dout=$3
for ifname in $din/*
do
ofname=$(echo "$ifname" |sed "s/$din/$dout/")
($app < $ifname > $ofname)
done
fi
Last edited by garrettderner; 07-28-2008 at 06:54 PM.
Reason: Removed ".txt" from code
I'd check $? after each invocation of $app and bail with a msg if it fails. You do obey the conventions right? ie if app rtns 0, ok, else error....
Well in a perfect world...
It looks like my early version of "unbreak" returns 255, because I never explicitly return a value from main(). There must be many apps like that. OK, I can say "return 0" at the end, but that's sort of just pretending there is error checking in the app. Real error checking in the case of "unbreak" would mean calling ferror() after every get and put, to make sure a disk didn't fill up or go offline or something. Could be worth it maybe OK, but it does complicate it. Anyway, in the script I will try following the suggestion.
Quote:
I prefer deeper indents (4 spaces) but that's just me.
So do I for readability, but I don't like editing and navigating through so many spaces. So I really like tabs. Tabs were made for indenting. Many developers seem to hate tabs with a passion; I don't know why.
Thank you Chris! I definitely bookmarked all the links you posted.
Thanks for the vim settings. I hope I may actually be using them one day soon. After 22 yrs of using text editors I still seem too lazy to learn vi or emacs; maybe I am a confirmed n00b? This week I decided to try again. I installed vim and gvim, and succeeded in making some simple changes to my script. But copying and pasting turned out to be beyond me. And I got lost just trying to insert a newline. So I went running back to bluefish with its menus and its nice pretty syntax highlighting.
You can tell I'm hopeless because to create .vimrc I did:
mousepad ~/.vimrc &
But I'm sure if I do get back into coding, vi or emacs will be useful or even indispensable, so maybe I'll keep working at it. One thing I'll need to find is key bindings for the dvorak keyboard layout I use. Edit command keys that are supposed to be next to each other or in locations that make sense, are not where they should be, because I am not using QWERTY.
Below is the script with exit status checking; it refuses to continue if it gets non-zero. Maybe it would be nice if it would report return code -1 as "-1" instead of "255"? I don't know how to do that, or whether it would be desirable.
Also it might be nice to put in switches for forced continuation, and for verbosity.
Garrett
Code:
#!/bin/sh
# cmd-dir1-dir2
if [ $# -ne 3 ]
then
echo " Usage: $0 cmd dir1 dir2"
echo ' Using stdin & stdout, runs (cmd) on each file'
echo ' in dir1, storing output files in dir2.'
echo ''
else
app=$1
din=$2
dout=$3
for ifname in $din/*
do
ofname=$(echo "$ifname" |sed "s/$din/$dout/")
echo "$app < $ifname > $ofname"
($app < $ifname > $ofname)
errnum=$?
if [ $errnum -ne 0 ]
then
echo " Error: $app returned $errnum"
break
fi
done
fi
Well, a good reason to know at least the basics of vi/vim is that its been the default editor in Unix/Linux based system for yrs, so even if a system is broken, the recovery tool will prob have a cut-down version.
Also, its very low overhead for remote work.
Last but not least, in commercial orgs, sysadmins can be hard to persuade to install your favourite.
vi (esp vim) is a very quick editor once you get used to it.
As for copy/paste, I cheat these days. I use xterms, so I use the mouse; just highlight to copy and centre button or both buttons to paste.
Sometimes you need to add ctrl-c, ctrl-v if copy/paste between xterm and browser/word processor.
Full docs here: http://vimdoc.sourceforge.net/htmldoc/
Re error val: bash only uses values 0-255 (8-bits), so you'll just have to manage with that
If you app writes errors to stderr (it should) then redirect to a temp file and print (cat) that if an err occurs.
Good arguments for knowing how to use vi/vim. OK, it's on my to-do list, and I'll work on it from time to time. {'Course I've also had learning Morse Code on that list for a long time and haven't done that yet, but I did at least learn my "Alpha Bravo Charlies.")
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.