Best way to process a group of files (whole directory, wildcards etc.)
Maybe this is more a Programming than a Newbie question, I'm not sure. I made MS-DOS apps years and years ago, and now I am trying to learn the unix way.
Although there are certainly already many similar programs, I just made a tiny utility I call "unbreak", to reformat text files for easier e-book reading. It's in C and it uses stdin/stdout. You may see it here: http://derner.com/code/unbreak.c Now that I have it working the way I like, I want to process groups of files, without having to specify each file. I want to be able to do something like: unbreak raw/*.txt reformatted/Of course in the above example, the shell would pass as parameters, the names of all the *.txt files in raw/, if they exist, followed by the directory name at the end. OK, I could easily change the code, to have it open and process each input file listed, and save the output files in the directory named at the end. But is that the best way? Am I missing something easier or more standard? Or maybe it would be simple with some generic shell script? Garrett |
If you don't have spaces in filenames then I don't see any glaring problems with your example. I mean something like 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/' looks much more convoluted compared to your simple globbing example, doesn't it? BTW for changing linebreaks there's also 'dos2unix' and 'unix2dos'. Haven't encountered any 'downgradewhatever2osx' tho ;-p
|
You could change the first param to be the input dir, if you know it'll only be .txt files. Avoids the 'too many args' error if you have a LOT of files.
You could even have 3 args: unbreak input_dir file_ext output_dir maybe even add a 4th, new_ext, so you can tell at a glance which files have been fixed. Handy for other progs to know also... |
unSpawn, thanks for mentioning spaces in filenames, that was waiting to bite me.
'downgradewhatever2osx' :D No, good luck! In OS X text can be Mac or unix style, anyone's guess. Chris, all good ideas, thanks! |
unSpawn: 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/'
Ah yes, the unix way! http://www.simson.net/ref/ugh.pdf |
I ended up making a shell script. I did start adding to "unbreak" the ability to open files in a directory, but I got bored. Anyway I needed more practice with shell scripts. So I made a general purpose script, "cmd-dir1-dir2", to use one directory as input and another as output. For instance I can do:
cmd-dir1-dir2 unbreak raw unbrokeSeems to work ok even with spaces in filenames or directories. Posting for comment, or in case it is useful to anyone else. Code:
#!/bin/sh |
I'd check $? after each invocation of $app and bail with a msg if it fails. You do obey the conventions right? ie if app rtns 0, ok, else error....
I prefer deeper indents (4 spaces) but that's just me. If doing a numeric comparison, you want '-ne' : http://www.tldp.org/LDP/abs/html/comparison-ops.html See also this note re double sq brackets vs single: http://www.tldp.org/LDP/abs/html/tes...ml#DBLBRACKETS |
Quote:
It looks like my early version of "unbreak" returns 255, because I never explicitly return a value from main(). There must be many apps like that. OK, I can say "return 0" at the end, but that's sort of just pretending there is error checking in the app. Real error checking in the case of "unbreak" would mean calling ferror() after every get and put, to make sure a disk didn't fill up or go offline or something. Could be worth it maybe OK, but it does complicate it. Anyway, in the script I will try following the suggestion. Quote:
Quote:
if [ $# -ne 3 ]Isn't "$#" a string already, with the dollar sign in front? Quote:
Right now I am puzzled that I can say with no problem: if [ $? -ne 0 ]...but if I say: ...it gives errors. |
$? and $# are special vars: http://www.tldp.org/LDP/abs/html/int...es.html#APPREF
Bookmark/read this doc. Also, http://tldp.org/LDP/Bash-Beginners-G...tml/index.html When you write to a bash var, do not inc leading $. Do use leading $ when reading a bash var; so: A=25 echo $A Actually, I use tabs as well: 1 tab = 4 spaces My .vimrc says: set tabstop=4 set shiftwidth=4 set softtabstop=4 |
Thank you Chris! I definitely bookmarked all the links you posted.
Thanks for the vim settings. I hope I may actually be using them one day soon. After 22 yrs of using text editors I still seem too lazy to learn vi or emacs; maybe I am a confirmed n00b? This week I decided to try again. I installed vim and gvim, and succeeded in making some simple changes to my script. But copying and pasting turned out to be beyond me. And I got lost just trying to insert a newline. So I went running back to bluefish with its menus and its nice pretty syntax highlighting. You can tell I'm hopeless because to create .vimrc I did: mousepad ~/.vimrc &But I'm sure if I do get back into coding, vi or emacs will be useful or even indispensable, so maybe I'll keep working at it. One thing I'll need to find is key bindings for the dvorak keyboard layout I use. Edit command keys that are supposed to be next to each other or in locations that make sense, are not where they should be, because I am not using QWERTY. Below is the script with exit status checking; it refuses to continue if it gets non-zero. Maybe it would be nice if it would report return code -1 as "-1" instead of "255"? I don't know how to do that, or whether it would be desirable. Also it might be nice to put in switches for forced continuation, and for verbosity. Garrett Code:
#!/bin/sh |
Well, a good reason to know at least the basics of vi/vim is that its been the default editor in Unix/Linux based system for yrs, so even if a system is broken, the recovery tool will prob have a cut-down version.
Also, its very low overhead for remote work. Last but not least, in commercial orgs, sysadmins can be hard to persuade to install your favourite. vi (esp vim) is a very quick editor once you get used to it. As for copy/paste, I cheat these days. I use xterms, so I use the mouse; just highlight to copy and centre button or both buttons to paste. Sometimes you need to add ctrl-c, ctrl-v if copy/paste between xterm and browser/word processor. Full docs here: http://vimdoc.sourceforge.net/htmldoc/ Re error val: bash only uses values 0-255 (8-bits), so you'll just have to manage with that ;) If you app writes errors to stderr (it should) then redirect to a temp file and print (cat) that if an err occurs. |
Thanks for the tips and links!
Good arguments for knowing how to use vi/vim. OK, it's on my to-do list, and I'll work on it from time to time. {'Course I've also had learning Morse Code on that list for a long time and haven't done that yet, but I did at least learn my "Alpha Bravo Charlies.") I found the key bindings for Dvorak-QWERTY: http://vim.wikia.com/wiki/VimTip1437 Garrett |
All times are GMT -5. The time now is 05:56 PM. |