LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 07-24-2008, 04:56 PM   #1
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Rep: Reputation: 0
Best way to process a group of files (whole directory, wildcards etc.)


Maybe this is more a Programming than a Newbie question, I'm not sure. I made MS-DOS apps years and years ago, and now I am trying to learn the unix way.

Although there are certainly already many similar programs, I just made a tiny utility I call "unbreak", to reformat text files for easier e-book reading. It's in C and it uses stdin/stdout. You may see it here: http://derner.com/code/unbreak.c

Now that I have it working the way I like, I want to process groups of files, without having to specify each file. I want to be able to do something like:
unbreak raw/*.txt reformatted/
Of course in the above example, the shell would pass as parameters, the names of all the *.txt files in raw/, if they exist, followed by the directory name at the end.

OK, I could easily change the code, to have it open and process each input file listed, and save the output files in the directory named at the end. But is that the best way? Am I missing something easier or more standard? Or maybe it would be simple with some generic shell script?

Garrett

Last edited by garrettderner; 07-24-2008 at 05:01 PM. Reason: Point to "unbreak.c" latest version instead of specific version.
 
Old 07-24-2008, 05:59 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 26,999
Blog Entries: 54

Rep: Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745Reputation: 2745
If you don't have spaces in filenames then I don't see any glaring problems with your example. I mean something like 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/' looks much more convoluted compared to your simple globbing example, doesn't it? BTW for changing linebreaks there's also 'dos2unix' and 'unix2dos'. Haven't encountered any 'downgradewhatever2osx' tho ;-p
 
Old 07-24-2008, 09:29 PM   #3
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,246

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
You could change the first param to be the input dir, if you know it'll only be .txt files. Avoids the 'too many args' error if you have a LOT of files.
You could even have 3 args:

unbreak input_dir file_ext output_dir

maybe even add a 4th, new_ext, so you can tell at a glance which files have been fixed. Handy for other progs to know also...
 
Old 07-24-2008, 11:43 PM   #4
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Original Poster
Rep: Reputation: 0
unSpawn, thanks for mentioning spaces in filenames, that was waiting to bite me.
'downgradewhatever2osx' No, good luck! In OS X text can be Mac or unix style, anyone's guess.

Chris, all good ideas, thanks!
 
Old 07-25-2008, 12:23 AM   #5
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Original Poster
Rep: Reputation: 0
unSpawn: 'find raw -type f -iname \*.txt -print0 | xargs -0 -iF unbreak 'F' reformatted/'

Ah yes, the unix way!
http://www.simson.net/ref/ugh.pdf
 
Old 07-28-2008, 04:40 PM   #6
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Original Poster
Rep: Reputation: 0
I ended up making a shell script. I did start adding to "unbreak" the ability to open files in a directory, but I got bored. Anyway I needed more practice with shell scripts. So I made a general purpose script, "cmd-dir1-dir2", to use one directory as input and another as output. For instance I can do:
cmd-dir1-dir2 unbreak raw unbroke
Seems to work ok even with spaces in filenames or directories. Posting for comment, or in case it is useful to anyone else.
Code:
#!/bin/sh
# cmd-dir1-dir2
if [ $# != 3 ]
then
 echo "  Usage: $0 cmd dir1 dir2"
 echo '  Using stdin & stdout, runs (cmd) on each file'
 echo '  in dir1, storing output files in dir2.'
 echo ''
else
 app=$1
 din=$2
 dout=$3
 for ifname in $din/*
 do
  ofname=$(echo "$ifname" |sed "s/$din/$dout/")
  ($app < $ifname > $ofname)
 done
fi

Last edited by garrettderner; 07-28-2008 at 06:54 PM. Reason: Removed ".txt" from code
 
Old 07-28-2008, 07:41 PM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,246

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
I'd check $? after each invocation of $app and bail with a msg if it fails. You do obey the conventions right? ie if app rtns 0, ok, else error....

I prefer deeper indents (4 spaces) but that's just me.

If doing a numeric comparison, you want '-ne' : http://www.tldp.org/LDP/abs/html/comparison-ops.html

See also this note re double sq brackets vs single: http://www.tldp.org/LDP/abs/html/tes...ml#DBLBRACKETS
 
Old 07-29-2008, 01:06 AM   #8
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Original Poster
Rep: Reputation: 0
Quote:
I'd check $? after each invocation of $app and bail with a msg if it fails. You do obey the conventions right? ie if app rtns 0, ok, else error....
Well in a perfect world...

It looks like my early version of "unbreak" returns 255, because I never explicitly return a value from main(). There must be many apps like that. OK, I can say "return 0" at the end, but that's sort of just pretending there is error checking in the app. Real error checking in the case of "unbreak" would mean calling ferror() after every get and put, to make sure a disk didn't fill up or go offline or something. Could be worth it maybe OK, but it does complicate it. Anyway, in the script I will try following the suggestion.

Quote:
I prefer deeper indents (4 spaces) but that's just me.
So do I for readability, but I don't like editing and navigating through so many spaces. So I really like tabs. Tabs were made for indenting. Many developers seem to hate tabs with a passion; I don't know why.

Quote:
If doing a numeric comparison, you want '-ne' : http://www.tldp.org/LDP/abs/html/comparison-ops.html
Thanks, it looks like that is what I want. Should I say,
if [ $# -ne 3 ]
Isn't "$#" a string already, with the dollar sign in front?

Quote:
See also this note re double sq brackets vs single: http://www.tldp.org/LDP/abs/html/tes...ml#DBLBRACKETS
Thank you, I think I will read that about the double brackets again when I am not sleepy.

Right now I am puzzled that I can say with no problem:
if [ $? -ne 0 ]
then
...
...but if I say:

errnum=$?
if [ errnum -ne 0 ]
then
...
...it gives errors.
 
Old 07-29-2008, 01:56 AM   #9
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,246

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
$? and $# are special vars: http://www.tldp.org/LDP/abs/html/int...es.html#APPREF
Bookmark/read this doc. Also, http://tldp.org/LDP/Bash-Beginners-G...tml/index.html

When you write to a bash var, do not inc leading $. Do use leading $ when reading a bash var; so:

A=25
echo $A

Actually, I use tabs as well: 1 tab = 4 spaces
My .vimrc says:

set tabstop=4
set shiftwidth=4
set softtabstop=4
 
Old 07-30-2008, 02:49 PM   #10
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Original Poster
Rep: Reputation: 0
Thank you Chris! I definitely bookmarked all the links you posted.

Thanks for the vim settings. I hope I may actually be using them one day soon. After 22 yrs of using text editors I still seem too lazy to learn vi or emacs; maybe I am a confirmed n00b? This week I decided to try again. I installed vim and gvim, and succeeded in making some simple changes to my script. But copying and pasting turned out to be beyond me. And I got lost just trying to insert a newline. So I went running back to bluefish with its menus and its nice pretty syntax highlighting.

You can tell I'm hopeless because to create .vimrc I did:
mousepad ~/.vimrc &
But I'm sure if I do get back into coding, vi or emacs will be useful or even indispensable, so maybe I'll keep working at it. One thing I'll need to find is key bindings for the dvorak keyboard layout I use. Edit command keys that are supposed to be next to each other or in locations that make sense, are not where they should be, because I am not using QWERTY.

Below is the script with exit status checking; it refuses to continue if it gets non-zero. Maybe it would be nice if it would report return code -1 as "-1" instead of "255"? I don't know how to do that, or whether it would be desirable.

Also it might be nice to put in switches for forced continuation, and for verbosity.

Garrett

Code:
#!/bin/sh
# cmd-dir1-dir2
if [ $# -ne 3 ]
then
	echo "  Usage: $0 cmd dir1 dir2"
	echo '  Using stdin & stdout, runs (cmd) on each file'
	echo '  in dir1, storing output files in dir2.'
	echo ''
else
	app=$1
	din=$2
	dout=$3
	for ifname in $din/*
	do
		ofname=$(echo "$ifname" |sed "s/$din/$dout/")
		echo "$app < $ifname > $ofname"
		($app < $ifname > $ofname)
		errnum=$?
		if [ $errnum -ne 0 ]
		then
			echo "  Error: $app returned $errnum"
			break
		fi
	done
fi
 
Old 07-30-2008, 09:22 PM   #11
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,246

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
Well, a good reason to know at least the basics of vi/vim is that its been the default editor in Unix/Linux based system for yrs, so even if a system is broken, the recovery tool will prob have a cut-down version.
Also, its very low overhead for remote work.
Last but not least, in commercial orgs, sysadmins can be hard to persuade to install your favourite.
vi (esp vim) is a very quick editor once you get used to it.
As for copy/paste, I cheat these days. I use xterms, so I use the mouse; just highlight to copy and centre button or both buttons to paste.
Sometimes you need to add ctrl-c, ctrl-v if copy/paste between xterm and browser/word processor.
Full docs here: http://vimdoc.sourceforge.net/htmldoc/

Re error val: bash only uses values 0-255 (8-bits), so you'll just have to manage with that

If you app writes errors to stderr (it should) then redirect to a temp file and print (cat) that if an err occurs.
 
Old 07-31-2008, 08:15 PM   #12
garrettderner
LQ Newbie
 
Registered: Jul 2008
Location: Chicago IL, USA
Distribution: antiX debian
Posts: 14

Original Poster
Rep: Reputation: 0
Thanks for the tips and links!

Good arguments for knowing how to use vi/vim. OK, it's on my to-do list, and I'll work on it from time to time. {'Course I've also had learning Morse Code on that list for a long time and haven't done that yet, but I did at least learn my "Alpha Bravo Charlies.")

I found the key bindings for Dvorak-QWERTY: http://vim.wikia.com/wiki/VimTip1437

Garrett
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Use wget to download multiple files with wildcards Anant Khaitan Linux - Networking 8 08-23-2013 09:45 PM
Renaming group of files within one directory DIRdiver Linux - General 3 10-25-2006 09:57 AM
setting a group for new files in a directory eantoranz Linux - Security 2 01-18-2005 01:44 PM
How to change owner and group in a directory to include subdir and all files Lakota Linux - General 2 07-15-2004 09:35 AM
What are Session ID, Group ID, Process ID... yrraja Linux - General 4 10-26-2003 10:33 PM


All times are GMT -5. The time now is 07:48 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration