Isolate lines in a text file and perform replacements

neville310 · 04-01-2007, 04:49 PM

I developed a shell script for renaming my mp3 files. My script uses Perl for full regular expression support (so I don’t have to escape the patterns like in SED).

The script uses Perl and regular expressions to normalize the file names. Basically, it removes junk from the file name; spaces, covert to title case. The patterns reside in a preset file since several replacements are necessary. Each preset file deals with a slight different renaming profile. My current development efforts involve applying these replacement patterns to the text inside playlist and xml files. I want to process the text inside these files; thus align the renamed filenames with the pointers in the playlist files. Eventually, the script would help me rename my mp3s and iTunes database (without importing all my audio files once again), yet it has several purposes beyond mp3 renaming.

The script performs a recursive search and finds files with these extensions (m3u, sfv, and xml). It iterates (while loop) through an external file with regular expression patterns. Then it combines the patterns and places them in a variable, which is passed to Perl. The script below performs this task (yet has not been thoroughly tested). It has a major shortcoming that the Perl line works on the entire file when it should only replace the lines with mp3 pointers. Here’s the call for assistance since I am having code block. Maybe, someone could help me with different logic or a traditional grep solution. I just need fresh ideas for my shell script.

Code:

find ./ -regex ".*\(m3u\|sfv\|xml\)$" -type f -print | while read FILE
	do 
		while read -r REGEX REPLACE line
		do
			CODE="$CODE; s/$REGEX/$REPLACE/g"
		done < "$PRESET"
#PRESET may contain over twenty five regex patterns for complex renaming task
		perl -pi.bak -e “$CODE” $FILE	
done

btw I am doing this script in Bash, because Perl is foreign territory.

indienick · 04-02-2007, 04:09 PM

I've stopped using Perl for any kind of scripting, mainly because I've gotten into Common Lisp recently, and much prefer it for...well, anything.

I'm stuck on helping you with anything purely code-wise, but I could provide an algorithm for you, to help you achieve your result (I use it all the time):
1. Declare a buffer array/list.
2. Read in each line from the file, assigning each line to a new buffer index.
3. Scan through each array index until you find the expression you want to match.
3a. If a match is found, modify the line accordingly; skip to step 4.
3b. If a match is not found, move on to the next array index.
4. Close any input streams from the file, and open up an output stream to the file.
5. Using a loop, iterate through the array, and print each array value to its own line.
6. End Of Program.

Here's some pseudo-code:

Code:

' Declarations.
DECLARE buf AS ARRAY
DECLARE dat AS FILE
DECLARE exp as REGEX
DECLARE idx as INTEGER

' Initializations.
dat = "/path/to/file"
exp = "expression to match"
idx = 0

' Opening the file stream for input.
OPEN dat FOR INPUT AS #1

' Collect all the lines in the file, and store them in a buffer.
DO
  LINE INPUT #1, buf[idx]
  idx = idx + 1
LOOP WHILE NOT EOF(1)

' Close the input stream (it's not needed anymore).
CLOSE #1

' Parse through the buffer for the line to edit.
FOR i = 0 TO idx
  IF buf[i] ~= exp THEN GOTO ModLine
  ELSE 
   CONTINUE
  END IF
NEXT i

' If it no matches are found, program flows here.
PRINT "Error: No matches found."
END

' If a match is found, program flows here.
ModLine:
 ' Modify line
 ' as you need to here,
 ' then write it to file.
 OPEN dat FOR OVERWRITE AS #1
 
 FOR i = 0 to idx
   PRINT #1, buf[i]
 NEXT i

 CLOSE #1

However, instead of reading through a file, you would want to make use of Perl's glob() function, which takes all files that match a regular expression, and assigns each to its own array index:

Code:

@files = glob(".*\(m3u\|sfv\|xml\)$");

You can then iterate through the array like so:

Code:

foreach $item (@files) {
  // Do
  // Whatever
  // Here
}

archtoad6 · 04-02-2007, 05:15 PM

The beauty of sed & awk are that they process a file 1 line at a time.

If the only reason you abandoned sed is escaping patterns, then perhaps you are not aware of 2 of its really wonderful features:

the -r option,
using ',' (or any character of your choice) in place of '/'.
(Bonus) ';' works to string multiple sed commands together w/o pipelining.

RT

M sed. However, the part about using ',' is not in the man page I just read, it's in Info. If you have Konqueror & KDE, info:/sed/The "s" Command will get you there. Otherwise:

Quote:

The `/' characters may be uniformly
replaced by any other single character within any given `s' command.
The `/' character (or whatever other character is used in its stead)
can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\'
character.

This will only work in Konqueror, no other browser.

Perhaps a small sample of your file names, profiles, & regexen would be helpful.

BTW, I just discovered http://pastebin.ca -- a really cool place to post whole files of examples & samples.

1 last Q/comment, wouldn't:

Code:

for F in .*m3u .*sfv .*xml

work as well as your find ... command?

archtoad6 · 06-19-2007, 09:20 AM

Quote:

Originally Posted by archtoad6

... If you have Konqueror & KDE, info:/sed/The "s" Command will get you there. Otherwise: This will only work in Konqueror, no other browser.

Clarification: That is, that URI will only work in Konqueror.

info:/ is a "pseudo protocol", a special feature of Konqueror; it is not really a protocol, but a "kioslave". See: http://en.wikipedia.org/wiki/Kioslave

For a list of Kioslaves, use the help: Kioslave -- "help:kioslave".