Isolate lines in a text file and perform replacements
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Isolate lines in a text file and perform replacements
I developed a shell script for renaming my mp3 files. My script uses Perl for full regular expression support (so I don’t have to escape the patterns like in SED).
The script uses Perl and regular expressions to normalize the file names. Basically, it removes junk from the file name; spaces, covert to title case. The patterns reside in a preset file since several replacements are necessary. Each preset file deals with a slight different renaming profile. My current development efforts involve applying these replacement patterns to the text inside playlist and xml files. I want to process the text inside these files; thus align the renamed filenames with the pointers in the playlist files. Eventually, the script would help me rename my mp3s and iTunes database (without importing all my audio files once again), yet it has several purposes beyond mp3 renaming.
The script performs a recursive search and finds files with these extensions (m3u, sfv, and xml). It iterates (while loop) through an external file with regular expression patterns. Then it combines the patterns and places them in a variable, which is passed to Perl. The script below performs this task (yet has not been thoroughly tested). It has a major shortcoming that the Perl line works on the entire file when it should only replace the lines with mp3 pointers. Here’s the call for assistance since I am having code block. Maybe, someone could help me with different logic or a traditional grep solution. I just need fresh ideas for my shell script.
Code:
find ./ -regex ".*\(m3u\|sfv\|xml\)$" -type f -print | while read FILE
do
while read -r REGEX REPLACE line
do
CODE="$CODE; s/$REGEX/$REPLACE/g"
done < "$PRESET"
#PRESET may contain over twenty five regex patterns for complex renaming task
perl -pi.bak -e “$CODE” $FILE
done
btw I am doing this script in Bash, because Perl is foreign territory.
Last edited by neville310; 04-01-2007 at 04:53 PM.
I've stopped using Perl for any kind of scripting, mainly because I've gotten into Common Lisp recently, and much prefer it for...well, anything.
I'm stuck on helping you with anything purely code-wise, but I could provide an algorithm for you, to help you achieve your result (I use it all the time):
1. Declare a buffer array/list.
2. Read in each line from the file, assigning each line to a new buffer index.
3. Scan through each array index until you find the expression you want to match.
3a. If a match is found, modify the line accordingly; skip to step 4.
3b. If a match is not found, move on to the next array index.
4. Close any input streams from the file, and open up an output stream to the file.
5. Using a loop, iterate through the array, and print each array value to its own line.
6. End Of Program.
Here's some pseudo-code:
Code:
' Declarations.
DECLARE buf AS ARRAY
DECLARE dat AS FILE
DECLARE exp as REGEX
DECLARE idx as INTEGER
' Initializations.
dat = "/path/to/file"
exp = "expression to match"
idx = 0
' Opening the file stream for input.
OPEN dat FOR INPUT AS #1
' Collect all the lines in the file, and store them in a buffer.
DO
LINE INPUT #1, buf[idx]
idx = idx + 1
LOOP WHILE NOT EOF(1)
' Close the input stream (it's not needed anymore).
CLOSE #1
' Parse through the buffer for the line to edit.
FOR i = 0 TO idx
IF buf[i] ~= exp THEN GOTO ModLine
ELSE
CONTINUE
END IF
NEXT i
' If it no matches are found, program flows here.
PRINT "Error: No matches found."
END
' If a match is found, program flows here.
ModLine:
' Modify line
' as you need to here,
' then write it to file.
OPEN dat FOR OVERWRITE AS #1
FOR i = 0 to idx
PRINT #1, buf[i]
NEXT i
CLOSE #1
However, instead of reading through a file, you would want to make use of Perl's glob() function, which takes all files that match a regular expression, and assigns each to its own array index:
Code:
@files = glob(".*\(m3u\|sfv\|xml\)$");
You can then iterate through the array like so:
Code:
foreach $item (@files) {
// Do
// Whatever
// Here
}
The beauty of sed & awk are that they process a file 1 line at a time.
If the only reason you abandoned sed is escaping patterns, then perhaps you are not aware of 2 of its really wonderful features:
the -r option,
using ',' (or any character of your choice) in place of '/'.
(Bonus) ';' works to string multiple sed commands together w/o pipelining.
RTM sed. However, the part about using ',' is not in the man page I just read, it's in Info. If you have Konqueror & KDE, info:/sed/The "s" Command will get you there. Otherwise:
Quote:
The `/' characters may be uniformly
replaced by any other single character within any given `s' command.
The `/' character (or whatever other character is used in its stead)
can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\'
character.
This will only work in Konqueror, no other browser.
Perhaps a small sample of your file names, profiles, & regexen would be helpful.
BTW, I just discovered http://pastebin.ca -- a really cool place to post whole files of examples & samples.
... If you have Konqueror & KDE, info:/sed/The "s" Command will get you there. Otherwise: This will only work in Konqueror, no other browser.
Clarification: That is, that URI will only work in Konqueror.
info:/ is a "pseudo protocol", a special feature of Konqueror; it is not really a protocol, but a "kioslave". See: http://en.wikipedia.org/wiki/Kioslave
For a list of Kioslaves, use the help: Kioslave -- "help:kioslave".
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.