LinuxQuestions.org - find and replace script

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - find and replace script (https://www.linuxquestions.org/questions/programming-9/find-and-replace-script-545347/)

UnixKiwi

04-12-2007 01:16 AM

find and replace script

Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).

Many thanks

wjevans_7d1@yahoo.co

04-12-2007 08:49 AM

If you semi-understand Perl, why not take a stab at writing a script? If it doesn't work, come back with questions.

kshkid

04-12-2007 09:46 AM

s/pattern/replace

could you please post sample input and output ?

cfaj	04-12-2007 11:32 AM

Quote:

Originally Posted by UnixKiwi

Convert the second file to a sed script. For example, if your second file contains:
one ONE
two TWO
Change it to:
s/one/ONE/g
s/two/TWO/g
Then you can use sed:

Code:

sed -f sedscriptfile datafile

bigearsbilly

04-13-2007 04:07 AM

substitute in files making a .bak backup file...

Code:

perl -pi.bak -e 's/this/that/g' file ...

UnixKiwi

04-13-2007 06:25 PM

Quote:

Originally Posted by kshkid

could you please post sample input and output ?

Sample Input
file A
1 4
2 5
3 6

file B
3 9
5 8

Output
1 4
2 8
9 6

cfaj	04-13-2007 06:46 PM

Quote:

Originally Posted by UnixKiwi

Sample Input
file A
1 4
2 5
3 6

file B
3 9
5 8

Output
1 4
2 8
9 6

As I posted before, turn the second file into a sed script:

Code:

sed "$( awk '{ printf "s/%s/%s/g\n", $1, $2 }' "$FILEB" )" "$F1LEA"

UnixKiwi

04-15-2007 11:24 PM

It worked on a simple example but when I tried to run it on my data it went haywire.
As far as I can tell (and this is a stab in the dark but makes sense to me) when I have 2 numbers like 10 and 101, and I want 10 replaced with 8, it is giving me 8 and 81 aka replacing parts of numbers as well as whole numbers. Is there any way for it to make it read whole numbers only?

bigearsbilly

04-16-2007 02:54 AM

welcome to the world of regular expressions!

what are you using?

clue?

matthewg42

04-16-2007 03:35 AM

If you want to match whole lines ones, use ^ and $ at the start and end of your pattern to denote start of line and end of line. E.g.

Code:

s/^mypattern$/replacement/g;

...will replace "mypattern" with "replacement", if and only if "mypattern" fills a whole line.

Another option is to use some of the zero-length patterns Perl-style regular expressions provide. \b means "word boundary", which is very useful if you want to match whole words only. For example:

Code:

s/\bmypattern\b/replacement/g;

will replace all instances of "mypattern" with "replacement", but only where "mypattern" is a whole word. Note that this includes "mypattern." and ".mypattern" and a few others you wouldn't necessarily expect. See the perlre manual page for more information.

UnixKiwi

04-16-2007 06:20 PM

Samples of the actual datafiles I'm using this on (of course there is over 10000 records in this one & over 100000 in some of the others I have to look at). The numbers in the first 3 columns (where the find and replace should occur) of hwt.dat start at 1 and go up from there, leading to the issues in other columns if 1, 10 etc need replaced.

datafile1
789 G1985
193 G1988

hwt.dat
170 789 172 1 53.1 495 1 1 1 97 1985
143 382 188 1 69.0 446 2 2 2 21 1988
149 146 193 1 69.8 446 2 2 1 21 1988
148 332 197 1 71.8 446 2 2 2 21 1988

I intially used this statement to prepare the data for the sed script
awk '{print "s/"$1"/"$2"/g"}' datafile1 > temp5

then this to run it
sed -f temp5 datafile2 > newdata

I've just tried using this to incorporate the ^ and $ into the file like this:
awk '{print "s/^"$1"$/"$2"/g"}' datafile1 > temp5
However, it made no changes - I guess the issue is some conflict between the ^ and $ and the column designators aka $1, $2. Or I stuffed up somewhere (again).

Similarily I also tried the 2nd option suggested and got the same problem
awk '{print "s/\b"$1"\b/"$2"/g"}' datafile > temp5

ntubski

04-16-2007 08:30 PM

The first option would only work if you had a line with 789 on a line by itself in hw.dat.

The second option doesn't work because the \b's in your awk expression get substituted by bash for the backspace character: if you open temp5 with vi you'll see funny ^H's. You need to double up the backslashes

Code:

awk '{print "s/\\b"$1"\\b/"$2"/g"}' datafile > temp5

This could still be a problem if in your other columns you have a 789 somewhere. Here's a solution is Python that only does replacements on the first 3 columns

Code:

#!/usr/bin/env python

import sys



repl_file = open(sys.argv[1])

repl = dict()



for line in repl_file:

    old, new = line.split()

    repl[old] = new



repl_file.close()



for line in sys.stdin:

    cols = line.split(None, 3)

    print " ".join([repl.get(item,item) for item in cols[:3]]), cols[3],

Disclaimer: This is probably not great code (eg no error checking), I don't know Python very well, so I was looking up stuff as I went along...

sample usage:

Code:

~/tmp% ./replace.py datafile1 <hwt.dat  >newdata                                                                          

~/tmp% cat newdata                                                                                                        

170 G1985 172 1 53.1 495 1 1 1 97 1985

143 382 188 1 69.0 446 2 2 2 21 1988

149 146 G1988 1 69.8 446 2 2 1 21 1988

148 332 197 1 71.8 446 2 2 2 21 1988

UnixKiwi

04-16-2007 11:08 PM

used

Quote:

Originally Posted by ntubski

Code:

awk '{print "s/\\b"$1"\\b/"$2"/g"}' datafile > temp5

you were right there were issues with other "789" like numbers but I got around that by seperating the first 3 columns into a seperate datafile (awk '{print $1,....), using the recode and then rejoining the files

Thanks for all your help guys

All times are GMT -5. The time now is 01:03 AM.