LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-12-2007, 01:16 AM   #1
UnixKiwi
LQ Newbie
 
Registered: Dec 2006
Posts: 28

Rep: Reputation: 15
Smile find and replace script


Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).

Many thanks
 
Old 04-12-2007, 08:49 AM   #2
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 31
If you semi-understand Perl, why not take a stab at writing a script? If it doesn't work, come back with questions.
 
Old 04-12-2007, 09:46 AM   #3
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Rep: Reputation: 30
s/pattern/replace

could you please post sample input and output ?
 
Old 04-12-2007, 11:32 AM   #4
cfaj
Member
 
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221

Rep: Reputation: 31
Quote:
Originally Posted by UnixKiwi
Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).

Convert the second file to a sed script. For example, if your second file contains:
one ONE
two TWO
Change it to:
s/one/ONE/g
s/two/TWO/g
Then you can use sed:
Code:
sed -f sedscriptfile datafile
 
Old 04-13-2007, 04:07 AM   #5
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
substitute in files making a .bak backup file...

Code:
perl -pi.bak -e 's/this/that/g' file ...
 
Old 04-13-2007, 06:25 PM   #6
UnixKiwi
LQ Newbie
 
Registered: Dec 2006
Posts: 28

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by kshkid
could you please post sample input and output ?
Sample Input
file A
1 4
2 5
3 6

file B
3 9
5 8

Output
1 4
2 8
9 6

Last edited by UnixKiwi; 04-13-2007 at 06:26 PM.
 
Old 04-13-2007, 06:46 PM   #7
cfaj
Member
 
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221

Rep: Reputation: 31
Quote:
Originally Posted by UnixKiwi
Sample Input
file A
1 4
2 5
3 6

file B
3 9
5 8

Output
1 4
2 8
9 6

As I posted before, turn the second file into a sed script:

Code:
sed  "$( awk '{ printf "s/%s/%s/g\n", $1, $2 }' "$FILEB" )" "$F1LEA"
 
Old 04-15-2007, 11:24 PM   #8
UnixKiwi
LQ Newbie
 
Registered: Dec 2006
Posts: 28

Original Poster
Rep: Reputation: 15
It worked on a simple example but when I tried to run it on my data it went haywire.
As far as I can tell (and this is a stab in the dark but makes sense to me) when I have 2 numbers like 10 and 101, and I want 10 replaced with 8, it is giving me 8 and 81 aka replacing parts of numbers as well as whole numbers. Is there any way for it to make it read whole numbers only?
 
Old 04-16-2007, 02:54 AM   #9
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
welcome to the world of regular expressions!

what are you using?

clue?
 
Old 04-16-2007, 03:35 AM   #10
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
If you want to match whole lines ones, use ^ and $ at the start and end of your pattern to denote start of line and end of line. E.g.
Code:
s/^mypattern$/replacement/g;
...will replace "mypattern" with "replacement", if and only if "mypattern" fills a whole line.

Another option is to use some of the zero-length patterns Perl-style regular expressions provide. \b means "word boundary", which is very useful if you want to match whole words only. For example:
Code:
s/\bmypattern\b/replacement/g;
will replace all instances of "mypattern" with "replacement", but only where "mypattern" is a whole word. Note that this includes "mypattern." and ".mypattern" and a few others you wouldn't necessarily expect. See the perlre manual page for more information.
 
Old 04-16-2007, 06:20 PM   #11
UnixKiwi
LQ Newbie
 
Registered: Dec 2006
Posts: 28

Original Poster
Rep: Reputation: 15
Samples of the actual datafiles I'm using this on (of course there is over 10000 records in this one & over 100000 in some of the others I have to look at). The numbers in the first 3 columns (where the find and replace should occur) of hwt.dat start at 1 and go up from there, leading to the issues in other columns if 1, 10 etc need replaced.

datafile1
789 G1985
193 G1988

hwt.dat
170 789 172 1 53.1 495 1 1 1 97 1985
143 382 188 1 69.0 446 2 2 2 21 1988
149 146 193 1 69.8 446 2 2 1 21 1988
148 332 197 1 71.8 446 2 2 2 21 1988

I intially used this statement to prepare the data for the sed script
awk '{print "s/"$1"/"$2"/g"}' datafile1 > temp5

then this to run it
sed -f temp5 datafile2 > newdata

I've just tried using this to incorporate the ^ and $ into the file like this:
awk '{print "s/^"$1"$/"$2"/g"}' datafile1 > temp5
However, it made no changes - I guess the issue is some conflict between the ^ and $ and the column designators aka $1, $2. Or I stuffed up somewhere (again).

Similarily I also tried the 2nd option suggested and got the same problem
awk '{print "s/\b"$1"\b/"$2"/g"}' datafile > temp5
 
Old 04-16-2007, 08:30 PM   #12
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,774

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
The first option would only work if you had a line with 789 on a line by itself in hw.dat.

The second option doesn't work because the \b's in your awk expression get substituted by bash for the backspace character: if you open temp5 with vi you'll see funny ^H's. You need to double up the backslashes
Code:
 awk '{print "s/\\b"$1"\\b/"$2"/g"}' datafile > temp5
This could still be a problem if in your other columns you have a 789 somewhere. Here's a solution is Python that only does replacements on the first 3 columns

Code:
#!/usr/bin/env python
import sys

repl_file = open(sys.argv[1])
repl = dict()

for line in repl_file:
    old, new = line.split()
    repl[old] = new

repl_file.close()

for line in sys.stdin:
    cols = line.split(None, 3)
    print " ".join([repl.get(item,item) for item in cols[:3]]), cols[3],
Disclaimer: This is probably not great code (eg no error checking), I don't know Python very well, so I was looking up stuff as I went along...

sample usage:
Code:
~/tmp% ./replace.py datafile1 <hwt.dat  >newdata                                                                          
~/tmp% cat newdata                                                                                                        
170 G1985 172 1 53.1 495 1 1 1 97 1985
143 382 188 1 69.0 446 2 2 2 21 1988
149 146 G1988 1 69.8 446 2 2 1 21 1988
148 332 197 1 71.8 446 2 2 2 21 1988

Last edited by ntubski; 04-16-2007 at 10:04 PM.
 
Old 04-16-2007, 11:08 PM   #13
UnixKiwi
LQ Newbie
 
Registered: Dec 2006
Posts: 28

Original Poster
Rep: Reputation: 15
used
Quote:
Originally Posted by ntubski
Code:
 awk '{print "s/\\b"$1"\\b/"$2"/g"}' datafile > temp5
you were right there were issues with other "789" like numbers but I got around that by seperating the first 3 columns into a seperate datafile (awk '{print $1,....), using the recode and then rejoining the files

Thanks for all your help guys
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script to find/replace build new TAB record ljungers Programming 6 01-19-2007 04:47 PM
find and replace happy78 Programming 11 09-10-2005 10:21 AM
Help - how to find and replace in Vim stardotstar Linux - Software 7 10-14-2004 11:31 PM
1. shell script "find and replace" on text 2. java GUI application randomx Programming 4 03-05-2004 01:01 PM
Find and Replace? duerra Linux - General 9 01-28-2004 04:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration