Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
04-12-2007, 01:16 AM
|
#1
|
LQ Newbie
Registered: Dec 2006
Posts: 28
Rep:
|
find and replace script
Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).
Many thanks
|
|
|
04-12-2007, 08:49 AM
|
#2
|
Member
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938
Rep:
|
If you semi-understand Perl, why not take a stab at writing a script? If it doesn't work, come back with questions.
|
|
|
04-12-2007, 09:46 AM
|
#3
|
Member
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383
Rep:
|
s/pattern/replace
could you please post sample input and output ?
|
|
|
04-12-2007, 11:32 AM
|
#4
|
Member
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221
Rep:
|
Quote:
Originally Posted by UnixKiwi
Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).
|
Convert the second file to a sed script. For example, if your second file contains:one ONE
two TWO Change it to:s/one/ONE/g
s/two/TWO/g Then you can use sed:
Code:
sed -f sedscriptfile datafile
|
|
|
04-13-2007, 04:07 AM
|
#5
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515
|
substitute in files making a .bak backup file...
Code:
perl -pi.bak -e 's/this/that/g' file ...
|
|
|
04-13-2007, 06:25 PM
|
#6
|
LQ Newbie
Registered: Dec 2006
Posts: 28
Original Poster
Rep:
|
Quote:
Originally Posted by kshkid
could you please post sample input and output ?
|
Sample Input
file A
1 4
2 5
3 6
file B
3 9
5 8
Output
1 4
2 8
9 6
Last edited by UnixKiwi; 04-13-2007 at 06:26 PM.
|
|
|
04-13-2007, 06:46 PM
|
#7
|
Member
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221
Rep:
|
Quote:
Originally Posted by UnixKiwi
Sample Input
file A
1 4
2 5
3 6
file B
3 9
5 8
Output
1 4
2 8
9 6
|
As I posted before, turn the second file into a sed script:
Code:
sed "$( awk '{ printf "s/%s/%s/g\n", $1, $2 }' "$FILEB" )" "$F1LEA"
|
|
|
04-15-2007, 11:24 PM
|
#8
|
LQ Newbie
Registered: Dec 2006
Posts: 28
Original Poster
Rep:
|
It worked on a simple example but when I tried to run it on my data it went haywire.
As far as I can tell (and this is a stab in the dark but makes sense to me) when I have 2 numbers like 10 and 101, and I want 10 replaced with 8, it is giving me 8 and 81 aka replacing parts of numbers as well as whole numbers. Is there any way for it to make it read whole numbers only?
|
|
|
04-16-2007, 02:54 AM
|
#9
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515
|
welcome to the world of regular expressions!
what are you using?
clue?
|
|
|
04-16-2007, 03:35 AM
|
#10
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
If you want to match whole lines ones, use ^ and $ at the start and end of your pattern to denote start of line and end of line. E.g.
Code:
s/^mypattern$/replacement/g;
...will replace "mypattern" with "replacement", if and only if "mypattern" fills a whole line.
Another option is to use some of the zero-length patterns Perl-style regular expressions provide. \b means "word boundary", which is very useful if you want to match whole words only. For example:
Code:
s/\bmypattern\b/replacement/g;
will replace all instances of "mypattern" with "replacement", but only where "mypattern" is a whole word. Note that this includes "mypattern." and ".mypattern" and a few others you wouldn't necessarily expect. See the perlre manual page for more information.
|
|
|
04-16-2007, 06:20 PM
|
#11
|
LQ Newbie
Registered: Dec 2006
Posts: 28
Original Poster
Rep:
|
Samples of the actual datafiles I'm using this on (of course there is over 10000 records in this one & over 100000 in some of the others I have to look at). The numbers in the first 3 columns (where the find and replace should occur) of hwt.dat start at 1 and go up from there, leading to the issues in other columns if 1, 10 etc need replaced.
datafile1
789 G1985
193 G1988
hwt.dat
170 789 172 1 53.1 495 1 1 1 97 1985
143 382 188 1 69.0 446 2 2 2 21 1988
149 146 193 1 69.8 446 2 2 1 21 1988
148 332 197 1 71.8 446 2 2 2 21 1988
I intially used this statement to prepare the data for the sed script
awk '{print "s/"$1"/"$2"/g"}' datafile1 > temp5
then this to run it
sed -f temp5 datafile2 > newdata
I've just tried using this to incorporate the ^ and $ into the file like this:
awk '{print "s/^"$1"$/"$2"/g"}' datafile1 > temp5
However, it made no changes - I guess the issue is some conflict between the ^ and $ and the column designators aka $1, $2. Or I stuffed up somewhere (again).
Similarily I also tried the 2nd option suggested and got the same problem
awk '{print "s/\b"$1"\b/"$2"/g"}' datafile > temp5
|
|
|
04-16-2007, 08:30 PM
|
#12
|
Senior Member
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,797
|
The first option would only work if you had a line with 789 on a line by itself in hw.dat.
The second option doesn't work because the \b's in your awk expression get substituted by bash for the backspace character: if you open temp5 with vi you'll see funny ^H's. You need to double up the backslashes
Code:
awk '{print "s/\\b"$1"\\b/"$2"/g"}' datafile > temp5
This could still be a problem if in your other columns you have a 789 somewhere. Here's a solution is Python that only does replacements on the first 3 columns
Code:
#!/usr/bin/env python
import sys
repl_file = open(sys.argv[1])
repl = dict()
for line in repl_file:
old, new = line.split()
repl[old] = new
repl_file.close()
for line in sys.stdin:
cols = line.split(None, 3)
print " ".join([repl.get(item,item) for item in cols[:3]]), cols[3],
Disclaimer: This is probably not great code (eg no error checking), I don't know Python very well, so I was looking up stuff as I went along...
sample usage:
Code:
~/tmp% ./replace.py datafile1 <hwt.dat >newdata
~/tmp% cat newdata
170 G1985 172 1 53.1 495 1 1 1 97 1985
143 382 188 1 69.0 446 2 2 2 21 1988
149 146 G1988 1 69.8 446 2 2 1 21 1988
148 332 197 1 71.8 446 2 2 2 21 1988
Last edited by ntubski; 04-16-2007 at 10:04 PM.
|
|
|
04-16-2007, 11:08 PM
|
#13
|
LQ Newbie
Registered: Dec 2006
Posts: 28
Original Poster
Rep:
|
used
Quote:
Originally Posted by ntubski
Code:
awk '{print "s/\\b"$1"\\b/"$2"/g"}' datafile > temp5
|
you were right there were issues with other "789" like numbers but I got around that by seperating the first 3 columns into a seperate datafile (awk '{print $1,....), using the recode and then rejoining the files
Thanks for all your help guys
|
|
|
All times are GMT -5. The time now is 02:27 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|