ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).
Is there a simple (preferrably unix, awk or perl which I semi understand) that can find and replace data in an ascii file according to a second file (aka with the data to be replaced in one column and the new data in the 2nd column).
Convert the second file to a sed script. For example, if your second file contains:
It worked on a simple example but when I tried to run it on my data it went haywire.
As far as I can tell (and this is a stab in the dark but makes sense to me) when I have 2 numbers like 10 and 101, and I want 10 replaced with 8, it is giving me 8 and 81 aka replacing parts of numbers as well as whole numbers. Is there any way for it to make it read whole numbers only?
If you want to match whole lines ones, use ^ and $ at the start and end of your pattern to denote start of line and end of line. E.g.
Code:
s/^mypattern$/replacement/g;
...will replace "mypattern" with "replacement", if and only if "mypattern" fills a whole line.
Another option is to use some of the zero-length patterns Perl-style regular expressions provide. \b means "word boundary", which is very useful if you want to match whole words only. For example:
Code:
s/\bmypattern\b/replacement/g;
will replace all instances of "mypattern" with "replacement", but only where "mypattern" is a whole word. Note that this includes "mypattern." and ".mypattern" and a few others you wouldn't necessarily expect. See the perlre manual page for more information.
Samples of the actual datafiles I'm using this on (of course there is over 10000 records in this one & over 100000 in some of the others I have to look at). The numbers in the first 3 columns (where the find and replace should occur) of hwt.dat start at 1 and go up from there, leading to the issues in other columns if 1, 10 etc need replaced.
I intially used this statement to prepare the data for the sed script
awk '{print "s/"$1"/"$2"/g"}' datafile1 > temp5
then this to run it
sed -f temp5 datafile2 > newdata
I've just tried using this to incorporate the ^ and $ into the file like this:
awk '{print "s/^"$1"$/"$2"/g"}' datafile1 > temp5
However, it made no changes - I guess the issue is some conflict between the ^ and $ and the column designators aka $1, $2. Or I stuffed up somewhere (again).
Similarily I also tried the 2nd option suggested and got the same problem
awk '{print "s/\b"$1"\b/"$2"/g"}' datafile > temp5
The first option would only work if you had a line with 789 on a line by itself in hw.dat.
The second option doesn't work because the \b's in your awk expression get substituted by bash for the backspace character: if you open temp5 with vi you'll see funny ^H's. You need to double up the backslashes
This could still be a problem if in your other columns you have a 789 somewhere. Here's a solution is Python that only does replacements on the first 3 columns
Code:
#!/usr/bin/env python
import sys
repl_file = open(sys.argv[1])
repl = dict()
for line in repl_file:
old, new = line.split()
repl[old] = new
repl_file.close()
for line in sys.stdin:
cols = line.split(None, 3)
print " ".join([repl.get(item,item) for item in cols[:3]]), cols[3],
Disclaimer: This is probably not great code (eg no error checking), I don't know Python very well, so I was looking up stuff as I went along...
you were right there were issues with other "789" like numbers but I got around that by seperating the first 3 columns into a seperate datafile (awk '{print $1,....), using the recode and then rejoining the files
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.