[SOLVED] Column substitution with grep inside awk inconsistent
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Column substitution with grep inside awk inconsistent
Hi, I would like some input on the following issue:
I have two files, and need to replace a column in file1.csv with text from file2.txt.
Column 28 in file1.csv contains numbers from 0 to 20. file2 contains the same numbers but with a descriptive text, like:
0. N/A
1. Item1
2. Item2
I use the following command to replace '0' in file1.csv with '0. N/A' (etc.) from file2.txt:
which produces the exact same result. With some syntax manipulation I can see that $28 always gets picked up correctly, and when I grep file2.txt from the command line I always get the correct result. However, getline seems to pick up that result inconsistently.
My approach works correctly for the first 4 lines in file1.csv, but then 1 record remains unaltered. It then converts 1 record OK again, but skips the following 2, the next 2 being OK again. Then it skips 1 and does 2 OK again. The skips get bigger as it gets deeper into file1.csv (336 records).
I've read about issues with getline, but fail to see an alternative way to achieve what I want. I need this to be 100% reliable.
I agree with previous posters that grep shouldn't used here, but I suspect the problem in the original code is due to not using close(), see 5.8 Closing Input and Output Redirections.
Thanks for chiming in. I'd read about not using grep inside awk, but I don't know awk well enough to build an array from file2.txt and matching up values.
ntubski hit the nail on the head; I tried close("grep -Fw " $28 " file2.txt") but that did not seem to make a difference; once I packed the command into a cmd variable and used close(cmd) it worked as desired. While I'm aware this will be deemed a 'dirty' solution, I will go with it for now. I will need to delve deeper into awk, since performance is dreadful, with all the file opening and closing going on. But for my current purpose - a one-off conversion of less than 400 records - I can live with it.
awk ' BEGIN {
#here comes the code to load file2.txt
while (getline a < "file2.txt") {
split (a, b)
arr[b[2]] = b[1]
}
}
{
# here you can replace $28
$28 = arr[$28]
# you may need to check if arr[$28] exists
print $0
} ' file1.csv
this is not tested but will show the idea (will run much faster)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.