[SOLVED] Column substitution with grep inside awk inconsistent

r_hartman · 01-28-2014, 05:20 AM

Hi, I would like some input on the following issue:
I have two files, and need to replace a column in file1.csv with text from file2.txt.
Column 28 in file1.csv contains numbers from 0 to 20. file2 contains the same numbers but with a descriptive text, like:
0. N/A
1. Item1
2. Item2

I use the following command to replace '0' in file1.csv with '0. N/A' (etc.) from file2.txt:

awk 'BEGIN{FS=OFS=","}{("grep -Fw " $28 " file2.txt") | getline u; $28=u; print $0}' file1.csv

or even

awk 'BEGIN{FS=OFS=","}{("grep -Fw " $28 " file2.txt") | getline $28; print $0}' file1.csv

which produces the exact same result. With some syntax manipulation I can see that $28 always gets picked up correctly, and when I grep file2.txt from the command line I always get the correct result. However, getline seems to pick up that result inconsistently.

My approach works correctly for the first 4 lines in file1.csv, but then 1 record remains unaltered. It then converts 1 record OK again, but skips the following 2, the next 2 being OK again. Then it skips 1 and does 2 OK again. The skips get bigger as it gets deeper into file1.csv (336 records).

I've read about issues with getline, but fail to see an alternative way to achieve what I want. I need this to be 100% reliable.

Any thoughts would be appreciated.

pan64 · 01-28-2014, 06:07 AM

I would not use grep inside awk, but read file2 into an array (or hash) and use that to replace $28.

grail · 01-28-2014, 07:38 AM

Ditto on no grep in awk, especially when awk already knows how to pattern match.

Maybe you could give some sample data both in and out so we may better follow what you are trying to do.

Also, please use [code][/code] tags around code and data to make it easier to read.

ntubski · 01-28-2014, 09:57 AM

I agree with previous posters that grep shouldn't used here, but I suspect the problem in the original code is due to not using close(), see 5.8 Closing Input and Output Redirections.

r_hartman · 01-28-2014, 12:47 PM

Thanks for chiming in. I'd read about not using grep inside awk, but I don't know awk well enough to build an array from file2.txt and matching up values.

ntubski hit the nail on the head; I tried close("grep -Fw " $28 " file2.txt") but that did not seem to make a difference; once I packed the command into a cmd variable and used close(cmd) it worked as desired. While I'm aware this will be deemed a 'dirty' solution, I will go with it for now. I will need to delve deeper into awk, since performance is dreadful, with all the file opening and closing going on. But for my current purpose - a one-off conversion of less than 400 records - I can live with it.

Final command used:

Code:

$ awk 'BEGIN{FS=OFS=","}
    {cmd=("grep -Fw " $28 " file2.txt")
    cmd | getline $28
    print $0
    close(cmd)}' file1.csv

Thanks again!

pan64 · 01-29-2014, 12:25 AM

Code:

awk ' BEGIN {
#here comes the code to load file2.txt
    while (getline a < "file2.txt") {
        split (a, b)
        arr[b[2]] = b[1]
    }
}
{
# here you can replace $28
    $28 = arr[$28]
# you may need to check if arr[$28] exists
    print $0
} ' file1.csv

this is not tested but will show the idea (will run much faster)

grail · 01-29-2014, 12:42 AM

As a quick alternative to the BEGIN / while approach you can also use:

Code:

awk 'NR==FNR{arr[$1] = $0;next}{<replacement and your code here>}' file2.txt file1.txt

This reads and builds the array based on the first file passed in and then performs the other tasks on the second (or more) file(s)

r_hartman · 01-29-2014, 03:00 AM

Thanks gentlemen. I will definitely explore your examples.
Much appreciated.

Cheers.