LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-28-2014, 05:20 AM   #1
r_hartman
LQ Newbie
 
Registered: Feb 2011
Location: Netherlands
Distribution: CentOS
Posts: 15

Rep: Reputation: 0
Column substitution with grep inside awk inconsistent


Hi, I would like some input on the following issue:
I have two files, and need to replace a column in file1.csv with text from file2.txt.
Column 28 in file1.csv contains numbers from 0 to 20. file2 contains the same numbers but with a descriptive text, like:
0. N/A
1. Item1
2. Item2

I use the following command to replace '0' in file1.csv with '0. N/A' (etc.) from file2.txt:

awk 'BEGIN{FS=OFS=","}{("grep -Fw " $28 " file2.txt") | getline u; $28=u; print $0}' file1.csv

or even

awk 'BEGIN{FS=OFS=","}{("grep -Fw " $28 " file2.txt") | getline $28; print $0}' file1.csv

which produces the exact same result. With some syntax manipulation I can see that $28 always gets picked up correctly, and when I grep file2.txt from the command line I always get the correct result. However, getline seems to pick up that result inconsistently.

My approach works correctly for the first 4 lines in file1.csv, but then 1 record remains unaltered. It then converts 1 record OK again, but skips the following 2, the next 2 being OK again. Then it skips 1 and does 2 OK again. The skips get bigger as it gets deeper into file1.csv (336 records).

I've read about issues with getline, but fail to see an alternative way to achieve what I want. I need this to be 100% reliable.

Any thoughts would be appreciated.
 
Old 01-28-2014, 06:07 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
I would not use grep inside awk, but read file2 into an array (or hash) and use that to replace $28.
 
1 members found this post helpful.
Old 01-28-2014, 07:38 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Ditto on no grep in awk, especially when awk already knows how to pattern match.

Maybe you could give some sample data both in and out so we may better follow what you are trying to do.

Also, please use [code][/code] tags around code and data to make it easier to read.
 
2 members found this post helpful.
Old 01-28-2014, 09:57 AM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
I agree with previous posters that grep shouldn't used here, but I suspect the problem in the original code is due to not using close(), see 5.8 Closing Input and Output Redirections.
 
2 members found this post helpful.
Old 01-28-2014, 12:47 PM   #5
r_hartman
LQ Newbie
 
Registered: Feb 2011
Location: Netherlands
Distribution: CentOS
Posts: 15

Original Poster
Rep: Reputation: 0
Thumbs up Solved

Thanks for chiming in. I'd read about not using grep inside awk, but I don't know awk well enough to build an array from file2.txt and matching up values.

ntubski hit the nail on the head; I tried close("grep -Fw " $28 " file2.txt") but that did not seem to make a difference; once I packed the command into a cmd variable and used close(cmd) it worked as desired. While I'm aware this will be deemed a 'dirty' solution, I will go with it for now. I will need to delve deeper into awk, since performance is dreadful, with all the file opening and closing going on. But for my current purpose - a one-off conversion of less than 400 records - I can live with it.

Final command used:
Code:
$ awk 'BEGIN{FS=OFS=","}
    {cmd=("grep -Fw " $28 " file2.txt")
    cmd | getline $28
    print $0
    close(cmd)}' file1.csv
Thanks again!
 
Old 01-29-2014, 12:25 AM   #6
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
Code:
awk ' BEGIN {
#here comes the code to load file2.txt
    while (getline a < "file2.txt") {
        split (a, b)
        arr[b[2]] = b[1]
    }
}
{
# here you can replace $28
    $28 = arr[$28]
# you may need to check if arr[$28] exists
    print $0
} ' file1.csv
this is not tested but will show the idea (will run much faster)
 
1 members found this post helpful.
Old 01-29-2014, 12:42 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
As a quick alternative to the BEGIN / while approach you can also use:
Code:
awk 'NR==FNR{arr[$1] = $0;next}{<replacement and your code here>}' file2.txt file1.txt
This reads and builds the array based on the first file passed in and then performs the other tasks on the second (or more) file(s)
 
1 members found this post helpful.
Old 01-29-2014, 03:00 AM   #8
r_hartman
LQ Newbie
 
Registered: Feb 2011
Location: Netherlands
Distribution: CentOS
Posts: 15

Original Poster
Rep: Reputation: 0
Thumbs up

Thanks gentlemen. I will definitely explore your examples.
Much appreciated.

Cheers.
 
  


Reply

Tags
awk, columns, grep, substitution



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I use grep inside awk? Helene Programming 10 09-29-2015 08:48 PM
[SOLVED] AWK fill column from previuos line column akeka Programming 4 01-30-2013 07:16 PM
awk multiple column into single column ilukacevic Programming 49 07-19-2010 07:23 PM
how to grep awk or sed the first row and column Bone11409 Linux - Newbie 2 03-21-2010 08:18 PM
awk gsub() command - string (column) manipulation - substitution casperdaghost Linux - Newbie 1 03-08-2010 02:12 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration