AWK parsing a CVS file with a seperate list file
I am trying to find a way to parse a csv file, pulling out lines where the fourth field matches any value in a list file. I know you can use something like "Fgrep -f list input.csv" and that will pull out the lines matching any instance from the list but in my particular case i specifically need to match only field four... What i am currently doing is using a loop to cut out the fourth field passing it through grep again then printing the line to a file if it matches... I think there is just an easier way to do it in awk or maybe perl. Also performance is crucial here since the source file can have over 100K lines and the pattern list can have about 1000 lines.
So my code is: Code:
echo -e "Do Run1" As you can see this can take forever through a large file Thanks! |
Please show the format and some data for the 2 files?
Would you also please explain the concept behind the following line: Code:
echo -e ",${CID}," | /bin/grep -f CIDIDs.txt |
Quote:
Lets say the input csv file is composed of the folliwing data Code:
20120315,152638,0010000119,224,UT01,foobar,NVLS,D,0.00,3000,3000,0,48.4091,,,20120315886 And the pattern list has the following: Code:
4589 Code:
20120315,102707,0015000000,325,ESMT,,NWSA,X,20.15,3000,3000,0,20.1200,,,, In my real world example due to some preprocessing the pattern list has numbers with the commas in them like: Code:
,4589, |
Well using the real world example I would do something like:
Code:
awk -F, 'FNR==NR{list[$2];next}$4 in list' pattern input.csv |
Quote:
20120315,102707,0015000000,325,ESMT,,NWSA,X,20.15,3000,3000,0,20.1200,,,, when you compare it to the pattern list it will match patterns that are say like: 5325 3259 In my real world pattern list i actually have the commas in place there because it limits the pattern matching to exactly the pattern For example... the previous line should only match a pattern of ,325, and not ,3256, Is there a way to do that in the one liner? To actually include the commas and field four and have it evaluate as ,325, and not just 325? |
Did you test it? The array 'list' has indexes equal to exactly each value in the pattern, therefore, as neither 5325 or 3259 is equal to 325 it will not be in the array and hence not printed.
|
Indeed. grail's approach isn't using pattern matching. :}
|
Quote:
Alex :) |
All times are GMT -5. The time now is 10:37 PM. |