LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   awk with pipe delimited file (specific column matching and multiple pattern matching) (http://www.linuxquestions.org/questions/programming-9/awk-with-pipe-delimited-file-specific-column-matching-and-multiple-pattern-matching-900345/)

lolmon 08-31-2011 12:41 AM

awk with pipe delimited file (specific column matching and multiple pattern matching)
 
Hi all, so I am working on a bash script, and am currently stuck trying to figure out how to deal with the following file:

Code:

467487|field1|more fields|first pattern|stuff|470061|more text|text with spaces|None|Red|another_field|8/30/2011 5:12:30 PM|9/6/2011 5:12:30 PM|one more text field with spaces|651463|
468751|field1|more fields|first pattern|stff|470061|more text|text with spaces|None|Red|another_field|8/30/2011 1:12:30 AM|9/2/2011 5:12:30 PM|one more text field with spaces|651463|
450104|field1|more fields|Legend|text|4700621|more text|text with spaces|None|Red|another_field|8/30/2011 5:44:30 PM|9/1/2011 5:12:30 PM|one more text field with spaces|651463|
465318|field1|more fields|Legend|text|442061|more text|text with spaces|None|Red|another_field|8/21/2011 2:12:30 PM|9/3/2011 5:12:30 PM|one more text field with spaces|651463|

So, what I need to do is match not only 'first pattern' in 4th column, but the date in the 12th column. (the date that needs to me matched in this example is 8/30/2011 and that is stored in $today)
Assuming both of these patterns match, (in the specific column, as they exist in other columns I do not care about) I also need to increment a counter by one.

Now, assuming I have been learning right, I need something like this to start with:

awk -F'|' {
'$4=="first pattern" print $0 >> count.txt
}'

then, I did a linecount on count to find out how many. Now, I know there is a way to do counts in awk, so I dont need to waste clock cycles similar to adding { ++x }END { print x } at the end of my expressions, however, I do not know how I can combine all this so I can add to a counter if both '$4=="first pattern" and $12==$date.

Any help that could steer me in the right direction would be truly fantastic.

-lolmon

Tinkster 08-31-2011 12:56 AM

Hi, welcome to LQ!

Almost there!

Code:

awk -F'|' -v date="$date" '$4=="first pattern" && $12==date {x++}END{print x}' file

Cheers,
Tink

lolmon 08-31-2011 01:20 AM

Quote:

Originally Posted by Tinkster (Post 4457629)
Hi, welcome to LQ!

Almost there!

Code:

awk -F'|' -v date="$date" '$4=="first pattern" && $12==date {x++}END{print x}' file

Cheers,
Tink

Thank you so much! I cannot believe it was as simple as a &&, however I am still having trouble with the date section, if I remove it, it does pick up everything in column 4, but if I keep it in there, it finds nothing, is awk making sure that column 12 ONLY has the date, and since there is also hours mins and seconds for time, its not picking it up? And if so, what is the best way to fix it?

I tried doing ^date, but it does not like that format, something I would assume has to do with the ==, which I am looking up right now, but I figured I should post anyway and thank you for the first part.

EDIT: it is clearly that, I have removed the extra time info and the script works just as planned, so all I need to do is figure out how to tell it that that column only must start with the date...
EDIT2: it works if I use the ~ match op instead of ==, and that should be fine considering the data file will look like my example all the time anyway.

thanks again!

Tinkster 08-31-2011 02:56 AM

Well ... the problem is that you need to populate the shell variable $date
with the exact time and date you're looking for. If it's just a date then,
yes, the '~' for the date part is adequate :}


Cheers,
Tink

David the H. 08-31-2011 12:17 PM

Incidentally, you could also do the same thing entirely in bash, using an array.

Code:

#!/bin/bash

file="inputfile.txt"
pattern="first pattern"
today="8/30/2011"
count=0

IFS="|"
while read -a line ; do

    if [[ "${line[3]}" == $pattern && "${line[11]}" == $today* ]]; then
          (( count++ ))
    fi

done <"$file"

echo "$count"

Bash's [[ test will let you use either globbing or regex in the tests, whichever suits your purposes better.

I imagine awk would still be more efficient for a large file, however.


All times are GMT -5. The time now is 07:45 PM.