Awk - Process a set of records if field $5 of line 01 is 'W', otherwise copy set to o

High-T · 02-04-2015, 04:24 PM

Hi guys,

I am looking to write a script where I need to process many sets of transactions.
I want to process the set if $ 1 == "01" field $5 = "W", and $ 1 == "07" field $3 = "YY" otherwise copy set to output.

Example of the input file:

Code:

01 08 77 78 W 9890
02 08 66 68 0 8554
07 08 YY 85 9 7545
01 08 99 87 X 8787
04 09 85 85 4 8758
09 87 88 78 7 6584
10 84 ZZ 99 8 9887

A new set is always starting with $1 == "01".
Script should only process first set because its 5th value is "W" and put "MATCHED" in the end.
and copy the unmatched set "X" as it is to output.

Code:

Example of output file:
01 08 77 78 W 9890
02 08 66 68 0 8554
07 08 YY 85 9 7545
MATCHED
01 08 99 87 X 8787
04 09 85 85 4 8758
09 87 88 78 7 6584
10 84 ZZ 99 8 9887

and so on..
thanks for your help

syg00 · 02-04-2015, 04:42 PM

We are here to help, not write your wants.
Show us what you came up with and the problem(s) you're having, and maybe someone can help.

Should be reasonably straightforward in awk.

High-T · 02-04-2015, 04:52 PM

I am using this script.
This script is creating a matching key from File1.txt and comparing with File2.txt and if find matches putting MATCHED. But I need to put additional filtering to check if the type is W and line 07 has YY.

So the following script would execute the code when the above criteria is met (i.e. W and YY) If not then copy the set to OUT file.
I did not wanted to complicate things so thats why I did not put the script I am using.
I just wanted a small code that I can fit in my script but if would be great if it helps you.

Code:

awk '	
BEGIN {
		OFS="\t"
		OUT = "File1.txt"
		
		valid_columns["01"] = "01"
		valid_columns["02"] = "02"
		valid_columns["03"] = "03"
		valid_columns["04"] = "04"
		valid_columns["05"] = "05"
		valid_columns["06"] = "06"
		valid_columns["07"] = "07"

	}

	NR == FNR {		
		if(NF)
		{
			master_key[substr($0,1,14)] = $0 
		}
		next		
	}
	! $1 in valid_first_comments{
		next
	}
	
	
	$1 == "01" {				
		line_accumulator = $0 "\n"	
		key = $4 $3 $2
    	}	
    	$1 != "01" && $1 != "07" {
    		line_accumulator = line_accumulator $0 "\n"
    	}
	$1 == "07" {
		output_line = line_accumulator $0
		
		key = key $4
		
		if ( key in master_key )
		{
			print output_line > OUT
			print "MATCHED", master_key[key] > OUT	
		}
		
	}

	END {

	}

' File2.txt File1.txt

syg00 · 02-04-2015, 06:36 PM

I suspect if you have a close look at the following, your code will get a lot further.

Code:

! $1 in valid_first_comments{
		next
}

grail · 02-04-2015, 06:57 PM

So you seem to have a reasonable understanding of awk ... where is your issue about checking columns other than $1?

I also do not see anything in your code that shows how records are delimited ... ie RS would need to be set to something. I would make the suggestion that this may need to be different for each file.

High-T · 02-05-2015, 11:47 AM

Grail, i have declared that the records are delimited by tabs.

OFS="\t"

I have came up with this option. it is not finalized yet and I am working further on it.

Code:

awk '	
BEGIN {
		OFS="\t"
		OUT = "File1.txt"

		valid_order_tp = "W" 
		
		valid_tender_tp = "YY"

		valid_columns["01"] = "01"
		valid_columns["02"] = "02"
		valid_columns["03"] = "03"
		valid_columns["04"] = "04"
		valid_columns["05"] = "05"
		valid_columns["06"] = "06"
		valid_columns["07"] = "07"
	
	}

	NR == FNR {		
		if(NF)
		{
			key[substr($0,1,14)] = $0 
		}
		next		
	}
	! $1 in valid_comments{
		next
	}
	
		$1 == "01" { 
		valid_order = $5
	}
				
	$1 == "07" {
		valid_tender = $3
	}

	{if ( valid_order = valid_order_tp  && valid_tender in valid_tender_tp)
	do 

	$1 == "01" {				
		line_accumulator = $0 "\n"	
		key = $4 $3 $2
    	}	
    	$1 != "01" && $1 != "07" {
    		line_accumulator = line_accumulator $0 "\n"
    	}
	$1 == "07" {
		output_line = line_accumulator $0
		
		key = key $4
		
		if ( comparison_key in master_key )
		{
			print output_line > OUT
			print "MATCHED", master_key[key] > OUT	
		}
		
	}
done
	END {

	}

' File2.txt File1.txt

grail · 02-06-2015, 10:14 AM

OFS = Output Field Separator

This will not help you in the case where you are reading the file in. What I mean by record separator is the RS variable which will allow you to say where a complete record ends, in your case
that each record should start with 01 at the start of the line.

Code:

valid_tender_field in valid_tender_type

valid_tender_type is not an array, so I would be dubious on how this part of your 'if' would work??

I also note, that at no time do you reset your variables, hence the previous '07' line value will still be set when the next '01' is hit and so the test will still be true.
In fact, as you never rest it, once you hit your desired scenario once, it will always be true.