LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Problems to exclude lines and bad filter within AWK script (https://www.linuxquestions.org/questions/programming-9/problems-to-exclude-lines-and-bad-filter-within-awk-script-805048/)

cgcamal 04-29-2010 04:20 PM

Problems to exclude lines and bad filter within AWK script
 
Hi everyone,

Looking for some help to fix 2 problems I have in my script.
(I´m using bash on cygwin)

I have the following source file ($7 does not have data):

Code:

HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,

I have this script:
Code:

awk 'BEGIN { FS = OFS = "," } #-1) Set "," as field separator

{ $7 = $2; sub(/\/.*/,"",$7) } #-2) (To copy $2 data into $7 and after that deletes "/Sub data1/Sub data2" from $7)

NR == 1 { $7="NEW_HEADER" } #-3) Changing first line header text in column 7

{/HEADER/||/pattern1/||/pattern2/ } #-4) Filter to search these patterns in every column

{$2 !~ /pattern4|pattern5|pattern6/ } #-5 To exclude pattern4, pattern5 and pattern6

{print $1, $7, $3, $4, $5, $6, $2}' inputfile #-6) Print in different order the 7 columns

And I get the following output:
Code:

HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2 # Why is printed this line?
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub data1/Sub data2

Question 1:
In line 5 of the script, that is " {$2 !~ /pattern4|pattern5|pattern6/ }" oriented to delete lines containing pattern4, pattern5 and pattern6 from column 2 it seems not to be working.

What I´m doing wrong in this case?

Question 2:
If I see the output, the line highlighted in red, is present and should not appear, because this line does not contain nor HEADER nor pattern1 nor pattern2.

Why does this happen?

Maybe somebody could help me with this.

Thanks in advance.

colucix 04-29-2010 05:04 PM

Code:

{ $2 !~ /pattern4|pattern5|pattern6/ }
This line of code actually does... nothing! It is simply a test which returns TRUE or FALSE, but no action is taken and it cannot serve to "delete" lines. The same for
Code:

{ /HEADER/||/pattern1/||/pattern2/ }
again it is just a test and it doesn't print out anything.

Maybe you want something like:
Code:

/HEADER/||/pattern1/||/pattern2/ {
  if ( $2 !~ /pattern4|pattern5|pattern6/ )
    print $1, $7, $3, $4, $5, $6, $2
}

That is: only for lines matching "HEADER", "pattern1" or "pattern2", if $2 doesn't match "pattern4", "pattern5" or "pattern6", print out something. Hope this helps.

PTrenholme 04-29-2010 05:20 PM

For question 2, the "conditional" part of an AWK stanza applies to the bracketed statements following the condition.

In other words /condition/{statement1;statement2;..}is processed when the "condition" is true. Since the print statement HAS NO CONDITION, it is executed for every line since "no condition" is defined as "true."

As to question 1, I believe your syntax is incorrect.

Try:
Code:

/HEADER|pattern1|pattern2|pattern4|pattern5|pattern6/{next;}
{print $1, $7, $3, $4, $5, $6, $2}


cgcamal 04-29-2010 05:57 PM

PTrenholme,

Many thanks for your explanation. Now I got the point, I´m more clear now about the AWK condition and its dunction.

Hey colucix, many thanks it works :-)!!!

But testing your suggestion I detected something that could give me bad results in the past.

Why the use of a brace "{" in the same line or in the next one change the output in so obvious manner? (I explain my question below)

I´ve noticed that If I run this script (the right one):

Code:

awk  '
 BEGIN { FS = OFS = "," }
 
 { $7 = $2; sub(/\/.*/,"",$7) }
 
 NR == 1 { $7="NEW_HEADER" }
 
/HEADER/||/pattern1/||/pattern2/ { # With the brace in the same line as /HEADER/...

if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}'
inputfile

I get the correct and desired output:
Code:

HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2

But if I put the the brace "{" in the line below "/HEADER/||/pattern1/||/pattern2/" I mean:
Code:

/HEADER/||/pattern1/||/pattern2/
{
if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}' inputfile

Instead of put the brace "{" immediately after "/HEADER/||/pattern1/||/pattern2/"
Code:

/HEADER/||/pattern1/||/pattern2/ {
The output is wrong and appear repeated lines as can be seen below:
Code:

HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,NEW_HEADER
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,pattern7
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,pattern6
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,pattern9
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,pattern5
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,pattern7
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,pattern2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,pattern4

Many thanks in advance.

Regards.

PTrenholme 04-29-2010 06:47 PM

In the main body of an AWK program, the syntax is test{expression} with the "test" part defaulting to EMPTY and the "expression" part defaulting to {}. Therefore your second program contains two stanzas in the main body: A "test" followed by a (default) {}, and another stanza consisting an (implied) EMPTY test followed by a print expression that's executed for every input record.

Note that you can use a backslash at the end of an input line to continue it to the next line. So
Code:

/HEADER/||/pattern1/||/pattern2/ \
{
if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}' inputfile

should work.

cgcamal 04-29-2010 08:16 PM

PTrenholme,

Really thanks for clarify me this points. At the first attempt to test your suggestion, I got an error even when I put the backslash after ....pattern2/, but the error reported by console it was easy to undertand.(sometimes I just don´t have idea what printed error means)


Code:

awk: cmd. line:7: /HEADER/||/pattern1/||/pattern2/ \
awk: cmd. line:7:                                  ^ backslash not last character on line

There was a space (invisible jaja) acting like last character, I delete that space and the script
becomes in a correct one.

I´ve learned so interesting things with answers of you both. I now understand that many things I just enable to get I working script due to apparently little details like these you´ve explained me. This will help me a lot in the future.

Many thanks.

Best regards.

grail 04-29-2010 11:57 PM

I know you have your solution, but thought I would show you how you could neaten it up a bit:
Code:

awk 'BEGIN{OFS=FS=","}
NR==1{$7="NEW_HEADER"}
NR>1{$7=$2;gsub(/\/.*/,"",$2)}
/HEADER|pattern1|pattern2/ && $2 !~ /pattern4|pattern5|pattern6/{print}' inputfile


cgcamal 04-30-2010 12:38 AM

Hey grail, how are you doing?

Great, great! My script is neated up ussing your suggestion. It´s much better, much faster and much shorter than the first one.

I see several things more clear regarding how to use in better way AWK tools and how to find syntax errors. Many thanks for your great and kindly help again.


All times are GMT -5. The time now is 11:01 AM.