LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-29-2010, 04:20 PM   #1
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Rep: Reputation: 16
Problems to exclude lines and bad filter within AWK script


Hi everyone,

Looking for some help to fix 2 problems I have in my script.
(Im using bash on cygwin)

I have the following source file ($7 does not have data):

Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,
I have this script:
Code:
awk 'BEGIN { FS = OFS = "," } #-1) Set "," as field separator

{ $7 = $2; sub(/\/.*/,"",$7) } #-2) (To copy $2 data into $7 and after that deletes "/Sub data1/Sub data2" from $7)

NR == 1 { $7="NEW_HEADER" } #-3) Changing first line header text in column 7 

{/HEADER/||/pattern1/||/pattern2/ } #-4) Filter to search these patterns in every column

{$2 !~ /pattern4|pattern5|pattern6/ } #-5 To exclude pattern4, pattern5 and pattern6

{print $1, $7, $3, $4, $5, $6, $2}' inputfile #-6) Print in different order the 7 columns
And I get the following output:
Code:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2 # Why is printed this line?
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub data1/Sub data2
Question 1:
In line 5 of the script, that is " {$2 !~ /pattern4|pattern5|pattern6/ }" oriented to delete lines containing pattern4, pattern5 and pattern6 from column 2 it seems not to be working.

What Im doing wrong in this case?

Question 2:
If I see the output, the line highlighted in red, is present and should not appear, because this line does not contain nor HEADER nor pattern1 nor pattern2.

Why does this happen?

Maybe somebody could help me with this.

Thanks in advance.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 04-29-2010, 05:04 PM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,503

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Code:
{ $2 !~ /pattern4|pattern5|pattern6/ }
This line of code actually does... nothing! It is simply a test which returns TRUE or FALSE, but no action is taken and it cannot serve to "delete" lines. The same for
Code:
{ /HEADER/||/pattern1/||/pattern2/ }
again it is just a test and it doesn't print out anything.

Maybe you want something like:
Code:
/HEADER/||/pattern1/||/pattern2/ {
  if ( $2 !~ /pattern4|pattern5|pattern6/ )
    print $1, $7, $3, $4, $5, $6, $2
}
That is: only for lines matching "HEADER", "pattern1" or "pattern2", if $2 doesn't match "pattern4", "pattern5" or "pattern6", print out something. Hope this helps.
 
1 members found this post helpful.
Old 04-29-2010, 05:20 PM   #3
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,150

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
For question 2, the "conditional" part of an AWK stanza applies to the bracketed statements following the condition.

In other words /condition/{statement1;statement2;..}is processed when the "condition" is true. Since the print statement HAS NO CONDITION, it is executed for every line since "no condition" is defined as "true."

As to question 1, I believe your syntax is incorrect.

Try:
Code:
/HEADER|pattern1|pattern2|pattern4|pattern5|pattern6/{next;}
{print $1, $7, $3, $4, $5, $6, $2}
 
1 members found this post helpful.
Old 04-29-2010, 05:57 PM   #4
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Original Poster
Rep: Reputation: 16
PTrenholme,

Many thanks for your explanation. Now I got the point, Im more clear now about the AWK condition and its dunction.

Hey colucix, many thanks it works :-)!!!

But testing your suggestion I detected something that could give me bad results in the past.

Why the use of a brace "{" in the same line or in the next one change the output in so obvious manner? (I explain my question below)

Ive noticed that If I run this script (the right one):

Code:
awk  '
 BEGIN { FS = OFS = "," }
 
 { $7 = $2; sub(/\/.*/,"",$7) }
 
 NR == 1 { $7="NEW_HEADER" }
 
/HEADER/||/pattern1/||/pattern2/ { # With the brace in the same line as /HEADER/...

if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}' inputfile
I get the correct and desired output:
Code:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
But if I put the the brace "{" in the line below "/HEADER/||/pattern1/||/pattern2/" I mean:
Code:
/HEADER/||/pattern1/||/pattern2/ 
{
if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}' inputfile
Instead of put the brace "{" immediately after "/HEADER/||/pattern1/||/pattern2/"
Code:
/HEADER/||/pattern1/||/pattern2/ {
The output is wrong and appear repeated lines as can be seen below:
Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,NEW_HEADER
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,pattern7
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,pattern6
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,pattern9
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,pattern5
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,pattern7
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,pattern2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,pattern4
Many thanks in advance.

Regards.
 
Old 04-29-2010, 06:47 PM   #5
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,150

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
In the main body of an AWK program, the syntax is test{expression} with the "test" part defaulting to EMPTY and the "expression" part defaulting to {}. Therefore your second program contains two stanzas in the main body: A "test" followed by a (default) {}, and another stanza consisting an (implied) EMPTY test followed by a print expression that's executed for every input record.

Note that you can use a backslash at the end of an input line to continue it to the next line. So
Code:
/HEADER/||/pattern1/||/pattern2/ \
{
if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}' inputfile
should work.
 
2 members found this post helpful.
Old 04-29-2010, 08:16 PM   #6
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Original Poster
Rep: Reputation: 16
Thumbs up

PTrenholme,

Really thanks for clarify me this points. At the first attempt to test your suggestion, I got an error even when I put the backslash after ....pattern2/, but the error reported by console it was easy to undertand.(sometimes I just dont have idea what printed error means)


Code:
awk: cmd. line:7: /HEADER/||/pattern1/||/pattern2/ \
awk: cmd. line:7:                                  ^ backslash not last character on line
There was a space (invisible jaja) acting like last character, I delete that space and the script
becomes in a correct one.

Ive learned so interesting things with answers of you both. I now understand that many things I just enable to get I working script due to apparently little details like these youve explained me. This will help me a lot in the future.

Many thanks.

Best regards.
 
Old 04-29-2010, 11:57 PM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,518

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
I know you have your solution, but thought I would show you how you could neaten it up a bit:
Code:
awk 'BEGIN{OFS=FS=","}
NR==1{$7="NEW_HEADER"}
NR>1{$7=$2;gsub(/\/.*/,"",$2)}
/HEADER|pattern1|pattern2/ && $2 !~ /pattern4|pattern5|pattern6/{print}' inputfile
 
2 members found this post helpful.
Old 04-30-2010, 12:38 AM   #8
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72

Original Poster
Rep: Reputation: 16
Thumbs up

Hey grail, how are you doing?

Great, great! My script is neated up ussing your suggestion. Its much better, much faster and much shorter than the first one.

I see several things more clear regarding how to use in better way AWK tools and how to find syntax errors. Many thanks for your great and kindly help again.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script to read lines in a text file and filter user data srimal Linux - Newbie 5 10-21-2009 07:41 AM
need help with awk-script (compare two lines) Mauline Programming 2 11-27-2008 04:12 AM
How exclude | from txt.file using awk or sed? sarajevo Programming 2 08-21-2006 07:26 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 09:19 AM
How can I filter the output of grep to exclude certain cases? QtCoder Linux - General 1 03-28-2004 12:05 AM


All times are GMT -5. The time now is 01:26 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration