Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
04-29-2010, 04:20 PM
|
#1
|
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72
Rep:
|
Problems to exclude lines and bad filter within AWK script
Hi everyone,
Looking for some help to fix 2 problems I have in my script.
(I´m using bash on cygwin)
I have the following source file ($7 does not have data):
Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,
I have this script:
Code:
awk 'BEGIN { FS = OFS = "," } #-1) Set "," as field separator
{ $7 = $2; sub(/\/.*/,"",$7) } #-2) (To copy $2 data into $7 and after that deletes "/Sub data1/Sub data2" from $7)
NR == 1 { $7="NEW_HEADER" } #-3) Changing first line header text in column 7
{/HEADER/||/pattern1/||/pattern2/ } #-4) Filter to search these patterns in every column
{$2 !~ /pattern4|pattern5|pattern6/ } #-5 To exclude pattern4, pattern5 and pattern6
{print $1, $7, $3, $4, $5, $6, $2}' inputfile #-6) Print in different order the 7 columns
And I get the following output:
Code:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2 # Why is printed this line?
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub data1/Sub data2
Question 1:
In line 5 of the script, that is " {$2 !~ /pattern4|pattern5|pattern6/ }" oriented to delete lines containing pattern4, pattern5 and pattern6 from column 2 it seems not to be working.
What I´m doing wrong in this case?
Question 2:
If I see the output, the line highlighted in red, is present and should not appear, because this line does not contain nor HEADER nor pattern1 nor pattern2.
Why does this happen?
Maybe somebody could help me with this.
Thanks in advance.
|
|
|
|
|
Click here to see the post LQ members have rated as the most helpful post in this thread.
|
04-29-2010, 05:04 PM
|
#2
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,893
|
Code:
{ $2 !~ /pattern4|pattern5|pattern6/ }
This line of code actually does... nothing! It is simply a test which returns TRUE or FALSE, but no action is taken and it cannot serve to "delete" lines. The same for
Code:
{ /HEADER/||/pattern1/||/pattern2/ }
again it is just a test and it doesn't print out anything.
Maybe you want something like:
Code:
/HEADER/||/pattern1/||/pattern2/ {
if ( $2 !~ /pattern4|pattern5|pattern6/ )
print $1, $7, $3, $4, $5, $6, $2
}
That is: only for lines matching "HEADER", "pattern1" or "pattern2", if $2 doesn't match "pattern4", "pattern5" or "pattern6", print out something. Hope this helps.
|
|
|
1 members found this post helpful.
|
04-29-2010, 05:20 PM
|
#3
|
|
Senior Member
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 3,923
|
For question 2, the "conditional" part of an AWK stanza applies to the bracketed statements following the condition.
In other words /condition/{statement1;statement2;..}is processed when the "condition" is true. Since the print statement HAS NO CONDITION, it is executed for every line since "no condition" is defined as "true."
As to question 1, I believe your syntax is incorrect.
Try:
Code:
/HEADER|pattern1|pattern2|pattern4|pattern5|pattern6/{next;}
{print $1, $7, $3, $4, $5, $6, $2}
|
|
|
1 members found this post helpful.
|
04-29-2010, 05:57 PM
|
#4
|
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72
Original Poster
Rep:
|
PTrenholme,
Many thanks for your explanation. Now I got the point, I´m more clear now about the AWK condition and its dunction.
Hey colucix, many thanks it works :-)!!!
But testing your suggestion I detected something that could give me bad results in the past.
Why the use of a brace "{" in the same line or in the next one change the output in so obvious manner? (I explain my question below)
I´ve noticed that If I run this script (the right one):
Code:
awk '
BEGIN { FS = OFS = "," }
{ $7 = $2; sub(/\/.*/,"",$7) }
NR == 1 { $7="NEW_HEADER" }
/HEADER/||/pattern1/||/pattern2/ { # With the brace in the same line as /HEADER/...
if ($2 !~ /pattern4|pattern5|pattern6/)
print $1, $7, $3, $4, $5, $6, $2}' inputfile
I get the correct and desired output:
Code:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
But if I put the the brace " {" in the line below " /HEADER/||/pattern1/||/pattern2/" I mean:
Code:
/HEADER/||/pattern1/||/pattern2/
{
if ($2 !~ /pattern4|pattern5|pattern6/)
print $1, $7, $3, $4, $5, $6, $2}' inputfile
Instead of put the brace "{" immediately after "/HEADER/||/pattern1/||/pattern2/"
Code:
/HEADER/||/pattern1/||/pattern2/ {
The output is wrong and appear repeated lines as can be seen below:
Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,NEW_HEADER
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,pattern7
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,pattern6
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,pattern9
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,pattern5
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,pattern7
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,pattern2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,pattern4
Many thanks in advance.
Regards.
|
|
|
|
04-29-2010, 06:47 PM
|
#5
|
|
Senior Member
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 3,923
|
In the main body of an AWK program, the syntax is test{expression} with the "test" part defaulting to EMPTY and the "expression" part defaulting to {}. Therefore your second program contains two stanzas in the main body: A "test" followed by a (default) {}, and another stanza consisting an (implied) EMPTY test followed by a print expression that's executed for every input record.
Note that you can use a backslash at the end of an input line to continue it to the next line. So
Code:
/HEADER/||/pattern1/||/pattern2/ \
{
if ($2 !~ /pattern4|pattern5|pattern6/)
print $1, $7, $3, $4, $5, $6, $2}' inputfile
should work.
|
|
|
2 members found this post helpful.
|
04-29-2010, 08:16 PM
|
#6
|
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72
Original Poster
Rep:
|
PTrenholme,
Really thanks for clarify me this points. At the first attempt to test your suggestion, I got an error even when I put the backslash after ....pattern2/, but the error reported by console it was easy to undertand.(sometimes I just don´t have idea what printed error means)
Code:
awk: cmd. line:7: /HEADER/||/pattern1/||/pattern2/ \
awk: cmd. line:7: ^ backslash not last character on line
There was a space (invisible jaja) acting like last character, I delete that space and the script
becomes in a correct one.
I´ve learned so interesting things with answers of you both. I now understand that many things I just enable to get I working script due to apparently little details like these you´ve explained me. This will help me a lot in the future.
Many thanks.
Best regards.
|
|
|
|
04-29-2010, 11:57 PM
|
#7
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,305
|
I know you have your solution, but thought I would show you how you could neaten it up a bit:
Code:
awk 'BEGIN{OFS=FS=","}
NR==1{$7="NEW_HEADER"}
NR>1{$7=$2;gsub(/\/.*/,"",$2)}
/HEADER|pattern1|pattern2/ && $2 !~ /pattern4|pattern5|pattern6/{print}' inputfile
|
|
|
2 members found this post helpful.
|
04-30-2010, 12:38 AM
|
#8
|
|
Member
Registered: Nov 2008
Location: Tegucigalpa
Posts: 72
Original Poster
Rep:
|
Hey grail, how are you doing?
Great, great! My script is neated up ussing your suggestion. It´s much better, much faster and much shorter than the first one.
I see several things more clear regarding how to use in better way AWK tools and how to find syntax errors. Many thanks for your great and kindly help again.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 02:14 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|