LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-25-2010, 06:48 AM   #1
grob115
Member
 
Registered: Oct 2005
Posts: 542

Rep: Reputation: 32
Awk to extract phrase between two words on a line?


Hi, am trying to find a way to extract the phrase between the words Connection and is (ie the underlined words below). Can we use awk to do this? How? Is it the best command to use?

Code:
[06:25:00][i] Connection at Plant A is live
[06:25:00][i] Connection at Building_C is not live
[07:25:00][i] Connection at Terminal D is down
 
Old 05-25-2010, 06:54 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Code:
awk '{print $4,$5}' file
Edit: sorry just noticed the underscore in second line, try:
Code:
sed -r 's/.*Connection(.*)is.*/\1/' file

Last edited by grail; 05-25-2010 at 06:59 AM.
 
Old 05-25-2010, 07:00 AM   #3
grob115
Member
 
Registered: Oct 2005
Posts: 542

Original Poster
Rep: Reputation: 32
Sorry wasn't aware the underline masked the space/underscore. Here's a better representation.

[06:25:00][i] Connection at Plant A is live
[06:25:00][i] Connection at Building_C is not live
[07:25:00][i] Connection at Terminal D is down

Notice the 2nd line has an underscore between Building and C. So it's not always fixed in terms of the column numbers.
 
Old 05-25-2010, 07:04 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Also here is a better awk:
Code:
awk 'gsub(/.*Connection|is.*/,"")' file
 
Old 05-25-2010, 07:14 AM   #5
grob115
Member
 
Registered: Oct 2005
Posts: 542

Original Poster
Rep: Reputation: 32
grail, thanks. You are the master. Will try it out.
 
Old 05-25-2010, 08:22 AM   #6
grob115
Member
 
Registered: Oct 2005
Posts: 542

Original Poster
Rep: Reputation: 32
BTW, sorry one more question. If the command returns many lines and some of the lines repeat, how do I select a distinct set of lines only?
 
Old 05-25-2010, 08:46 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well here is one a little trickier to work out
Code:
awk 'gsub(/.*Connection|is.*/,"") && !_[$0]++' file
 
Old 05-25-2010, 10:12 AM   #8
grob115
Member
 
Registered: Oct 2005
Posts: 542

Original Poster
Rep: Reputation: 32
Um... thanks. Haven't seen this !_[$0]++ before. I'll dig around for some explanation. But yeah thanks!
 
Old 05-25-2010, 10:35 AM   #9
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grob115 View Post
Haven't seen this !_[$0]++ before. I'll dig around for some explanation.
Not easy to find so I'll help some. "!" is a logical negation. "_" is an array name. awk arrays are "content addressable" -- you can use strings as their indexes. In awk the number zero is logical "false" and any other number is logical "true". Uninitialised variables when referenced as numbers are given value 0. "++" in that position is a post-reference increment operator. String those concepts together!
 
Old 05-26-2010, 12:07 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
@catkin - thanks, very comprehensive explanation
 
Old 05-26-2010, 08:53 PM   #11
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Rep: Reputation: 31
Hi, I walked by and saw this interesting thread about awk. I did the following to test:

Code:
$ awk '!_[$0]++' file
and it miraculously prints the first occurence of $0 in the file. Why does it 'print'? Is it a default behaviour if a condition evaluates true?

Code:
$ awk '!_[$0]++ {print}' file
Thanks.
 
Old 05-26-2010, 09:35 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
Is it a default behaviour if a condition evaluates true?
Yes

Explanation found here: http://www.gnu.org/manual/gawk/html_...ml#Very-Simple

Quote:
In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.
 
Old 05-26-2010, 09:46 PM   #13
NetRock
Member
 
Registered: Mar 2010
Posts: 134

Rep: Reputation: 16
Hi grail.....
YOU ROCK..!! Keep doing the good job!! share your knowledge & THE UNIVERSE will return to you MUCH MORE than you share....
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Get all lines containing 23 specific words with AWK cgcamal Programming 3 11-05-2008 10:51 AM
extract part of a line with sed or awk alirezan1 Linux - Newbie 2 10-01-2008 09:44 PM
How do I extract characters from several words on a line? MheAd Linux - Newbie 22 06-23-2008 10:22 PM
Need to strip words from front of line. sed/awk/grep? joadoor Linux - Software 6 08-28-2006 04:39 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration