LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-22-2004, 10:44 PM   #1
Helene
LQ Newbie
 
Registered: Apr 2004
Posts: 19

Rep: Reputation: 0
search for specific text in fields using awk


Hi,

I have the following code:
awk -F';' ' { if ( $1 ~/'Test'/ ) print $0 }' myFile

The file looks like this:
Test;cat
Testing;dog
noTest;fish

The result of running my awk is an output of all these three records. I only want to list the records that exactly match 'Test', in this case Test;cat. Should I use a regular expression here?

What does actually ~ mean? "contains?"

I hope anyone can help!

- Helene
 
Old 04-23-2004, 12:04 AM   #2
meonkeys
Member
 
Registered: Apr 2004
Location: Minneapolis
Distribution: Ubuntu
Posts: 45

Rep: Reputation: 15
from the 'gawk' man page

Try this:
Code:
awk -F';' '{ if ( $1 ~/\yTest\y/ ) print $0 }' myFile
The \y "matches the empty string at either the beginning or the end of a word" according to the gawk(1) manpage. If you don't have GNU awk, check the manpage for your version of awk.

Quote:
What does actually ~ mean? "contains?"
matches.
 
Old 04-23-2004, 12:13 AM   #3
rkef
Member
 
Registered: Mar 2004
Location: bursa
Posts: 110

Rep: Reputation: 15
This a rather contrived situation, but it illustrates the point that you write the regex test to match your goal and the data; there is no catch-all regex . So, in this case you'd want to do:

awk -F';' '{ if ($1 ~/^Test$/) print $0 }' myFile

I'm not sure if you're familiar with ^ and $, so I'll explain them anyway. ^ and $ are anchors. ^ indicates that you want to match "at the beginning of the input". So, "^cat" matches the text "cat" and "catherine", but not "fatcat" or " cat".

$ works similarily, but it means match "at the end of the input". So, "cat$" will match "cat" and "fatcat", but not "catherine" or "catcatcatt".

As you can see, I used them together in fixing your regex above. Now it only matches if the text is exactly "Test".

HTH

p.s. I'm not sure on the history of the tilde's (~) use in regexes, but generally it means "approximately". Perhaps it's just a convenient way of making it clear that we're dealing with regexes? (syntactic sugar?)

p.p.s the poster above points out another useful construct in regexes. Note, though, that it will also match lines containing multiple words, with "Test" being one of them (eg. the line "monkey Test zappa;fish" will match also). That may work just as well as my solution . But in more complicated situations you'd probably have to choose one or the other (or something completely different).

Last edited by rkef; 04-23-2004 at 12:16 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Unique lines based on specific fields. carl.waldbieser Programming 6 08-21-2005 02:26 PM
How to find and change a specific text in a text file by using shell script Bassam Programming 1 07-18-2005 07:15 PM
Can't enter text in certain Java text fields TheBelush Linux - Software 4 04-27-2005 05:29 PM
Supressing Fields w/ AWK Rv5 Programming 3 10-19-2004 11:06 AM
PHP appending form fields and text. BigFred Programming 6 09-19-2003 10:02 AM


All times are GMT -5. The time now is 04:38 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration