LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-23-2008, 03:34 PM   #16
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330

Of course not. There are at least two way to do that.

The on-line manual (info gawk) describes both methods:

The short way, although less flexible, is to just add the statement IGNORECASE = 1; after the first curly bracket in the BEGIN block. (Or just add a -v IGNORECASE=1 to the command line invoking the program.) Then all comparisons in the whole program will be case insensitive.

The long, and more flexible, way is like this:

In the "String Functions" subsection under "Functions," at the end of the list you'll see the toupper() and tolower() functions mentioned. All you need to do to make your test case insensitive is to change the match($1, words, val) to match(toupper($1), words, val).

And, if you want to make sure that the words regular expression is all upper case, add a words = toupper(words); before the last closing curly bracket the the BEGIN bock. (Changing the test to be match(toupper($0), toupper(words), val) would also work, but then you'd be calling toupper(words) redundantly for every input line. While toupper is a fairly efficient function, doing it once instead of for every line seems more prudent.)
 
Old 02-05-2009, 08:51 AM   #17
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Original Poster
Rep: Reputation: 15
What I now notice is that the word INTERFACE is picked up, when in actual fact it needs to find INTER, one of the lookup words on the list. How can I prevent that happening?
 
Old 02-05-2009, 11:37 AM   #18
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Quote:
Originally Posted by wtaicken View Post
What I now notice is that the word INTERFACE is picked up, when in actual fact it needs to find INTER, one of the lookup words on the list. How can I prevent that happening?
If go to the info gawk manual and search for "word," you'll find that "\<" and "/>" are gawk symbols for 'start of word" and "end of word." Now put that together with the comment I made that the contents of the "words to find" file are, in fact, regular expressions and you'll see that there are two ways to proceed:

  1. Change the "words to search for" file contents so that any string that can only appear as separate words are surrounded by the "\<" and "\>" strings or
  2. change the part of the code that reads in the list of "words" look like this:
PHP Code:
  # Build a regular expression that will match any word in the "fields" file 
  # Note that the "words" in the "fields" file may, themselves, be regular expressions. 
  
while (getline fields) { 
    
words = (words) ? words "|(\\<" $"\\>)" "(\\<" $"\\>)"
  } 
By the way, the grep function has a "built-in" option to read patterns from a file, so a simple grep -iwf [file of words] [file to search] {file to search] . . .} might be all you need. (The "iw" in the "-iwf" option list specifies "ignore case" and "only match whole words.")

Personally I like awk, but you might prefer something already coded. See info grep for details.

Last edited by PTrenholme; 02-06-2009 at 05:55 PM. Reason: Typos
 
Old 02-06-2009, 04:44 AM   #19
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Original Poster
Rep: Reputation: 15
Tried that, adding in /< & /> but it doesn't seem to work, causing the skip statement
Code:
"No line in any input file matched any word in the field list.";
Could the syntax be slightly wrong...........?
I note in info gawk that the beginning & end are given as \< & \>, however tried that & it doesn't make any difference

cheers
 
Old 02-06-2009, 05:54 PM   #20
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Quote:
Originally Posted by wtaicken View Post
Tried that, adding in /< & /> but it doesn't seem to work, causing the skip statement
Code:
"No line in any input file matched any word in the field list.";
Could the syntax be slightly wrong...........?
I note in info gawk that the beginning & end are given as \< & \>, however tried that & it doesn't make any difference

cheers
Oops! Some typos fixed above. Basically, to get a back-slash into a string, you need to escape it. So \< needs to be "\\<" inside the quotes. And, yes, I had used forward slashes in the post. Sorry.

Last edited by PTrenholme; 02-06-2009 at 05:56 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
complicated pattern matching with awk or sed... alirezan1 Linux - Newbie 1 10-10-2008 06:45 PM
Help with pattern matching, sorting data with awk/gawk or perl placem Programming 2 09-11-2008 02:26 PM
pattern matching in file amitpardesi Linux - Software 5 02-08-2008 07:06 AM
AWK/SED Multiple pattern matching over multiple lines issue GigerMalmensteen Programming 15 12-03-2006 05:08 PM
Linux/Unix script for file pattern matching varunnarang Programming 1 08-07-2006 01:14 PM


All times are GMT -5. The time now is 01:57 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration