ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Of course not. There are at least two way to do that.
The on-line manual (info gawk) describes both methods:
The short way, although less flexible, is to just add the statement IGNORECASE = 1; after the first curly bracket in the BEGIN block. (Or just add a -v IGNORECASE=1 to the command line invoking the program.) Then all comparisons in the whole program will be case insensitive.
The long, and more flexible, way is like this:
In the "String Functions" subsection under "Functions," at the end of the list you'll see the toupper() and tolower() functions mentioned. All you need to do to make your test case insensitive is to change the match($1, words, val) to match(toupper($1), words, val).
And, if you want to make sure that the words regular expression is all upper case, add a words = toupper(words); before the last closing curly bracket the the BEGIN bock. (Changing the test to be match(toupper($0), toupper(words), val) would also work, but then you'd be calling toupper(words) redundantly for every input line. While toupper is a fairly efficient function, doing it once instead of for every line seems more prudent.)
What I now notice is that the word INTERFACE is picked up, when in actual fact it needs to find INTER, one of the lookup words on the list. How can I prevent that happening?
What I now notice is that the word INTERFACE is picked up, when in actual fact it needs to find INTER, one of the lookup words on the list. How can I prevent that happening?
If go to the info gawk manual and search for "word," you'll find that "\<" and "/>" are gawk symbols for 'start of word" and "end of word." Now put that together with the comment I made that the contents of the "words to find" file are, in fact, regular expressions and you'll see that there are two ways to proceed:
Change the "words to search for" file contents so that any string that can only appear as separate words are surrounded by the "\<" and "\>" strings or
change the part of the code that reads in the list of "words" look like this:
PHP Code:
# Build a regular expression that will match any word in the "fields" file # Note that the "words" in the "fields" file may, themselves, be regular expressions. while (getline < fields) { words = (words) ? words "|(\\<" $0 "\\>)" : "(\\<" $0 "\\>)"; }
By the way, the grep function has a "built-in" option to read patterns from a file, so a simple grep -iwf [file of words] [file to search] {file to search] . . .} might be all you need. (The "iw" in the "-iwf" option list specifies "ignore case" and "only match whole words.")
Personally I like awk, but you might prefer something already coded. See info grep for details.
Last edited by PTrenholme; 02-06-2009 at 05:55 PM.
Reason: Typos
Tried that, adding in /< & /> but it doesn't seem to work, causing the skip statement
Code:
"No line in any input file matched any word in the field list.";
Could the syntax be slightly wrong...........?
I note in info gawk that the beginning & end are given as \< & \>, however tried that & it doesn't make any difference
Tried that, adding in /< & /> but it doesn't seem to work, causing the skip statement
Code:
"No line in any input file matched any word in the field list.";
Could the syntax be slightly wrong...........?
I note in info gawk that the beginning & end are given as \< & \>, however tried that & it doesn't make any difference
cheers
Oops! Some typos fixed above. Basically, to get a back-slash into a string, you need to escape it. So \< needs to be "\\<" inside the quotes. And, yes, I had used forward slashes in the post. Sorry.
Last edited by PTrenholme; 02-06-2009 at 05:56 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.