ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need to do some scripting to read through a text file, and find the last occurrence of a word in the file that corresponds to a look up list. When that line containing that word has been found, I need to extract out the last numerical character from that line, and substitute it for another character in another text file which will then be appended to the first.
e.g. the original file will contain something like
ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2
what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.
I need to do some scripting to read through a text file, and find the last occurrence of a word in the file that corresponds to a look up list. When that line containing that word has been found, I need to extract out the last numerical character from that line, and substitute it for another character in another text file which will then be appended to the first.
e.g. the original file will contain something like
ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2
what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.
W
I'd grep it, since if you're only looking for the CRITIC lines, it'll just return those. Doing "grep CRITIC <filename>" would work.
You're all ignoring the "list of words in another file" part of the OP's problem.
Consider this possibility:
Code:
$ cat fields
ARCHIVE
CRITIC
$ cat comp_test
ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2
$ gawk -f comp.awk -v fields=fields comp_test
2
<edit>
Sorry. There's an error in this code. See my post below for commented corrected code.
</edit>
Using this code:
PHP Code:
$ cat comp.awk #!/bin/gawk BEGIN { if (!fields) { printf "Usage: gawk -v fields=list-of-words -F " ARGV[0] " file-to-search\n"; exit 1; } while (getline < fields) { words = (words) ? words "|(" $0 ")" : "(" $0 ")"; } }
{ if ($0 ~ words) matched = $0; }
END { if (matched) { printf $ NF "\n"; <edit> This is not correct. </edit> } else { printf "No line in any input file matched any word in the field list.\n"; } }
Last edited by PTrenholme; 12-08-2008 at 10:34 PM.
Reason: Logic error in code
You're all ignoring the "list of words in another file" part of the OP's problem.
Not really ... he only asked for the extraction part.
Quote:
what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.
And didn't mention any specifics what so ever what
the criteria for that replacement might be, either.
Um, Tink, look in the last quote you posted: "... since that's one of the words in my lookup list." I think that's a fairly clear indication that the OP had a "list" of words, not just a single word, in mind. The "CRITIC" part was just an example of a match from the list. (That's why I used a two-word list in my example code.)
<edit>
And so I looked at my code and realized I was reporting $ NF, which is the last field in the last line of the file, not the matching line. Here's a corrected version of the the code with some added comments:
PHP Code:
#!/bin/gawk BEGIN { if (!fields) { print "Usage: gawk -v fields=list-of-words -f comp.awk file-to-search"; skip = 1; exit; } # Build a regular expression that will match any word in the "fields" file # Note that the "words" in the "fields" file may, themselves, be regular expressions. while (getline < fields) { words = (words) ? words "|(" $0 ")" : "(" $0 ")"; } }
# Read the input file and check each line for a match in the word list { if (skip) break; if (match($0, words, val)) { # Use the "match" function to extract the matched string matched = $0; # Save the line containing the match, overwriting any prior value matched_str = val[0]; # Save the matching token matched_val = $ NF; # And the last field in the line. Other "values" could be selected by, e.g., $1, $2, etc. } }
# All done. Report the matched information, if any. END { if (matched) { print "\"" matched "\" contained \"" matched_str "\" and was the last line containing any word in the list. The last field in that string is" # Placing the field value on the last output line for later use. print matched_val; } else if (!skip) { print "No line in any input file matched any word in the field list."; } }
Last edited by PTrenholme; 12-08-2008 at 10:30 PM.
Ok, thanks, thats works a treat! I did mean a word from a lookup list...........sorry if it was a bit vague to earlier posters
Can I bed this within another parent script, and if so what would the syntax be? The parent script cd's to a specific directory(supplied at the commandline), and spools through all files, performing various actions. This above is the first action, and the output from that will be used to substitute for characters in another block of text, which will ultimately be appended to the orig file. Hope thats clear!
Note that the print . . . stuff in the final section of the sample code I provided can be simplified to just produce the output you want so you don't need the pipe into the tail command.
I need to ensure this awk script just carries out the matching process with the first word on the line. Currently it looks for the last occurrence of a word anywhere on the line, which is messing up my results
The current syntax is
Code:
if (match($0, words, val)) { # Use the "match" function to extract the matched string
How can I mod this to look at the first word in the line. Will swapping $0 for $1 work?
Yes, substituting $1 for $0 in the call to the match function will match the regular expression in words to the first input field rather than the whole input line.
I now note that the script will only pick up matches in the same case. If I wanted to look for matches in either upper or lower case, and the list to lookup against is in uppercase, do I have to add words in lowercase?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.