LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Pattern matching in a text file - use of AWK?? (http://www.linuxquestions.org/questions/programming-9/pattern-matching-in-a-text-file-use-of-awk-689131/)

wtaicken 12-08-2008 11:00 AM

Pattern matching in a text file - use of AWK??
 
I need to do some scripting to read through a text file, and find the last occurrence of a word in the file that corresponds to a look up list. When that line containing that word has been found, I need to extract out the last numerical character from that line, and substitute it for another character in another text file which will then be appended to the first.

e.g. the original file will contain something like

ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2


what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.

W

TB0ne 12-08-2008 12:44 PM

Quote:

Originally Posted by wtaicken (Post 3368205)
I need to do some scripting to read through a text file, and find the last occurrence of a word in the file that corresponds to a look up list. When that line containing that word has been found, I need to extract out the last numerical character from that line, and substitute it for another character in another text file which will then be appended to the first.

e.g. the original file will contain something like

ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2


what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.

W

I'd grep it, since if you're only looking for the CRITIC lines, it'll just return those. Doing "grep CRITIC <filename>" would work.

x_terminat_or_3 12-08-2008 12:59 PM

. . . and to get the last occurrence, of your grep output, pipe it to tail

like this:

grep CRITIC filename | tail -n 1

then pipe all that to sed/awk

Tinkster 12-08-2008 01:16 PM

Or in awk
Code:

awk '/CRITIC/{line=$0} END{$0=line; print $NF}' file

jan61 12-08-2008 03:31 PM

Moin,

Quote:

Originally Posted by Tinkster (Post 3368334)
Or in awk
Code:

awk '/CRITIC/{line=$0} END{$0=line; print $NF}' file

Probably you can save time by reverting the file first, because you can stop analysing the file at the first match:
Code:

tac file | awk '/CRITIC/{print $NF; exit;}'
Jan

Tinkster 12-08-2008 03:36 PM

Good idea - would be worthwhile to time executions.

PTrenholme 12-08-2008 03:44 PM

You're all ignoring the "list of words in another file" part of the OP's problem.

Consider this possibility:
Code:

$ cat fields
ARCHIVE                       
CRITIC                         
$ cat comp_test
ARCHIVE 1                         
store                             
begin                             
********************************* 
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2
$ gawk -f comp.awk -v fields=fields comp_test
2

<edit>
:eek: Sorry. There's an error in this code. See my post below for commented corrected code.
</edit>
Using this code:
PHP Code:

cat comp.awk
#!/bin/gawk
BEGIN {
  if (!
fields) {
    
printf "Usage: gawk -v fields=list-of-words -F " ARGV[0" file-to-search\n";
    exit 
1;
  }
  while (
getline fields) {
    
words = (words) ? words "|(" $")" "(" $")";
  }
}

{
  if ($
words)  matched = $0;
}

END {
  if (
matched) {
    
printf NF "\n"; <editThis is not correct. </edit>
  }
  else {
    
printf "No line in any input file matched any word in the field list.\n";
  }



Tinkster 12-08-2008 04:31 PM

Quote:

Originally Posted by PTrenholme (Post 3368475)
You're all ignoring the "list of words in another file" part of the OP's problem.

Not really ... he only asked for the extraction part.
Quote:

what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.
And didn't mention any specifics what so ever what
the criteria for that replacement might be, either.



Cheers,
Tink

PTrenholme 12-08-2008 09:36 PM

Um, Tink, look in the last quote you posted: "... since that's one of the words in my lookup list." I think that's a fairly clear indication that the OP had a "list" of words, not just a single word, in mind. The "CRITIC" part was just an example of a match from the list. (That's why I used a two-word list in my example code.)

<edit>
And so I looked at my code and realized I was reporting $ NF, which is the last field in the last line of the file, not the matching line. :o Here's a corrected version of the the code with some added comments:
PHP Code:

#!/bin/gawk
BEGIN {
  if (!
fields) {
    print 
"Usage: gawk -v fields=list-of-words -f comp.awk file-to-search";
    
skip 1;
    exit;
  }
  
# Build a regular expression that will match any word in the "fields" file
  # Note that the "words" in the "fields" file may, themselves, be regular expressions.
  
while (getline fields) {
    
words = (words) ? words "|(" $")" "(" $")";
  }
}

# Read the input file and check each line for a match in the word list
{
  if (
skip) break;
  if (
match($0wordsval)) { # Use the "match" function to extract the matched string
    
matched = $0;          # Save the line containing the match, overwriting any prior value
    
matched_str val[0];     # Save the matching token
    
matched_val = $ NF;       # And the last field in the line. Other "values" could be selected by, e.g., $1, $2, etc.
  
}
}

# All done. Report the matched information, if any.
END {
  if (
matched) {
    print 
"\"" matched "\" contained \"" matched_str "\" and was the last line containing any word in the list. The last field in that string is"
    
# Placing the field value on the last output line for later use.
    
print matched_val;
  }
  else if (!
skip) {
    print 
"No line in any input file matched any word in the field list.";
  }



wtaicken 12-09-2008 05:16 AM

Ok, thanks, thats works a treat! I did mean a word from a lookup list...........sorry if it was a bit vague to earlier posters

Can I bed this within another parent script, and if so what would the syntax be? The parent script cd's to a specific directory(supplied at the commandline), and spools through all files, performing various actions. This above is the first action, and the output from that will be used to substitute for characters in another block of text, which will ultimately be appended to the orig file. Hope thats clear!

PTrenholme 12-09-2008 09:12 AM

As to the embedding, if you're using a bash shell, you are, in effect, already embedded. . . :D

Anyhow, the syntax is the same as it would be on a command line. Something like this:
Code:

#/bin/bash
word_list="$1"
file_name="$2"
token=$(gawk -f comp.awk -v fields=$word_list $file_name | tail -n 1)
[  $? != 0 ] && echo "error" && exit

Note that the print . . . stuff in the final section of the sample code I provided can be simplified to just produce the output you want so you don't need the pipe into the tail command.

wtaicken 12-09-2008 12:29 PM

Ok, that works. Ta v much

wtaicken 12-15-2008 03:49 AM

I need to ensure this awk script just carries out the matching process with the first word on the line. Currently it looks for the last occurrence of a word anywhere on the line, which is messing up my results

The current syntax is
Code:

if (match($0, words, val)) { # Use the "match" function to extract the matched string
How can I mod this to look at the first word in the line. Will swapping $0 for $1 work?

Any help gratefully received

PTrenholme 12-15-2008 11:08 AM

Yes, substituting $1 for $0 in the call to the match function will match the regular expression in words to the first input field rather than the whole input line.

wtaicken 12-23-2008 04:44 AM

I now note that the script will only pick up matches in the same case. If I wanted to look for matches in either upper or lower case, and the list to lookup against is in uppercase, do I have to add words in lowercase?

W


All times are GMT -5. The time now is 10:05 PM.