LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-08-2008, 11:00 AM   #1
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Rep: Reputation: 15
Pattern matching in a text file - use of AWK??


I need to do some scripting to read through a text file, and find the last occurrence of a word in the file that corresponds to a look up list. When that line containing that word has been found, I need to extract out the last numerical character from that line, and substitute it for another character in another text file which will then be appended to the first.

e.g. the original file will contain something like

ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2


what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.

W
 
Old 12-08-2008, 12:44 PM   #2
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,132

Rep: Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456Reputation: 2456
Quote:
Originally Posted by wtaicken View Post
I need to do some scripting to read through a text file, and find the last occurrence of a word in the file that corresponds to a look up list. When that line containing that word has been found, I need to extract out the last numerical character from that line, and substitute it for another character in another text file which will then be appended to the first.

e.g. the original file will contain something like

ARCHIVE 1
store
begin
*********************************
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2


what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.

W
I'd grep it, since if you're only looking for the CRITIC lines, it'll just return those. Doing "grep CRITIC <filename>" would work.
 
Old 12-08-2008, 12:59 PM   #3
x_terminat_or_3
Member
 
Registered: Mar 2007
Location: Plymouth, UK
Distribution: Fedora Core, RHEL, Arch
Posts: 342

Rep: Reputation: 35
. . . and to get the last occurrence, of your grep output, pipe it to tail

like this:

grep CRITIC filename | tail -n 1

then pipe all that to sed/awk
 
Old 12-08-2008, 01:16 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,962
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Or in awk
Code:
awk '/CRITIC/{line=$0} END{$0=line; print $NF}' file
 
Old 12-08-2008, 03:31 PM   #5
jan61
Member
 
Registered: Jun 2008
Posts: 235

Rep: Reputation: 46
Moin,

Quote:
Originally Posted by Tinkster View Post
Or in awk
Code:
awk '/CRITIC/{line=$0} END{$0=line; print $NF}' file
Probably you can save time by reverting the file first, because you can stop analysing the file at the first match:
Code:
tac file | awk '/CRITIC/{print $NF; exit;}'
Jan
 
Old 12-08-2008, 03:36 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,962
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Good idea - would be worthwhile to time executions.
 
Old 12-08-2008, 03:44 PM   #7
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
You're all ignoring the "list of words in another file" part of the OP's problem.

Consider this possibility:
Code:
$ cat fields
ARCHIVE                         
CRITIC                          
$ cat comp_test
ARCHIVE 1                          
store                              
begin                              
*********************************  
* Retrieve interface by default into INTERFACE001
ARCHIVE 1
retrieve
begin
*********************************
CRITIC 1 2
$ gawk -f comp.awk -v fields=fields comp_test
2
<edit>
Sorry. There's an error in this code. See my post below for commented corrected code.
</edit>
Using this code:
PHP Code:
cat comp.awk
#!/bin/gawk
BEGIN {
  if (!
fields) {
    
printf "Usage: gawk -v fields=list-of-words -F " ARGV[0" file-to-search\n";
    exit 
1;
  }
  while (
getline fields) {
    
words = (words) ? words "|(" $")" "(" $")";
  }
}

{
  if ($
words)  matched = $0;
}

END {
  if (
matched) {
    
printf NF "\n"; <editThis is not correct. </edit>
  }
  else {
    
printf "No line in any input file matched any word in the field list.\n";
  }


Last edited by PTrenholme; 12-08-2008 at 10:34 PM. Reason: Logic error in code
 
Old 12-08-2008, 04:31 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,962
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Quote:
Originally Posted by PTrenholme View Post
You're all ignoring the "list of words in another file" part of the OP's problem.
Not really ... he only asked for the extraction part.
Quote:
what I want to do is find CRITIC, since thats the last occurrence of one of the words in my lookup list. I need to then extract out the number 2, and substitute that for something like x in another text file. Guess I can do the last part using sed. But should I use AWK or GREP for the first bit.
And didn't mention any specifics what so ever what
the criteria for that replacement might be, either.



Cheers,
Tink
 
Old 12-08-2008, 09:36 PM   #9
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Um, Tink, look in the last quote you posted: "... since that's one of the words in my lookup list." I think that's a fairly clear indication that the OP had a "list" of words, not just a single word, in mind. The "CRITIC" part was just an example of a match from the list. (That's why I used a two-word list in my example code.)

<edit>
And so I looked at my code and realized I was reporting $ NF, which is the last field in the last line of the file, not the matching line. Here's a corrected version of the the code with some added comments:
PHP Code:
#!/bin/gawk
BEGIN {
  if (!
fields) {
    print 
"Usage: gawk -v fields=list-of-words -f comp.awk file-to-search";
    
skip 1;
    exit;
  }
  
# Build a regular expression that will match any word in the "fields" file
  # Note that the "words" in the "fields" file may, themselves, be regular expressions.
  
while (getline fields) {
    
words = (words) ? words "|(" $")" "(" $")";
  }
}

# Read the input file and check each line for a match in the word list
{
  if (
skip) break;
  if (
match($0wordsval)) { # Use the "match" function to extract the matched string
    
matched = $0;          # Save the line containing the match, overwriting any prior value
    
matched_str val[0];     # Save the matching token
    
matched_val = $ NF;       # And the last field in the line. Other "values" could be selected by, e.g., $1, $2, etc.
  
}
}

# All done. Report the matched information, if any.
END {
  if (
matched) {
    print 
"\"" matched "\" contained \"" matched_str "\" and was the last line containing any word in the list. The last field in that string is"
    
# Placing the field value on the last output line for later use.
    
print matched_val;
  }
  else if (!
skip) {
    print 
"No line in any input file matched any word in the field list.";
  }


Last edited by PTrenholme; 12-08-2008 at 10:30 PM.
 
Old 12-09-2008, 05:16 AM   #10
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Original Poster
Rep: Reputation: 15
Ok, thanks, thats works a treat! I did mean a word from a lookup list...........sorry if it was a bit vague to earlier posters

Can I bed this within another parent script, and if so what would the syntax be? The parent script cd's to a specific directory(supplied at the commandline), and spools through all files, performing various actions. This above is the first action, and the output from that will be used to substitute for characters in another block of text, which will ultimately be appended to the orig file. Hope thats clear!
 
Old 12-09-2008, 09:12 AM   #11
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
As to the embedding, if you're using a bash shell, you are, in effect, already embedded. . .

Anyhow, the syntax is the same as it would be on a command line. Something like this:
Code:
#/bin/bash
word_list="$1"
file_name="$2"
token=$(gawk -f comp.awk -v fields=$word_list $file_name | tail -n 1)
[  $? != 0 ] && echo "error" && exit
Note that the print . . . stuff in the final section of the sample code I provided can be simplified to just produce the output you want so you don't need the pipe into the tail command.
 
Old 12-09-2008, 12:29 PM   #12
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Original Poster
Rep: Reputation: 15
Ok, that works. Ta v much
 
Old 12-15-2008, 03:49 AM   #13
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Original Poster
Rep: Reputation: 15
I need to ensure this awk script just carries out the matching process with the first word on the line. Currently it looks for the last occurrence of a word anywhere on the line, which is messing up my results

The current syntax is
Code:
if (match($0, words, val)) { # Use the "match" function to extract the matched string
How can I mod this to look at the first word in the line. Will swapping $0 for $1 work?

Any help gratefully received
 
Old 12-15-2008, 11:08 AM   #14
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,147

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Yes, substituting $1 for $0 in the call to the match function will match the regular expression in words to the first input field rather than the whole input line.
 
Old 12-23-2008, 04:44 AM   #15
wtaicken
LQ Newbie
 
Registered: Dec 2008
Location: Dorset, UK
Distribution: Ubuntu 7.1
Posts: 25

Original Poster
Rep: Reputation: 15
I now note that the script will only pick up matches in the same case. If I wanted to look for matches in either upper or lower case, and the list to lookup against is in uppercase, do I have to add words in lowercase?

W
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
complicated pattern matching with awk or sed... alirezan1 Linux - Newbie 1 10-10-2008 06:45 PM
Help with pattern matching, sorting data with awk/gawk or perl placem Programming 2 09-11-2008 02:26 PM
pattern matching in file amitpardesi Linux - Software 5 02-08-2008 07:06 AM
AWK/SED Multiple pattern matching over multiple lines issue GigerMalmensteen Programming 15 12-03-2006 05:08 PM
Linux/Unix script for file pattern matching varunnarang Programming 1 08-07-2006 01:14 PM


All times are GMT -5. The time now is 10:00 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration