LinuxQuestions.org - [SOLVED] get two/more specific words on a line and print next few lines

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - get two/more specific words on a line and print next few lines (https://www.linuxquestions.org/questions/programming-9/get-two-more-specific-words-on-a-line-and-print-next-few-lines-941412/)

get two/more specific words on a line and print next few lines

currently I am searching one word at last occurrence in file for which I use following thing.

Code:

grep -A 20 -e 'address 5' input | tail -n 21 > output

This works great for last occurrence searching for one word and then print that line and next 20 lines. but If i want to search 2 or 3 words? like last occurrence of "address 5" and "usb", together in one line... for example as in following line:

Code:

Apr 20 15:58:47 box2 kernel: usb 1-3: USB disconnect, address 5

now i want to search that line which has words "address 5" and "usb" and it will be last combined occurrence of these words in this file. or if there can be any generic way, like 3 words , 4 words???

Please note: I want to get last occurrence of word "address 5" AND "usb" in file and next 20 lines. thanks in advance

Use a regular expression in the pattern for grep. For instance:

Code:

grep -e 'USB.*address 5' /var/log/messages

That will match lines that contain any amount of characters between 'USB' and 'address 5'

Quote:

Originally Posted by Diantre (Post 4661431)

Use a regular expression in the pattern for grep. For instance:

Code:

grep -e 'USB.*address 5' /var/log/messages

That will match lines that contain any amount of characters between 'USB' and 'address 5'

Nitpick: this grep finds the line or lines where "address 5" follows "USB." OP asked for a way to find lines which contain both strings. Your RegEx fits his example but his words were more general. Perhaps a more elaborate RegEx will serve the purpose. However, if OP wants to identify lines which contain 3 or 4 strings without specifying their sequential relationship the complexity is compounded.

Daniel B. Martin

Hi.

I have used glark for this kind of work:

Code:

NAME

      glark - Search text files for complex regular expressions



SYNOPSIS

      glark [options] expression file ...



DESCRIPTION

      Similar to "grep", "glark" offers: Perl-compatible regular expressions,

      color highlighting of matches, context around matches, complex

      expressions ("and" and "or"), grep output emulation, and automatic

      exclusion of non-text files. Its regular expressions should be familiar

      to persons experienced in Perl, Python, or Ruby. File may also be a

      list of files in the form of a path.

( ... )

It was in the Debian repositories. The web site is http://www.incava.org/projects/glark

The code is a Ruby script, and it is fairly slow, of the 20+ utilities and languages that I compared for string searches, it was near the lower end for speed: e.g. 10 times slower than sed. The flexibility, however, is amazing.

Best wishes ... cheers, makyo

Quote:

Originally Posted by Kashif_Bash (Post 4661351)

... can be any generic way, like 3 words ,4 words???

This is not elegant but it works. For testing I used the Edgar Allen Poe poem "The Raven". This code seeks the last line containing "chamber" and "visitor" and "entrance" in any order, prints that line and the following 4 lines.

Code:

egrep  -A 4 "chamber" $InFile \

|egrep -A 4 "visitor"        \

|egrep -A 4 "entrance"        \

|tail -5                      \

> $OutFile3

Daniel B. Martin

Quote:

Originally Posted by danielbmartin (Post 4661776)

Yes, you are right. A regular expression to find 3 or 4 words could be something like this:

Code:

grep -e 'USB\|address\|mount' /var/log/messages

That would find lines containing 'USB' or 'address' or 'mount'.

If, for instance, one wants to find lines that contain 'address' following 'USB' or 'mount' following 'EXT4', it could be written:

Code:

grep -e 'USB.*address\|EXT4.*mounted' /var/log/messages

More regular expressions can be added using the alternation character (|).

Quote:

Originally Posted by Diantre (Post 4661931)

Code:

grep -e 'USB\|address\|mount' /var/log/messages

That would find lines containing 'USB' or 'address' or 'mount'.

With respect, you have shown how to use grep with OR. OP wants to use grep with AND. He said

Quote:

like last occurrence of "address 5" and "usb", together in one line

Daniel B. Martin

Hi.

Here is an example of glark on chapter one of Moby Dick:

Code:

#!/usr/bin/env bash



# @(#) s1        Demonstrate multiple-pattern match, any order, glark.



# Utility functions: print-as-echo, print-line-with-visual-space, debug.

# export PATH="/usr/local/bin:/usr/bin:/bin"

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }

pl() { pe;pe "-----" ;pe "$*"; }

edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);

  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }

db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }

db() { : ; }

C=$HOME/bin/context && [ -f $C ] && $C glark specimen



FILE=${1-data1}



# Use edges if specimen not available.

pl " Sample of input data file $FILE:"

specimen $FILE

# edges $FILE



pl " Results:"

glark --after-context=1 --no-highlight \( me --and swayed --and that \) $FILE



exit 0

producing:

Code:

% ./s1



Environment: LC_ALL = C, LANG = C

(Versions displayed with local utility "version")

OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64

Distribution        : Debian GNU/Linux 5.0.8 (lenny) 

bash GNU bash 3.2.39

glark version 1.8.0

specimen (local) 1.17



-----

 Sample of input data file data1:

Edges: 5:0:5 of 206 lines in file "data1"

# Moby Dick, Chapter 1 The Loomings.  Page numbers removed.

# obtained from gopher at University of Minnesota, 94.09.16.



Call me Ishmael.  Some years ago--never mind how long precisely

--having little or no money in my purse, and nothing particular

  ---

flood-gates of the wonder-world swung open, and in the wild

conceits that swayed me to my purpose, two and two there floated

into my inmost soul, endless processions of the whale, and, mid

most of them all, one grand hooded phantom, like a snow hill in

the air.



-----

 Results:

  203 : conceits that swayed me to my purpose, two and two there floated

  204 + into my inmost soul, endless processions of the whale, and, mid

Leading line identification can be omitted with an option. The same technique with tail could be used to get the last segment matched ... cheers, makyo

Quote:

Originally Posted by Kashif_Bash (Post 4661351)

Please note: I want to get last occurrence of word "address 5" AND "usb"

Quote:

Originally Posted by danielbmartin (Post 4661948)

With respect, you have shown how to use grep with OR. OP wants to use grep with AND.

Thank you for being respectful. Also with respect, I believe I already mentioned how to match a line with two specific strings:

Code:

grep -e 'USB.*address 5' /var/log/messages

That would be something like an AND, "match 'USB' AND 'address 5'". I also mentioned the alternation, which is an OR as you point out:

Code:

grep -e 'USB.*address\|EXT4.*mounted' /var/log/messages

Match lines with "'USB' AND 'address' OR 'EXT4' AND 'mounted'".

But perhaps I'm completely misunderstanding the point of the OP. Would the OP be so kind as to further comment on the answers?

ok. sorry for any confusion that I'm making. Here is exact scenario:

I have following line in log file:

Quote:

Apr 24 14:32:15 box2 kernel: usb 1-3: new high speed USB device using ehci_hcd and address 3

in log file, after above line, usb detail is given that I'm trying to get.

and I have only two words with me

Quote:

"usb 1-3" and "address 3"

Now I want to go in log file. and find last occurrence of these words (together) in line and then read next 16 lines (as these lines contains usb detail like size, serial number, manufacturer etc)

@danielbmartin: egrep didn't work.
tried this:

Quote:

egrep -A 4 "usb 1-3" /var/log/messages |egrep -A 4 "address 3" |tail -8 > tempfile

got this:

Quote:

Apr 18 20:55:24 box2 kernel: usb 4-1: New USB device found, idVendor=0411, idProduct=0105
Apr 18 20:55:24 box2 kernel: usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=5
--
Apr 24 14:32:15 box2 kernel: usb 1-3: new high speed USB device using ehci_hcd and address 3
Apr 24 14:32:15 box2 kernel: usb 1-3: New USB device found, idVendor=0411, idProduct=0105
Apr 24 14:32:15 box2 kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Apr 24 14:32:15 box2 kernel: usb 1-3: Product: USB-SATA Bridge
Apr 24 14:32:15 box2 kernel: usb 1-3: Manufacturer: BUFFALO

I hope I made it clear. if still not then do let me know. thanks for the efforts of all guys.

Try this:

Code:

egrep -A 16 'usb 1-3.*address 3' logfile

That should give you all lines matching 'usb 1-3', any amount of characters and then 'address 3'. The '-A 16' parameter displays the next 16 lines after the matches.

I just added a post to your other related thread.

http://www.linuxquestions.org/questi...ic-word-940850

Questions concerning a single topic should really be kept to one thread, to keep down the amount of duplicated effort.