LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   get two/more specific words on a line and print next few lines (http://www.linuxquestions.org/questions/programming-9/get-two-more-specific-words-on-a-line-and-print-next-few-lines-941412/)

Kashif_Bash 04-24-2012 01:17 AM

get two/more specific words on a line and print next few lines
 
currently I am searching one word at last occurrence in file for which I use following thing.

Code:

grep -A 20 -e 'address 5' input | tail -n 21 > output

This works great for last occurrence searching for one word and then print that line and next 20 lines. but If i want to search 2 or 3 words? like last occurrence of "address 5" and "usb", together in one line... for example as in following line:

Code:

Apr 20 15:58:47 box2 kernel: usb 1-3: USB disconnect, address 5
now i want to search that line which has words "address 5" and "usb" and it will be last combined occurrence of these words in this file. or if there can be any generic way, like 3 words , 4 words???

Please note: I want to get last occurrence of word "address 5" AND "usb" in file and next 20 lines. thanks in advance

Diantre 04-24-2012 03:05 AM

Use a regular expression in the pattern for grep. For instance:

Code:

grep -e 'USB.*address 5' /var/log/messages
That will match lines that contain any amount of characters between 'USB' and 'address 5'

danielbmartin 04-24-2012 10:07 AM

Quote:

Originally Posted by Diantre (Post 4661431)
Use a regular expression in the pattern for grep. For instance:

Code:

grep -e 'USB.*address 5' /var/log/messages
That will match lines that contain any amount of characters between 'USB' and 'address 5'

Nitpick: this grep finds the line or lines where "address 5" follows "USB." OP asked for a way to find lines which contain both strings. Your RegEx fits his example but his words were more general. Perhaps a more elaborate RegEx will serve the purpose. However, if OP wants to identify lines which contain 3 or 4 strings without specifying their sequential relationship the complexity is compounded.

Daniel B. Martin

makyo 04-24-2012 10:22 AM

Hi.

I have used glark for this kind of work:
Code:

NAME
      glark - Search text files for complex regular expressions

SYNOPSIS
      glark [options] expression file ...

DESCRIPTION
      Similar to "grep", "glark" offers: Perl-compatible regular expressions,
      color highlighting of matches, context around matches, complex
      expressions ("and" and "or"), grep output emulation, and automatic
      exclusion of non-text files. Its regular expressions should be familiar
      to persons experienced in Perl, Python, or Ruby. File may also be a
      list of files in the form of a path.
( ... )

It was in the Debian repositories. The web site is http://www.incava.org/projects/glark

The code is a Ruby script, and it is fairly slow, of the 20+ utilities and languages that I compared for string searches, it was near the lower end for speed: e.g. 10 times slower than sed. The flexibility, however, is amazing.

Best wishes ... cheers, makyo

danielbmartin 04-24-2012 10:39 AM

Quote:

Originally Posted by Kashif_Bash (Post 4661351)
... can be any generic way, like 3 words ,4 words???

This is not elegant but it works. For testing I used the Edgar Allen Poe poem "The Raven". This code seeks the last line containing "chamber" and "visitor" and "entrance" in any order, prints that line and the following 4 lines.

Code:

egrep  -A 4 "chamber" $InFile \
|egrep -A 4 "visitor"        \
|egrep -A 4 "entrance"        \
|tail -5                      \
> $OutFile3

Daniel B. Martin

Diantre 04-24-2012 12:48 PM

Quote:

Originally Posted by danielbmartin (Post 4661776)
Nitpick: this grep finds the line or lines where "address 5" follows "USB." OP asked for a way to find lines which contain both strings. Your RegEx fits his example but his words were more general. Perhaps a more elaborate RegEx will serve the purpose. However, if OP wants to identify lines which contain 3 or 4 strings without specifying their sequential relationship the complexity is compounded.

Yes, you are right. A regular expression to find 3 or 4 words could be something like this:

Code:

grep -e 'USB\|address\|mount' /var/log/messages
That would find lines containing 'USB' or 'address' or 'mount'.

If, for instance, one wants to find lines that contain 'address' following 'USB' or 'mount' following 'EXT4', it could be written:

Code:

grep -e 'USB.*address\|EXT4.*mounted' /var/log/messages
More regular expressions can be added using the alternation character (|).

danielbmartin 04-24-2012 01:01 PM

Quote:

Originally Posted by Diantre (Post 4661931)
Code:

grep -e 'USB\|address\|mount' /var/log/messages
That would find lines containing 'USB' or 'address' or 'mount'.

With respect, you have shown how to use grep with OR. OP wants to use grep with AND. He said
Quote:

like last occurrence of "address 5" and "usb", together in one line
Daniel B. Martin

makyo 04-24-2012 01:29 PM

Hi.

Here is an example of glark on chapter one of Moby Dick:
Code:

#!/usr/bin/env bash

# @(#) s1        Demonstrate multiple-pattern match, any order, glark.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C glark specimen

FILE=${1-data1}

# Use edges if specimen not available.
pl " Sample of input data file $FILE:"
specimen $FILE
# edges $FILE

pl " Results:"
glark --after-context=1 --no-highlight \( me --and swayed --and that \) $FILE

exit 0

producing:
Code:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
glark version 1.8.0
specimen (local) 1.17

-----
 Sample of input data file data1:
Edges: 5:0:5 of 206 lines in file "data1"
# Moby Dick, Chapter 1 The Loomings.  Page numbers removed.
# obtained from gopher at University of Minnesota, 94.09.16.

Call me Ishmael.  Some years ago--never mind how long precisely
--having little or no money in my purse, and nothing particular
  ---
flood-gates of the wonder-world swung open, and in the wild
conceits that swayed me to my purpose, two and two there floated
into my inmost soul, endless processions of the whale, and, mid
most of them all, one grand hooded phantom, like a snow hill in
the air.

-----
 Results:
  203 : conceits that swayed me to my purpose, two and two there floated
  204 + into my inmost soul, endless processions of the whale, and, mid

Leading line identification can be omitted with an option. The same technique with tail could be used to get the last segment matched ... cheers, makyo

Diantre 04-24-2012 01:34 PM

Quote:

Originally Posted by Kashif_Bash (Post 4661351)
Please note: I want to get last occurrence of word "address 5" AND "usb"

Quote:

Originally Posted by danielbmartin (Post 4661948)
With respect, you have shown how to use grep with OR. OP wants to use grep with AND.

Thank you for being respectful. Also with respect, I believe I already mentioned how to match a line with two specific strings:

Code:

grep -e 'USB.*address 5' /var/log/messages
That would be something like an AND, "match 'USB' AND 'address 5'". I also mentioned the alternation, which is an OR as you point out:

Code:

grep -e 'USB.*address\|EXT4.*mounted' /var/log/messages
Match lines with "'USB' AND 'address' OR 'EXT4' AND 'mounted'".

But perhaps I'm completely misunderstanding the point of the OP. Would the OP be so kind as to further comment on the answers?

Kashif_Bash 04-24-2012 05:31 PM

ok. sorry for any confusion that I'm making. Here is exact scenario:

I have following line in log file:

Quote:

Apr 24 14:32:15 box2 kernel: usb 1-3: new high speed USB device using ehci_hcd and address 3
in log file, after above line, usb detail is given that I'm trying to get.

and I have only two words with me

Quote:

"usb 1-3" and "address 3"
Now I want to go in log file. and find last occurrence of these words (together) in line and then read next 16 lines (as these lines contains usb detail like size, serial number, manufacturer etc)

@danielbmartin: egrep didn't work.
tried this:

Quote:

egrep -A 4 "usb 1-3" /var/log/messages |egrep -A 4 "address 3" |tail -8 > tempfile
got this:
Quote:

Apr 18 20:55:24 box2 kernel: usb 4-1: New USB device found, idVendor=0411, idProduct=0105
Apr 18 20:55:24 box2 kernel: usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=5
--
Apr 24 14:32:15 box2 kernel: usb 1-3: new high speed USB device using ehci_hcd and address 3
Apr 24 14:32:15 box2 kernel: usb 1-3: New USB device found, idVendor=0411, idProduct=0105
Apr 24 14:32:15 box2 kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Apr 24 14:32:15 box2 kernel: usb 1-3: Product: USB-SATA Bridge
Apr 24 14:32:15 box2 kernel: usb 1-3: Manufacturer: BUFFALO

I hope I made it clear. if still not then do let me know. thanks for the efforts of all guys.

Diantre 04-24-2012 05:56 PM

Try this:

Code:

egrep -A 16 'usb 1-3.*address 3' logfile
That should give you all lines matching 'usb 1-3', any amount of characters and then 'address 3'. The '-A 16' parameter displays the next 16 lines after the matches.

David the H. 04-26-2012 12:15 AM

I just added a post to your other related thread.

http://www.linuxquestions.org/questi...ic-word-940850

Questions concerning a single topic should really be kept to one thread, to keep down the amount of duplicated effort.


All times are GMT -5. The time now is 12:32 AM.