[SOLVED] Search different strings in a single (same) line

shivaa · 11-01-2012, 09:17 AM

Suppose I've a large text file, with content like this:

Quote:

This is for testing Linux, as it is better then any other operating system.
This is for testing Windows, as it is better then any other operating system.
This is for testing Solaris, as it is better then any other operating system.
This is for testing Mac, as it is better then any other operating system.
This is for testing Linux, as it is better then any other operating system.
.... And so on!

I want to filter and display only those lines which contains all 3 strings (i.e. words) "Solaris", "better" "system". How can I achieve this?

One more thing, egrep '(string1|string2) or grep -E '(string1|string2)' displays all lines that either contains both strings or string1 or string2, but I want only those lines having both strings. Although it can be achieved simply by using grep cmd multiple times for every string, but don't want this. So is there any replacement for this?

sag47 · 11-01-2012, 09:50 AM

By default bash uses Basic Regular Expressions (BRE). This means certain regular expression characters lose their special meaning so you'll have to use the backslash escape character to give it special meaning again...

Code:

grep 'Solaris\|better\|system' somefile

To match all words you'll require multiple grep statements because non-matching lines will be filtered out with each subsequent grep.

Code:

grep 'Solaris' somefile | grep 'better' | grep 'system'

More on regex and grep
From the grep (GNU grep) 2.10 man page...

Quote:

grep understands three different versions of regular expression syntax: “basic” (BRE), “extended” (ERE) and “perl" (PRCE). In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful.

Quote:

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Code:

   Matcher Selection
       -E, --extended-regexp
              Interpret PATTERN as an extended regular expression (ERE, see below).  (-E is specified by POSIX.)

       -F, --fixed-strings
              Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.  (-F is
              specified by POSIX.)

       -G, --basic-regexp
              Interpret PATTERN as a basic regular expression (BRE, see below).  This is the default.

       -P, --perl-regexp
              Interpret PATTERN as a Perl regular expression (PCRE, see below).  This is highly experimental and grep  -P
              may warn of unimplemented features.

Read your local grep man page for what options are specifically available for you. All greps are not equal.

shivaa · 11-01-2012, 10:09 AM

Thanks @sag47. But the actual situation is little different. File contains lakhs of entries in it with different patterns. So using grep 3 or 4 or multiplpe times is the only option to get desired output? Or is their any better alternative?

sag47 · 11-01-2012, 11:04 AM

Quote:

Originally Posted by shivaa

Thanks @sag47. But the actual situation is little different. File contains lakhs of entries in it with different patterns. So using grep 3 or 4 or multiplpe times is the only option to get desired output? Or is their any better alternative?

grep operates on the stream unlike programs like sort (which takes in all the input and then presents output). So using grep in this manner does not change performance whether you're using it once or 10 times.

That's why grep can work with "tail -f" for example.

shivaa · 12-07-2012, 11:47 AM

Solved with:

Code:

awk '/Solaris/ && /better/ && /system/ {print $0}' filename