LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   bash print only found words (http://www.linuxquestions.org/questions/linux-software-2/bash-print-only-found-words-923420/)

newbie0101 01-12-2012 04:56 AM

bash print only found words
 
hello, i have a file that contains a lot of strings.
my goal is to print only words i need.
for example i am searching for part of words ab, cd, ef.
those words might be for example able, abnormal... whatever.
when i use:
Code:

egrep '(ab|cd|ef)' file.txt
it prints whole sentences which contain them but i want get printed only the words, not sentences that contain ab, cd, ef.

how can i do that ?
thanks

druuna 01-12-2012 05:24 AM

Hi,

Using the -o option prints only the matching part. But.....

In your example using the -o options would only print the ab (cd or ef) part and not the complete word. You need to expand the regular expression to do that. Have a look at this:
Code:

egrep -o '\<[a-z]*(ab|cd|ef)[a-z]*\>' infile
- the \< and \> are word bounderies, they make sure only complete words are matched,
- the [a-z]* parts are there to match possible other parts of the words.

Example:
Code:

$ cat infile
this is not abnormal
coral reef
acda and whatever
an abnormal reef

$ egrep -o '\<[a-z]*(ab|cd|ef)[a-z]*\>' infile
abnormal
reef
acda
abnormal
reef

Hope this helps.

newbie0101 01-12-2012 06:09 AM

thank you it works ! i just added [a-z,0-9] because i also needed numbers

druuna 01-12-2012 07:46 AM

Hi,
Quote:

Originally Posted by newbie0101 (Post 4572689)
thank you it works ! i just added [a-z,0-9] because i also needed numbers

[a-z,0-9] should be [a-z0-9] (unless you need to include the comma in the search pattern)

You might want to use [[:alnum:]] instead. This would be the same as: [0-9A-Za-z]

BTW: You're welcome :)

newbie0101 01-13-2012 02:15 AM

one more question yet :)
if the file contains for example more words "able" on the same or different line, how can i print it only once ?
because now i am getting certain words more times

sycamorex 01-13-2012 03:09 AM

Quote:

Originally Posted by newbie0101 (Post 4573407)
one more question yet :)
if the file contains for example more words "able" on the same or different line, how can i print it only once ?
because now i am getting certain words more times

You could pipe it to 'uniq'


All times are GMT -5. The time now is 12:46 PM.