LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-30-2012, 11:19 AM   #1
aswani
LQ Newbie
 
Registered: Jun 2011
Location: Bangalore, India
Distribution: Open SuSe11.4
Posts: 12

Rep: Reputation: Disabled
how to use grep to search for words/sentences starting with hyphen?


Hello,

I am trying to use grep to select out sentences by searching for words which start with a number. There is a problem in using grep when the number is negative and has a negative sign (hyphen) prior to it.

The data file looks like this:

PHP Code:
 -2 LEU  OMEGA  -180.0 +/-    0.0  1.000
 
-2 LEU  PHI     155.0 +/-   69.6  0.420
 
-2 LEU  CHI1   -105.9 +/-   49.4  0.772
 
-2 LEU  CHI2    128.3 +/-   30.4  0.872
 
-2 LEU  CHI31   -55.3 +/-   89.4  0.243
 
-2 LEU  CHI32   -31.6 +/-   86.0  0.333
 
-2 LEU  PSI      81.2 +/-  101.1  0.122
 
-1 GLY  OMEGA  -180.0 +/-    0.0  1.000
 
-1 GLY  PHI     159.6 +/-   65.0  0.581
 
-1 GLY  PSI     144.5 +/-   79.4  0.316
  0 SER  OMEGA  
-180.0 +/-    0.0  1.000
  0 SER  PHI     
-77.0 +/-    3.1  0.999
  0 SER  CHI1   
-130.8 +/-  101.3  0.204
  0 SER  CHI2    156.9 
+/-   83.8  0.295
  0 SER  PSI     
-32.2 +/-    4.7  0.997
  1 MET  OMEGA  
-180.0 +/-    0.0  1.000
  1 MET  PHI      46.2 
+/-   87.4  0.316
  1 MET  CHI1   
-121.0 +/-   92.3  0.438
  1 MET  CHI2    178.0 
+/-   59.5  0.558
  1 MET  CHI3    150.4 
+/-   76.4  0.373 
The following command works:
Code:
[aswani@maruthi]$ grep -E "\<1 MET" ./1ZU2.angle
  1 MET  OMEGA  -180.0 +/-    0.0  1.000
  1 MET  PHI      46.2 +/-   87.4  0.316
  1 MET  CHI1   -121.0 +/-   92.3  0.438
  1 MET  CHI2    178.0 +/-   59.5  0.558
  1 MET  CHI3    150.4 +/-   76.4  0.373
However the following command doesn't work!
Code:
[aswani@maruthi]$ grep -E "\<-1 GLY" ./1ZU2.angle
[aswani@maruthi]$
This command doesn't work either!
Code:
[aswani@maruthi]$ grep -E "-1 GLY" ./1ZU2.angle
grep: invalid option -- ' '
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
However the following commands work!
Code:
[aswani@maruthi]$ grep -E " -1 GLY" ./1ZU2.angle
 -1 GLY  OMEGA  -180.0 +/-    0.0  1.000
 -1 GLY  PHI     159.6 +/-   65.0  0.581
 -1 GLY  PSI     144.5 +/-   79.4  0.316
[aswani@maruthi]$ grep -E -- "-1 GLY" ./1ZU2.angle
 -1 GLY  OMEGA  -180.0 +/-    0.0  1.000
 -1 GLY  PHI     159.6 +/-   65.0  0.581
 -1 GLY  PSI     144.5 +/-   79.4  0.316
[aswani@maruthi]$

I am wondering how I could use grep to look for words which start with a hyphen!


I am unable to figure out right expression so that I can incorporate it in my shell script!


The anchor " \< " doesn't seem to work when the first character is a hyphen in the expression!


Any suggestions will be of great help to me.

regards,
Aswani
 
Old 08-30-2012, 11:54 AM   #2
raskin
Senior Member
 
Registered: Sep 2005
Location: Russia
Distribution: NixOS (http://nixos.org)
Posts: 1,893

Rep: Reputation: 68
Problem: grep sees "-", tries to parse arguments.

You can try one of two solutions:

-- tells grep that arguments are over. So,
Code:
grep -E -- "-1 GLY" ./1ZU2.angle
Second option:
Code:
grep -E "[-]1 GLY" ./1ZU2.angle
 
Old 08-30-2012, 12:22 PM   #3
aswani
LQ Newbie
 
Registered: Jun 2011
Location: Bangalore, India
Distribution: Open SuSe11.4
Posts: 12

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by raskin View Post
Problem: grep sees "-", tries to parse arguments.

You can try one of two solutions:

-- tells grep that arguments are over. So,
Code:
grep -E -- "-1 GLY" ./1ZU2.angle
Second option:
Code:
grep -E "[-]1 GLY" ./1ZU2.angle
Thanks for the reply.

But the problem is that I need to use the anchoring character "\<" so as to avoid hits which are not true. For example "1 GLY" also matches "11 GLY". In order to overcome that I would use "\<1 GLY". However I am not able to use these anchoring characters "\<" in the above two ways that you suggested, i.e. when the hyphen is present in the search expression.
 
Old 08-30-2012, 12:39 PM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,513

Rep: Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895
Your examples are not exactly accurate to what you asked for. Yes searching for a single 1 without boundary can match multiple 1's like 11, however, by placing the minus in to
allow searching for a negative number negates the possibility of multiples as you indicate there is a space and then characters, so:
Code:
grep -E -- "-1 GLY" ./1ZU2.angle
This will only ever match -1 and never -11.
 
Old 08-30-2012, 05:53 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Another way to do it is to use the -e option. It also allows you to use multiple patterns in one command.

Code:
grep -E -e "-1 GLY" ./1ZU2.angle
As for the second problem, Both <space> and "-" are non-word characters, so there is no word boundary there to match. The characters considered word characters are "a-zA-Z0-9_" (exact sequence depends on locale). You need a transition between word and non-word characters in order to use "\<", "\>" and "\b".

It appears to be unnecessary to use them anyway, as grail pointed out. You appear to have an exact, unique pattern to match. Unless, of course, there can be multiple "-1 GLY" entries on the line, in which case you'll have to extend the pattern to something that targets the one you want uniquely.

BTW, consider using the -F option when you have a fixed string to match, and don't need a regex.
 
1 members found this post helpful.
Old 08-31-2012, 12:44 AM   #6
aswani
LQ Newbie
 
Registered: Jun 2011
Location: Bangalore, India
Distribution: Open SuSe11.4
Posts: 12

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Your examples are not exactly accurate to what you asked for. Yes searching for a single 1 without boundary can match multiple 1's like 11, however, by placing the minus in to
allow searching for a negative number negates the possibility of multiples as you indicate there is a space and then characters, so:
Code:
grep -E -- "-1 GLY" ./1ZU2.angle
This will only ever match -1 and never -11.
Thanks for the reply. Yeah, I agree with you that my examples don't match exactly with what I asked for. This is so because, I encounter both the type of situations i.e. looking for words starting with negative numbers as well as words starting with positive numbers. ( I have not pasted the entire data file as it is quite large). If I do not put the characters "\<" in the expression, I get wrong search results for words starting with positive numbers. (for example, "1 MET" also matches "11 MET"). If I put the characters "\<" in the search expression, then I do not get any results for the search words which start with negative numbers.

It looks like I need to use "\<" in a way it works with words starting with hyphen. Once I figure out the right expression I will then put it in a larger shell script.

Thanks and regards,
Aswani
 
Old 08-31-2012, 01:06 AM   #7
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,737

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
what about:
grep -E -- '-?\<[0-9] GLY' file
 
Old 08-31-2012, 01:37 AM   #8
aswani
LQ Newbie
 
Registered: Jun 2011
Location: Bangalore, India
Distribution: Open SuSe11.4
Posts: 12

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
Another way to do it is to use the -e option. It also allows you to use multiple patterns in one command.

Code:
grep -E -e "-1 GLY" ./1ZU2.angle
As for the second problem, Both <space> and "-" are non-word characters, so there is no word boundary there to match. The characters considered word characters are "a-zA-Z0-9_" (exact sequence depends on locale). You need a transition between word and non-word characters in order to use "\<", "\>" and "\b".

It appears to be unnecessary to use them anyway, as grail pointed out. You appear to have an exact, unique pattern to match. Unless, of course, there can be multiple "-1 GLY" entries on the line, in which case you'll have to extend the pattern to something that targets the one you want uniquely.

BTW, consider using the -F option when you have a fixed string to match, and don't need a regex.
Hi David,
Thanks for the reply.

As I have explained in my reply to Grail, the search strings are not unique and they are sometimes a sub-string of a larger string. So I need to use the characters "\<" in the search expression.

According to you, we can not use the characters "\<" with non-word characters like "-". Is there a way to over come this? I am actually reading the search string from another file which looks like this.
PHP Code:
[aswani@maruthi]$ head -n 10 1ZU2.seq
  
-4 GLY
  
-3 PRO
  
-2 LEU
  
-1 GLY
   0 SER
   1 MET
   2 ASP
   3 THR
   4 GLU
   5 THR 
I read line by line from this file and use it to search for the required lines from the data file. So I am looking for an expression which will work for both kinds of strings, those starting with a positive number as well as those starting with a negative number (and hence a hyphen).

The bigger script looks similar to this:
Code:
(while read i
do
        grep -E "\<$i  PHI" 1ZU2.angle > phi.out 
        n=$?
       
        if [ $n -eq 0 ]
        then cat phi.out >> 1ZU2.phi
        fi

        if [ $n -eq 1 ]
        then echo "$i  PHI      ---  ---    ---   ---" >> 1ZU2.phi
        fi

done)<1ZU2.seq

How can I put multiple patterns in one command with the -e option?

Many thanks for your help.

regards,
Aswani
 
Old 08-31-2012, 02:23 AM   #9
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,737

Rep: Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265Reputation: 1265
Quote:
Originally Posted by aswani View Post
According to you, we can not use the characters "\<" with non-word characters like "-". Is there a way to over come this?
what about:
grep -E -- '-?\<[0-9] GLY' file
this definitely work. From the other hand why do you want to use grep, there are other possibilities, for example awk, perl...
 
Old 08-31-2012, 05:04 AM   #10
raskin
Senior Member
 
Registered: Sep 2005
Location: Russia
Distribution: NixOS (http://nixos.org)
Posts: 1,893

Rep: Reputation: 68
What about
Code:
grep -E -- "(^|[^A-Za-z_0-9])-[0-9] GLY"
 
Old 08-31-2012, 06:01 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,513

Rep: Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895
How about a little trickery:
Code:
#!/bin/bash

while read -r line
do
		[[ $line =~ ^- ]] || line="\<$line"

		if grep -qE -- "$line  PHI" 1ZU2.angle
		then
				echo found $line
		fi
done<1ZU2.seq
 
1 members found this post helpful.
Old 09-01-2012, 07:40 AM   #12
aswani
LQ Newbie
 
Registered: Jun 2011
Location: Bangalore, India
Distribution: Open SuSe11.4
Posts: 12

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
what about:
grep -E -- '-?\<[0-9] GLY' file
this definitely work. From the other hand why do you want to use grep, there are other possibilities, for example awk, perl...
Hi,
Thanks for your reply and the expression. I couldn't follow the expression fully (the question mark part!). Moreover, as I mentioned in one of my previous replies to Grail, I encounter two different kinds of words.The above expression will not work for some of them.

Nevertheless, I overcome this problem using Grail's latest reply in this thread. Using a conditional expression.

I know how to use awk to some extent. But for some reason, I feel that, grep is faster many times. I don't know if I am right! That is why I prefer to use grep wherever possible.

best wishes,
Aswani
 
Old 09-01-2012, 07:44 AM   #13
aswani
LQ Newbie
 
Registered: Jun 2011
Location: Bangalore, India
Distribution: Open SuSe11.4
Posts: 12

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
How about a little trickery:
Code:
#!/bin/bash

while read -r line
do
		[[ $line =~ ^- ]] || line="\<$line"

		if grep -qE -- "$line  PHI" 1ZU2.angle
		then
				echo found $line
		fi
done<1ZU2.seq

Hello Grail,

Many thanks for the reply and your time. The trick seems to work well. I figured out what you are doing in the script too.

I was just wondering why I didn't think about it myself! That would have saved a lot of time for you guys.

thanks and regards,

Aswani
 
  


Reply

Tags
egrep, grep, shell script, word


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I use grep to find two words near each other? walterbyrd Linux - Software 4 12-09-2011 11:01 PM
[SOLVED] LQ Search: Can there be a way to search for tiny words? GrapefruiTgirl LQ Suggestions & Feedback 6 02-02-2010 05:58 PM
How to cd into dir starting with hyphen? JussiKp Linux - Newbie 2 11-05-2006 08:59 PM
MySQL Fulltext search hyphen workaround using charsets MicahCarrick Linux - Software 0 10-11-2006 01:25 PM
Search and Replace: Asian Words to English Words ieeestd802 Linux - Software 0 10-27-2004 07:48 PM


All times are GMT -5. The time now is 06:49 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration