LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   how to select the table based on regular expression (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-select-the-table-based-on-regular-expression-4175439204/)

upendra_35 11-28-2012 05:10 PM

how to select the table based on regular expression
 
I have a big table consisting of five columns (see below). I want to filter the table in such a way that only ".1"s (3rd column) are remained in the final table.

I tried to use grep ".1$" but somehow the second row gets included as well. Can someone help me with this.

Thanks
Upendra

PHP Code:

AT1G53670       gene:2024816    AT1G53670.1     located in      chloroplast stroma      GO:0009570      
AT1G53670       gene
:4515100791 AT1G53670.2     has     peptide-methionine (S)-S-oxide reductase activity       GO:0008113      
AT1G53670       gene
:2024816    AT1G53670.1     has protein modification of type        N-terminal protein myristoylation       GO:0006499 


towheedm 11-28-2012 09:17 PM

Firstly, you should use code tags and not PHP tags.

I'm not quite certain whether you would like to show the entire line that contains .1 in the third field or just the contents of the third field that contains .1

In either case, you regex ".1$" means to find any line that ends (the $) with any character (the .) followed by a 1. Since the . is a regex meta-character, it must be escaped if you to include it as part of your regex. So it's strange that grep would return any lines with the regex given.

To list any line that contains a .1 use:
Code:

grep "\.1" < /path/to/file
If you need to return just the third field you will need to use SED or AWK. Even a simple cut can work (assuming your field delimiter is a tab):
Code:

grep "\.1" < /path/to/table | cut -f3
Hope it helps.

upendra_35 11-28-2012 09:34 PM

Quote:

Originally Posted by towheedm (Post 4839173)
Firstly, you should use code tags and not PHP tags.

I'm not quite certain whether you would like to show the entire line that contains .1 in the third field or just the contents of the third field that contains .1

In either case, you regex ".1$" means to find any line that ends (the $) with any character (the .) followed by a 1. Since the . is a regex meta-character, it must be escaped if you to include it as part of your regex. So it's strange that grep would return any lines with the regex given.

To list any line that contains a .1 use:
Code:

grep "\.1" < /path/to/file
If you need to return just the third field you will need to use SED or AWK. Even a simple cut can work (assuming your field delimiter is a tab):
Code:

grep "\.1" < /path/to/table | cut -f3
Hope it helps.

Thanks.....

I didn't realize that ".1$" will look for .1 at the end of the line and so that might be reason for me getting the 2nd row with the my grep pattern. Anyway i figured out a few minutes of how i would like to use
Code:

.1$/b/
which filters the table based on third column.

Thanks anyway for your help

David the H. 11-30-2012 08:19 AM

Although you should try to use grep whenever possible, because it's lighter, more generally awk is the tool to use when working with columnized data.

Code:

awk '$3 ~ /[.]1$/ { print }'
If field 3 matches the given /regex/ pattern, then print it.

Notice also, BTW, that in regex "." means "any character", and so it needs to be either escaped or bracketed to make it literal.

And actually, since the default action on a positive match is to print the line, the "{ print }" part can be left off in this particular case.

Here are a few useful awk references:
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/man...ode/index.html
http://www.pement.org/awk/awk1line.txt
http://www.catonmat.net/blog/awk-one...ined-part-one/


All times are GMT -5. The time now is 01:49 AM.