LinuxQuestions.org - [SOLVED] how to select the table based on regular expression

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - how to select the table based on regular expression (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-select-the-table-based-on-regular-expression-4175439204/)

how to select the table based on regular expression

I have a big table consisting of five columns (see below). I want to filter the table in such a way that only ".1"s (3rd column) are remained in the final table.

I tried to use grep ".1$" but somehow the second row gets included as well. Can someone help me with this.

Thanks
Upendra

PHP Code:




AT1G53670       gene:2024816    AT1G53670.1     located in      chloroplast stroma      GO:0009570       
AT1G53670       gene:4515100791 AT1G53670.2     has     peptide-methionine (S)-S-oxide reductase activity       GO:0008113       
AT1G53670       gene:2024816    AT1G53670.1     has protein modification of type        N-terminal protein myristoylation       GO:0006499

Firstly, you should use code tags and not PHP tags.

I'm not quite certain whether you would like to show the entire line that contains .1 in the third field or just the contents of the third field that contains .1

In either case, you regex ".1$" means to find any line that ends (the $) with any character (the .) followed by a 1. Since the . is a regex meta-character, it must be escaped if you to include it as part of your regex. So it's strange that grep would return any lines with the regex given.

To list any line that contains a .1 use:

Code:

grep "\.1" < /path/to/file

If you need to return just the third field you will need to use SED or AWK. Even a simple cut can work (assuming your field delimiter is a tab):

Code:

grep "\.1" < /path/to/table | cut -f3

Hope it helps.

Quote:

Originally Posted by towheedm (Post 4839173)

Code:

grep "\.1" < /path/to/file

If you need to return just the third field you will need to use SED or AWK. Even a simple cut can work (assuming your field delimiter is a tab):

Code:

grep "\.1" < /path/to/table | cut -f3

Hope it helps.

Thanks.....

I didn't realize that ".1$" will look for .1 at the end of the line and so that might be reason for me getting the 2nd row with the my grep pattern. Anyway i figured out a few minutes of how i would like to use

Code:

.1$/b/

which filters the table based on third column.

Thanks anyway for your help

Although you should try to use grep whenever possible, because it's lighter, more generally awk is the tool to use when working with columnized data.

Code:

awk '$3 ~ /[.]1$/ { print }'

If field 3 matches the given /regex/ pattern, then print it.

Notice also, BTW, that in regex "." means "any character", and so it needs to be either escaped or bracketed to make it literal.

And actually, since the default action on a positive match is to print the line, the "{ print }" part can be left off in this particular case.

Here are a few useful awk references:
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/man...ode/index.html
http://www.pement.org/awk/awk1line.txt
http://www.catonmat.net/blog/awk-one...ined-part-one/