LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   GREP - reg exp to find words ending with .V and .TO (https://www.linuxquestions.org/questions/linux-server-73/grep-reg-exp-to-find-words-ending-with-v-and-to-512399/)

vikram_cvk 12-20-2006 08:14 PM

GREP - reg exp to find words ending with .V and .TO
 
Hello Experts,

Im trying to extract the words ending with .V or .TO from the following list

AVWI.OB
AVX
AVX-P.V
AVXT.OB
AVY
AVZ
AVZ.TO
AW
AWB.V
AWBC
AWC
AWF
AWG
AWX.V
AXA

Im new to grep/regexp, Im breaking my head trying to figure out wot would be the righ grep command for that ...

I tried the following options , but im not able to construct the correct regular expression

$ grep -o '[ A-Z][a-z].V' StockList30.txt
$ grep -o '[ A-Z][a-z].TO' StockList30.txt

Plz help me out in constructing the right grep command to retrieve words ending with .V and .TO.


Thx in advance.

Regards
Swiftguy

kbrede 12-20-2006 08:26 PM

egrep -e '\.TO|\.V' test.txt
works for me.

vikram_cvk 12-20-2006 09:37 PM

GREP - reg exp to find words ending with .V and .TO
 
Hello kbrede,

Thx for providing the solution, im trying to extract the words ending with .V or .TO from a html file, How can i avoid the complete line and only the search strings to be displayed(cleaning the html code) ?

Following is the result when i tried your solution on a html file,




$egrep -e '\.TO|\.V' StockList30.html (enter) ..sample output is..



<td><font color="Black"><a href="http://charts.tradingchief.com/DBDemos/CustomChart.aspx?Symbol=AVV.V">Chart</a></font></td><td><font color="Black">AVV.V</font></td><td><font color="Black">AVANTEC TECHNOLOGIES INC. (Tier</font></td><td><font color="Black">2006-12-19</font></td><td><font color="Black">0.05</font></td><td><font color="Black">0.05</font></td><td><font color="Black">0.05</font></td><td><font color="Black">0.05</font></td><td><font color="Black">0.05</font></td><td><font color="Black">14000.00</font></td><td><font color="Red">-9.09 %</font></td><td><font color="Black">CDNX</font></td>


<td><font color="Black"><a href="http://charts.tradingchief.com/DBDemos/CustomChart.aspx?Symbol=AVZ.TO">Chart</a></font></td><td><font color="Black">AVZ.TO</font></td><td><font color="Black">AMVESCAP Inc</font></td><td><font color="Black">2006-12-20</font></td><td><font color="Black">13.00</font></td><td><font color="Black">12.99</font></td><td><font color="Black">12.99</font></td><td><font color="Black">12.85</font></td><td><font color="Black">12.90</font></td><td><font color="Black">24078.00</font></td><td><font color="Red">-0.77 %</font></td><td><font color="Black">Toronto</font></td>


I want only the words in bold i.e AVZ.TO and AVV.V to be displayed.

Plz help me out

Thanking you,

Regards

zetabill 12-20-2006 09:55 PM

Code:

grep -o '[A-Z]*\.V\|[A-Z]*\.TO' StockList30.html
Is this a homework question?

vikram_cvk 12-22-2006 10:22 AM

Hi,

This is no homework job! im new to grep and i need a quick solution to extract data for a small task on hand.

Thx for the solution,

zetabill 12-22-2006 01:47 PM

Okay... It just sounded a lot like a homework question I once had in my bash class some moons ago. I'm glad that it worked for you though. That's why we're here.

For future reference:
Code:

grep -o 'regular expression' StockList30.html
will find in the html file only the part of the line that matches the regular expression. Since you were trying to find something that isn't part of another word or phrase (I.E. it's wrapped in tags or whitespaces) this works just fine... but I think you got that already.
Code:

[A-Z]*\.V
This will look for zero or more capital letters followed by a ".V". The backslash is to escape the period because it is a special character and escaping it tells grep that you're looking for a period and not any single character.
Code:

\|
Think of this as a separator... an OR if you will. It needs to be escaped as well.
Code:

[A-Z]*\.TO
Same as before but with the TO extension.


All times are GMT -5. The time now is 10:22 AM.