exract string between two different characters
Hi
I have these code: Code:
<span class="resultsAd">VOLVO C70 2.3</span><br> Code:
VOLVO C70 2.3 |
|
Quote:
Although sed can be used for the specific example given by you, it is rather hard (close to impossible) to use sed when the html open and close tags aren't on the same line. This would work for your specific example: Code:
sed -r 's%.*sAd">(.*)</sp.*%\1%' input |
This would work for your specific example:
Code:
cut -d\> -f2- $InFile \ |
This would work for your specific example:
Code:
awk -F "<|>" '{print $3}' $InFile >$OutFile |
I would use a program that is already able to extract the text. The lynx text based web browser is excellent for this type of thing.
Code:
syd@computer:~/Desktop$ lynx --dump a.html |
sed -rne 's/(<.*">)(.*)(<\/.*>)/\2/p'
|
sed 's/<[^>]*>//g' test
EDIT: add an i switch to make the file changes permanent. => sed -i 's/<[^>]*>//g' test |
if you have Ruby, you can use Nokogiri to parse your html
Code:
require 'rubygems' Code:
# ruby test.rb |
All times are GMT -5. The time now is 09:00 PM. |