LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 02-14-2016, 08:09 PM   #1
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Rep: Reputation: 0
Extract text from text


I have a bash script (.sh file) that puts weather conditions on an image. I am having problems getting the -30 near the end of the string. The text file that I am geting the info from is fairly large but this is the rows of text I am trying to get.

</dd>
<dt>
<a href="http://ec.gc.ca/meteo-weather/default.asp?lang=En&amp;n=5FBF816A-1" title="Wind Chill">Wind Chill</a>:
</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-30</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">-22</dd>
</dl>



I have tried the following but can't get my head around what I am doing.


This gets the correct file but not the correct data (-30)


windchillStr=$(grep -A30 "Ottawa Macdonald-Cartier Int'l Airport" "$weatherFile" | grep -A1 "Wind Chill" | tail -1)

windchill=$(echo $windchillStr | cut -d ">" -f 2 | cut -d "</dd" -f 1)

Any help would be appreciated.
 
Old 02-14-2016, 08:48 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
I might be inclined to some simple sed
Code:
winchill=$(sed -nr '/Wind Chill/{n;n;s/.*>(.*)<.*/\1/p}' "$weatherfile")
look for text, skip down 2 lines, strip out whatever is between > <.
Kiss.

Does make some assumptions about the layout.

Last edited by syg00; 02-14-2016 at 08:49 PM. Reason: last sentence
 
Old 02-14-2016, 09:40 PM   #3
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
I might be inclined to some simple sed
Code:
winchill=$(sed -nr '/Wind Chill/{n;n;s/.*>(.*)<.*/\1/p}' "$weatherfile")
look for text, skip down 2 lines, strip out whatever is between > <.
Kiss.

Does make some assumptions about the layout.
Hi

Thanks for the quick Response but that does not find it either ?

winchill=$(sed -nr '/Wind Chill/{n;n;s/.*>(.*)<.*/\1/p}' "$weatherFile")

I have attached the file I am searching
Attached Files
File Type: txt weather_temp_live.txt (51.3 KB, 21 views)
 
Old 02-14-2016, 10:09 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Did you look at what got assigned ?.
There are two matches on that file, which complicates things a little. Not so much if you (always) want the first occurrence.
I might be inclined to use a bash array to hold all the returned fields.
 
Old 02-14-2016, 10:31 PM   #5
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
Did you look at what got assigned ?.
There are two matches on that file, which complicates things a little. Not so much if you (always) want the first occurrence.
I might be inclined to use a bash array to hold all the returned fields.

Yes the query didn't pick up anything as it looked at the first occurrence. I had this to find the correct place to start from

windchillStr=$(grep -A30 "Ottawa Macdonald-Cartier Int'l Airport" "$weatherFile")


This pulls up the following text: <dd class="mrgn-bttm-0">Ottawa Macdonald-Cartier Int'l Airport</dd> <dt>Date: </dt>
Quote:
<dd class="mrgn-bttm-0">11:00 PM EST Sunday 14 February 2016</dd> </dl> <dl class="dl-horizontal visible-xs wxo-dl-cnd"> <dt>Wind:</dt> <dd class="longContent mrgn-bttm-0 wxo-metric-hide"> <abbr title="South-Southwest">SSW</abbr> 9 <abbr title="kilometres per hour">km/h</abbr> </dd> <dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden"> <abbr title="South-Southwest">SSW</abbr> 6 <abbr title="miles per hour">mph</abbr> </dd> <dt> <a href="http://ec.gc.ca/meteo-weather/default.asp?lang=En&amp;n=5FBF816A-1" title="Wind Chill">Wind Chill</a>: </dt> <dd class="mrgn-bttm-0 wxo-metric-hide">-30</dd> <dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">-22</dd> </dl> </div> <div class="row brdr-tp"> <div id="wxo-conditionscontainer" class="mrgn-tp-sm"> <p id="wxo-detailstag" class="visible-xs mrgn-lft-md">Conditions details</p> <div id="wxo-conditiondetails"> <dl class="col-sm-6 dl-horizontal visible-xs wxo-dl"> <dt>Observed at: </dt> <dd class="mrgn-bttm-0">Ottawa Macdonald-Cartier Int'l Airport</dd> <dt>Date: </dt> <dd class="mrgn-bttm-0">11:00 PM EST Sunday 14 February 2016</dd> </dl> <dl class="col-sm-6 dl-horizontal wxo-dl mrgn-bttm-0"> <dt>Condition:</dt> <dd class="mrgn-bttm-0">Clear</dd> <dt>Pressure:</dt> <dd class="mrgn-bttm-0 wxo-metric-hide">103.4&nbsp;<abbr title="kilopascals">kPa</abbr> </dd> <dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">30.5&nbsp;inches</dd> <dt>Tendency:</dt> <dd class="mrgn-bttm-0">falling</dd> <dt>Visibility:</dt> <dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr> </dd> <dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">15 miles</dd> </dl> <dl class="col-sm-6 dl-horizontal wxo-dl mrgn-bttm-0"> <dt>Temperature:</dt> <dd class="mrgn-bttm-0 wxo-metric-hide">-22.7&deg;<abbr title="Celsius">C</abbr> </dd> <dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">-8.9&deg;<abbr title="Fahrenheit">F</abbr> </dd> <dt>Dewpoint:</dt> <dd class="mrgn-bttm-0 wxo-metric-hide">-27.8&deg;<abbr title="Celsius">C</abbr> </dd> <dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">-18.0&deg;<abbr title="Fahrenheit">F</abbr> </dd> <dt>Humidity:</dt> <dd class="mrgn-bttm-0">64%</dd>
I tried this but didn't work

windchill=$(sed -nr '/Wind Chill/{n;n;s/.*>(.*)<.*/\1/p}' "$windchillStr")

Last edited by dingo-den; 02-14-2016 at 10:51 PM.
 
Old 02-14-2016, 10:34 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Very strange - I get two -30 ("-30 -30") assigned to the variable using your file. As it should.
 
Old 02-14-2016, 10:56 PM   #7
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
Very strange - I get two -30 ("-30 -30") assigned to the variable using your file. As it should.
What commands did you run and how do I get just one -30 ??
 
Old 02-15-2016, 06:54 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
I used the command I gave you earlier.

Ahhhh - hang on; are you still using windchillStr ?. Don't.
Use my command is place of both - sorry, I should have been clearer about that. As for only one result, which one ?. What about if there are more - 3, 5, 20, ...
Data providers change format ever so often too.
 
Old 02-15-2016, 01:40 PM   #9
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
one thought:
if you want to get content from a html page, why don't you use the same logic that is at work inside your browser?
it's called xpath (ok i'm simplifying things here, but just read the intro to the tutorial) and there's a command line utility called "xmllint" (included in a package usually called "libxml2" iirc).

i was using the same sed/grep-logic for my weather script - until i bit the bullet and changed it all to xmllint.

have a look at my weather conky & shell script in my github stuff link below.
 
Old 02-15-2016, 07:52 PM   #10
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
I used the command I gave you earlier.

Ahhhh - hang on; are you still using windchillStr ?. Don't.
Use my command is place of both - sorry, I should have been clearer about that. As for only one result, which one ?. What about if there are more - 3, 5, 20, ...
Data providers change format ever so often too.
I am just using your code and it does not find the Wind Chill Temperature when run on the file I attached. It return nothing at all ?

The file format is pretty static it changed in the summer and uses Humidex instead of Wind Chill but nothing major in the last few years.

Last edited by dingo-den; 02-15-2016 at 08:17 PM.
 
Old 02-15-2016, 08:21 PM   #11
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Original Poster
Rep: Reputation: 0
Talking

Quote:
Originally Posted by ondoho View Post
one thought:
if you want to get content from a html page, why don't you use the same logic that is at work inside your browser?
it's called xpath (ok i'm simplifying things here, but just read the intro to the tutorial) and there's a command line utility called "xmllint" (included in a package usually called "libxml2" iirc).

i was using the same sed/grep-logic for my weather script - until i bit the bullet and changed it all to xmllint.

have a look at my weather conky & shell script in my github stuff link below.
I might have to look into this at some time but at 66 learning new things does not come easy
 
Old 02-16-2016, 12:47 PM   #12
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by dingo-den View Post
but at 66 learning new things does not come easy
neither does it at 44, with no IT background.
 
Old 02-17-2016, 07:03 AM   #13
dingo-den
Member
 
Registered: Aug 2005
Location: Apr-Oct Enderby, BC Canada Oct-Mar Somewhere warm
Distribution: Debian 6 and 7
Posts: 46

Original Poster
Rep: Reputation: 0
Answer found.

The answer to my problem is below:
Code:
## Just add the 'q' command to make
## sed quit after it prints the first match
windchill=$(sed -nr '/Wind Chill/{n;n;s/.*>(.*)<.*/\1/p;q}' "$weatherFile")


I also found that I could use the .xml file instead of the web page text.
Code:
data=$(wget -qO- http://www.weather.gc.ca/rss/city/on-118_e.xml)
windchill=$(echo "$data" |sed -nr '/Wind Chill:/{s/.*>\s(.*)\s<.*/\1/p;q}')
echo "Wind Chill: $windchill"
 
Old 02-17-2016, 08:11 PM   #14
sgosnell
Senior Member
 
Registered: Jan 2008
Location: Baja Oklahoma
Distribution: Debian Stable and Unstable
Posts: 1,943

Rep: Reputation: 542Reputation: 542Reputation: 542Reputation: 542Reputation: 542Reputation: 542
In the US, I would never use a webpage to get weather. That's the hard way. It's easy enough to get actual text weather reports from the NWS for any airport which reports weather, and they're easier to parse. But I don't know about the Great White North, things may be very different up there. :-P
 
  


Reply

Tags
bash



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] extract text with sed. sharky Programming 2 09-18-2015 12:44 PM
How do I extract a url with text? LAPIII Linux - Newbie 5 02-16-2012 03:47 PM
[SOLVED] Bash command to 'cut' text into another text file & modifying text. velgasius Programming 4 10-17-2011 04:55 AM
How to extract particular text in a text file maverick_cat Linux - Newbie 3 07-22-2008 02:44 AM
Extract certain text info from text file xmrkite Linux - Software 30 02-26-2008 11:06 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration