LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-15-2018, 12:24 AM   #1
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68

Rep: Reputation: 0
Help with Text manipulation


I am not a programmer but I have a web page that I have put together but I am having problems with some coding.

I have a text file that I download with weather information in it and I want to extract certain bits of it.

Quote:
<dd class="mrgn-bttm-0">Ottawa Macdonald-Cartier Int'l Airport</dd>
<dt>Date: </dt>
<dd class="mrgn-bttm-0">9:00 PM EDT Saturday 14 July 2018</dd>
</dl>
<div class="row no-gutters wb-eqht brdr-tp">
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col1">
<dt>Condition:</dt>
<dd class="mrgn-bttm-0">Partly Cloudy</dd>
<dt>Pressure:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">101.4 <abbr title="kilopascals">kPa</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">29.9 inches</dd>
<dt>Tendency:</dt>
<dd class="mrgn-bttm-0">Falling</dd>
</dl></div>
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col2">
<dt>Temperature:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">23.3°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">73.9°
<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Dew point:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">19.4°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">66.9°<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">78%</dd>
</dl></div>
<div class="col-sm-4"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="West">W</abbr> 10 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="West">W</abbr> 6 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
<a href="https://www.canada.ca/en/environment-climate-change/services/seasonal-weather-hazards/spring-summer.html#heat_and_humidity">Humidex</a>:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">30</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">87</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">15 miles</dd>
I need to extract the value for Visibility = 24 above.

I successfully retrieve the Temperature which has the same structure with the following

Code:
tempStr=$(grep -A30 "Ottawa Macdonald-Cartier Int'l Airport" "$weatherFile" | grep -A1 "Temperature" | tail -1)

temperature=$(echo $tempStr| cut -d ">" -f 2 | cut -d "<" -f 1)
But when I try the same for Visibility is does not find anything for the first part visibStr.

There must be a better way to do this and any help would be appreciated.

Last edited by gilesaj001; 07-15-2018 at 12:25 AM.
 
Old 07-15-2018, 01:53 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 20,844

Rep: Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008Reputation: 4008
Quote:
Originally Posted by gilesaj001 View Post
But when I try the same for Visibility is does not find anything for the first part visibStr.
Do you understand what those commands are doing ?. I suspect the "-A30" is insufficient.
Quote:
There must be a better way to do this
Indeed - innumerable ways, but if it works and it's a "one-off" is it worth learning a new "language" just for this ?. If so, do a search for "cli xml parsing linux" or similar. There are stand-alone tools as well packaged modules for languages such as python and perl.
 
Old 07-15-2018, 02:07 AM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,964

Rep: Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217Reputation: 5217
Beautiful Soup is a famous one for Python.
 
Old 07-15-2018, 02:52 AM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 6,903
Blog Entries: 3

Rep: Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585
You'll need an actual HTML parser for dealing with HTML. The python one mentioned above works. There are also perl parsers like HTML::TreeBuilder::XPath or HTML::TokeParser. And there are even separate XPath utilities. The latter might be for you if you don't like scripting.

Just from the page snippet provided, the following XPath might work:

Code:
//dl[dt="Visibility:"]/dd[@class="mrgn-bttm-0 wxo-metric-hide"][2]'
That will give you '24 km' as a result.

However, that relative position [2] leaves a bit up to hoping that they don't change their layout.

Last edited by Turbocapitalist; 07-15-2018 at 02:54 AM.
 
Old 07-15-2018, 08:54 PM   #5
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68

Original Poster
Rep: Reputation: 0
I ended up using the following for the second part of the code.

Code:
visibStr=$(grep -A1 "<dt>Visibility:</dt>" "$weatherFile" | tail -1)
This thread can be closed.
 
Old 03-30-2023, 08:48 PM   #6
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68

Original Poster
Rep: Reputation: 0
I am back again as the site I get the weather info from must have changed something yesterday as all my weather info no longer works.
As I have said this sed stuff scrambles my 73 year old brain and I just can't get it.

So I am trying to extract the following info from the file which the text is below:

Temperature
Tendency
Visibility
Humidex
Condition
Tendency
Humidity
Wind ( as in strength)
Wind ( As in Direction)
Pressure

An example of what I was using for Temperature is as follows:

Code:
weatherFile=/home/www/localhost/htdocs/scripts/weather/weather_temp_live.txt

tempStr=$(grep -A30 "Ottawa Macdonald-Cartier Int'l Airport" "$weatherFile" | grep -A1 "Temperature" | tail -1)

temperature=$(echo $tempStr| cut -d ">" -f 2 | cut -d "<" -f 1)
In the file below that should return 2.9 ?

Any help would be appreciated as I am not a programmer.


The shortend format of weather_temp_live.txt now is as follows:

Code:
class="panel-heading"><h2>Current Conditions<span class="small visible-print-inline-block pull-right">Observed at: Ottawa Macdonald-Cartier Int'l Airport   1:00 AM EDT Wednesday 29 March 2023</span>
</h2></summary><ul class="hidden-print hidden-xs pull-right list-inline mrgn-rght-sm mrgn-bttm-0 wxo-moveup_cur">
<li>
<a class="wxo-metric-hide" href="/past_conditions/index_e.html?station=yow">Past 24 hours</a><a class="wxo-imperial-hide wxo-city-hidden" href="/past_conditions/index_e.html?station=yow">Past 24 hours</a>
</li>
<li class="brdr-lft"><a href="/map_e.html?layers=radar&amp;zoom=-1&amp;center=45.33,-75.58">Weather Radar</a></li>
<li class="brdr-lft"><a href="/satellite/index_e.html#goes_east">Satellite</a></li>
<li class="brdr-lft"><a href="/lightning/index_e.html">Lightning</a></li>
</ul>
<div class="row no-gutters wb-eqht hidden-print">
<div class="col-sm-2 brdr-rght  text-center">
<img width="60" height="51" class="center-block mrgn-tp-md" src="/weathericons/31.gif" alt="Mainly Clear"><p class="visible-xs text-center">Mainly Clear</p>
<div>
<p class="text-center mrgn-tp-md mrgn-bttm-sm lead hidden-xs"><span class="wxo-metric-hide">-3°<abbr title="Celsius">C</abbr></span><span class="wxo-imperial-hide wxo-city-hidden">27°<abbr title="Fahrenheit">F</abbr></span></p>
<p class="text-center mrgn-tp-md mrgn-bttm-sm conds-lead visible-xs hidden-print"><span class="wxo-metric-hide">-3°<abbr title="Celsius">C</abbr></span><span class="wxo-imperial-hide wxo-city-hidden">27°<abbr title="Fahrenheit">F</abbr></span></p>
<ul class="list-inline list-unstyled text-center wxo-imperial-hide wxo-city-hidden hidden-print">
<li><a class="wxo-btn-metric-toggle" href="/city/pages/on-118_metric_e.html" title="Convert to Metric Units">°C</a></li>
<li class="brdr-lft">°<abbr title="Fahrenheit">F</abbr>
</li>
</ul>
<ul class="list-inline list-unstyled text-center wxo-metric-hide hidden-print">
<li>°<abbr title="Celsius">C</abbr>
</li>
<li class="brdr-lft"><a class="wxo-btn-imperial-toggle" href="/city/city_imperial_e.html?id=on-118" data-link-id="on-118_metric_e.html" title="Convert to Imperial Units">°F
                        </a></li>
</ul>
</div>
</div>
<div class="col-sm-10 text-center">
<dl class="dl-horizontal mrgn-bttm-0 hidden-xs wxo-conds-tmp mrgn-tp-sm">
<dt>Observed at:</dt>
<dd class="mrgn-bttm-0">Ottawa Macdonald-Cartier Int'l Airport</dd>
<dt>Date: </dt>
<dd class="mrgn-bttm-0">1:00 AM EDT Wednesday 29 March 2023</dd>
</dl>
<div class="row no-gutters wb-eqht brdr-tp">
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col1">
<dt>Condition:</dt>
<dd class="mrgn-bttm-0">Mainly Clear</dd>
<dt>Pressure:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">102.1 <abbr title="kilopascals">kPa</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">30.2 inches</dd>
<dt>Tendency:</dt>
<dd class="mrgn-bttm-0">Falling</dd>
</dl></div>
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col2">
<dt>Temperature:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-2.9°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">26.8°
					<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Dew point:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-7.5°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">18.5°<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">71%</dd>
</dl></div>
<div class="col-sm-4"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="Southeast">SE</abbr> 4 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="Southeast">SE</abbr> 2 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
 
Old 03-30-2023, 10:43 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,241

Rep: Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713Reputation: 2713
Y: things change...

I honestly agree with Posts 2, 3, 4 - use a dedicated tool or a language that has good modules for parsing eg Perl / Python.

Doing it by hand can be educational, but also tedious ....
 
1 members found this post helpful.
Old 03-31-2023, 07:19 AM   #8
Michael Uplawski
Senior Member
 
Registered: Dec 2015
Location: Non. Je suis propriétaire – No. I am an owner.
Distribution: Apple-selling shops, markets and direct marketing
Posts: 1,493
Blog Entries: 39

Rep: Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774
An XML-parser can be useful in many contexts. If you must learn a new technology to get along, choose one of those.
 
Old 03-31-2023, 08:19 AM   #9
teckk
Senior Member
 
Registered: Oct 2004
Distribution: Arch
Posts: 4,796
Blog Entries: 6

Rep: Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711
Quote:
I am back again as the site I get the weather info from must have changed something yesterday
Post the url for the weather web page.
 
Old 03-31-2023, 08:22 AM   #10
teckk
Senior Member
 
Registered: Oct 2004
Distribution: Arch
Posts: 4,796
Blog Entries: 6

Rep: Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711Reputation: 1711
https://forecast.weather.gov/data/obhistory/CYOW.html
https://w1.weather.gov/data/obhistory/CYOW.html
 
Old 03-31-2023, 08:36 AM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 6,903
Blog Entries: 3

Rep: Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585Reputation: 3585
That one returns JSON, e.g.

https://api.weather.gov/points/39.7456,-97.0892

So a proper JSON parser will be needed. Again Perl or Python can be recommended for doing that.
 
Old 03-31-2023, 09:25 AM   #12
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,261

Rep: Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838
Quote:
Originally Posted by Turbocapitalist View Post
That one returns JSON, e.g.

https://api.weather.gov/points/39.7456,-97.0892

So a proper JSON parser will be needed. Again Perl or Python can be recommended for doing that.
good find. I would suggest python too, but actually you can use jq to parse json documents, so
Code:
curl https://api.weather.gov/points/39.7456,-97.0892 | jq ...
can be used too
 
Old 03-31-2023, 11:41 PM   #13
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by teckk View Post
Post the url for the weather web page.

http://weather.gc.ca/city/pages/on-118_metric_e.html
 
Old 04-04-2023, 01:14 AM   #14
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68

Original Poster
Rep: Reputation: 0
I have the weather info updated using this website https://wttr.in/Ottawa. I followed an example and have it running but the info does not seem to be up to date. It is now Tue Apr 4 01:45:42 EDT 2023 in Ottawa but the weather info from the wttr web site is "2023-04-03 10:56 PM", "observation_time": "02:56 AM", almost a day behind.

https://dingo-den.com

I have downloaded the weather.gc.ca weather info into a file called weather-data.txt

Code:
lynx -dump "https://weather.gc.ca/city/pages/on-118_metric_e.html" > weather-data.txt
But I still have the same problem even though the format is better.

I have been working on this for about two days and would like to get the weather.gc.ca data as I know it is correct so any help would be appreciated.

The file output for the weather.gc.ca is attached:

The data that I am interested in is below. Just one example in Ubuntu script syntax to get this data from the file will get me started.

Code:
Observed at:
          Ottawa Macdonald-Cartier Int'l Airport

   Date:
          1:00 AM EDT Tuesday 4 April 2023

   Condition:
          Mostly Cloudy

   Pressure:
          101.5 kPa

   Tendency:
          Rising

   Temperature:
          4.3°C

   Dew point:
          -0.9°C

   Humidity:
          69%

   Wind:
          NNW 21 km/h

   Visibility:
          24 km

   Mostly Cloudy

   4°C

   Condition:
          Mostly Cloudy

   Pressure:
          101.5 kPa

   Tendency:
          Rising

   Temperature:
          4.3°C

   Dew point:
          -0.9°C

   Humidity:
          69%

   Wind:
          NNW 21 km/h
Attached Files
File Type: txt weather-data.txt (33.5 KB, 4 views)
 
Old 04-04-2023, 01:23 AM   #15
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,261

Rep: Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838
Code:
awk '/[[]37]/,/[[]39]/'
is almost perfect, just you need to cut a few lines
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Easy string/text manipulation/indentation for restructured text brianmcgee Linux - Software 1 04-22-2008 08:27 PM
need help with text manipulation pcorajr Programming 12 12-15-2006 07:33 AM
text manipulation in scripts manicman Linux - Newbie 8 02-17-2006 05:04 AM
Manipulation of text files in C++ Hady Programming 5 05-31-2005 08:24 AM
More text manipulation ice_hockey Linux - General 2 05-28-2005 01:43 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration