Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to
LinuxQuestions.org , a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free.
Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please
contact us . If you need to reset your password,
click here .
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a
virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month.
Click here for more info.
07-15-2018, 12:24 AM
#1
Member
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68
Rep:
Help with Text manipulation
I am not a programmer but I have a web page that I have put together but I am having problems with some coding.
I have a text file that I download with weather information in it and I want to extract certain bits of it.
Quote:
<dd class="mrgn-bttm-0">Ottawa Macdonald-Cartier Int'l Airport</dd>
<dt>Date: </dt>
<dd class="mrgn-bttm-0">9:00 PM EDT Saturday 14 July 2018</dd>
</dl>
<div class="row no-gutters wb-eqht brdr-tp">
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col1">
<dt>Condition:</dt>
<dd class="mrgn-bttm-0">Partly Cloudy</dd>
<dt>Pressure:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">101.4 <abbr title="kilopascals">kPa</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">29.9 inches</dd>
<dt>Tendency:</dt>
<dd class="mrgn-bttm-0">Falling</dd>
</dl></div>
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col2">
<dt>Temperature:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">23.3°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">73.9°
<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Dew point:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">19.4°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">66.9°<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">78%</dd>
</dl></div>
<div class="col-sm-4"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="West">W</abbr> 10 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="West">W</abbr> 6 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
<a href="https://www.canada.ca/en/environment-climate-change/services/seasonal-weather-hazards/spring-summer.html#heat_and_humidity">Humidex</a>:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">30</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">87</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">15 miles</dd>
I need to extract the value for Visibility = 24 above.
I successfully retrieve the Temperature which has the same structure with the following
Code:
tempStr=$(grep -A30 "Ottawa Macdonald-Cartier Int'l Airport" "$weatherFile" | grep -A1 "Temperature" | tail -1)
temperature=$(echo $tempStr| cut -d ">" -f 2 | cut -d "<" -f 1)
But when I try the same for Visibility is does not find anything for the first part visibStr.
There must be a better way to do this and any help would be appreciated.
Last edited by gilesaj001; 07-15-2018 at 12:25 AM .
07-15-2018, 01:53 AM
#2
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 20,844
Quote:
Originally Posted by
gilesaj001
But when I try the same for Visibility is does not find anything for the first part visibStr.
Do you understand what those commands are doing ?. I suspect the "-A30" is insufficient.
Quote:
There must be a better way to do this
Indeed - innumerable ways, but if it works and it's a "one-off" is it worth learning a new "language" just for this ?. If so, do a search for "cli xml parsing linux" or similar. There are stand-alone tools as well packaged modules for languages such as python and perl.
07-15-2018, 02:07 AM
#3
LQ Guru
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 10,964
Beautiful Soup is a famous one for Python.
07-15-2018, 02:52 AM
#4
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 6,903
You'll need an actual HTML parser for dealing with HTML. The python one mentioned above works. There are also perl parsers like HTML::TreeBuilder::XPath or HTML::TokeParser. And there are even separate XPath utilities. The latter might be for you if you don't like scripting.
Just from the page snippet provided, the following XPath might work:
Code:
//dl[dt="Visibility:"]/dd[@class="mrgn-bttm-0 wxo-metric-hide"][2]'
That will give you '24 km' as a result.
However, that relative position [2] leaves a bit up to hoping that they don't change their layout.
Last edited by Turbocapitalist; 07-15-2018 at 02:54 AM .
07-15-2018, 08:54 PM
#5
Member
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68
Original Poster
Rep:
I ended up using the following for the second part of the code.
Code:
visibStr=$(grep -A1 "<dt>Visibility:</dt>" "$weatherFile" | tail -1)
This thread can be closed.
03-30-2023, 08:48 PM
#6
Member
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68
Original Poster
Rep:
I am back again as the site I get the weather info from must have changed something yesterday as all my weather info no longer works.
As I have said this sed stuff scrambles my 73 year old brain and I just can't get it.
So I am trying to extract the following info from the file which the text is below:
Temperature
Tendency
Visibility
Humidex
Condition
Tendency
Humidity
Wind ( as in strength)
Wind ( As in Direction)
Pressure
An example of what I was using for Temperature is as follows:
Code:
weatherFile=/home/www/localhost/htdocs/scripts/weather/weather_temp_live.txt
tempStr=$(grep -A30 "Ottawa Macdonald-Cartier Int'l Airport" "$weatherFile" | grep -A1 "Temperature" | tail -1)
temperature=$(echo $tempStr| cut -d ">" -f 2 | cut -d "<" -f 1)
In the file below that should return 2.9 ?
Any help would be appreciated as I am not a programmer.
The shortend format of weather_temp_live.txt now is as follows:
Code:
class="panel-heading"><h2>Current Conditions<span class="small visible-print-inline-block pull-right">Observed at: Ottawa Macdonald-Cartier Int'l Airport 1:00 AM EDT Wednesday 29 March 2023</span>
</h2></summary><ul class="hidden-print hidden-xs pull-right list-inline mrgn-rght-sm mrgn-bttm-0 wxo-moveup_cur">
<li>
<a class="wxo-metric-hide" href="/past_conditions/index_e.html?station=yow">Past 24 hours</a><a class="wxo-imperial-hide wxo-city-hidden" href="/past_conditions/index_e.html?station=yow">Past 24 hours</a>
</li>
<li class="brdr-lft"><a href="/map_e.html?layers=radar&zoom=-1&center=45.33,-75.58">Weather Radar</a></li>
<li class="brdr-lft"><a href="/satellite/index_e.html#goes_east">Satellite</a></li>
<li class="brdr-lft"><a href="/lightning/index_e.html">Lightning</a></li>
</ul>
<div class="row no-gutters wb-eqht hidden-print">
<div class="col-sm-2 brdr-rght text-center">
<img width="60" height="51" class="center-block mrgn-tp-md" src="/weathericons/31.gif" alt="Mainly Clear"><p class="visible-xs text-center">Mainly Clear</p>
<div>
<p class="text-center mrgn-tp-md mrgn-bttm-sm lead hidden-xs"><span class="wxo-metric-hide">-3°<abbr title="Celsius">C</abbr></span><span class="wxo-imperial-hide wxo-city-hidden">27°<abbr title="Fahrenheit">F</abbr></span></p>
<p class="text-center mrgn-tp-md mrgn-bttm-sm conds-lead visible-xs hidden-print"><span class="wxo-metric-hide">-3°<abbr title="Celsius">C</abbr></span><span class="wxo-imperial-hide wxo-city-hidden">27°<abbr title="Fahrenheit">F</abbr></span></p>
<ul class="list-inline list-unstyled text-center wxo-imperial-hide wxo-city-hidden hidden-print">
<li><a class="wxo-btn-metric-toggle" href="/city/pages/on-118_metric_e.html" title="Convert to Metric Units">°C</a></li>
<li class="brdr-lft">°<abbr title="Fahrenheit">F</abbr>
</li>
</ul>
<ul class="list-inline list-unstyled text-center wxo-metric-hide hidden-print">
<li>°<abbr title="Celsius">C</abbr>
</li>
<li class="brdr-lft"><a class="wxo-btn-imperial-toggle" href="/city/city_imperial_e.html?id=on-118" data-link-id="on-118_metric_e.html" title="Convert to Imperial Units">°F
</a></li>
</ul>
</div>
</div>
<div class="col-sm-10 text-center">
<dl class="dl-horizontal mrgn-bttm-0 hidden-xs wxo-conds-tmp mrgn-tp-sm">
<dt>Observed at:</dt>
<dd class="mrgn-bttm-0">Ottawa Macdonald-Cartier Int'l Airport</dd>
<dt>Date: </dt>
<dd class="mrgn-bttm-0">1:00 AM EDT Wednesday 29 March 2023</dd>
</dl>
<div class="row no-gutters wb-eqht brdr-tp">
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col1">
<dt>Condition:</dt>
<dd class="mrgn-bttm-0">Mainly Clear</dd>
<dt>Pressure:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">102.1 <abbr title="kilopascals">kPa</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">30.2 inches</dd>
<dt>Tendency:</dt>
<dd class="mrgn-bttm-0">Falling</dd>
</dl></div>
<div class="col-sm-4 brdr-rght-city"><dl class="dl-horizontal wxo-conds-col2">
<dt>Temperature:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-2.9°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">26.8°
<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Dew point:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-7.5°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">18.5°<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">71%</dd>
</dl></div>
<div class="col-sm-4"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="Southeast">SE</abbr> 4 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="Southeast">SE</abbr> 2 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
03-30-2023, 10:43 PM
#7
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,241
Y: things change...
I honestly agree with Posts 2, 3, 4 - use a dedicated tool or a language that has good modules for parsing eg Perl / Python.
Doing it by hand can be educational, but also tedious ....
1 members found this post helpful.
03-31-2023, 07:19 AM
#8
Senior Member
Registered: Dec 2015
Location: Non. Je suis propriétaire – No. I am an owner.
Distribution: Apple-selling shops, markets and direct marketing
Posts: 1,493
An XML-parser can be useful in many contexts. If you must learn a new technology to get along, choose one of those.
03-31-2023, 08:19 AM
#9
Senior Member
Registered: Oct 2004
Distribution: Arch
Posts: 4,796
Quote:
I am back again as the site I get the weather info from must have changed something yesterday
Post the url for the weather web page.
03-31-2023, 08:22 AM
#10
Senior Member
Registered: Oct 2004
Distribution: Arch
Posts: 4,796
03-31-2023, 08:36 AM
#11
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 6,903
Quote:
Originally Posted by
teckk
That one returns JSON, e.g.
https://api.weather.gov/points/39.7456,-97.0892
So a proper JSON parser will be needed. Again Perl or Python can be recommended for doing that.
03-31-2023, 09:25 AM
#12
LQ Addict
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,261
Quote:
Originally Posted by
Turbocapitalist
good find. I would suggest python too, but actually you can use jq to parse json documents, so
Code:
curl https://api.weather.gov/points/39.7456,-97.0892 | jq ...
can be used too
04-04-2023, 01:14 AM
#14
Member
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 68
Original Poster
Rep:
I have the weather info updated using this website
https://wttr.in/Ottawa . I followed an example and have it running but the info does not seem to be up to date. It is now Tue Apr 4 01:45:42 EDT 2023 in Ottawa but the weather info from the wttr web site is "2023-04-03 10:56 PM", "observation_time": "02:56 AM", almost a day behind.
https://dingo-den.com
I have downloaded the weather.gc.ca weather info into a file called weather-data.txt
Code:
lynx -dump "https://weather.gc.ca/city/pages/on-118_metric_e.html" > weather-data.txt
But I still have the same problem even though the format is better.
I have been working on this for about two days and would like to get the weather.gc.ca data as I know it is correct so any help would be appreciated.
The file output for the weather.gc.ca is attached:
The data that I am interested in is below. Just one example in Ubuntu script syntax to get this data from the file will get me started.
Code:
Observed at:
Ottawa Macdonald-Cartier Int'l Airport
Date:
1:00 AM EDT Tuesday 4 April 2023
Condition:
Mostly Cloudy
Pressure:
101.5 kPa
Tendency:
Rising
Temperature:
4.3°C
Dew point:
-0.9°C
Humidity:
69%
Wind:
NNW 21 km/h
Visibility:
24 km
Mostly Cloudy
4°C
Condition:
Mostly Cloudy
Pressure:
101.5 kPa
Tendency:
Rising
Temperature:
4.3°C
Dew point:
-0.9°C
Humidity:
69%
Wind:
NNW 21 km/h
04-04-2023, 01:23 AM
#15
LQ Addict
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,261
Code:
awk '/[[]37]/,/[[]39]/'
is almost perfect, just you need to cut a few lines
All times are GMT -5. The time now is 06:21 AM .
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know .
Latest Threads
LQ News