LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 05-01-2019, 07:54 AM   #16
individual
Member
 
Registered: Jul 2018
Posts: 315
Blog Entries: 1

Rep: Reputation: 233Reputation: 233Reputation: 233

Quote:
Originally Posted by gilesaj001 View Post
I never did this before. I use mainly gui except for commands that I use all the time on the server.
That would explain a lot. Ok, go ahead and type rm -rf pup-master, as that contains the source code for the program, but not the program.
Assuming you have a 64 bit system, download this version of it.
If you have a 32 bit system, download this one.
Unzip it and place it somewhere. I have a bin directory in my home directory that I place one-off programs in. If you decide to do that, add "$HOME/bin" to your path.
 
1 members found this post helpful.
Old 05-01-2019, 07:58 AM   #17
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,140
Blog Entries: 6

Rep: Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828
https://www.dropbox.com/s/gxy3vd7o3r...alert.txt?dl=0
I downloaded that source to test.html. You won't get the content that you are wanting from that page unless you run the scripts on that page. So curl won't help. You'll need something that runs scripts. And then there is a log in that must be passed before you can get the text. So even if you filled out that form with curl, you still aren't going to get the content because it is script delivered.

I got the source with scripts run using python/webengine. You could use soup, selenium, nodejs, whatever you want.
Code:
#! /usr/bin/env python

#Get source with scripts run using Python3/PyQt5/qt5-webengine
#Usage:
#script.py <url> <local filename>
#or script.py and answer prompts

import sys
from PyQt5.QtWebEngineWidgets import (QWebEnginePage, 
                        QWebEngineProfile, QWebEngineView)
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl

agent = ('Mozilla/5.0 (Windows NT 10.0; WOW64; rv:65.0)'
        ' Gecko/20100101 Firefox/65.0')

class Source(QWebEnginePage):
    def __init__(self, url, _file):
        self.app = QApplication([])
        QWebEnginePage.__init__(self)
        
        self.agent = QWebEngineProfile(self)
        self.agent.defaultProfile().setHttpUserAgent(agent)
        
        self._file = _file
        self.load(QUrl(url))
        self.loadFinished.connect(self.on_load_finished)
        self.app.exec_()
        
    def on_load_finished(self):
        self.html = self.toHtml(self.write_it)

    def write_it(self, data):
        self.html = data
        with open (self._file, 'w') as f:
            f.write (self.html)
        print ('\nFinished\nFile saved to ' + (self._file))
        self.app.quit()

if __name__ == '__main__':
    #Open with arguments or prompt for input
    if len(sys.argv) > 2:
        url = (sys.argv[1])
        _file = (sys.argv[2])
    else:
        url = input('Enter/Paste url for source: ')
        _file = input('Enter output file name: ')
    Source(url, _file)
I opened that file with dillo, and could see all the info I wanted.

You would do better to post your source to someplace like:
Code:
cat test.html | curl -F 'sprunge=<-' http://sprunge.us
http://sprunge.us/StUKF3
And that is what I did with that source.
http://sprunge.us/StUKF3

Then it's just a text file and you can parse easy enough.

Parse that however you wish. Get it from there and save it to file.html and parse the html file. A html file is just text. You can use awk to parse tags on a html file, or a little python. Another words, get that source to file and practice parsing that file.
 
Old 05-01-2019, 08:15 AM   #18
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,235

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
To download Pup, click on "Releases" and then click on "pup_v0.4.0_linux_amd64.zip". That zip file should have an executable binary in it.

Most distros have ~/.local/bin in the PATH. You put the pup binary there. You might need to run "rehash" afterwards.
 
Old 05-01-2019, 08:45 AM   #19
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 79

Original Poster
Rep: Reputation: 0
Thanks I have it working now but the text is too much. I will have to look at it further and learn how pup works

Thanks for all your help

EDIT: This is what I ended up doing on the web page:

<p> "If there is an Alert or Warning it will appear under this text"</p>

<?php
$message=0;
$message=shell_exec("PATH to script/find-warnings.sh 2>&1");
if (empty($message))
{
echo "No Alerts in Effect";
}
else
{
print_r($message);
}
?>


This what I used to get the Name of the Alert to put on the image.

WARN2=`grep "col-xs-10" $weatherFile | awk -F'>' '{print $2}' | sed 's|</div| |g'`
WARN3=`echo $WARN2 |cut -c1-26`
echo "Warning is this " $WARN3

http://dingo-den.com/index.php?nav=cam1

Last edited by gilesaj001; 05-02-2019 at 03:39 AM. Reason: added information
 
Old 09-19-2022, 04:05 AM   #20
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 79

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by gilesaj001 View Post
The contents of the file is what you posted above: -rwxr-xr-x 1 root root 298 May 1 07:42 test.sh

#!/bin/bash

baseUrl="https://weather.gc.ca"

weatherData="$(curl -s $baseUrl/city/pages/on-118_metric_e.html)"
alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
[[ -n "$alertUrl" ]] || exit
alertData="$(curl -s $baseUrl$alertUrl | pup 'ul + p text{}')"

echo "$alertData"
This script has been working since I started using it in May last year. For some reason it no longer finds the text I am looking for.
The line that is not working I think is
Code:
 alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
I am not a programmer and tried to read up on pup but could not make head nor tails of it.

As of time of posting there is a wether statement on the web site that should be picked up but isn't

Any hel appresiated.
 
Old 09-19-2022, 08:01 AM   #21
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,610

Rep: Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553Reputation: 2553
Quote:
Originally Posted by gilesaj001 View Post
For some reason it no longer finds the text I am looking for.
The line that is not working I think is
What PRECISELY is not working - i.e. what text is it finding instead? What leads you to believe it is that line that's failing?

Quote:
Code:
 alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
The pup command is simply two instructions - the first bit ".alert-item > a" is standard CSS selector syntax to filter to a specific a "<a" tag. (The A tag is used for hyperlinks).

The second part "attr{href}" is Pup-specific, but it simply reads the value of the href attribute of the selected tag, which means it will output the URL.

It's very possible the HTML structure has changed slightly and caused it to fail - either by failing to select, or by selecting multiple (and then head outputting the wrong one; as an aside that head syntax is obsolete and should be "head -n1" instead).

However, it could also easily be one of the other parts failing - maybe curl is not succeeding; having -s without -S means that any errors there would be suppressed - it's a good idea to use both ("-sS") so that a message is printed to stdout if something unexpected happens.


Last edited by boughtonp; 09-19-2022 at 08:02 AM.
 
Old 09-19-2022, 08:33 AM   #22
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 79

Original Poster
Rep: Reputation: 0
This is the Data:


weatherData="$(curl -s $baseUrl/city/pages/on-118_metric_e.html)"
echo $weatherData

I had to put it in a file on my server because it was too big. http://dingo-den.com/weather_text.txt

alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
echo $alertUrl

There is nothing in "alertUrl"

It is supposed to find
Code:
 href="/warnings/report_e.html?on41#1251147931110540001202209180503wz8889cwto"
Thanks for your help.

Last edited by gilesaj001; 09-19-2022 at 10:34 PM.
 
Old 09-22-2022, 10:49 PM   #23
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 79

Original Poster
Rep: Reputation: 0
OK I have been trying to get the data from the file but have been unsuccessful.

What I want to do is grab the first occurrence of href from this file http://dingo-den.com/weather_text.txt

That starts with
Code:
/warnings/report_e.html
and save the complete url to a variable. The href in the file is
Code:
 href="/warnings/report_e.html?on41#1251147931110540001202209180503wz8889cwto"
and it changes when the warning changes. In this case I want to save
Code:
/warnings/report_e.html?on41#1251147931110540001202209180503wz8889cwto
to a variable called alertUrl

It does not have to use pup I am fine using anything that will extract the url.

As I said before I am not a programmer, just a 73 year old fart that has been playing around with computers since the 60's but never could code a dam.

Any help appresiated.

Last edited by gilesaj001; 09-22-2022 at 10:51 PM.
 
Old 09-23-2022, 10:14 AM   #24
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,140
Blog Entries: 6

Rep: Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828
Couple of quick examples. You are going to have to put your nose in the docs for the tool that you want to use. You could use re to parse that further.

Code:
from html.parser import HTMLParser
import urllib.request

url = "https://weather.gc.ca/city/pages/on-118_metric_e.html"

class LinkScrape(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            for attr in attrs:
                if attr[0] == 'href':
                    link = attr[1]
                    print('- ' + link)

if __name__ == '__main__':
    request_object = urllib.request.Request(url)
    page_object = urllib.request.urlopen(url)
    link_parser = LinkScrape()
    link_parser.feed(page_object.read().decode('utf-8'))
Code:
#!/usr/bin/python

from bs4 import BeautifulSoup
import requests
import re

url = "https://weather.gc.ca/city/pages/on-118_metric_e.html"

page = requests.get(url)    
data = page.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))
 
Old 09-23-2022, 10:53 AM   #25
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,140
Blog Entries: 6

Rep: Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828Reputation: 1828
Another example with simple tools.
Code:
url="https://weather.gc.ca/city/pages/on-118_metric_e.html"

weatherData=$(curl "$url")

echo "$weatherData"

echo "$weatherData" | grep -io '<a[^>]\+href[ ]*=[ \t]*"[^"]\+"'
You need all of the links, not just the ones that start with http.
 
Old 09-23-2022, 10:10 PM   #26
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 79

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by teckk View Post
Another example with simple tools.
Code:
url="https://weather.gc.ca/city/pages/on-118_metric_e.html"

weatherData=$(curl "$url")

echo "$weatherData"

echo "$weatherData" | grep -io '<a[^>]\+href[ ]*=[ \t]*"[^"]\+"'
You need all of the links, not just the ones that start with http.
I tried your first two options and they had multiple errors. I am running on the command line and there is no gui installed on the server.


This one ran and did show all the href links and there were a lot of them. Now all I need to find is the one that starts with
Code:
/warnings/report_e.html
and save the whole URL to a variable.
 
Old 09-30-2022, 10:03 PM   #27
gilesaj001
Member
 
Registered: Apr 2017
Location: Australia
Distribution: Ubuntu
Posts: 79

Original Poster
Rep: Reputation: 0
The script is working again. I found that when I updated pup to a new version it put it in a different place than the original and somehow because there were two versions of pup it didn't work. I deleted the old one and set the PATH to the new one and it works as it used to.

Thank you to those that tried to help. It must be exhausting to try and help and old fart like me that does not know programming.
 
  


Reply

Tags
scripts



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
BASH: Trying to extract a field from a variable, and place the result in a new variable fribse Linux - Newbie 10 03-05-2019 04:29 AM
[SOLVED] ksh read variable from text file and populate with content of shell variable WindozBytes Programming 4 09-17-2012 01:48 PM
[SOLVED] Need help!!How to extract text in one file as variable then count it in another file? nobtiba Programming 14 01-04-2011 01:37 AM
How to get variable from text file into Bash variable mcdef Linux - Software 2 06-10-2009 01:15 PM
AWK a variable Ouptut to a new variable and using the new variable with the old one alertroshannow Linux - Newbie 4 02-16-2009 12:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 07:06 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration