[SOLVED] Extract text from a text file to put in a variable
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I never did this before. I use mainly gui except for commands that I use all the time on the server.
That would explain a lot. Ok, go ahead and type rm -rf pup-master, as that contains the source code for the program, but not the program.
Assuming you have a 64 bit system, download this version of it.
If you have a 32 bit system, download this one.
Unzip it and place it somewhere. I have a bin directory in my home directory that I place one-off programs in. If you decide to do that, add "$HOME/bin" to your path.
https://www.dropbox.com/s/gxy3vd7o3r...alert.txt?dl=0
I downloaded that source to test.html. You won't get the content that you are wanting from that page unless you run the scripts on that page. So curl won't help. You'll need something that runs scripts. And then there is a log in that must be passed before you can get the text. So even if you filled out that form with curl, you still aren't going to get the content because it is script delivered.
I got the source with scripts run using python/webengine. You could use soup, selenium, nodejs, whatever you want.
Code:
#! /usr/bin/env python
#Get source with scripts run using Python3/PyQt5/qt5-webengine
#Usage:
#script.py <url> <local filename>
#or script.py and answer prompts
import sys
from PyQt5.QtWebEngineWidgets import (QWebEnginePage,
QWebEngineProfile, QWebEngineView)
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
agent = ('Mozilla/5.0 (Windows NT 10.0; WOW64; rv:65.0)'
' Gecko/20100101 Firefox/65.0')
class Source(QWebEnginePage):
def __init__(self, url, _file):
self.app = QApplication([])
QWebEnginePage.__init__(self)
self.agent = QWebEngineProfile(self)
self.agent.defaultProfile().setHttpUserAgent(agent)
self._file = _file
self.load(QUrl(url))
self.loadFinished.connect(self.on_load_finished)
self.app.exec_()
def on_load_finished(self):
self.html = self.toHtml(self.write_it)
def write_it(self, data):
self.html = data
with open (self._file, 'w') as f:
f.write (self.html)
print ('\nFinished\nFile saved to ' + (self._file))
self.app.quit()
if __name__ == '__main__':
#Open with arguments or prompt for input
if len(sys.argv) > 2:
url = (sys.argv[1])
_file = (sys.argv[2])
else:
url = input('Enter/Paste url for source: ')
_file = input('Enter output file name: ')
Source(url, _file)
I opened that file with dillo, and could see all the info I wanted.
You would do better to post your source to someplace like:
Then it's just a text file and you can parse easy enough.
Parse that however you wish. Get it from there and save it to file.html and parse the html file. A html file is just text. You can use awk to parse tags on a html file, or a little python. Another words, get that source to file and practice parsing that file.
Thanks I have it working now but the text is too much. I will have to look at it further and learn how pup works
Thanks for all your help
EDIT: This is what I ended up doing on the web page:
<p> "If there is an Alert or Warning it will appear under this text"</p>
<?php
$message=0;
$message=shell_exec("PATH to script/find-warnings.sh 2>&1");
if (empty($message))
{
echo "No Alerts in Effect";
}
else
{
print_r($message);
}
?>
This what I used to get the Name of the Alert to put on the image.
WARN2=`grep "col-xs-10" $weatherFile | awk -F'>' '{print $2}' | sed 's|</div| |g'`
WARN3=`echo $WARN2 |cut -c1-26`
echo "Warning is this " $WARN3
The contents of the file is what you posted above: -rwxr-xr-x 1 root root 298 May 1 07:42 test.sh
#!/bin/bash
baseUrl="https://weather.gc.ca"
weatherData="$(curl -s $baseUrl/city/pages/on-118_metric_e.html)"
alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
[[ -n "$alertUrl" ]] || exit
alertData="$(curl -s $baseUrl$alertUrl | pup 'ul + p text{}')"
echo "$alertData"
This script has been working since I started using it in May last year. For some reason it no longer finds the text I am looking for.
The line that is not working I think is
Code:
alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
I am not a programmer and tried to read up on pup but could not make head nor tails of it.
As of time of posting there is a wether statement on the web site that should be picked up but isn't
For some reason it no longer finds the text I am looking for.
The line that is not working I think is
What PRECISELY is not working - i.e. what text is it finding instead? What leads you to believe it is that line that's failing?
Quote:
Code:
alertUrl="$(pup '.alert-item > a attr{href}' <<< $weatherData | head -1)"
The pup command is simply two instructions - the first bit ".alert-item > a" is standard CSS selector syntax to filter to a specific a "<a" tag. (The A tag is used for hyperlinks).
The second part "attr{href}" is Pup-specific, but it simply reads the value of the href attribute of the selected tag, which means it will output the URL.
It's very possible the HTML structure has changed slightly and caused it to fail - either by failing to select, or by selecting multiple (and then head outputting the wrong one; as an aside that head syntax is obsolete and should be "head -n1" instead).
However, it could also easily be one of the other parts failing - maybe curl is not succeeding; having -s without -S means that any errors there would be suppressed - it's a good idea to use both ("-sS") so that a message is printed to stdout if something unexpected happens.
Couple of quick examples. You are going to have to put your nose in the docs for the tool that you want to use. You could use re to parse that further.
Code:
from html.parser import HTMLParser
import urllib.request
url = "https://weather.gc.ca/city/pages/on-118_metric_e.html"
class LinkScrape(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == 'a':
for attr in attrs:
if attr[0] == 'href':
link = attr[1]
print('- ' + link)
if __name__ == '__main__':
request_object = urllib.request.Request(url)
page_object = urllib.request.urlopen(url)
link_parser = LinkScrape()
link_parser.feed(page_object.read().decode('utf-8'))
Code:
#!/usr/bin/python
from bs4 import BeautifulSoup
import requests
import re
url = "https://weather.gc.ca/city/pages/on-118_metric_e.html"
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
The script is working again. I found that when I updated pup to a new version it put it in a different place than the original and somehow because there were two versions of pup it didn't work. I deleted the old one and set the PATH to the new one and it works as it used to.
Thank you to those that tried to help. It must be exhausting to try and help and old fart like me that does not know programming.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.