Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
As you can see, there are some parts of the file name on the website that remain constant - yet some (inc the date string) changes.....
I am seeking assistance in drafting up a line fo the download script to be able to find the *news*.mp3 file on the website and download it to my system - where it will cp/mv the file to a dedicated location and save it under a "known" filename.
The file is located on a Webpage - so we will have to navigate HTML.
As for the file name - yes, sometimes it is 32bit (hence the 32) or sometimes 64bit (hence the 64). I have seen some files that have not had either 32/64 contained ... that is why I am trying to narrow down and locate the file (current Sunday date) and search by *news* and *.mp3.
I am struggling how to get it to work out .... Especially when the file name format changes (depending on the uploader - which we cannot control).
Do you have access to the file system there or do you have to use HTTP / HTTPS?
I'm not sure of another way than to just guess at the names that might be there and try them all each time using --input-file with wget or --files-from with rsync.
Code:
. . .
d=$(date -d 'last sunday' +'%Y%m%d')
echo D=$d
tmp=$(tempfile --prefix="tmp." --suffix="-$d")
dir=$(mktemp --directory --suffix="-$d")
# clean up temp file and directory upon any type of EXIT
trap 'rm -f "$tmp"; rm -rf "$dir";' 0
# make an exhaustive list of possible file names
cat << EOF > "$tmp"
$(date -d $d +"%d-%m-%Y-xxx_news.mp3")
$(date -d $d +"%d-%m-%Y_xxx_news_64.mp3")
$(date -d $d +"%d%m%Y_xxx_news_64.mp3")
$(date -d $d +"%d%m%Y-xxx_news_64.mp3")
$(date -d $d +"xxx_news_64-%d-%m_%Y.mp3")
$(date -d $d +"xxx_news_32-%d-%m_%Y.mp3")
$(date -d $d +"xxx_news_64_%d-%m_%Y.mp3")
$(date -d $d +"xxx_news_32_%d-%m_%Y.mp3")
$(date -d $d +"xxx_news-64_%d%m%Y.mp3")
$(date -d $d +"xxx_news_64_%d%m%Y.mp3")
EOF
. . .
Then a mv or cp from the temporary directory using wildcards can convert the file (hopefully there is just one) to a standardized name.
It would be simpler if the file naming was standardised however this is the dilemma.....the only parts of the file name that I am seeing as remaining would be the .mp3 extension and the word “news”.
The placement of the date string and the “-“ or “_” has been proving problematic. The date string (if kept in the same position) is fine but there seems to be no consistency in the file naming process.
I am just mindful of being not to selective (grep) as there could be many other files with the mp3 extension that I don’t want to pick up. I am aiming to grab the latest (date file) and download that to my system.
It would be far simpler if the file name was standardised - but having to list the potential potentials as filenames will make it hard. Knowing my luck, the filename will change again and I will miss it as its not listed or covered in the variable filename list.
Originally the file could be found on a HTML page but I have done further digging.
Finding the original file location has been quite the task. It is attached to a RSS feed - which I believe I have found the original feed.
I have been looking at the "enclosure type" segment of the feed to extract the audio file - hence this is where the filename changes from week to week. The only part that seems the same, and which I would like to search by, would be ".mp3" and "news" or "qnews". I am thinking as long as "news" and ".mp3" are in the same line, it should be able to download the file from the storage location (which does not appear to change) and then save it locally on my system and cp/mv to a location/filename that suits.
Appreciate your assistance and working with me to overcome this conundrum in searching for a file by two variables
It would be far simpler if the file name was standardised - but having to list the potential potentials as filenames will make it hard. Knowing my luck, the filename will change again and I will miss it as its not listed or covered in the variable filename list.
Originally the file could be found on a HTML page but I have done further digging.
Finding the original file location has been quite the task. It is attached to a RSS feed - which I believe I have found the original feed.
I have been looking at the "enclosure type" segment of the feed to extract the audio file - hence this is where the filename changes from week to week. The only part that seems the same, and which I would like to search by, would be ".mp3" and "news" or "qnews". I am thinking as long as "news" and ".mp3" are in the same line, it should be able to download the file from the storage location (which does not appear to change) and then save it locally on my system and cp/mv to a location/filename that suits.
Appreciate your assistance and working with me to overcome this conundrum in searching for a file by two variables
Cheers
Thanks for providing sample data. A better 'key' to search for would be type="audio/mpeg".
I am assuming that the latest would(should) be named correctly with the date string. Highly dependent on the person doing up uploading and file naming.
I see your point in searching by "audio/mpeg" - with the string and location.
I will try and get something working in a bash environment.... I think that I am going to have to use AWK to get it working in a bash script?
I will probably exit the script after the first download (assuming that it is the most recent as loaded to the RSS feed).
Just having troubles working out where I am going wrong here ..... Trying at adapt an existing script to this purpose (clearly not working, but Im giving it a go...)
The middle part there with AWK is scraping the feed and thus brittle. It will break when the spacing or other layout changes. You might consider a simple perl script in that section to properly parse the feed instead:
Code:
#!/usr/bin/perl -T
use XML::Feed;
use strict;
use warnings;
my $file = shift || '/dev/stdin';
my $feed = XML::Feed->parse($file)
or die(XML::Feed->errstr);
my $feed_title = $feed->title;
foreach my $entry ($feed->entries) {
my $mp3 = $entry->enclosure->url;
print $mp3,qq(\n);
}
exit(0);
From there you can send the output to curl or wget, or call one of them from within perl after additional parsing or pattern matching.
I’m sorry, you’ve lost me..... fairly new to this and though to give it a go.
I’ve really only had limited experience with bash scripts..... didn’t think a Perl script could be run inside a bash script (bin/bash)
The shell script can call the perl script, just like it could call any other program or script. So if you had the above perl script in /usr/local/bin/newsfeed.pl then you could call it like this:
Code:
#!/bin/sh
PATH=/bin:/usr/bin:/usr/local/bin
set -e
lynx -source https://wmrct5.podcaster.de/qnews.rss > news.rss
news_work=$(newsfeed.pl news.rss)
process="Get NEWS PODCAST File - $news_work"
echo "Fetching $news_work from Server "
curl -SkLo news.mp3 $news_work
exit 0
Of course, since perl grew up around this kind of thing, you could do it all (including the rename) within perl with not too many lines extra.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.