LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 07-10-2020, 01:48 AM   #1
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Rep: Reputation: Disabled
Question Script - Search by part name and extension


Hi!

I am trying to draft up a script to search a website for a file name that changes week to week.

The file names extension is .mp3 (Simple!)
One part of the file name does not change (again - simple)

The part that is testing is the file name is uploaded in different format each week. Some times is:

#FILE="`date -dsunday +'%d-%m-%Y'`-xxx_news.mp3"
#FILE="`date -dsunday +'%d-%m-%Y'`_xxx_news_64.mp3"
#FILE="`date -dsunday +'%d%m%Y'`_xxx_news_64.mp3"
#FILE="`date -dsunday +'%d%m%Y'`-xxx_news_64.mp3"
#FILE="xxx_news_64-`date -dsunday +'%d-%m-%Y'`.mp3"
#FILE="xxx_news_32-`date -dsunday +'%d-%m-%Y'`.mp3"
#FILE="xxx_news_64_`date -dsunday +'%d-%m-%Y'`.mp3"
#FILE="xxx_news_32_`date -dsunday +'%d-%m-%Y'`.mp3"
#FILE="xxx_news_64-`date -dsunday +'%d%m%Y'`.mp3"
#FILE="xxx_news_64_`date -dsunday +'%d%m%Y'`.mp3"

As you can see, there are some parts of the file name on the website that remain constant - yet some (inc the date string) changes.....

I am seeking assistance in drafting up a line fo the download script to be able to find the *news*.mp3 file on the website and download it to my system - where it will cp/mv the file to a dedicated location and save it under a "known" filename.

Thoughts/comments/suggestions?

Thank you
 
Old 07-10-2020, 03:12 AM   #2
individual
Member
 
Registered: Jul 2018
Posts: 315
Blog Entries: 1

Rep: Reputation: 233Reputation: 233Reputation: 233
What format are the file links being served in? Is it HTML, JSON, plain text? What language did you want/need to write your script in?

Is the line '#FILE="`date -dsunday +'%d-%m-%Y'`-xxx_news.mp3"' supposed to have a '_32' in it? The other lines had either _32 or _64.
 
Old 07-10-2020, 03:17 AM   #3
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Thanks for your reply.

The file is located on a Webpage - so we will have to navigate HTML.

As for the file name - yes, sometimes it is 32bit (hence the 32) or sometimes 64bit (hence the 64). I have seen some files that have not had either 32/64 contained ... that is why I am trying to narrow down and locate the file (current Sunday date) and search by *news* and *.mp3.

I am struggling how to get it to work out .... Especially when the file name format changes (depending on the uploader - which we cannot control).

Thanks
 
Old 07-10-2020, 03:20 AM   #4
individual
Member
 
Registered: Jul 2018
Posts: 315
Blog Entries: 1

Rep: Reputation: 233Reputation: 233Reputation: 233
Could you provide the actual HTML from the website, or a close mock-up if that's not possible? What programming language are you using?
 
Old 07-10-2020, 03:25 AM   #5
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Do you have access to the file system there or do you have to use HTTP / HTTPS?

I'm not sure of another way than to just guess at the names that might be there and try them all each time using --input-file with wget or --files-from with rsync.
Code:
. . .

d=$(date -d 'last sunday' +'%Y%m%d')

echo D=$d

tmp=$(tempfile --prefix="tmp." --suffix="-$d")
dir=$(mktemp --directory --suffix="-$d")

# clean up temp file and directory upon any type of EXIT
trap 'rm -f "$tmp"; rm -rf "$dir";' 0

# make an exhaustive list of possible file names
cat << EOF > "$tmp"
$(date -d $d +"%d-%m-%Y-xxx_news.mp3")
$(date -d $d +"%d-%m-%Y_xxx_news_64.mp3")
$(date -d $d +"%d%m%Y_xxx_news_64.mp3")
$(date -d $d +"%d%m%Y-xxx_news_64.mp3")
$(date -d $d +"xxx_news_64-%d-%m_%Y.mp3")
$(date -d $d +"xxx_news_32-%d-%m_%Y.mp3")
$(date -d $d +"xxx_news_64_%d-%m_%Y.mp3")
$(date -d $d +"xxx_news_32_%d-%m_%Y.mp3")
$(date -d $d +"xxx_news-64_%d%m%Y.mp3")
$(date -d $d +"xxx_news_64_%d%m%Y.mp3")
EOF

. . .
Then a mv or cp from the temporary directory using wildcards can convert the file (hopefully there is just one) to a standardized name.
 
Old 07-10-2020, 03:51 AM   #6
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
Debian has an awsome tool, uscan (part of the package devscripts). It probably can be repurposed for this. See options --watchfile and --package.

I would think of a watch file similar to this:
Code:
version=4
opts="uversionmangle=s/^(\d\d)-?(\d\d)-?(\d{4})$/$3$2$1/" \
https://example.com/path/to/foo.html \
files/(\d\d-?\d\d-?\d{4})?[-_]?\w+_news[-_]?(?:32|64)?[-_]?(\d\d-?\d\d-?\d{4})?\.mp3 \
20200101

Last edited by shruggy; 07-10-2020 at 07:58 AM.
 
Old 07-10-2020, 07:25 AM   #7
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Thanks for the update.....

It would be simpler if the file naming was standardised however this is the dilemma.....the only parts of the file name that I am seeing as remaining would be the .mp3 extension and the word “news”.

The placement of the date string and the “-“ or “_” has been proving problematic. The date string (if kept in the same position) is fine but there seems to be no consistency in the file naming process.

I am just mindful of being not to selective (grep) as there could be many other files with the mp3 extension that I don’t want to pick up. I am aiming to grab the latest (date file) and download that to my system.
 
Old 07-10-2020, 08:06 AM   #8
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
As individual said in #4 above, if you could provide (the relevant part of) the HTML code of the webpage, that would be helpful. Or maybe the link.

I'm thinking on something similar to this naming pattern. Am I right?
 
Old 07-10-2020, 06:22 PM   #9
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Thanks for the feedback.

It would be far simpler if the file name was standardised - but having to list the potential potentials as filenames will make it hard. Knowing my luck, the filename will change again and I will miss it as its not listed or covered in the variable filename list.

Originally the file could be found on a HTML page but I have done further digging.
Finding the original file location has been quite the task. It is attached to a RSS feed - which I believe I have found the original feed.

https://wmrct5.podcaster.de/qnews.rss

I have been looking at the "enclosure type" segment of the feed to extract the audio file - hence this is where the filename changes from week to week. The only part that seems the same, and which I would like to search by, would be ".mp3" and "news" or "qnews". I am thinking as long as "news" and ".mp3" are in the same line, it should be able to download the file from the storage location (which does not appear to change) and then save it locally on my system and cp/mv to a location/filename that suits.

Appreciate your assistance and working with me to overcome this conundrum in searching for a file by two variables

Cheers
 
Old 07-10-2020, 08:02 PM   #10
individual
Member
 
Registered: Jul 2018
Posts: 315
Blog Entries: 1

Rep: Reputation: 233Reputation: 233Reputation: 233
Quote:
Originally Posted by orangepeel190 View Post
Thanks for the feedback.

It would be far simpler if the file name was standardised - but having to list the potential potentials as filenames will make it hard. Knowing my luck, the filename will change again and I will miss it as its not listed or covered in the variable filename list.

Originally the file could be found on a HTML page but I have done further digging.
Finding the original file location has been quite the task. It is attached to a RSS feed - which I believe I have found the original feed.

https://wmrct5.podcaster.de/qnews.rss

I have been looking at the "enclosure type" segment of the feed to extract the audio file - hence this is where the filename changes from week to week. The only part that seems the same, and which I would like to search by, would be ".mp3" and "news" or "qnews". I am thinking as long as "news" and ".mp3" are in the same line, it should be able to download the file from the storage location (which does not appear to change) and then save it locally on my system and cp/mv to a location/filename that suits.

Appreciate your assistance and working with me to overcome this conundrum in searching for a file by two variables

Cheers
Thanks for providing sample data. A better 'key' to search for would be type="audio/mpeg".
Code:
<enclosure type="audio/mpeg" length="5761233" url="https://wmrct5.podcaster.de/qnews/media/05072020-vk4_qnews_64$
mp3"/>
You can use AWK, Perl, or Shell operators to isolate those URLs. Here is an example using Perl.
Code:
perl -aE 'm!audio/mpeg! && m!url="([^"]+)"! && say $1' qnews.rss
Which returns:
Code:
https://wmrct5.podcaster.de/qnews/media/05072020-vk4_qnews_64.mp3
https://wmrct5.podcaster.de/qnews/media/28062020-vk4_qnews_64.mp3
https://wmrct5.podcaster.de/qnews/media/vk4_qnews_32-21-06-2020.mp3
https://wmrct5.podcaster.de/qnews/media/14062020-vk4_qnews_64.mp3
https://wmrct5.podcaster.de/qnews/media/07062020-vk4_qnews_64.mp3
EDIT:
I think it's safe to assume the first matched URL is the latest episode. With that in mind, just exit the script after printing the first match.
Code:
perl -aE 'm!audio/mpeg! && m!url="([^"]+)"! && say $1 and exit' qnews.rss

Last edited by individual; 07-10-2020 at 08:11 PM. Reason: Removed redundant information. Added the result of my example.
 
1 members found this post helpful.
Old 07-10-2020, 10:03 PM   #11
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
I am assuming that the latest would(should) be named correctly with the date string. Highly dependent on the person doing up uploading and file naming.
I see your point in searching by "audio/mpeg" - with the string and location.

I will try and get something working in a bash environment.... I think that I am going to have to use AWK to get it working in a bash script?
I will probably exit the script after the first download (assuming that it is the most recent as loaded to the RSS feed).

Could I use curl or wget?

curl 'https://wmrct5.podcaster.de/qnews.rss' | awk '/audio/mpeg/{system("wget -nc "$2);exit}' FS="

Then out put the file to /save/file/here/news.mp3



Just having troubles working out where I am going wrong here ..... Trying at adapt an existing script to this purpose (clearly not working, but Im giving it a go...)
 
Old 07-10-2020, 11:45 PM   #12
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
Can someone check this for any tip/suggestions ... it seems to work ok...although a little clunky


/usr/bin/lynx -source https://wmrct5.podcaster.de/qnews.rss > news.rss

news_work=`grep -i mp3 news.rss | cut -d""" -f6 | head -n1`
process="Get NEWS PODCAST File - $news_work"
echo "Fetching $news_work from Server "

/usr/bin/curl -SkLo news.mp3 $news_work

Last edited by orangepeel190; 07-10-2020 at 11:52 PM.
 
Old 07-11-2020, 12:11 AM   #13
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
The middle part there with AWK is scraping the feed and thus brittle. It will break when the spacing or other layout changes. You might consider a simple perl script in that section to properly parse the feed instead:

Code:
#!/usr/bin/perl -T                                                              
use XML::Feed;
use strict;
use warnings;

my $file = shift || '/dev/stdin';

my $feed = XML::Feed->parse($file)
    or die(XML::Feed->errstr);

my $feed_title = $feed->title;

foreach my $entry ($feed->entries) {
    my $mp3 = $entry->enclosure->url;
    print $mp3,qq(\n);
}

exit(0);
From there you can send the output to curl or wget, or call one of them from within perl after additional parsing or pattern matching.
 
Old 07-11-2020, 12:45 AM   #14
orangepeel190
Member
 
Registered: Aug 2016
Posts: 69

Original Poster
Rep: Reputation: Disabled
I’m sorry, you’ve lost me..... fairly new to this and though to give it a go.
I’ve really only had limited experience with bash scripts..... didn’t think a Perl script could be run inside a bash script (bin/bash)
 
Old 07-11-2020, 03:45 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
The shell script can call the perl script, just like it could call any other program or script. So if you had the above perl script in /usr/local/bin/newsfeed.pl then you could call it like this:

Code:
#!/bin/sh

PATH=/bin:/usr/bin:/usr/local/bin

set -e

lynx -source https://wmrct5.podcaster.de/qnews.rss > news.rss

news_work=$(newsfeed.pl news.rss)
process="Get NEWS PODCAST File - $news_work"
echo "Fetching $news_work from Server "

curl -SkLo news.mp3 $news_work

exit 0
Of course, since perl grew up around this kind of thing, you could do it all (including the rename) within perl with not too many lines extra.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] search all the file extension in the system using shell script rajhans Linux - Newbie 1 03-31-2012 11:57 AM
[SOLVED] shell script to search for files of specific extension and delete them all rhklinux Linux - Newbie 12 07-12-2010 12:14 AM
rename multiple file by moving part of the name to end of extension cashinke Linux - Newbie 1 08-26-2009 10:55 PM
Magento Install: dom extension and mcrypt extension mjdb Linux - Newbie 0 02-21-2009 02:58 AM
Opinion: (name, file name, and extension) un shiza Programming 2 06-28-2005 04:23 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration