download radio broadcasts in mp3 format
Posted 01-26-2019 at 08:11 AM by Michael Uplawski
Updated 03-09-2019 at 02:44 AM by Michael Uplawski (remark on general interest)
Updated 03-09-2019 at 02:44 AM by Michael Uplawski (remark on general interest)
Edit: I just mention that the procedure used in the below script is generally applicable in any situation where you want to get a piece of “Web” while avoiding a downright Web-Browser and wherever you can replace it by curl or wget.
No this is not spectacular.
When I try to listen to a precise broadcast on France Culture (www.franceculture.fr), because I have missed it in the morning, I am confronted with a page that wants to open more than 25 connections to sites external to Radio France servers. This is probably due to the choice of the Web-developers there, to use Google-libraries (that's what “everybody” does).
Among those sites are (of course) doubleclick.net, ads.twitter.com and other stuff which has nothing to do with my radio broadcast.
Reading the source-code of the page to find urls to download is cumbersome. But it has the advantage to stay a rather reliable procedure, as the radio stations of Radio France do not often do significant changes to their Web-sites.
As I am a (“learned”) Informatician.., all which works reliably in the always same way.., I do not do.
The script follows. There are messages in French, but if you care for France Culture, this will not shock you. What may shock you are calls to nokogiri and torify. I cannot know what you make of that. Maybe do not use this or adapt it to your needs...
But nokogiri is really great.
No this is not spectacular.
When I try to listen to a precise broadcast on France Culture (www.franceculture.fr), because I have missed it in the morning, I am confronted with a page that wants to open more than 25 connections to sites external to Radio France servers. This is probably due to the choice of the Web-developers there, to use Google-libraries (that's what “everybody” does).
Among those sites are (of course) doubleclick.net, ads.twitter.com and other stuff which has nothing to do with my radio broadcast.
Reading the source-code of the page to find urls to download is cumbersome. But it has the advantage to stay a rather reliable procedure, as the radio stations of Radio France do not often do significant changes to their Web-sites.
As I am a (“learned”) Informatician.., all which works reliably in the always same way.., I do not do.
The script follows. There are messages in French, but if you care for France Culture, this will not shock you. What may shock you are calls to nokogiri and torify. I cannot know what you make of that. Maybe do not use this or adapt it to your needs...
But nokogiri is really great.
Code:
#!/bin/bash # This script downloads radio-broadcasts in mp3-format from # the sites of Radio-France. # The only argument to the script is the URL to a player-page, # i.e. the page for 1 broadcast, showing a play-button on top. # # ©2019-2019 Michael Uplawski <michael.uplawski@uplawski.eu> # Use ths script at your own risk, modify it as you please. # But maybe leave the copyright-notice intact. Thank You. SC=`basename "$0"` if [ $# -ne 1 ] then clear echo -e "ERREUR ! Il faut l'URL d'une page avec un audio-player" echo -e "Exemple :\n\t"$SC" https://www.franceculture.fr/emissions/la-fabrique-mediatique/defiance-envers-les-medias-quelles-solutions-22" exit 1 fi # --------- SOME DEFINITIONS ---------- # The command to extract an mp3-file from a page EXTR_CULT='puts $_.at_css("div.heading-zone-wrapper>div.heading-zone-player-button>button.replay-button/@data-asset-source")' EXTR_INTER='puts $_.at_css("div.cover-emission-actions-buttons-wrapper>button.replay-button/@data-url")' EXTR="" if [[ $1 == *"franceinter"* ]] then EXTR=$EXTR_INTER elif [[ $1 == *"franceculture"* ]] then EXTR=$EXTR_CULT else echo -e "ERREUR ! Téléchargements sont possibles seulement des sites de" echo -e "France-Culture ou France-Inter !" exit 2 fi # extract the URL of the mp3 mp3=`torify curl -s "$1" | nokogiri -e "$EXTR"` # extract the title of the broadcast title=`torify curl -s "$1" | nokogiri -e 'puts $_.at_css("title/text()")'` title=`echo "$title"|tr -s "[:space:][:punct:]" _` # Output-file OFL="$title".mp3 echo $OFL # --------> ACTION <--------- # download the mp3 torify wget -c "$mp3" --output-document="$OFL" # <-------- END ACTION ---------> #EOF
Total Comments 1
Comments
-
Nokogiri is a nice addition to the web scrapping tools like pup and webscraper.io.
I don't know how to use any of the three, but I hope I get the opportunity to use
them and play with them someday.
I came to know pup when Microsoft took over github and removed youtube-dl for
copy-right infringement (maybe by fear of having a lawsuit by some of its many
adversaries? but that's another topic). As a reaction, people started writing
replacement software, and one of them was a simple bash script that was like under
50 lines of code that would download media from youtube using "pup" and "jq", two
tools I never heard of before.
Then searching for webscraping tools I also discovered webscraper.io, which took
on the task of identifying the paths for you, with a fair degree of precision, all
with a visual tool like the developer bar of your favorite browser that let you
point and click on the elements of the page to inspect their HTML and CSS properties,
along with their paths and other useful information.
Thanks for the nice share!Posted 07-28-2022 at 08:41 AM by ychaouche