LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-05-2021, 01:02 PM   #1
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Rep: Reputation: Disabled
Open url and save page using terminal without x11 - Firefox or Chrome


Hello guys , is there any way to over the terminal without x11 loaded open an url to save it as file without open the gui app on X11 session ?

I want to do it with Firefox or Chrome , i try httpie but it does not load the full webpage only 50% of it .
 
Old 09-05-2021, 01:11 PM   #2
GentleThotSeaMonkey
Member
 
Registered: Dec 2016
Posts: 194
Blog Entries: 3

Rep: Reputation: 65
wget or curl, or lynx maybe?
 
2 members found this post helpful.
Old 09-05-2021, 01:33 PM   #3
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Original Poster
Rep: Reputation: Disabled
wget and curl can not get it , its from dropbox

Edited : curl can download the links from dropbox , but i want the full webpage only .

Last edited by pedropt; 09-05-2021 at 01:39 PM.
 
Old 09-05-2021, 01:46 PM   #4
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,715
Blog Entries: 11

Rep: Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747
Curl or wegt should be able to do it, they can make the same request and receive the same response as the browser would do with suitable options.
 
Old 09-05-2021, 02:14 PM   #5
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Original Poster
Rep: Reputation: Disabled
Try it : https://www.dropbox.com/sh/erv1tyczt...HqxmYz_5a?dl=0

and if you can then get me all the links from the wav files inside that folder with whatever tool you may think it will work over the terminal only.

Note : I dont need the links parsed from webpage , i just want the webpage fully downloaded with all wav files links inside . I can only get 30 from 50 and it is with httpie because with curl or wget its impossible . I dont want to download the files , i just want the webpage as html downloaded and full .


Here it is a simple script that does all the stuff , i inserted httpie as the call tool to get the html from dropbox , but you can use any tool you want , just adjust the code at line 18 to whatever tool you want .
Code:
#!/bin/bash
rm out.file >/dev/null 2>&1
rm tmp.file >/dev/null 2>&1
echo -ne "Enter dropbox url : "
read -r url
chk=$(echo "$url" | grep "?dl=1")
chk1=$(echo "$url" | grep "https://www.dropbox.com/sh/")
if [[ ! -z "$chk" ]]
then
echo -e "Invalid Dropbox url"
exit 1
elif [[ -z "$chk1" ]]
then
echo -e "Invalid Dropbox url"
exit 1
fi
echo -ne "Retrieving links ...."
http $url -o tmp.file
grep -Eo "https?://\S+?\dl=0" tmp.file | sort | uniq > out.file
a1=$(wc -l out.file | awk '{print$1}')
clear
echo "Got $a1 Links"
echo "--------------------------------------------------------------------------"
cat out.file
echo "--------------------------------------------------------------------------"
exit 0

Last edited by pedropt; 09-05-2021 at 02:30 PM.
 
Old 09-05-2021, 02:18 PM   #6
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 16,854

Rep: Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692Reputation: 5692
https://www.dropbox.com/install?os=lnx
https://superuser.com/questions/4706...g-wget-command
 
Old 09-05-2021, 02:39 PM   #7
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Original Poster
Rep: Reputation: Disabled
Quote:
pan64 wrote :
https://www.dropbox.com/install?os=lnx
https://superuser.com/questions/4706...g-wget-command
__________________
A program will never do what you wish but what was implemented!


Happy with solution ... mark as [SOLVED]
If you really want to say thanks => click on Yes (bottom right corner).
I dont want to download the links , i just want to get the html where links are , i think i was clear when i started the thread .
 
Old 09-05-2021, 04:39 PM   #8
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,715
Blog Entries: 11

Rep: Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747
That page is delivered as, and loaded with javascript with Micro$oft copyright notices, but apparently it checks the useragent string to decide whether to deliver just the page or to helpfully push the whole archive down the user's pipe...

Use wget with suitable options, in this case the -U option and your browser's useragent string and it will give you just the page without the files. Note that the page you get will also be obnoxiously delivered as javascript from which you will have to extract the links, but the links are there.

Anticipating your next question, "How can I get just the links as HTML without the javascript?"... probably use a different storage platform. I think you will need some post processing of the "page" (i.e. downloaded script) to extract those links.

Last edited by astrogeek; 09-05-2021 at 04:55 PM.
 
Old 09-05-2021, 05:00 PM   #9
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Original Poster
Rep: Reputation: Disabled
Quote:
That page is loaded with javascript with Micro$oft copyright notices, but apparently it checks the useragent string to decide whether to deliver just the page or to helpfully push the whole archive down the user's pipe...

Use wget with suitable options, in this case the -U option and your browser's useragent string and it will give you just the page without the files. Note that the page you get will also be obnoxiously delivered as javascript from which you will have to extract the links, but the links are there.
Did you have try it or is just a guess ?

I just test it here and got the same output as httpie
Quote:
wget -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36" "$url" -O tmp.file
 
Old 09-05-2021, 05:07 PM   #10
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,715
Blog Entries: 11

Rep: Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747Reputation: 3747
I just tried it as a guess (after looking at the page source in my browser) and I got the page.

Code:
wget https://www.dropbox.com/sh/erv1tycztizfvyd/AADeXwemV9sK37MSHqxmYz_5a?dl=0 -O page.html -U 'Mozilla...my ua string here'
The page as delivered is 482471 bytes of script from which the page is to be rendered via javascript.

Without the -U option I get approx 11MB of zipped archive, all those wav files.

Last edited by astrogeek; 09-05-2021 at 05:18 PM.
 
Old 09-06-2021, 01:19 AM   #11
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 17,783
Blog Entries: 11

Rep: Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383
The problem is javascript - wget, curl & co. do not handle it - only browsers and dedicated tools (e.g. phantomjs) do.

You can try using your browser in headless mode (with firefox, simply add --headless to the command) - if that doesn't work, you'll have to resort to phantomjs or some python modules (beautifulsoup if memory serves) etc. Rudimentary coding will be required. Example for phantomjs.

And web searches. Example.

Last edited by ondoho; 09-06-2021 at 01:20 AM.
 
Old 09-06-2021, 12:26 PM   #12
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Original Poster
Rep: Reputation: Disabled
Running it from a shell got this :
Quote:
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
But after starting a X11 session and running it on terminal i got the same output as wget and httpie , witch is 31 links instead 50 .
 
Old 09-07-2021, 12:21 AM   #13
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 17,783
Blog Entries: 11

Rep: Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383Reputation: 5383
Quote:
Originally Posted by pedropt View Post
Running it from a shell got this :


But after starting a X11 session and running it on terminal i got the same output as wget and httpie , witch is 31 links instead 50 .
What "it"???
 
Old 09-07-2021, 12:27 PM   #14
pedropt
Member
 
Registered: Aug 2014
Distribution: Devuan
Posts: 299

Original Poster
Rep: Reputation: Disabled
IT =
However this javascript is out of the question because it requires a X11 session opened , and i want everything to be working over a shell .
I made a ticket to httpie denvelopers on github , maybe they will find a way to override this issue .
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
run X11 application from Firefox (HTML/php) button and display to remote X11. kamgas Linux - Newbie 2 02-10-2021 01:02 PM
[SOLVED] /usr/bin/X11/X11/....... 10 X11 subsets & counting walter74 Linux - General 2 06-07-2015 12:36 AM
[SOLVED] URL ftp://<my_servers_IPaddr> works with Chrome, but not so much with Firefox and IE. DavidDiepUSC Linux - Software 7 07-01-2014 01:57 PM
Google Chrome New Tab Page (!)= Chrome OS Desktop Kenny_Strawn Linux - General 6 02-19-2011 05:36 PM
[SOLVED] changing home page url for firefox using script kushalkoolwal Programming 4 03-15-2010 12:36 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration