Open url and save page using terminal without x11 - Firefox or Chrome
Hello guys , is there any way to over the terminal without x11 loaded open an url to save it as file without open the gui app on X11 session ?
I want to do it with Firefox or Chrome , i try httpie but it does not load the full webpage only 50% of it . |
wget or curl, or lynx maybe?
|
wget and curl can not get it , its from dropbox
Edited : curl can download the links from dropbox , but i want the full webpage only . |
Curl or wegt should be able to do it, they can make the same request and receive the same response as the browser would do with suitable options.
|
Try it : https://www.dropbox.com/sh/erv1tyczt...HqxmYz_5a?dl=0
and if you can then get me all the links from the wav files inside that folder with whatever tool you may think it will work over the terminal only. Note : I dont need the links parsed from webpage , i just want the webpage fully downloaded with all wav files links inside . I can only get 30 from 50 and it is with httpie because with curl or wget its impossible . I dont want to download the files , i just want the webpage as html downloaded and full . Here it is a simple script that does all the stuff , i inserted httpie as the call tool to get the html from dropbox , but you can use any tool you want , just adjust the code at line 18 to whatever tool you want . Code:
#!/bin/bash |
|
Quote:
|
That page is delivered as, and loaded with javascript with Micro$oft copyright notices, but apparently it checks the useragent string to decide whether to deliver just the page or to helpfully push the whole archive down the user's pipe...
Use wget with suitable options, in this case the -U option and your browser's useragent string and it will give you just the page without the files. Note that the page you get will also be obnoxiously delivered as javascript from which you will have to extract the links, but the links are there. Anticipating your next question, "How can I get just the links as HTML without the javascript?"... probably use a different storage platform. I think you will need some post processing of the "page" (i.e. downloaded script) to extract those links. |
Quote:
I just test it here and got the same output as httpie Quote:
|
I just tried it as a guess (after looking at the page source in my browser) and I got the page.
Code:
wget https://www.dropbox.com/sh/erv1tycztizfvyd/AADeXwemV9sK37MSHqxmYz_5a?dl=0 -O page.html -U 'Mozilla...my ua string here' Without the -U option I get approx 11MB of zipped archive, all those wav files. |
The problem is javascript - wget, curl & co. do not handle it - only browsers and dedicated tools (e.g. phantomjs) do.
You can try using your browser in headless mode (with firefox, simply add --headless to the command) - if that doesn't work, you'll have to resort to phantomjs or some python modules (beautifulsoup if memory serves) etc. Rudimentary coding will be required. Example for phantomjs. And web searches. Example. |
Running it from a shell got this :
Quote:
|
Quote:
|
IT =
However this javascript is out of the question because it requires a X11 session opened , and i want everything to be working over a shell . I made a ticket to httpie denvelopers on github , maybe they will find a way to override this issue . |
All times are GMT -5. The time now is 05:51 AM. |