Can I use "cURL" or "wget" to click on the links on a page?
Hello,
On a web page, I want to click on all links. Can I use "cURL" or "wget" for this task? I saw https://askubuntu.com/questions/6390...tiple-webpages and also found below "wget" command: Code:
$ wget -r -p -k http://website Thank you. |
No, you cannot 'click' with wget or curl. You may be able to simulate a 'click' by transmitting the data that may have been transmitted as a result from a 'click' but therefor you would first need to analyze the traffic that results from aforementioned 'click'.
Besides, some websites are implemented almost entirely with javascript and clicks may be processed entirely by javascript. Neither of those CLI tools can process javascript. |
So, I am assuming you want to "crawl" and "scrape" website(s). You can do that, but I doubt it will be with a single line...
Here are some references (and there are many more if you search): https://www.petergroom.com/index.php...ape-a-web-page https://data36.com/web-scraping-tuto...age-with-bash/ |
I don't know why you would want to do that, but first step would be to get the hyperlinks.
Code:
url="https://www.amazon.com/s?k=linux&i=stripbooks-intl-ship&ref=nb_sb_noss" You could intercept the requests that a web browsers engine makes and print them. Scraping for some content would be easier and do-able. |
Thank you.
I found below command: Code:
$ curl URL 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 |
The easiest solution would be to append
Code:
grep ^/text/ |
Thanks.
I used below command: Code:
$ curl https://www.URL.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | grep ^/text/ > out.txt |
Quote:
Code:
sed 's|^|https://www.url.com|' Code:
sed -n 's|^/text/|https://www.url.com/text/|p' |
Thank you.
I did: Code:
$ curl https://www.URL.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | grep ^/text/ | sed 's|^|https://www.URL.com|' > out.txt |
All times are GMT -5. The time now is 05:02 PM. |