LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-25-2021, 11:23 AM   #1
n00b_noob
Member
 
Registered: Sep 2020
Posts: 436

Rep: Reputation: Disabled
Post Can I use "cURL" or "wget" to click on the links on a page?


Hello,
On a web page, I want to click on all links. Can I use "cURL" or "wget" for this task?
I saw https://askubuntu.com/questions/6390...tiple-webpages and also found below "wget" command:
Code:
$ wget -r -p -k http://website
But, it download whole of a web site. I just want to click on all links on a page. For example, Consider https://www.amazon.com/s?k=linux&i=s...ref=nb_sb_noss URL, you can see a list of books on that page, I want to use cURL or wget tool, to click on all books on that page.

Thank you.
 
Old 03-25-2021, 12:28 PM   #2
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
No, you cannot 'click' with wget or curl. You may be able to simulate a 'click' by transmitting the data that may have been transmitted as a result from a 'click' but therefor you would first need to analyze the traffic that results from aforementioned 'click'.

Besides, some websites are implemented almost entirely with javascript and clicks may be processed entirely by javascript. Neither of those CLI tools can process javascript.

Last edited by crts; 03-25-2021 at 12:32 PM.
 
Old 03-25-2021, 02:00 PM   #3
dc.901
Senior Member
 
Registered: Aug 2018
Location: Atlanta, GA - USA
Distribution: CentOS/RHEL, openSuSE/SLES, Ubuntu
Posts: 1,005

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
So, I am assuming you want to "crawl" and "scrape" website(s). You can do that, but I doubt it will be with a single line...

Here are some references (and there are many more if you search):
https://www.petergroom.com/index.php...ape-a-web-page
https://data36.com/web-scraping-tuto...age-with-bash/
 
Old 03-25-2021, 02:51 PM   #4
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,138
Blog Entries: 6

Rep: Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827
I don't know why you would want to do that, but first step would be to get the hyperlinks.
Code:
url="https://www.amazon.com/s?k=linux&i=stripbooks-intl-ship&ref=nb_sb_noss"

agent="Mozilla/5.0 (Windows NT 10.1; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"

wget -k -U "$agent" "$url" -O myfile.html

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" myfile.html
And that is not everything, just what is specified on the page. Look at myfile.html. Lots of server side scripts on that page that render other pages.

You could intercept the requests that a web browsers engine makes and print them.

Scraping for some content would be easier and do-able.
 
Old 03-27-2021, 01:45 AM   #5
n00b_noob
Member
 
Registered: Sep 2020
Posts: 436

Original Poster
Rep: Reputation: Disabled
Thank you.
I found below command:
Code:
$ curl URL 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2
It extracted all links, but I need to change it. I just want all links that started with "/text/" string. How can I change above command to just show all links that started with "/text/" string?
 
Old 03-27-2021, 06:49 AM   #6
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
The easiest solution would be to append
Code:
grep ^/text/
to the pipeline. That only displays those lines that start with /text/.
 
1 members found this post helpful.
Old 03-29-2021, 10:03 AM   #7
n00b_noob
Member
 
Registered: Sep 2020
Posts: 436

Original Poster
Rep: Reputation: Disabled
Thanks.
I used below command:
Code:
$ curl https://www.URL.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | grep ^/text/ > out.txt
The out.txt file includes the links that I wanted. How can I add "https://www.URL.com" to the beginning of each line in the out.txt file?
 
Old 03-29-2021, 05:46 PM   #8
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by n00b_noob View Post
The out.txt file includes the links that I wanted. How can I add "https://www.URL.com" to the beginning of each line in the out.txt file?
Add this to the pipeline:
Code:
sed 's|^|https://www.url.com|'
Or better, you can incorporate the grep into the sed:
Code:
sed -n 's|^/text/|https://www.url.com/text/|p'
I suggest you read the sed guide. Link is in my signature.

Last edited by berndbausch; 03-29-2021 at 05:47 PM.
 
1 members found this post helpful.
Old 03-31-2021, 07:01 AM   #9
n00b_noob
Member
 
Registered: Sep 2020
Posts: 436

Original Poster
Rep: Reputation: Disabled
Thank you.

I did:
Code:
$ curl https://www.URL.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | grep ^/text/ | sed 's|^|https://www.URL.com|' > out.txt
$ wget -i out.txt -O /dev/null
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Computer detecting right-click as left-click, left-click as left-click and middle with 2 fingers pressed as right-click Festerdam Linux - Newbie 5 06-19-2017 05:41 PM
Can wget extract links from a locally stored html page? LAPIII Linux - Software 1 11-12-2013 01:14 AM
web page/links links/links vendtagain Linux - Newbie 2 09-19-2009 08:13 PM
cURL: Server has many IPs, how would I make a cURL script use those IPs to send data? guest Programming 0 04-11-2009 11:42 AM
Wget or cURL code for checking changes to a web page? ewingtux Programming 2 12-16-2008 04:46 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:53 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration