LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-23-2021, 07:01 AM   #1
lvm_
Member
 
Registered: Jul 2020
Posts: 925

Rep: Reputation: 337Reputation: 337Reputation: 337Reputation: 337
DOM curl


Web is rapidly transitioning to CSR model, old school tools for client-side web automation like curl and wget are becoming less and less useful, and yet I cannot find a good CSR web automation tool. Are there any? - essentially a modern browser engine with GUI replaced by a scripting backend for parsing a rendered DOM document and simulating user inputs. Yes, I know curl can be used to interact with REST APIs supplying data to CSR scripts, but analysing the workings of these APIs is usually time consuming
 
Old 06-25-2021, 07:19 AM   #2
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,453

Rep: Reputation: 447Reputation: 447Reputation: 447Reputation: 447Reputation: 447
https://github.com/puppeteer/puppeteer

The API is huge. An easier start is to look at the examples:

https://github.com/puppeteer/puppete...main/examples/
 
1 members found this post helpful.
Old 06-29-2021, 05:16 AM   #3
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by lvm_ View Post
Web is rapidly transitioning to CSR model, old school tools for client-side web automation like curl and wget are becoming less and less useful, and yet I cannot find a good CSR web automation tool. Are there any? - essentially a modern browser engine with GUI replaced by a scripting backend for parsing a rendered DOM document and simulating user inputs. Yes, I know curl can be used to interact with REST APIs supplying data to CSR scripts, but analysing the workings of these APIs is usually time consuming
What is this CSR model?
I don't see it here: https://en.wikipedia.org/wiki/CSR

That said, wget and curl do not deal with javascript, which can make scraping impossible.
I have used 'phantomjs' for such tasks, or simply 'chromium --headless' (download DOM first, then use other tools to inspect).
python has tools for that, too.
 
Old 06-29-2021, 06:30 AM   #4
Michael Uplawski
Senior Member
 
Registered: Dec 2015
Posts: 1,622
Blog Entries: 40

Rep: Reputation: Disabled
You want to take a look at Web-Driver and implemenations like Selenium or Watir. Maybe the Molybdenum extension for Firefox is still alive; then this may be sufficient for small projects. I saw an enterprise base its whole testing environment on Molybdenum. It worked until approximately 150 subsequent tests had been integrated and the system choked.

... choked on a multi-tier Web-Application. *)


But even for simple tasks, rather try anything “Web-Driver”. It is enough fun to motivate for more.

*) At that moment, they recruited me to sort out the mess. Today I know I was dumb. But ask me about web-driver, if need be.

Last edited by Michael Uplawski; 06-29-2021 at 06:32 AM. Reason: Kraut2English
 
Old 06-29-2021, 06:39 AM   #5
lvm_
Member
 
Registered: Jul 2020
Posts: 925

Original Poster
Rep: Reputation: 337Reputation: 337Reputation: 337Reputation: 337
Quote:
Originally Posted by ondoho View Post
What is this CSR model?
Client-side rendering. The worst thing that happened to the web.
 
Old 06-29-2021, 08:10 AM   #6
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,137
Blog Entries: 6

Rep: Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826
Quote:
I cannot find a good CSR web automation tool. Are there any?
You are either going to have to run the scripts on the client and spit out the result, or get the page source and follow it back one step at a time. I just posted a lighter resource python script to help with that. You don't have to use tkinter. You can spit out items to shell. That's why I made the script modular like that. Copy/Paste just the parts that you want.
https://www.linuxquestions.org/quest...er-4175697045/

A web browsers engine that already knows how to run scripts is ideal for this. Spit out all of the requests that the web browsers engine is making.

youtube-dl will extract video data for a lot of sites.

Another option is to open the web inspector of the browser, then refresh the page, look for item.
 
Old 06-30-2021, 01:31 AM   #7
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by lvm_ View Post
Client-side rendering. The worst thing that happened to the web.
Ah, javascript. Well, mostly. So my previous reply applies. Seriously, these tools exist, more than just a couple. Maybe your search fu is weak.

edi: more specific advice would require more specificity from your side.

Last edited by ondoho; 06-30-2021 at 01:32 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Curl-OPENSSL1 update for SUSE 11 SP4 - Backdates Curl DaveUSC SUSE / openSUSE 5 05-31-2019 02:04 PM
[SOLVED] Slackware ARM 14.2 - Updates Rollback - curl-7.57.0 - curl-7.58.0 abga Slackware - ARM 13 02-02-2018 11:44 AM
CentOS 6.7 has really OLD curl. Best way to update curl? sneakyimp Linux - Server 4 04-26-2016 03:06 PM
cURL: Server has many IPs, how would I make a cURL script use those IPs to send data? guest Programming 0 04-11-2009 11:42 AM
VeeJay - Free(dom) software for the GNU community bkeating Linux - Software 1 09-08-2004 04:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 03:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration