GeneralThis forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Over the years I've found myself using links more and more for web browsing, for load times if no other reason. Oh yeah, and the blithely getting past popup ads without realizing they're there until I re-visit a site on a javoids-on browser.
Today a site I go to every few days to read Mass Effect stories came up with this new bit of stupidity:
Quote:
Please enable cookies.
One more step Please complete the security check to access www.fanfiction.net
Please stand by, while we are checking your browser...
Please turn JavaScript on and reload the page.
Why do I have to complete a CAPTCHA?
Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.
What can I do to prevent this in the future?
If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.
If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.
Cloudflare Ray ID: 60458458db48386c * Your IP: xx.xx.xx.xx * Performance & security by Cloudflare
In the above, I made one alteration from the original quoted text: i replaced the displayed IP with "xx.xx.xx.xx".
I'm pretty sure that I can get links to accept cookies and promptly discard them, thus keeping that part of the problem from manifesting, if only because wget can do it, so it is possible.
Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.
Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.
I haven't heard of such a thing, and, even if it exists, it probably won't help much, because those elements of the site that depends on javascript won't work. And some sites are heavily dependent on javascript.
A web search for "mimic javascript" turned up a lot of articles about something called "mimic," but nothing about how to prevaricate about the status ojavascript in your browser. A search for "pretend javascript" was similarly fruitless.
I've encountered some sites that don't work at all unless they can take advantage of javascript. For example, many sites that get put up by candidates during campaigns--sites that they know may not be needed after election day, are entirely javascript. Without javascript enabled, they are just big empty browser windows.
Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.
That is definitely impossible.
The best compromise I can offer you is to look into replacing Lynx with Browsh.
Do be aware that the "new bit of stupidity" is probably this:
It's not possible because the site relies on something it only gets after executing a bit of js.
I know you asked because of links, but: consider a 2nd browser for problematic sites. javascript is not evil incarnate, it can be used sparingly (an addon like uMatrix helps greatly to not run it indiscriminately, and also shows you that javascript isn't the only "thing" to be avoided).
Update. Sorry, just checked this. edbrowse cannot bypass the CAPTCHA on that page. It uses Duktape to handle JS, and obviously that's not enough in this case.
Another possibility is edbrowse, but it's a very unconventional browser geared towards blind users. If you're comfortable using ed you may give edbrowse a try. Otherwise, Browsh is definitely a better option.
I mostly use edbrowse as a quick'n'dirty scripting/webscraping tool.
Your issue is the domain owner has chosen to put CloudFlare as a MitM to "protect" their site from bad bots/etc.
You either need a second browser to obtain the relevant cookies (which you then set in your standard browser), or a proxy/mirror server to appear as a browser that you can then browse with Links/whatever.
Here's the first hit from a relevant search: https://github.com/VeNoMouS/cloudscraper - no idea of its quality/effectiveness; check the tags on the right for others.
I tried to see if that site would respond to different user agents.
Lets play:
Code:
agentA="Mozilla/5.0 (Windows NT 10.1; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0"
agentB="Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
agentC="Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A403 Safari/602.1"
agentD="Mozilla/5.0 (compatible; Googlebot/2.1); + http://www.google.com/bot.html"
#Fail
curl -A "$agentA" -LI https://www.fanfiction.net -o test1.html
#Fail
curl -A "$agentB" https://www.fanfiction.net -o test2.html
#Fail
curl -A "$agentC" https://www.fanfiction.net -o test3.html
#Fail
curl -A "$agentD" https://www.fanfiction.net -o test4.html
To show the OP that it can be done, here is an example using PyQt5 and webkit to get that page quickly with images and scripts turned off. Might not be as neat as I could have made it, but it works.
Code:
#!/usr/bin/python
import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication, QMainWindow
from PyQt5.QtWebKitWidgets import QWebPage, QWebView
from PyQt5.QtWebKit import QWebSettings
from PyQt5.QtGui import QFont
a = ('Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) '
' AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0'
' Mobile/14A403 Safari/602.1')
class MainWindow(QMainWindow):
def __init__(self, url):
super(MainWindow, self).__init__()
self.resize(1200,1000)
font = QFont()
font.setPointSize(14)
self.fs = (20)
self.userAgent = QWebPage(self)
def agent(self):
return (a)
self.userAgent.userAgentForUrl = agent
self.agent = agent(self)
self.view = QWebView(self)
self.view.setPage(self.userAgent)
self.view.setUrl(url)
self.view.settings().setAttribute(
QWebSettings.JavascriptEnabled, False)
self.view.settings().setAttribute(
QWebSettings.AutoLoadImages, False)
self.view.settings().globalSettings().setFontSize(
QWebSettings.MinimumFontSize, (self.fs))
self.view.titleChanged.connect(self.adjustTitle)
self.view.loadProgress.connect(self.setProgress)
self.view.loadFinished.connect(self.finishLoading)
self.setCentralWidget(self.view)
def adjustTitle(self):
if 0 < self.progress < 100:
self.setWindowTitle("%s (%s%%)" % (self.view.title(), self.progress))
else:
self.setWindowTitle(self.view.title())
def setProgress(self, p):
self.progress = p
self.adjustTitle()
def finishLoading(self):
self.progress = 100
self.adjustTitle()
if __name__ == '__main__':
app = QApplication(sys.argv)
if len(sys.argv) > 1:
url = QUrl(sys.argv[1])
else:
url = QUrl('https://www.fanfiction.net')
browser = MainWindow(url)
browser.show()
sys.exit(app.exec_())
You could also dump that page to file and open the file in a web browser, like lynx, or you could import webbrowser and open the page in it.
Example:
Code:
from webbrowser import register, get, GenericBrowser
register('dillo', None, GenericBrowser('dillo'))
get('dillo').open(<something>)
I was able to open fanfiction.net, too, with javascript and 3rd-party requests OFF.
Clicking some random links to get to some content, it seems to work well enough.
Maybe Cloudflare just doesn't like links' user agent?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.