is there a way to fake having javascript on?

jr_bob_dobbs · 12-19-2020, 07:36 PM

Over the years I've found myself using links more and more for web browsing, for load times if no other reason. Oh yeah, and the blithely getting past popup ads without realizing they're there until I re-visit a site on a javoids-on browser.

Today a site I go to every few days to read Mass Effect stories came up with this new bit of stupidity:

Quote:

Please enable cookies.

One more step Please complete the security check to access www.fanfiction.net

Please stand by, while we are checking your browser...

Please turn JavaScript on and reload the page.

Why do I have to complete a CAPTCHA?

Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.

What can I do to prevent this in the future?

If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.

If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.

Cloudflare Ray ID: 60458458db48386c * Your IP: xx.xx.xx.xx * Performance & security by Cloudflare

In the above, I made one alteration from the original quoted text: i replaced the displayed IP with "xx.xx.xx.xx".

I'm pretty sure that I can get links to accept cookies and promptly discard them, thus keeping that part of the problem from manifesting, if only because wget can do it, so it is possible.

Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.

Thank you.

frankbell · 12-19-2020, 07:48 PM

Quote:

Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.

I haven't heard of such a thing, and, even if it exists, it probably won't help much, because those elements of the site that depends on javascript won't work. And some sites are heavily dependent on javascript.

A web search for "mimic javascript" turned up a lot of articles about something called "mimic," but nothing about how to prevaricate about the status ojavascript in your browser. A search for "pretend javascript" was similarly fruitless.

I've encountered some sites that don't work at all unless they can take advantage of javascript. For example, many sites that get put up by candidates during campaigns--sites that they know may not be needed after election day, are entirely javascript. Without javascript enabled, they are just big empty browser windows.

Just my two cents.

dugan · 12-19-2020, 10:46 PM

Quote:

Originally Posted by jr_bob_dobbs

Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.

That is definitely impossible.

The best compromise I can offer you is to look into replacing Lynx with Browsh.

Do be aware that the "new bit of stupidity" is probably this:

https://support.cloudflare.com/hc/en...Bot-Management

ondoho · 12-20-2020, 03:09 AM

It's not possible because the site relies on something it only gets after executing a bit of js.
I know you asked because of links, but: consider a 2nd browser for problematic sites. javascript is not evil incarnate, it can be used sparingly (an addon like uMatrix helps greatly to not run it indiscriminately, and also shows you that javascript isn't the only "thing" to be avoided).

shruggy · 12-20-2020, 03:17 AM

Update. Sorry, just checked this. edbrowse cannot bypass the CAPTCHA on that page. It uses Duktape to handle JS, and obviously that's not enough in this case.

Another possibility is edbrowse, but it's a very unconventional browser geared towards blind users. If you're comfortable using ed you may give edbrowse a try. Otherwise, Browsh is definitely a better option.

I mostly use edbrowse as a quick'n'dirty scripting/webscraping tool.

boughtonp · 12-20-2020, 06:32 AM

Your issue is the domain owner has chosen to put CloudFlare as a MitM to "protect" their site from bad bots/etc.

You either need a second browser to obtain the relevant cookies (which you then set in your standard browser), or a proxy/mirror server to appear as a browser that you can then browse with Links/whatever.

Here's the first hit from a relevant search: https://github.com/VeNoMouS/cloudscraper - no idea of its quality/effectiveness; check the tags on the right for others.

Depending on how fresh the content is/needs to be, you might also be able to use the Wayback Machine - latest version of the main page is available at: https://web.archive.org/web/9/https://www.fanfiction.net

teckk · 12-20-2020, 01:43 PM

I was able to open https://www.fanfiction.net with Palemoon, with images and scripts turned off.

I tried to see if that site would respond to different user agents.

Lets play:

Code:

agentA="Mozilla/5.0 (Windows NT 10.1; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0"

agentB="Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"

agentC="Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A403 Safari/602.1"

agentD="Mozilla/5.0 (compatible; Googlebot/2.1); + http://www.google.com/bot.html"

#Fail
curl -A "$agentA" -LI https://www.fanfiction.net -o test1.html

#Fail
curl -A "$agentB" https://www.fanfiction.net -o test2.html

#Fail
curl -A "$agentC" https://www.fanfiction.net -o test3.html

#Fail
curl -A "$agentD" https://www.fanfiction.net -o test4.html

Ok, lets try something else:

Code:

from urllib import request

agentWin = ('Mozilla/5.0 (Windows NT 10.1; Win64 x64; rv:82.0) '
            'Gecko/20100101 Firefox/82.0')
            
user_agent = {'User-Agent': agentWin}
            
url = 'https://www.fanfiction.net'
req = request.Request(url, data=None, headers=user_agent)
print(req)
html = request.urlopen(req)
print(html)

Nope, cloudflare. Nope!

Ok, you hacked me off cloudflare.

To show the OP that it can be done, here is an example using PyQt5 and webkit to get that page quickly with images and scripts turned off. Might not be as neat as I could have made it, but it works.

Code:

#!/usr/bin/python

import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication, QMainWindow
from PyQt5.QtWebKitWidgets import QWebPage, QWebView
from PyQt5.QtWebKit import QWebSettings
from PyQt5.QtGui import QFont

a = ('Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) '
        ' AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0'
            ' Mobile/14A403 Safari/602.1')
            
class MainWindow(QMainWindow):
    def __init__(self, url):
        super(MainWindow, self).__init__()
        self.resize(1200,1000)
        font = QFont()
        font.setPointSize(14)
        self.fs = (20)
        self.userAgent = QWebPage(self)
        def agent(self):
            return (a)
        self.userAgent.userAgentForUrl = agent
        self.agent = agent(self)
        
        self.view = QWebView(self)
        self.view.setPage(self.userAgent)
        self.view.setUrl(url)
        self.view.settings().setAttribute(
                QWebSettings.JavascriptEnabled, False)
        self.view.settings().setAttribute(
                QWebSettings.AutoLoadImages, False)
        self.view.settings().globalSettings().setFontSize(
                QWebSettings.MinimumFontSize, (self.fs))
                
        self.view.titleChanged.connect(self.adjustTitle)
        self.view.loadProgress.connect(self.setProgress)
        self.view.loadFinished.connect(self.finishLoading)
        self.setCentralWidget(self.view)
        
    def adjustTitle(self):
        if 0 < self.progress < 100:
            self.setWindowTitle("%s (%s%%)" % (self.view.title(), self.progress))
        else:
            self.setWindowTitle(self.view.title())  
            
    def setProgress(self, p):
        self.progress = p
        self.adjustTitle()

    def finishLoading(self):
        self.progress = 100
        self.adjustTitle()
                
if __name__ == '__main__':
    app = QApplication(sys.argv)

    if len(sys.argv) > 1:
        url = QUrl(sys.argv[1])
    else:
        url = QUrl('https://www.fanfiction.net')

    browser = MainWindow(url)
    browser.show()
    sys.exit(app.exec_())

You could also dump that page to file and open the file in a web browser, like lynx, or you could import webbrowser and open the page in it.

Example:

Code:

from webbrowser import register, get, GenericBrowser

register('dillo', None, GenericBrowser('dillo'))
get('dillo').open(<something>)

ondoho · 12-21-2020, 12:51 AM

I was able to open fanfiction.net, too, with javascript and 3rd-party requests OFF.
Clicking some random links to get to some content, it seems to work well enough.
Maybe Cloudflare just doesn't like links' user agent?

hazel · 12-31-2020, 11:52 AM

But surely this site uses Cloudflare and I can get it on links.

JSB · 12-31-2020, 12:48 PM

fanfiction.net
It require ajax.googleapis.com but also work without.