LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 12-19-2020, 08:36 PM   #1
jr_bob_dobbs
Member
 
Registered: Mar 2009
Distribution: Slackware,Linux From Scratch
Posts: 483
Blog Entries: 82

Rep: Reputation: 104Reputation: 104
is there a way to fake having javascript on?


Over the years I've found myself using links more and more for web browsing, for load times if no other reason. Oh yeah, and the blithely getting past popup ads without realizing they're there until I re-visit a site on a javoids-on browser.

Today a site I go to every few days to read Mass Effect stories came up with this new bit of stupidity:
Quote:
Please enable cookies.

One more step Please complete the security check to access www.fanfiction.net

Please stand by, while we are checking your browser...

Please turn JavaScript on and reload the page.

Why do I have to complete a CAPTCHA?

Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.

What can I do to prevent this in the future?

If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.

If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.

Cloudflare Ray ID: 60458458db48386c * Your IP: xx.xx.xx.xx * Performance & security by Cloudflare
In the above, I made one alteration from the original quoted text: i replaced the displayed IP with "xx.xx.xx.xx".

I'm pretty sure that I can get links to accept cookies and promptly discard them, thus keeping that part of the problem from manifesting, if only because wget can do it, so it is possible.

Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.

Thank you.
 
Old 12-19-2020, 08:48 PM   #2
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 16,954
Blog Entries: 27

Rep: Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180Reputation: 5180
Quote:
Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.
I haven't heard of such a thing, and, even if it exists, it probably won't help much, because those elements of the site that depends on javascript won't work. And some sites are heavily dependent on javascript.

A web search for "mimic javascript" turned up a lot of articles about something called "mimic," but nothing about how to prevaricate about the status ojavascript in your browser. A search for "pretend javascript" was similarly fruitless.

I've encountered some sites that don't work at all unless they can take advantage of javascript. For example, many sites that get put up by candidates during campaigns--sites that they know may not be needed after election day, are entirely javascript. Without javascript enabled, they are just big empty browser windows.

Just my two cents.
 
Old 12-19-2020, 11:46 PM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 9,929

Rep: Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500Reputation: 4500
Quote:
Originally Posted by jr_bob_dobbs View Post
Does anyone know how to get links (or any non-javiods browser) to say "yeah, I have javascript on, really I do, trust me on this one, dude," to the site it's trying to fetch from.
That is definitely impossible.

The best compromise I can offer you is to look into replacing Lynx with Browsh.

Do be aware that the "new bit of stupidity" is probably this:

https://support.cloudflare.com/hc/en...Bot-Management

Last edited by dugan; 12-20-2020 at 12:02 AM.
 
Old 12-20-2020, 04:09 AM   #4
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 16,657
Blog Entries: 10

Rep: Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923
It's not possible because the site relies on something it only gets after executing a bit of js.
I know you asked because of links, but: consider a 2nd browser for problematic sites. javascript is not evil incarnate, it can be used sparingly (an addon like uMatrix helps greatly to not run it indiscriminately, and also shows you that javascript isn't the only "thing" to be avoided).

Last edited by ondoho; 12-20-2020 at 04:25 AM.
 
Old 12-20-2020, 04:17 AM   #5
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,491

Rep: Reputation: Disabled
Update. Sorry, just checked this. edbrowse cannot bypass the CAPTCHA on that page. It uses Duktape to handle JS, and obviously that's not enough in this case.

Another possibility is edbrowse, but it's a very unconventional browser geared towards blind users. If you're comfortable using ed you may give edbrowse a try. Otherwise, Browsh is definitely a better option.

I mostly use edbrowse as a quick'n'dirty scripting/webscraping tool.

Last edited by shruggy; 12-20-2020 at 08:30 AM.
 
Old 12-20-2020, 07:32 AM   #6
boughtonp
Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 967

Rep: Reputation: 740Reputation: 740Reputation: 740Reputation: 740Reputation: 740Reputation: 740Reputation: 740

Your issue is the domain owner has chosen to put CloudFlare as a MitM to "protect" their site from bad bots/etc.

You either need a second browser to obtain the relevant cookies (which you then set in your standard browser), or a proxy/mirror server to appear as a browser that you can then browse with Links/whatever.

Here's the first hit from a relevant search: https://github.com/VeNoMouS/cloudscraper - no idea of its quality/effectiveness; check the tags on the right for others.

Depending on how fresh the content is/needs to be, you might also be able to use the Wayback Machine - latest version of the main page is available at: https://web.archive.org/web/9/https://www.fanfiction.net


Last edited by boughtonp; 12-20-2020 at 07:33 AM.
 
Old 12-20-2020, 02:43 PM   #7
teckk
Senior Member
 
Registered: Oct 2004
Distribution: FreeBSD Arch
Posts: 3,274

Rep: Reputation: 983Reputation: 983Reputation: 983Reputation: 983Reputation: 983Reputation: 983Reputation: 983Reputation: 983
I was able to open https://www.fanfiction.net with Palemoon, with images and scripts turned off.

I tried to see if that site would respond to different user agents.

Lets play:
Code:
agentA="Mozilla/5.0 (Windows NT 10.1; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0"

agentB="Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"

agentC="Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A403 Safari/602.1"

agentD="Mozilla/5.0 (compatible; Googlebot/2.1); + http://www.google.com/bot.html"

#Fail
curl -A "$agentA" -LI https://www.fanfiction.net -o test1.html

#Fail
curl -A "$agentB" https://www.fanfiction.net -o test2.html

#Fail
curl -A "$agentC" https://www.fanfiction.net -o test3.html

#Fail
curl -A "$agentD" https://www.fanfiction.net -o test4.html
Ok, lets try something else:
Code:
from urllib import request

agentWin = ('Mozilla/5.0 (Windows NT 10.1; Win64 x64; rv:82.0) '
            'Gecko/20100101 Firefox/82.0')
            
user_agent = {'User-Agent': agentWin}
            
url = 'https://www.fanfiction.net'
req = request.Request(url, data=None, headers=user_agent)
print(req)
html = request.urlopen(req)
print(html)
Nope, cloudflare. Nope!

Ok, you hacked me off cloudflare.

To show the OP that it can be done, here is an example using PyQt5 and webkit to get that page quickly with images and scripts turned off. Might not be as neat as I could have made it, but it works.
Code:
#!/usr/bin/python

import sys
from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication, QMainWindow
from PyQt5.QtWebKitWidgets import QWebPage, QWebView
from PyQt5.QtWebKit import QWebSettings
from PyQt5.QtGui import QFont

a = ('Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) '
        ' AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0'
            ' Mobile/14A403 Safari/602.1')
            
class MainWindow(QMainWindow):
    def __init__(self, url):
        super(MainWindow, self).__init__()
        self.resize(1200,1000)
        font = QFont()
        font.setPointSize(14)
        self.fs = (20)
        self.userAgent = QWebPage(self)
        def agent(self):
            return (a)
        self.userAgent.userAgentForUrl = agent
        self.agent = agent(self)
        
        self.view = QWebView(self)
        self.view.setPage(self.userAgent)
        self.view.setUrl(url)
        self.view.settings().setAttribute(
                QWebSettings.JavascriptEnabled, False)
        self.view.settings().setAttribute(
                QWebSettings.AutoLoadImages, False)
        self.view.settings().globalSettings().setFontSize(
                QWebSettings.MinimumFontSize, (self.fs))
                
        self.view.titleChanged.connect(self.adjustTitle)
        self.view.loadProgress.connect(self.setProgress)
        self.view.loadFinished.connect(self.finishLoading)
        self.setCentralWidget(self.view)
        
    def adjustTitle(self):
        if 0 < self.progress < 100:
            self.setWindowTitle("%s (%s%%)" % (self.view.title(), self.progress))
        else:
            self.setWindowTitle(self.view.title())  
            
    def setProgress(self, p):
        self.progress = p
        self.adjustTitle()

    def finishLoading(self):
        self.progress = 100
        self.adjustTitle()
                
if __name__ == '__main__':
    app = QApplication(sys.argv)

    if len(sys.argv) > 1:
        url = QUrl(sys.argv[1])
    else:
        url = QUrl('https://www.fanfiction.net')

    browser = MainWindow(url)
    browser.show()
    sys.exit(app.exec_())
You could also dump that page to file and open the file in a web browser, like lynx, or you could import webbrowser and open the page in it.

Example:
Code:
from webbrowser import register, get, GenericBrowser

register('dillo', None, GenericBrowser('dillo'))
get('dillo').open(<something>)
 
Old 12-21-2020, 01:51 AM   #8
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 16,657
Blog Entries: 10

Rep: Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923Reputation: 4923
I was able to open fanfiction.net, too, with javascript and 3rd-party requests OFF.
Clicking some random links to get to some content, it seems to work well enough.
Maybe Cloudflare just doesn't like links' user agent?
 
Old 12-31-2020, 12:52 PM   #9
hazel
LQ Guru
 
Registered: Mar 2016
Location: Harrow, UK
Distribution: LFS, AntiX, Slackware
Posts: 5,063
Blog Entries: 14

Rep: Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865Reputation: 2865
But surely this site uses Cloudflare and I can get it on links.
 
Old 12-31-2020, 01:48 PM   #10
JSB
Member
 
Registered: Dec 2020
Posts: 45

Rep: Reputation: 18
fanfiction.net
It require ajax.googleapis.com but also work without.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 04:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration