Can't Pull out PDF from this website, even at individual Elements...

Sotoprior · 12-09-2018, 02:09 AM

https://hobbydocbox.com/Photography/...adability.html

Just trying to read up on this useful guide for programming, and my part as of currently, is developing some low resolution sprites for concept, (16x16,) which I'm using as a challenge to myself, (as a start to my own indie work,) and I need to back up this PDF hosted above to read offline when I'm not at an Access point, (which happens a lot here.) Yet I can't seem to get any of the corresponding download options to work. Nor even inspecting elements seems to show up anything to download yet. So I'm wondering if anyone has any better luck trying to download this PDF and if they can share a live download link for me, as I can't seem to get the element to download, as of yet so. Any advice as well how to download from this weird web PDF player would also be appreciated. Thanks.

jsbjsb001 · 12-09-2018, 03:01 AM

No, I had the same issue too. From what I'm seeing, it looks like a problem with the site, not you. Maybe contact the site's admin and explain the situation to them.

ondoho · 12-09-2018, 06:37 AM

not likely to succeed.
it's the sort of site that makes money with providing content that is only viewable with their (ad- and analytics-laden) web ui.
best bet is to find out where said tutorial originally came from.

////// · 12-09-2018, 07:21 AM

Quote:

Originally Posted by ondoho

it's the sort of site that makes money with providing content that is only viewable with their (ad- and analytics-laden) web ui.

i once searched pictures about brown recluse spider bites. the site in question had really low resolution picture and asked for money for large picture.

i checked that sites html code and found something like this :

Code:

https://www[some.site.com]/pictures/small_spider.jpeg

i just tried to change small_spider.jpeg to :

Code:

https://www[some.site.com]/pictures/large_spider.jpeg

and voila, it showed that large version of the spider bite.

here is wikipedia page of'em, might be NSFW and not suitable to small children. those spiders causes ugly wounds with their poison.
https://en.wikipedia.org/wiki/Brown_recluse_spider

teckk · 12-09-2018, 08:31 AM

The .pdf is located at:
https://hobbydocbox.com/storage/78/7...1/78187251.pdf

https://hobbydocbox.com/docview/78/78187251/
Open url in Browser, scroll to get all pages, then print to file.pdf

Or use something that will parse scripts, browser, python, soup etc..
https://hobbydocbox.com/docview/78/7...1/78187251.pdf

I was able to print above url to file.pdf with Palemoon. File size is 54Mb.

Sotoprior · 12-09-2018, 12:06 PM

Quote:

Originally Posted by teckk

The .pdf is located at:
https://hobbydocbox.com/storage/78/7...1/78187251.pdf

https://hobbydocbox.com/docview/78/78187251/
Open url in Browser, scroll to get all pages, then print to file.pdf

Or use something that will parse scripts, browser, python, soup etc..
https://hobbydocbox.com/docview/78/7...1/78187251.pdf

I was able to print above url to file.pdf with Palemoon. File size is 54Mb.

Thanks. I already tried the Printer Trick from the website and the isolated element from my chrome Browser, but it didn't work like it usually does. Didn't know if it was because I had no Printers Assigned to my laptop causing it to crash, or the PDF browser itself was bugged or something as the action of saving the file was incapable of being completed so far. So at least Someone was able to get the PDF file through the print option to fully load. Thanks.

teckk · 12-09-2018, 02:07 PM

You could also use a little python, with something that will run
the scripts on the page, then print it to .pdf.

This Example uses a web browsers engine, so it should be ok to post
it here, since it uses a web browser to print, the same as if you
loaded the page into a web browser and printed it.

Python3, PyQt5, QtWebEngine

Code:

#! /usr/bin/env python

import sys
from PyQt5.QtCore import QUrl, pyqtSignal
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineWidgets import QWebEngineView, QWebEngineProfile

agent = ('Mozilla/5.0 (Windows NT 10.0; WOW64; rv:62.0)'
            ' Gecko/20100101 Firefox/62.0')

class PdfPrint():
    def __init__(self, url, out_file):
        super(PdfPrint, self).__init__()
        
        self.agent = QWebEngineProfile()
        self.agent.defaultProfile().setHttpUserAgent(agent)
            
        def print_pdf():
            self.printer.show()
            self.printer.page().printToPdf(out_file)

        self.printer = QWebEngineView()
        self.printer.load(QUrl(url))
        self.printer.loadFinished.connect(print_pdf)

if __name__ == '__main__':
    app = QApplication([])
    
    url = ('https://hobbydocbox.com/docview/78/78187251'
            '/#file=/storage/78/78187251/78187251.pdf')
            
    out_file = "MyFile.pdf"
      
    PdfPrint(url, out_file)
    sys.exit(app.exec_())

I was able to get it with that.(little screen cap)

Code:

curl https://ptpb.pw/H67f -o MyFile.jpg

The pages for that .pdf are delivered in blobs if that helps you.
You'll need to scroll down slowly to get them all in cache before
you try and print it.

Otherwise you are going to have to deal with blobs. I can list
them, but what are you going to do with them? Way easier
to print them after a browser has rendered them.
https://hobbydocbox.com/6ef349df-c39...b-7ea050939620
https://hobbydocbox.com/037f89b8-f55...2-ed93025169ab
https://hobbydocbox.com/739016b6-65d...f-a3a2bd1120f7
etc.

And as last resort, you could take a screenshot of every page
with scrot or imagemagic
Focused window
scrot -u
import -screen out.png

I can print a .pdf of that url/pdf here multiple ways. With webengine and Palemoon. It's a huge thing made of image files. Good luck.