LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 12-09-2018, 02:09 AM   #1
Sotoprior
Member
 
Registered: Nov 2017
Posts: 30

Rep: Reputation: Disabled
Question Can't Pull out PDF from this website, even at individual Elements...


https://hobbydocbox.com/Photography/...adability.html

Just trying to read up on this useful guide for programming, and my part as of currently, is developing some low resolution sprites for concept, (16x16,) which I'm using as a challenge to myself, (as a start to my own indie work,) and I need to back up this PDF hosted above to read offline when I'm not at an Access point, (which happens a lot here.) Yet I can't seem to get any of the corresponding download options to work. Nor even inspecting elements seems to show up anything to download yet. So I'm wondering if anyone has any better luck trying to download this PDF and if they can share a live download link for me, as I can't seem to get the element to download, as of yet so. Any advice as well how to download from this weird web PDF player would also be appreciated. Thanks.
 
Old 12-09-2018, 03:01 AM   #2
jsbjsb001
Senior Member
 
Registered: Mar 2009
Location: Earth, unfortunately...
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881

Rep: Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063Reputation: 2063
No, I had the same issue too. From what I'm seeing, it looks like a problem with the site, not you. Maybe contact the site's admin and explain the situation to them.
 
Old 12-09-2018, 06:37 AM   #3
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
not likely to succeed.
it's the sort of site that makes money with providing content that is only viewable with their (ad- and analytics-laden) web ui.
best bet is to find out where said tutorial originally came from.
 
Old 12-09-2018, 07:21 AM   #4
//////
Member
 
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: Arch Linux && OpenBSD 7.4 && Pop!_OS && Kali && Qubes-Os
Posts: 824

Rep: Reputation: 350Reputation: 350Reputation: 350Reputation: 350
Quote:
Originally Posted by ondoho View Post
it's the sort of site that makes money with providing content that is only viewable with their (ad- and analytics-laden) web ui.
i once searched pictures about brown recluse spider bites. the site in question had really low resolution picture and asked for money for large picture.

i checked that sites html code and found something like this :
Code:
https://www[some.site.com]/pictures/small_spider.jpeg
i just tried to change small_spider.jpeg to :

Code:
https://www[some.site.com]/pictures/large_spider.jpeg
and voila, it showed that large version of the spider bite.

here is wikipedia page of'em, might be NSFW and not suitable to small children. those spiders causes ugly wounds with their poison.
https://en.wikipedia.org/wiki/Brown_recluse_spider
 
Old 12-09-2018, 08:31 AM   #5
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,136
Blog Entries: 6

Rep: Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826
The .pdf is located at:
https://hobbydocbox.com/storage/78/7...1/78187251.pdf

https://hobbydocbox.com/docview/78/78187251/
Open url in Browser, scroll to get all pages, then print to file.pdf

Or use something that will parse scripts, browser, python, soup etc..
https://hobbydocbox.com/docview/78/7...1/78187251.pdf

I was able to print above url to file.pdf with Palemoon. File size is 54Mb.
 
Old 12-09-2018, 12:06 PM   #6
Sotoprior
Member
 
Registered: Nov 2017
Posts: 30

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by teckk View Post
The .pdf is located at:
https://hobbydocbox.com/storage/78/7...1/78187251.pdf

https://hobbydocbox.com/docview/78/78187251/
Open url in Browser, scroll to get all pages, then print to file.pdf

Or use something that will parse scripts, browser, python, soup etc..
https://hobbydocbox.com/docview/78/7...1/78187251.pdf

I was able to print above url to file.pdf with Palemoon. File size is 54Mb.
Thanks. I already tried the Printer Trick from the website and the isolated element from my chrome Browser, but it didn't work like it usually does. Didn't know if it was because I had no Printers Assigned to my laptop causing it to crash, or the PDF browser itself was bugged or something as the action of saving the file was incapable of being completed so far. So at least Someone was able to get the PDF file through the print option to fully load. Thanks.
 
Old 12-09-2018, 02:07 PM   #7
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,136
Blog Entries: 6

Rep: Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826
You could also use a little python, with something that will run
the scripts on the page, then print it to .pdf.

This Example uses a web browsers engine, so it should be ok to post
it here, since it uses a web browser to print, the same as if you
loaded the page into a web browser and printed it.

Python3, PyQt5, QtWebEngine
Code:
#! /usr/bin/env python

import sys
from PyQt5.QtCore import QUrl, pyqtSignal
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineWidgets import QWebEngineView, QWebEngineProfile

agent = ('Mozilla/5.0 (Windows NT 10.0; WOW64; rv:62.0)'
            ' Gecko/20100101 Firefox/62.0')

class PdfPrint():
    def __init__(self, url, out_file):
        super(PdfPrint, self).__init__()
        
        self.agent = QWebEngineProfile()
        self.agent.defaultProfile().setHttpUserAgent(agent)
            
        def print_pdf():
            self.printer.show()
            self.printer.page().printToPdf(out_file)

        self.printer = QWebEngineView()
        self.printer.load(QUrl(url))
        self.printer.loadFinished.connect(print_pdf)

if __name__ == '__main__':
    app = QApplication([])
    
    url = ('https://hobbydocbox.com/docview/78/78187251'
            '/#file=/storage/78/78187251/78187251.pdf')
            
    out_file = "MyFile.pdf"
      
    PdfPrint(url, out_file)
    sys.exit(app.exec_())
I was able to get it with that.(little screen cap)
Code:
curl https://ptpb.pw/H67f -o MyFile.jpg
The pages for that .pdf are delivered in blobs if that helps you.
You'll need to scroll down slowly to get them all in cache before
you try and print it.

Otherwise you are going to have to deal with blobs. I can list
them, but what are you going to do with them? Way easier
to print them after a browser has rendered them.
https://hobbydocbox.com/6ef349df-c39...b-7ea050939620
https://hobbydocbox.com/037f89b8-f55...2-ed93025169ab
https://hobbydocbox.com/739016b6-65d...f-a3a2bd1120f7
etc.

And as last resort, you could take a screenshot of every page
with scrot or imagemagic
Focused window
scrot -u
import -screen out.png

I can print a .pdf of that url/pdf here multiple ways. With webengine and Palemoon. It's a huge thing made of image files. Good luck.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
CSS for group of HTML elements in a series of such elements Turbocapitalist Linux - General 17 09-20-2017 12:16 AM
ipv6 newbie. is ipv6 configurable by an individual? or ipv4 still best for individual debguy Linux - Networking 1 10-26-2015 01:40 PM
In Javascript How to replace elements in one object with elements from another object pizzipie Linux - Software 1 05-08-2014 02:28 AM
save website as pdf and send pdf as an email? Cyberman Linux - Software 4 12-19-2009 09:41 PM
extract all the diagrams in a pdf file to individual graphics files on linux tcma Linux - Software 0 10-22-2004 01:52 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 12:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration