LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-10-2018, 08:58 AM   #1
newbiesforever
Senior Member
 
Registered: Apr 2006
Location: Iowa
Distribution: Debian distro family
Posts: 2,375

Rep: Reputation: Disabled
printing the text from a web page without printing the graphics


A "good" website to me will have some button or mode that allows one to print only the text if desired, not wasting ink/toner on printing the graphics. Unfortunately, any number of professionally designed sites (probably the majority, even) don't. I don't suppose Firefox (or Pale Moon, in my case) has its own option to ignore any graphics when printing? I'm looking but haven't seen it.

Of course I know the simplest way to print only the text: highlight it all, then copy and paste it into a word processor. I just had to do that for an article I wanted a hard copy of. And it's not as though toner is expensive anymore, I admit. This is one of those things I ask mainly on principle.

You know, I can guess the problem here: when these websites don't easily facilitate printing their text, it's most likely because the sites are optimized for being read on one's phone, where printing is impossible. (Doesn't that sound absurd to a non-millennial? Reading on a phone? Like listening to a piece of paper?)
 
Old 03-10-2018, 09:18 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
I thought there was a built-in function in Firefox to have a user-specified stylesheet override. Maybe there's not, but I would look for that first.

Otherwise there are a number of plug-ins or add-ons or whatever they are called that allow you to do CSS overrides selectively.

The heart of the problem is that competency in web design is becoming as rare as hen's teeth. There are some good people out there still but fewer are working and even fewer are teaching. Even fewer are coming up through the ranks. So it's likely terminal stage situation we are seeing because the very few that can actually do web design have moved on or even retired. Certainly they're no longer in positions to deal with the politics necessitated by the boss' cousin or college buddy's son's assertions of skill in web design. Look, even banks have third-party objects slowing down their pages, and that includes both CSS and JS.

<grumble> If you contact the web site in question, it would be interesting to know what they say if they repsond. However, most have a catalog of excuses handy. I just had to deal with yet another one that became glacially slow and bandwidth after a site 'upgrade'. Fine. However what is not fine is that they recently inserted an Adobe Flash dependency between visitors and checkout/payment for services. I bet they're wondering why sales have all but stopped. </grumble>
 
1 members found this post helpful.
Old 03-10-2018, 03:53 PM   #3
xamaco
Member
 
Registered: Sep 2009
Location: Bastelicaccia, Corsica
Distribution: Crux
Posts: 48

Rep: Reputation: 7
A quick and easy way is to use a text based browser as links, w3m, etc. which allow you to dump a web page as text.

Another way is to bring up the developer tools (F12 or ctrl-shift-i in Firefox) and then edit or delete some html, css, javascript stuff in there. I do it routinely.
 
Old 03-11-2018, 10:31 AM   #4
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,138
Blog Entries: 6

Rep: Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827
If you want a lighter page that is designed for an iphone, and the web server serves up pages that way, then report yourself as one, either as your browsers user agent, or your scripts user agent.

iPhone 10
Code:
agent="Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A403 Safari/602.1"
Quote:
not wasting ink/toner on printing the graphics.
Then get the page with images turned off, save it to file that way. Print the file.

Quote:
has its own option to ignore any graphics when printing?
Sorry, I haven't went that route for years now. I did not like being browser dependent for such tasks.

If all you want is the text, and you don't care about scripts being run
Code:
curl -A "$agent" <url> -o MyFile.html
Then print MyFile.html. You may want to turn it into a .ps .pdf of .txt first.

If you want the page to look right, but without images, then.... either turn images off in the browser before you load the page, or use a browser engine in a script to get the page without images.

Example
Code:
#! /usr/bin/env python

#Get source with scripts run using Python3/PyQt5/qt5-webengine
#Usage:
    #script.py <url> <local filename>
    #or script.py and answer prompts

import sys
from PyQt5.QtWebEngineWidgets import (QWebEnginePage, 
        QWebEngineProfile, QWebEngineView, QWebEngineSettings)
from PyQt5.QtWidgets import QApplication, QMainWindow
from PyQt5.QtCore import QUrl

#iphone 6 Safari
a = ('Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like Mac OS X)'
        ' AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0'
            ' Mobile/10B350 Safari/8536.25')

class Source(QWebEnginePage):
    def __init__(self, url, _file):
        self.app = QApplication([]) #(sys.argv)
        QWebEnginePage.__init__(self)
        
        self.agent = QWebEngineProfile(self)
        self.agent.defaultProfile().setHttpUserAgent(a) #set ua here
        
        self.view = QWebEngineView() #Images off
        self.view.settings().setAttribute(
                QWebEngineSettings.AutoLoadImages, False)
        
        self._file = _file
        self.load(QUrl(url))
        self.loadFinished.connect(self.on_load_finished)
        self.app.exec_()
        
    def on_load_finished(self):
        self.html = self.toHtml(self.write_it)

    def write_it(self, data):
        self.html = data
        with open (self._file, 'w') as f:
            f.write (self.html)
        print ('\nFinished\nFile saved to ' + (self._file))
        self.app.quit()
     
def main():
    #Open with arguments or prompt for input
    if len(sys.argv) > 2:
        url = (sys.argv[1])
        _file = (sys.argv[2])
    else:
        url = input('Enter/Paste url for source: ')
        _file = input('Enter output file name: ')
    Source(url, _file)
    
if __name__ == '__main__':
    main()
You could also use Phantomjs, nodejs, soup etc. I'm using webengine for my scripts. It works just like a browser of course.

Lynx will get a text format of a page
Code:
lynx -dump url > out.txt
curl will, something like
Code:
curl url | html2text  > out.txt
Another words, 2 steps, get page the way you want, save to file , print file. Not dependent on any browser.

Last edited by teckk; 03-11-2018 at 10:34 AM.
 
Old 03-11-2018, 03:28 PM   #5
newbiesforever
Senior Member
 
Registered: Apr 2006
Location: Iowa
Distribution: Debian distro family
Posts: 2,375

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
The heart of the problem is that competency in web design is becoming as rare as hen's teeth. There are some good people out there still but fewer are working and even fewer are teaching. Even fewer are coming up through the ranks. So it's likely terminal stage situation we are seeing because the very few that can actually do web design have moved on or even retired. Certainly they're no longer in positions to deal with the politics necessitated by the boss' cousin or college buddy's son's assertions of skill in web design.=
I had no idea. Why is this? Sorry, I haven't figured it out from what you said--why don't people want to work in web design anymore?
 
Old 03-11-2018, 11:56 PM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by newbiesforever View Post
I had no idea. Why is this? Sorry, I haven't figured it out from what you said--why don't people want to work in web design anymore?
I'm not sure why that is. There are probably more people today claiming to work in web design than ever before but the end product shows very clearly that neither the knowledge nor the skill is there. It seems that every other week a commercial site or two that I need or someone I know needs has fallen to their inept fiddlings. One site in particular that caused trouble to several different people I know went through two very major web site redesigns very recently and each time the redesign made it impossible for more and more potential customers to buy their services. It's like they're trying to go out of business.

I doubt there is a single group that can be blamed specifically for the shift in the sites and the loss of knowledge from the general population. However, there are several groups which have shown great effort in diminishing and disparaging knowledge, especially when it comes to ICT. There is a strong current of argumentum ad novitatem pervading the computing industry, especially the web sector rather than an emphasis on finding what works or even on Usability design. Though a lot of governments gain surveilance and control capabilities by further centralizing the WWW. As does Google which would gain leverage to force people into their centralized AMP hosting as sites get even more bloated. However, bloating is only one factor and I'm mainly railing against the complete lack of Usability and even basic functionality such as ordering or payment.

Anyway, the statements of Vint Cerf may seem quaint to some, and aimed at the net overall and not necessarily at just the WWW by itself, but to put it another way all of the market is more money than some of the market:

https://tools.ietf.org/html/rfc3271
 
Old 03-12-2018, 06:57 AM   #7
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by Turbocapitalist View Post
There is a strong current of argumentum ad novitatem
or shiny-new-stuff-syndrome, as others call it.
my god, that was already in 2002!

incidentally, a fellow forum member recommended this excellent article:
http://idlewords.com/talks/website_obesity.htm feat. my new favorite phrase "chickenshit minimalism"
 
Old 03-12-2018, 07:31 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Yeah. It's fairly new, 2002, but it was something that was talked about a lot online for ages and ages earlier and becoming more and more critical as time passed. I guess it could be seen coming to a head back then and needed to be brought up in an RFC. Again it's weird. Things used to be about growing a market or maximizing reach with the least effort and expense, but now it has turned on its head and is about using outrageous amounts of effort and costs to artificially narrow the potential market to a fixed subset of people. Like with many of the harmful fads on the net, doing it right would have been faster, cheaper, easier and would have produced higher return on investment through expanded market reach.

Quote:
Originally Posted by ondoho View Post
incidentally, a fellow forum member recommended this excellent article:
http://idlewords.com/talks/website_obesity.htm feat.
Thanks. That's a useful discussion of the problem. Bloat (and third-party objects) has only gotten worse since the time the that post was written. It is right on target:
"Why not just serve regular HTML without stuffing it full of useless crap? The question is left unanswered."
I suppose that although Google could use its weight to reward lean pages and punish bloat, they gain if they can increase the bloat until everyone seeks refuge in AMP hosted/cached on Google's own servers.

This thread has jogged my memory: I recall one audit report from about 10 years ago where one country's audit office investigated where (IIRC) 25M EUR had gone. The money had been earmarked for "developing" a slew of web sites. The answer in the report was that it was all spent on web designers and produced no visible results. It was convenient for me at the time to follow up very superficially on the report and I found that, in the geographic area affected, it looked to me like there were few if any skilled web teams. And, I'll go way out on a limb, it sure looked like, at the time, there were not any in educational positions any more to train up skilled teams even if there might have once been some. Thus a downward cycle had started.

If someone deploys BS, it is because they have learned BS and learned that it is ok to deploy BS, and if they have learned to use and endorse BS, someone certainly was involved in teaching them that BS. Given what I've seen since, I'm fairly sure that was and remains the case -- in more than one country.
 
Old 03-12-2018, 08:53 AM   #9
onebuck
Moderator
 
Registered: Jan 2005
Location: Central Florida 20 minutes from Disney World
Distribution: SlackwareŽ
Posts: 13,925
Blog Entries: 44

Rep: Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159
Member response

Hi,

Some sites will provide the means to show printable text for the page. Here at LQ you will find that under 'Thread tools' as 'Show Printable Text'. You may need to search the other site(s) for this option but I know some will provide this service.

Hope this helps.
Have fun & enjoy!
 
2 members found this post helpful.
Old 03-12-2018, 04:45 PM   #10
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,982

Rep: Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626
Used to be a setting in browsers that will not load images.

Most pages are fine without them anyway. https://support.mozilla.org/en-US/questions/981640

Third party images are even more bandwidth hogs.
 
1 members found this post helpful.
Old 03-12-2018, 06:39 PM   #11
Habitual
LQ Veteran
 
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
https://addons.mozilla.org/en-US/fir...on/print-edit/
 
Old 03-13-2018, 04:55 AM   #12
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by ondoho View Post
incidentally, a fellow forum member recommended this excellent article:
http://idlewords.com/talks/website_obesity.htm feat.
The conference summary of the talk has a link to the video which, on top of the good content, turned out to be surprisingly well delivered. Here is the link to that, too, if some would rather listen than read:

http://www.webdirections.org/blog/th...besity-crisis/

Some years ago the W3C used to have some pull with developers but somehow it has given up. Google, Amazon, and Facebook seem to call the shots now. If any two of them were to decide on anything together, they'd basically cause the decision to become a defacto standard simply through their massiveness.

The web if done correctly is rather device independent.
 
Old 03-13-2018, 08:45 AM   #13
newbiesforever
Senior Member
 
Registered: Apr 2006
Location: Iowa
Distribution: Debian distro family
Posts: 2,375

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
It's fairly new, 2002,
It is? But besides computer technology being obsolete the day you buy it, due to the pace of innovation (according to non-expert popular wisdom), 2002 was the Web 1.0 era. How do you call it recent, then?




Quote:
Originally Posted by Turbocapitalist View Post
Google, Amazon, and Facebook seem to call the shots now. If any two of them were to decide on anything together, they'd basically cause the decision to become a defacto standard simply through their massiveness.
Wouldn't that be because the foundation of Web 2.0 is the full monetization of the internet?

Last edited by newbiesforever; 03-13-2018 at 08:46 AM.
 
Old 03-13-2018, 09:06 AM   #14
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
You could try the "Print as Plain Text" Firefox extension (https://addons.mozilla.org/en-US/fir...text-selected/).
 
Old 03-13-2018, 09:31 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by newbiesforever View Post
Wouldn't that be because the foundation of Web 2.0 is the full monetization of the internet?
No. But that would be a different matter. Monetization works, but as the late Pieter Hintjens pointed out, happy customers are usually profitable customers. What's common is to squeeze too hard, and that loses money quickly after the first round. Same for making inefficiencies like shown in the presentation (text or video) above in the comparison of Pinboard vs "ACME".

Google's moves make sense, especially in regards to AMP, if they plan to capture part of the net. We've been through that before. Closed nets just don't grow. See history about CompuServe, Prodigy, Delphi, MSN (the original version) and others. The WWW grew, and thus the Internet beneath it, because it was open, just follow the RFCs and you are in. And what are RFCs but contracts.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Printing: 1st page has no margins, 2nd page is fine SmurfGGM Linux - General 3 11-06-2013 08:15 PM
Printing Problems - Canon Pixma iP3000 printing only half of the last page of job beoram Linux - Hardware 1 08-04-2012 04:42 AM
printing document comments on same page with text Libre/Open-Office SaintDanBert Linux - Software 4 11-15-2011 04:10 PM
Landscape printing from web page via Firefox only does portrait -- elp! Herbivore Linux - Newbie 7 12-29-2008 11:55 AM
Scribus Printing How to put 4 flyers on a page when printing jazzboy Linux - Software 0 05-02-2004 06:13 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 03:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration