LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-22-2012, 09:11 AM   #1
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Rep: Reputation: Disabled
Question Change pdftoppm output to 16 bits or 24 bits


Hello LQ people

I am working to make a script that automatically converts webpages to png or jpg files. First of all, I try to convert webpages to pdf files. I have managed it using wkhtmltopdf. After this conversion, I tried several programs to convert pdf to png or jpg.

As you all know, there is imagemagick library and the convert command. This is too slow for me. It tooks at least 20 seconds to convert a PDF to PNG.

I tried NetPBM with "pdftops" command but I have failed to properly convert the PDF file PNG file.

Later on, a friend on LQ suggested that I should use "pdftoppm". This is a perfect software. I have managed to convert PDF to PPM and then use NetPBM to convert PNG or JPG (also used "cjpeg" instead of NetPBM).

After a few hours, I have found that pdftoppm can directly convert a PDF file to PNG file! That becomes awesome, because I can make it using one command. However, the problem is, produced PNG's are very low quality, even after setting resolution to 600 dpi (using -r switch).

I have searched the possible problem on this issue and I have found that the produced PNG file is 8 bits, in all cases above.

Is it possible to convert pdftoppm output to 16 or 24 bits?

Last edited by Cyrolancer; 01-22-2012 at 09:12 AM.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 01-22-2012, 11:07 AM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
I cannot reproduce your results. For testing, I created a colorful PDF and verified it displays OK in Evince. Then, I ran
Code:
pdftoppm -aa yes -aaVector yes -freetype yes -r 300 file.pdf basename
to generate basename-1.ppm , which I compressed to PNG using maximum compression,
Code:
pnmtopng -compress 9 basename-1.ppm > basename-1.png
all in about eleven seconds on my workstation. The end result is a 24-bit PNG image,
Code:
file basename-1.png
 basename-1.png: PNG image data, 3509 x 2480, 8-bit/color RGB, non-interlaced
Note that the 8-bit/color RGB means eight bits per component using RGB, therefore 3×8 = 24 bits of color information per pixel.

After that, I checked that the PNG output (you use) in pdftoppm works okay:
Code:
pdftoppm -aa yes -aaVector yes -freetype yes -png -r 300 file.pdf othername
generates othername-1.png directly, and is about 6% faster, too.

Carefully comparing and checking the othername-1.png and basename-1.png images show that they contain the exact same data, and are almost exactly the same size. This means that the PNG output option in pdftoppm works very well, for me at least. Even eyeballing the resulting image in Gimp shows that all elements are nicely antialiased (no jaggies), color gradients are smooth (no visible steps), and so on: very satisfactory quality. And I'm very particular about my image quality.

Using the pnmcolormap tool to analyze the intermediate PPM image I created with the first command above,
Code:
pnmcolormap all basename-1.ppm >/dev/null
 pnmcolormap: making histogram...
 pnmcolormap: too many colors!
 pnmcolormap: scaling colors from maxval=255 to maxval=127 to improve clustering...
 pnmcolormap: making histogram...
 pnmcolormap: 21287 colors found
we can see that the image does have a lot of colors, 21287 in this case even after considering only the highest 7 bits per component.

I am using netpbm-2:10.0-12.2 and poppler-utils-0.16.7-2ubuntu2 (the latter providing the pdftoppm command).
 
2 members found this post helpful.
Old 01-23-2012, 01:52 AM   #3
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
Thank you for the detailed instructions Nominal Animal. I am using such a script to convert a webpage to JPG.

Code:
#!/bin/bash
START_TS=`date +%s`
xvfb-run -a -s "-screen 0 640x480x16" wkhtmltopdf -d 600 -q http://www.$1.com $1.pdf
pdftoppm -r 200 -png -H 1500 -freetype yes -aa yes -aaVector no $1.pdf > $1.png
convert -resize %50 $1.png $1.jpg
rm $1.pdf $1.png
END_TS=`date +%s`
declare -i TS_DIFF=$END_TS-$START_TS
echo "http://www.$1.com is processed in $TS_DIFF seconds"
Somehow the quality has increased a lot after setting up DPI. But there are problems with the gradients, shadows and some colors. Setting aaVector variable to "no" has corrected some of these problems, especially on gradients and colors. There are still some problems with the shadows but I think there are no ways to fix that, instead of using imagemagick.

After that, I tried

Code:
convert site.pdf site.jpg
and found that the execution time is approximately same with the above script.

The main problem is using only the "convert" command produces very very low resolution jpg image but the script above produces well-defined, readable and acceptable resolution jpg image. I think there is something to do with the parameters of "convert" command

P.S.
poppler-utils 0.12.4-1.2
netpbm 2:10.0-12.2+b1
imagemagick 8:6.6.0.4-3

on Debian 6.0.3 (squeeze)
 
Old 01-23-2012, 08:12 PM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by Cyrolancer View Post
Thank you for the detailed instructions Nominal Animal. I am using such a script to convert a webpage to JPG.
Have you tried using wkhtmltoimage instead of wkhtmltopdf? The conversion to PDF seems like unnecessary complication to me. The options are described in the README_WKHTMLTOIMAGE file.
 
1 members found this post helpful.
Old 01-24-2012, 01:52 AM   #5
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
I know wkhtmltoimage, but it is not available in Debian repos (at least for squeeze). Due to my company's policy, it is not possible to compile programs. When Debian repos are updated with wkhtmltoimage, probably I am going to change everything and use it for my purpose.
 
Old 01-24-2012, 10:24 AM   #6
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Are you allowed to write and run Python scripts?

You could use the Debian webkit libraries with Python bindings to render the HTML using xvfb-run, then save it to a PNG or JPEG file.

Assuming you have python and python-webkit installed, the webkitscreenshot.py script might be a good starting point. I think it restricts the images to top 1024x768 of the page, though. (I think it uses 1024x768 to render the page, then scales it to the desired size.)

I think it might be better to have the script "tile" the page, saving each tile as a PPM image (to avoid compression overhead). Then, stitch them back together into a single image (possibly cropping out any overlap). That way you wouldn't need to worry about the page size either, you'd always get the entire page. I guess it depends on whether you want "thumbnails" of the web pages, or entire web pages as images.
 
Old 01-24-2012, 03:28 PM   #7
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
I can use Python scripts on the servers and it is probably possible to install any python libraries / extensions to the server.

I think I would not use wkhtmltopdf due to security reasons (Thanks for unSpawn and okcomputer44 for their support) and I need another option which is not using X-server to process HTML pages to PDF. I am not really sure python-webkit needs x-server.

After executing "apt-get install python-webkit" on an OpenVZ Debian-based virtual machine using official image provided in OpenVZ website, these packages are shown up to be installed:

Quote:
aspell
aspell-en
dictionaries-common
hicolor-icon-theme
hunspell-en-us
iso-codes
libaspell15
libatk1.0-0
libatk1.0-data
libblas3gf
libcairo2
libdatrie1
libenchant1c2a
libffi5
libfontenc1
libgail18
libgfortran3
libgstreamer-plugins-base0.10-0
libgstreamer0.10-0
libgtk2.0-0
libgtk2.0-bin
libgtk2.0-common
libhunspell-1.2-0
libice6
libicu44
libjasper1
libjpeg62
liblapack3gf
libpango1.0-0
libpango1.0-common
libpixman-1-0
libpng12-0
libsm6
libsoup2.4-1
libthai-data
libthai0
libtiff4
libwebkit-1.0-2
libwebkit-1.0-common
libx11-6
libx11-data
libxau6
libxcb-render-util0
libxcb-render0
libxcb1
libxcomposite1
libxcursor1
libxdamage1
libxdmcp6
libxext6
libxfixes3
libxfont1
libxft2
libxi6
libxinerama1
libxrandr2
libxrender1
libxslt1.1
libxt6
python-cairo
python-gobject
python-gtk2
python-numpy
python-webkit
shared-mime-info
x-ttcidfont-conf
x11-common
xfonts-encodings
xfonts-utils
There are some libraries related with X but it doesn't seem to install full X server. That will be good for me and probably, your suggestion is the one that I am going to use.

Thank you
 
Old 01-25-2012, 12:05 AM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
A working Python standalone script

I got curious; I've used browsershots.org and similar before, and having an utility to grab entire pages automatically might come handy.

I hacked away on the above-linked Python code. As you can see from the timestamps, I only did a quick hack, to get it working. I think it has some bugs or at least wrinkles left..

The script itself calls Xvfb to provide a virtual X server. It does need the various X libraries mentioned above to be installed, but it does not need a running X server, or a physical display at all. I worked on it in a virtual machine running Debian with a text console only, no desktop environment installed at all, so I am sure of that. Remember to install some fonts (ttf-* packages), to get nicer web pages. Most web pages name their fonts without falling back to plain ones too gracefully, so having as many fonts installed as possible will make the pages closer to their designers' intent.

Specifically, on top of a clean, minimal Debian 6.0.3 (Squeeze) install, (graphical desktop environment explicitly unselected and thus not installed at all), I installed
Code:
sudo apt-get install python-gtk2 python-webkit xvfb ttf-freefont
with their required prerequisites only. Plus the below script, of course.

The Python script itself is still under 200 lines, so it is still relatively simple:
Code:
#!/usr/bin/env python

class WindowImage(object):
    def __init__(self, url, imagefile = "", font_size = 0,
                 font_default = "", font_serif = "",
                 font_sans_serif = "", font_monospace = ""):
        import gtk
        import webkit
        gtk.gdk.threads_init()

        window = gtk.Window(gtk.WINDOW_TOPLEVEL)
        window.move(0, 0)
        size = (gtk.gdk.screen_width(), gtk.gdk.screen_height())
        window.resize(*size)
        webview = webkit.WebView()

        self.imagefile = imagefile

        # webkit settings
        settings = webkit.WebSettings()
        if len(font_serif) > 0:
            settings.set_property("serif-font-family", font_serif)
        if len(font_sans_serif) > 0: 
            settings.set_property("sans-serif-font-family", font_sans_serif)
        if len(font_monospace) > 0:
            settings.set_property("monospace-font-family", font_monospace)
        if len(font_default) > 0:
            settings.set_property("default-font-family", font_default)
        if font_size > 0:
            settings.set_property("default-font-size", font_size)
        webview.set_settings(settings)

        window.add(webview)
        webview.connect("load-finished", self._loaded)
        webview.open(url)
        window.show_all()
        gtk.main()
        gtk.gdk.threads_leave()
        pass

    def _loaded(self, view, frame):
        import gtk
        try:
            width, height = view.window.get_size()
            pixmap = gtk.gdk.Pixmap(view.window, width, height)
            gc = pixmap.new_gc(function = gtk.gdk.COPY,
                               subwindow_mode = gtk.gdk.INCLUDE_INFERIORS)
            pixmap.draw_drawable(gc, view.window, 0, 0, 0, 0, width, height)
            pixbuf = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, False, 8, width, height)
            pixbuf.get_from_drawable(pixmap, pixmap.get_colormap(), 0, 0, 0, 0, width, height)
            print "Saving %d x %d PNG image '%s'" % (width, height, self.imagefile)
            pixbuf.save(self.imagefile, "png")
            self.saved = True
        except:
            #import traceback
            #traceback.print_exc()
            pass
        gtk.main_quit()
        pass
    pass


def vfb(display_spec, dpi, server=2, screen=0):
    import subprocess
    import os
    while True:
        try:
            devnull = open("/dev/null", "w")
            proc = subprocess.Popen(
                ["Xvfb", ":%d" % server, "-dpi", "%d" % dpi,
                 "-screen", "%d" % screen, display_spec],
                shell=False, stdout=devnull, stderr=devnull)
            print "Opened Xvfb (%s @ %d DPI)" % (display_spec, dpi)
            os.environ["DISPLAY"] = ":%d.%d" % (server, screen)
            return (proc, screen)
        except:
            screen += 1
            pass
        pass
    pass

def vfb_image(url, display_spec, dpi, **args):
    proc, screen = vfb(display_spec, dpi)

    try:
        return WindowImage(url, **args).saved
    finally:
        proc.terminate()
        pass
    pass

def _main():
    screen_width = 1024
    screen_height = 600
    screen_depth = 24
    dpi = 96
    imagefile = "page.png"
    font_size = 14
    font_default = "FreeSerif"
    font_serif = "FreeSerif"
    font_sans_serif = "FreeSans"
    font_monospace = "FreeMono"

    from optparse import OptionParser
    parser = OptionParser()
    parser.usage += " URL"
    parser.add_option("-x", "--width", dest="width",
                      help="browser window width: %d" % screen_width,
                      default="%d" % screen_width)
    parser.add_option("-y", "--height", dest="height",
                      help="browser window height: %d" % screen_height,
                      default="%d" % screen_height)
    parser.add_option("-d", "--depth", dest="depth",
                      help="color depth: %d" % screen_depth,
                      default="%d" % screen_depth)
    parser.add_option("-o", "--output", dest="output",
                      help="output image file name: %s" % imagefile,
                      default=imagefile)
    parser.add_option("-z", "--dpi", dest="dpi",
                      help="dots per inch: %d" % dpi,
                      default=dpi)
    parser.add_option("-s", "--size", dest="font_size",
                      help="font size: %s" % font_size,
                      default=font_size)
    parser.add_option("-f", "--font", dest="font_default",
                      help="default font: %s" % font_default,
                      default="")
    parser.add_option("-m", "--mono", dest="font_monospace",
                      help="Monospace font: %s" % font_monospace,
                      default="")
    parser.add_option("-S", "--serif", dest="font_serif",
                      help="Serif font: %s" % font_serif,
                      default="")
    parser.add_option("-A", "--sans", dest="font_sans_serif",
                      help="Sans-serif font: %s" % font_sans_serif,
                      default="")

    opts, args = parser.parse_args()
    if len(args) == 0:
        parser.print_help()
        import sys
        sys.exit(-1)
        pass

    try: font_size = eval(opts.font_size)
    except: pass

    try:
        dpi = eval(opts.dpi)
        if dpi < 36:
            dpi = 36
    except: pass

    if len(opts.font_default) > 0:
        font_default = opts.font_default
        font_serif = opts.font_default
        font_sans_serif = opts.font_default
        font_monospace = opts.font_monospace

    if len(opts.font_serif) > 0:
        font_serif = opts.font_serif

    if len(opts.font_sans_serif) > 0:
        font_sans_serif = opts.font_sans_serif

    if len(opts.font_monospace) > 0:
        font_monospace = opts.font_monospace

    imagefile = opts.output

    try: screen_width = eval(opts.width)
    except: pass

    try: screen_height = eval(opts.height)
    except: pass

    try: screen_depth = eval(opts.depth)
    except: pass

    screen = "%dx%dx%d" % (screen_width, screen_height, screen_depth)

    from urlparse import urlparse
    if "://" in args[0]:
        url = urlparse(args[0]).geturl()
    elif args[0].startswith("/") or args[0].startswith("./") or args[0].startswith("../"):
        url = urlparse("file://" + args[0]).geturl()
    else:
        url = urlparse("http://" + args[0]).geturl()

    if vfb_image(url, screen, dpi,
                      imagefile = imagefile,
                      font_size = font_size,
                      font_default = font_default,
                      font_serif = font_serif,
                      font_sans_serif = font_sans_serif,
                      font_monospace = font_monospace):
        print "%s: Image saved successfully." % imagefile

if __name__ == "__main__": _main()
The script calls Xvfb itself internally, so there is no need to use xvfb-run. It will not help, it will just slow things down. I also switched the default Xvfb server number to 2, so that if you do run it on a workstation with an X server, it'll still use the Xvfb and not your real X.

Edited: This version uses urlparse from urlparse to make sure the URL is correctly escaped. If you wish to refer to a local file, use the absolute or relative path (i.e. start the file name or path with /, ./ or ../). Thanks to Cyrolancer for pointing it out!

Run without parameters to see the usage. In a nutshell, the usage is
Code:
python script -o image.png URL
It is pretty fast, too. Linuxquestions Forums page:
Code:
time python url2png -x 1920 -y 1080 -o lq.png http://www.linuxquestions.org/questions/
  Saving 1920 x 3081 PNG image 'lq.png'
  lq.png: Image saved successfully
  real: 0m6.821s
  user: 0m2.136s
  sys:  0m0.536s

ls -l lq.png
  -rw-r--r-- 1 user group 719220 2012-01-25 07:08 lq.png
Most of the time is from loading the page; a local test page renders in less than a second. (If the Linuxquestions Forums page was local, it would have been rendered in less than three seconds, and it's a pretty complex page.)

If you want the script to be silent, omit the print lines.

Note that the screen width and height (specified using the -x and -y options) define the browser window size. Since the image is of the contents, the image size may be larger. If the page is taller or wider, then the image will be taller or wider, too. The layout on most pages depends on the browser window size, though.

Selecting the font size does not matter much on typical webpages, since they define their font sizes in points, not relative to the user default. Changing the DPI (option -z DPI for the script) reported by Xvfb affects only pages that use points (as opposed to pixels) to define the font size. For others, we could use the page zoom feature, but my initial tests showed it was too buggy: sometimes only part of the page would render. I think the page would need a reload or something to render properly when zoom is used.

The main difference with the original script I linked to, is that this one renders the window to a pixmap first, then converts the pixmap to a pixbuf, and finally saves the pixbuf as a PNG image directly using the pixbuf save function. The pixmap is necessary to get all page content, not just the part that is "visible" to Xvfb. If you prefer JPEG output (over PNG as used now), change the line to
Code:
            pixbuf.save(self.imagefile, "jpeg", {"quality": 97})
Security-wise, you could run the script using a dedicated user account, with very limited access to files. You could even wipe the user's home directory clean after each invocation, to make sure that even if a malicious webpage manages to cut through webkit, anything it might manage to save on your server would be wiped out anyway.

If you want to develop the script further, just start a new thread in the Programming forum -- perhaps including the above script and whatever you feel pertinent from this post, as a starting point. Feel free to use the script in any way you like.

Hope you find this useful,

Last edited by Nominal Animal; 01-26-2012 at 02:01 PM. Reason: Replaced the url = args[0] line.
 
1 members found this post helpful.
Old 01-25-2012, 03:43 AM   #9
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
You are awesome Nominal Animal This is a superior guide for this purpose. I will try this script as soon as possible.

Thank you very much
 
Old 01-26-2012, 05:56 AM   #10
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
Hello again

I think I have found an error on this script. When I try to open a page with such a link "jquery.js?m", there are some errors like:

Quote:
** Message: console message: http://xxx/drupal.js?m @271: TypeError: Result of expression '$' [undefined] is not a function.


** Message: console message: http://xxx/jcalendar.js?m @2: TypeError: Result of expression '$' [undefined] is not a function.

** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
** (url2png.py:6798): DEBUG: NP_Initialize
** (url2png.py:6798): DEBUG: NP_Initialize succeeded
but it creates the PNG file. I think that escaping the characters can solve this problem, and so I have changed "vfb_image" function a bit.

Code:
def vfb_image(url, display_spec, dpi, **args):
    proc, screen = vfb(display_spec, dpi)

    import urllib

    try:
        return WindowImage(urllib.quote_plus(url), **args).saved
    finally:
        proc.terminate()
        pass
    pass
It still creates PNG file but not showing the above messages.

Well, I really don't know Python much Maybe I have done something wrong, but after this small change, it works.

Small edit: Also having
Quote:
Xlib: extension "RANDR" missing on display ":2.0".
warning. I am pretty sure that X has loaded RANDR extension. Maybe this warning comes out of the problem, not using a real X?

Last edited by Cyrolancer; 01-26-2012 at 06:19 AM. Reason: RANDR warning added
 
Old 01-26-2012, 02:32 PM   #11
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by Cyrolancer View Post
I think I have found an error on this script. When I try to open a page with such a link "jquery.js?m", there are some errors like:
Ah, I forgot webkit does not escape the URL.

I think it is better to fix the URL a bit earlier, in the main function. I think the urlparse function works better, too. The fixed version above will even detect local file paths correctly, if you start the local file reference with /, ./ or ../. (After all, ./filename is always the same as filename.)

Quote:
Originally Posted by Cyrolancer View Post
Well, I really don't know Python much Maybe I have done something wrong, but after this small change, it works.
Well spotted! I used urlparse() instead so that both local (paths) and remote URLs work.

Quote:
Originally Posted by Cyrolancer View Post
Small edit: Also having warning. I am pretty sure that X has loaded RANDR extension. Maybe this warning comes out of the problem, not using a real X?
Yup, it is a harmless warning. The XRANDR extension is just not enabled for Xvfb. The extension is used to manage resolution changes, display rotation, and that sort of stuff.

The issue has been already reported with a suggested fix -- needs all of eleven lines changed -- but somebody would have to send an email to the xorg-devel mailing list, and ask it to be reviewed and included.
 
1 members found this post helpful.
Old 01-26-2012, 03:09 PM   #12
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
Thank you for your corrections on the script. I am just a person that knows "python" name only, not the python coding itself. I will gladly accept your solution on this problem.

Just want to ask a simple question. Because I don't know much about python, this seemed easy to me. If it is complicated or hard to do, please ignore this suggestion: Is it possible to make xvfb run silently, without giving out errors? Or maybe it can pass the errors on a script or print - append all errors and warnings to a text file for later investigations? I couldn't understand the part that runs xvfb. Tried some changes but always got errors.
 
Old 01-27-2012, 12:19 PM   #13
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by Cyrolancer View Post
Is it possible to make xvfb run silently, without giving out errors?
Of course. The trick is that it is webkit (actually some Xlib function called by webkit) that outputs the error, not Xvfb. In other words, you need to redirect the standard output and standard error streams elsewhere in the Python code.

This version of the script supports two additional command line arguments: -v and -q.
By default, it will output the image file name to standard output if successful, or an error message to standard error otherwise.
If you use the -q option, it will never print anything.
If you use the -v option (at least once), it will include the size and the format of the image file name in the standard output message.

If the URL cannot be loaded, or if the image file cannot be saved, it will exit with exit status 1. Otherwise, it will exit with exit status 0.

Code:
#!/usr/bin/env python

class WindowImage(object):
    def __init__(self, url, imagefile = "", font_size = 0,
                 font_default = "", font_serif = "",
                 font_sans_serif = "", font_monospace = ""):
        import gtk
        import webkit
        gtk.gdk.threads_init()

        window = gtk.Window(gtk.WINDOW_TOPLEVEL)
        window.move(0, 0)
        size = (gtk.gdk.screen_width(), gtk.gdk.screen_height())
        window.resize(*size)
        webview = webkit.WebView()

        self.url = url
        self.imagefile = imagefile

        # webkit settings
        settings = webkit.WebSettings()
        if len(font_serif) > 0:
            settings.set_property("serif-font-family", font_serif)
        if len(font_sans_serif) > 0: 
            settings.set_property("sans-serif-font-family", font_sans_serif)
        if len(font_monospace) > 0:
            settings.set_property("monospace-font-family", font_monospace)
        if len(font_default) > 0:
            settings.set_property("default-font-family", font_default)
        if font_size > 0:
            settings.set_property("default-font-size", font_size)
        webview.set_settings(settings)

        window.add(webview)
        webview.connect("load-finished", self._loaded)
        webview.connect("load-error", self._failed)
        webview.open(url)
        window.show_all()
        gtk.main()
        gtk.gdk.threads_leave()

    def _loaded(self, view, frame):
        import gtk
        self.image = (0, 0, "Cannot create image.")
        try:
            width, height = view.window.get_size()
            pixmap = gtk.gdk.Pixmap(view.window, width, height)
            gc = pixmap.new_gc(function = gtk.gdk.COPY,
                               subwindow_mode = gtk.gdk.INCLUDE_INFERIORS)
            pixmap.draw_drawable(gc, view.window, 0, 0, 0, 0, width, height)
            pixbuf = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, False, 8, width, height)
            pixbuf.get_from_drawable(pixmap, pixmap.get_colormap(), 0, 0, 0, 0, width, height)
            self.image = (0, 0, "Cannot save image file: %s" % self.imagefile )
            pixbuf.save(self.imagefile, "png")
            self.image = (width, height, "PNG")
        except:
            pass
        gtk.main_quit()

    def _failed(self, view, frame, uri, gerror):
        import gtk
        import ctypes
        msg = ctypes.cast(int(str(gerror)[13:-1],16)+8, ctypes.POINTER(ctypes.c_char_p))[0]
        self.image = (0, 0, "%s (%s)." % (msg, uri))
        gtk.main_quit()

    pass


def vfb(display_spec, dpi, server=2, screen=0):
    import subprocess
    import os
    while True:
        try:
            devnull = open(os.devnull, "w")
            proc = subprocess.Popen(
                ["Xvfb", ":%d" % server, "-dpi", "%d" % dpi,
                 "-screen", "%d" % screen, display_spec],
                shell=False, stdout=devnull, stderr=devnull)
            os.environ["DISPLAY"] = ":%d.%d" % (server, screen)
            return (proc, screen)
        except:
            screen += 1
        pass
    pass

def vfb_image(url, display_spec, dpi, **args):
    proc, screen = vfb(display_spec, dpi)
    try:
        return WindowImage(url, **args).image
    finally:
        proc.terminate()
    return (0, 0, "Webkit failed")

def _main():
    screen_width = 1024
    screen_height = 600
    screen_depth = 24
    dpi = 96
    imagefile = "page.png"
    font_size = 14
    font_default = "FreeSerif"
    font_serif = "FreeSerif"
    font_sans_serif = "FreeSans"
    font_monospace = "FreeMono"
    verbose = 1

    import os
    out = os.fdopen(os.dup(1), "w")
    err = os.fdopen(os.dup(2), "w")
    devnull = open(os.devnull, "w")
    if devnull.fileno() != 1: os.dup2(devnull.fileno(), 1)
    if devnull.fileno() != 2: os.dup2(devnull.fileno(), 2)
    if devnull.fileno() >  2: devnull.close()

    from optparse import OptionParser
    parser = OptionParser()
    parser.usage += " URL"
    parser.add_option("-v", "--verbose", dest="verbose",
                      help="verbose output", action="count",
                      default=verbose)
    parser.add_option("-q", "--quiet", dest="quiet",
                      help="no output", action="store_true",
                      default=False)
    parser.add_option("-x", "--width", dest="width",
                      help="browser window width: %d" % screen_width,
                      default="%d" % screen_width)
    parser.add_option("-y", "--height", dest="height",
                      help="browser window height: %d" % screen_height,
                      default="%d" % screen_height)
    parser.add_option("-d", "--depth", dest="depth",
                      help="color depth: %d" % screen_depth,
                      default="%d" % screen_depth)
    parser.add_option("-o", "--output", dest="output",
                      help="output image file name: %s" % imagefile,
                      default=imagefile)
    parser.add_option("-z", "--dpi", dest="dpi",
                      help="dots per inch: %d" % dpi,
                      default=dpi)
    parser.add_option("-s", "--size", dest="font_size",
                      help="font size: %s" % font_size,
                      default=font_size)
    parser.add_option("-f", "--font", dest="font_default",
                      help="default font: %s" % font_default,
                      default="")
    parser.add_option("-m", "--mono", dest="font_monospace",
                      help="Monospace font: %s" % font_monospace,
                      default="")
    parser.add_option("-S", "--serif", dest="font_serif",
                      help="Serif font: %s" % font_serif,
                      default="")
    parser.add_option("-A", "--sans", dest="font_sans_serif",
                      help="Sans-serif font: %s" % font_sans_serif,
                      default="")

    opts, args = parser.parse_args()
    if len(args) == 0:
        parser.print_help()
        import sys
        sys.exit(-1)
        pass

    try: font_size = eval(opts.font_size)
    except: pass

    try:
        dpi = eval(opts.dpi)
        if dpi < 36:
            dpi = 36
    except: pass

    if len(opts.font_default) > 0:
        font_default = opts.font_default
        font_serif = opts.font_default
        font_sans_serif = opts.font_default
        font_monospace = opts.font_monospace

    if len(opts.font_serif) > 0:
        font_serif = opts.font_serif

    if len(opts.font_sans_serif) > 0:
        font_sans_serif = opts.font_sans_serif

    if len(opts.font_monospace) > 0:
        font_monospace = opts.font_monospace

    imagefile = opts.output
    verbose = opts.verbose
    if opts.quiet: verbose = 0

    try: screen_width = eval(opts.width)
    except: pass

    try: screen_height = eval(opts.height)
    except: pass

    try: screen_depth = eval(opts.depth)
    except: pass

    screen = "%dx%dx%d" % (screen_width, screen_height, screen_depth)

    from urlparse import urlparse
    if "://" in args[0]:
        url = urlparse(args[0]).geturl()
    elif args[0].startswith("/") or args[0].startswith("./") or args[0].startswith("../"):
        url = urlparse("file://" + args[0]).geturl()
    else:
        url = urlparse("http://" + args[0]).geturl()

    (width, height, format) = vfb_image(url, screen, dpi,
                                        imagefile = imagefile,
                                        font_size = font_size,
                                        font_default = font_default,
                                        font_serif = font_serif,
                                        font_sans_serif = font_sans_serif,
                                        font_monospace = font_monospace)

    if (width < 1) or (height < 1):
        if verbose > 0:
            if len(format) > 0:
                err.write("%s\n" % format)
            else:
                err.write("Failed to save image: %s\n" % imagefile)
        import sys
        sys.exit(1)

    if verbose > 1:
        out.write("Saved %d x %d %s image: %s\n" % (width, height, format, imagefile))
    elif verbose > 0:
        out.write("%s\n" % imagefile)

if __name__ == "__main__": _main()
Hmm, this is getting pretty fully-featured already, and still clocks in at just 232 lines.
 
1 members found this post helpful.
Old 01-27-2012, 01:32 PM   #14
Cyrolancer
Member
 
Registered: Jan 2012
Distribution: Debian
Posts: 52

Original Poster
Rep: Reputation: Disabled
The new parameters are great, extends the capability of this tool. Thank you for your efforts.

I have a question and a modification on this script.

My modification is:

I think that, verbose output should be more verbose than ever So, I have commented out:

Code:
out.write("Saved %d x %d %s image: %s\n" % (width, height, format, imagefile))
and put this:

Code:
out.write("Status: OK\nURL: %s\nFile: %s\nSize: %dx%d\n" % (args[0], imagefile, width, height))
Seems better

And my question is:

After using such a link in the command prompt:

Code:
python url2png.py -v -x 1024 -y 768 -o testing3.png http://www.youtube.com/watch?v=p8EY6TB1Iow&feature=autoplay&list=UUsa5WTL9c9PLUi1KvfNNPyg&lf=plcp&playnext=1
it throws out something like:

Code:
[1] 6138
[2] 6139
[3] 6140
[4] 6141

Status: OK
URL: http://www.youtube.com/watch?v=p8EY6TB1Iow
File: testing3.png
Size: 1024x2120

[1]   Done                    python url2png.py -v -x 1024 -y 768 -o testing3.png http://www.youtube.com/watch?v=p8EY6TB1Iow
[2]   Done                    feature=autoplay
[3]-  Done                    list=UUsa5WTL9c9PLUi1KvfNNPyg
[4]+  Done                    lf=plcp
I think the problem is the ampersand (&) character. Is it possible to solve this problem or do we need to accept this?
 
Old 01-27-2012, 03:56 PM   #15
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by Cyrolancer View Post
I think that, verbose output should be more verbose than ever
If you use
Code:
    if verbose > 2:
        out.write("Status: OK\n")
        our.write("Request: %s\n" % args[0])
        out.write("URL: %s\n" % url)
        out.write("Saved: %s\n" % imagefile)
        out.write("Format: %s\n" % format)
        out.write("Width: %d\n" % width)
        out.write("height: %d\n" % height)
    elif verbose > 1:
        out.write("Saved %d x %d %s image: %s\n" % (width, height, format, imagefile))
    elif verbose > 0:
        out.write("%s\n" % imagefile)
then you can use -v to select the one-line output, and -vv to select the multiline output.

Quote:
Originally Posted by Cyrolancer View Post
After using such a link in the command prompt:
http://www.youtube.com/watch?v=p8EY6...lcp&playnext=1
.. your shell will complain about the & characters, because it thinks you want to run five commands in parallel (third one being lf=plcp) instead of one command.

Use single quotes when supplying the URLs by hand, i.e.
Code:
python url2png.py -v -x 1024 -y 768 -o testing3.png 'http://www.youtube.com/watch?v=p8EY6TB1Iow&feature=autoplay&list=UUsa5WTL9c9PLUi1KvfNNPyg&lf=plcp&playnext=1'
and double quotes when the parameter is expanded from a shell variable, i.e.
Code:
read -p 'Input URL: ' URL
python url2png.py -vv -x 1024 -y 768 -o testing3.png "$URL"
This is not specific to this script in any way; this is what you always have to do when using a command-line shell.

Please read the Quoting chapter in the Bash Reference Manual for details and explanations. It's not long! After that, just remember that every command you type or write in a Bash script is first interpreted by the shell; the actual inputs the command receives is after shell processing.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
64 bits or 32 bits on AMD athlon x2 ram 4gb? albertoburgos Linux - Newbie 13 08-15-2011 03:37 PM
What is the meaning of 32 bits x 16 bits in a flash memory? archieval Linux - Embedded & Single-board computer 4 05-25-2011 02:37 PM
Things to consider when compiling C using 32 bits instead of 64 bits processors rpomerleau Programming 2 07-08-2008 01:22 PM
How to change MAC address from 48 bits to 32 bits? cywong.digi Linux - Networking 1 09-05-2007 09:28 AM
32 bits version distros running 64 bits CPU javb Linux - General 4 04-02-2006 07:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 08:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration