LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Blogs > Lumak's Guide to Random Things
User Name
Password

Notices

OK I don't really have a good title yet but I figure I can post works in progress and other tips I've come across or other interesting things.
Rate this Entry

man to pdf converter

Posted 09-08-2009 at 04:21 PM by lumak
Updated 09-22-2009 at 12:31 AM by lumak

EDIT:
while it was fun to do, this was actually completely useless. Just hotkey your terminal and type "man'! Original post as follows.


I recently got a tablet PC and instantly wanted to fill it with a wealth of knowledge. This lead me to want PDFs of the man pages. Luckily, man can output in post script and there is a ps2pdf converter on most Linux distributions... But you may not want to convert them manually and you may want a nice folder structure for them.

It's not perfect but go ahead and check it out.

EDIT:
Known Bugs:
- Some manual pages may contain errors when converted to ps then to pdf. These errors are not logged yet and the script continues happily.

- some file names are name.<section>x[.gz] and others are name.<section>p[.gz]. The ones with the p are actually POSIX manual pages and the matching files with no p are Linux manual pages.

# package bug
muttbug may be an actual man page but redirects to flea with a man page link. It may link to a non existent file.

# package bug
compress-dummy.1.gz appears to be an empty file and is not converted.

version numbers in file name affect sed substitutions.
gimp-2.6.1.gz
gimptool-2.0.1.gz
gimp-console-2.6.1.gz
gimp-remote-2.6.1.gz
gimprc-2.6.5.gz

# package bug
pthread_sigmask.3p.gz redirects to man3p/sigprocmask.3p which does not exist
additionally pthread_sigmask.3.gz reported the same issue. Not shure why.

mount.nfs.8 and umount.nfs.8 were not converted. But for some reason [u]mount.cifs.8 were!



WINDOWS ISSUES WITH FILE NAMES

- the man3 section has files named ExtUtils::name::suffix.3
mann on slackware has pkg::create and platform::shell as well.
These file names are not allowed by windows because of the colons ':' and will get copied over with names like SIH273~1.PDF. This is a windows issue and will not be fixed. Consider doing a rename on the file BEFORE copying to windows.
find -name '*::*' -exec rename '::' '..' {} \;
It will need to be run 4 times. CPANPLUSS::Shell:efault::Plugins::CustomSource.3.pdf

- if copying files to a windows machine, the following files will be in naming conflict due to windows not being case sensitive. This is a windows problem and I don't plan on fixing things that arn't issues. Additionally, windows may copy them over in a different order than what is expected and you will have to make sure of which ones to copy.

DB.3.pdf - db.3.pdf
Standards.7.pdf - standards.7.pdf
CORE.3.pdf - Core.3.pdf
Errno.3.pdf - errno.3.pdf
NAN.3.pdf - nan.3.pdf
_Exit.2.pdf - _exit.2.pdf


END EDIT


Code:
#!/bin/bash
# man2pdf
# Convert the man page(s) specified to pdf files
# 
# This script is in the public domain and comes with no warranty
#
# Options:
# --all - convert all manpages in $MANDIR.  Currently only grabs
#         those in man[1-9,n]. Optionally pass a section number
#         as a second argument to convert only those in man<section>
#
# Without the above option, the user can convert man pages by
# man2pdf [section] man_page [[section] man_page] [...]

MAN_PAGES="$@"             # If --all is not the first parameter then all
                           # the parameters are assumed to be man pages

SECTION=${2:-0}            # used with --all as the section number.

OUTPUT=${OUTPUT:-$(pwd)}   # Your output directory


MANDIR=${MANDIR:-/usr/man} # You may have your own copy some place else

LOGFILE=${OUTPUT}/output.log # currently not supported

if [ $1 != "--all" ]; then
    # This may match several and will store them as a space separated
    # list of exact paths to file names
    page_list=$(man -M $MANDIR -aW "$MAN_PAGES")
else
    # Store all the file names found in the MANDIR and convert them.
    # NOTE! this will only produce a list of actual files and symbolic links
    # will be ignored.
    page_list=""
    if [[ $SECTION == [1-9,n] ]]; then
        # Find all the files in the given section
        page_list=$(find $MANDIR/man$SECTION -type f)
    else
        # Recursively add all the files listed in each section
        for current_section in 1 2 3 4 5 6 7 8 9 n; do
            page_list="$page_list $(find $MANDIR/man$current_section -type f)"
        done
    fi
fi

#  Loop through all the files found 
for current in $page_list; do

    # These sed substitutions make the assumption that the section
    # name is found after the first dot in the name.  Man pages may be compressed
    # or uncompressed, as long as they are name.n

    # Find just the name of the man page and strip the section
    current_page=$(basename $current | sed -e 's@\.[1-9,n].*@@')
    # Find the section number from the file name
    current_section=$(basename $current | sed -e 's@.*\.\([1-9,n]\).*@\1@')

    # Quick check to ensure we will get a man page
    # We let it print here so we know our progress for the --all command
    # TODO: print filenames and errors to a log file when we come across them.
    man -M "$MANDIR" -aW "$current_section" "$current_page"

    # Just in case my regular expression magic fails.
    if [ $PIPESTATUS -eq 0 ]; then
        # Create the storage directory.  There doesn't appear to be any harm in letting
        # this happen for every loop
        mkdir -p "${OUTPUT}/man${current_section}"

        # Our file name
        pdf_file="${OUTPUT}/man${current_section}/${current_page}.${current_section}.pdf"

        # Pipe the man page into ps2pdf
        # This line may complain about the format of some characters or specifics of some
        # man pages.  I didn't make a list and I didn't log them and I didn't check
        # the produced man pages in detail.
        man -M "$MANDIR" -t "${current_section}" "${current_page}" | ps2pdf - "${pdf_file}"
    fi
done
Views 1508 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 02:39 AM.

Main Menu
Advertisement

My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration