LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-13-2011, 08:54 PM   #1
marly
LQ Newbie
 
Registered: Dec 2011
Posts: 1

Rep: Reputation: Disabled
pdftk or bash script


I have several hundred PDF files that have 100 (or more) pages in each of them. I want to split those pages PDF’s into single pages. Like for example: I have a PDF file called,”Book-Pages_01_through_102.pdf”. I want to extract the PDF pages and name the first page, “book_001.pdf”, then for the next page name it “book_002.pdf” and so on. Once it does all of the 102 pages on the first PDF, I want to go to the next PDF file called, “Book-Pages_102_through_267.pdf”, extract the first page and name it “book_103.pdf” , the next page, “book_104.pdf”, and so on.

I found pdftk and that works great, but on each PDF that I extract, it starts numbering the files at 001. I haven’t figured out a way to keep the numbering going (or being able to tell it to start numbering at 50 or another number).

I was wondering if there is a script or if there is a way to pass pdftk at what number to start numbering. I can't be the only one who ran into this.

Thanks,

marly
 
Old 12-13-2011, 10:43 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946
Just renumber the files afterwards.

Here is a simple bash script that does that for you. Supply it with a base printf pattern, like book_%03d.pdf (see man printf for formatting details on the pattern), the number to start the numbering at, and the names of the files to be renamed:
Code:
#!/bin/bash

# Output usage if -h or --help or less than two parameters given
if [ $# -lt 2 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
    exec >&2
    echo ""
    echo "Usage: $0 [ -h | --help ]"
    echo "       $0 pattern start file1 file2 ... fileN"
    echo ""
    echo "This will rename the files using printf pattern 'pattern',"
    echo "starting at integer 'start'."
    echo ""
    exit 0
fi

# Extract output filename pattern, and first page number,
PATTERN="$1"
if [ "$PATTERN" = "${PATTERN//%/}" ]; then
    echo "$1: Invalid pattern (no %d)." >&2
    exit 1
fi
FIRST=$[ $2 ] || exit $?

# and remove them from the command line parameter list.
shift 2

# No names converted yet.
COUNT=0

# Rename loop.
for OLD in "$@" ; do

    # Construct new filename.
    NEW="$(printf "$PATTERN" $[FIRST + COUNT])"

    # Try rename; ask before overwrite.
    # Note: answering No will not abort the script.
    mv -vi "$OLD" "$NEW" || exit $?

    # Increase count.
    COUNT=$[COUNT+1]
done

# Output a nice summary.
if [ $COUNT -gt 1 ]; then
    FIRSTFILE="$(printf "$PATTERN" $[FIRST])"
    LASTFILE="$(printf "$PATTERN" $[FIRST+COUNT-1])"
    echo "Renamed $COUNT files (to $FIRSTFILE .. $LASTFILE)." >&2
elif [ $COUNT -eq 1 ]; then
    FIRSTFILE="$(printf "$PATTERN" $[FIRST])"
    echo "Renamed 1 file (to $FIRSTFILE)." >&2
else
    echo "Renamed 0 files." >&2
fi

exit 0
For example, if you have files a-001.pdf,a-002.pdf and so on, and b-025.pdf, b-026.pdf and so on, and you wish to rename them to book-129.pdf, book-130.pdf and so on, and you have saved the above as renumber.bash, try
Code:
./renumber book-%03d.pdf 130 a-*.pdf b-*.pdf
 
Old 12-14-2011, 12:57 AM   #3
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
Look at the poppler-tools package for a program (pdfseparate) that will extract single pages from a pdf document, into separate pdfs for each page. The format of the extracted files can have the page number added, but the format isn't as flexible as printf. You may need to extract each pdf in its own directory just to be safe.

The command "pdfseparate book-10-113 book-%f" would produce the files "book-1.pdf, book-2.pdf ... book-114.pdf", so an offset would be needed to add to the pages.

The command "pdfseparate book-10-113 book-10_%f" would produce the files "book-10_1.pdf ... book-10_114.pdf"

If you included the first page in the name, you could extract both numbers with sed, and calculate the correct number. Plus the output filenames would be unique, and you wouldn't need to extract each file in a separate directory.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
SSH connection from BASH script stops further BASH script commands tardis1 Linux - Newbie 3 12-06-2010 09:56 AM
Warnings with pdftk SlackBuild brianL Slackware 3 01-08-2010 05:57 AM
[SOLVED] pdftk for Slackware64-current. Martinezio Slackware 30 11-23-2009 08:10 AM
executing pdftk - HELP 2-tone-joe Linux - Software 1 07-25-2008 01:11 PM
PDFTK, php, & fill_form jb8578 Programming 2 02-12-2007 04:48 PM


All times are GMT -5. The time now is 06:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration