LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 10-27-2006, 10:45 PM   #1
bobb_roof
LQ Newbie
 
Registered: Sep 2006
Posts: 17

Rep: Reputation: 0
pdf to text file converter??


where can i get pdf file to text/doc/html converter for linux and windows?. i mean free on not those that convert first 3 or 5 pages.
 
Old 10-27-2006, 11:04 PM   #2
Old_Fogie
Senior Member
 
Registered: Mar 2006
Distribution: SLACKWARE 4TW! =D
Posts: 1,515

Rep: Reputation: 62
I cannot for the life of me find the name of the program at the moment but I know I saw one via RSS feed the other day from freshmeat and I was meaning to go check it out. Check out freshmeat's site.
 
Old 10-28-2006, 12:23 AM   #3
Sourcetrunk
LQ Newbie
 
Registered: Oct 2006
Location: Brussels
Distribution: Xubuntu
Posts: 13

Rep: Reputation: 0
try out XPdf : http://freshmeat.net/projects/xpdf/

will do the trick.

regards,
Dimi
 
Old 10-28-2006, 01:27 AM   #4
shawnbishop
Member
 
Registered: Dec 2005
Location: South Africa
Distribution: CentOS,Ubuntu,Fedora
Posts: 249

Rep: Reputation: 30
In linux there are command line options to do this..try the following

# pdf2text file.pdf file.txt

or use tab completion when you type in pdf, it should find the correct bin file
 
Old 10-29-2006, 01:16 AM   #5
Sourcetrunk
LQ Newbie
 
Registered: Oct 2006
Location: Brussels
Distribution: Xubuntu
Posts: 13

Rep: Reputation: 0
Hi Shawnbishop,

did not find pdf2text on my xubuntu box, did not find it anywhere on the packages, could you give me the output of pdf2text --help or pdf2text -v to see which program you are talking about ? would be cool to have an integrated command to do such conversion !

thanks,
Dimi
 
Old 10-29-2006, 01:54 AM   #6
shawnbishop
Member
 
Registered: Dec 2005
Location: South Africa
Distribution: CentOS,Ubuntu,Fedora
Posts: 249

Rep: Reputation: 30
Hi Sourcetrunk

Just a note "Note that scanned PDF files, or PDF files produced from raster image formats like TIFF, cannot contain 'live' text data. And pdf2text is not OCR (Optical Character Recognition) software - it needs the text data to be present in the PDF file."

Depending on your distro, Xubuntu, just do an

#apt-cache search pdf

It should bring back some options, one of them will be pdf2txt, then install it

#apt-get install pdf2txt..if that doesnt work, here is a script from www.comp.eonworks.com


#! /bin/sh
# #############################################################################

NAME_="pdf2txt"
HTML_="convert pdf to text"
PURPOSE_="convert pdf file to ascii text; write the converted file to disk"
SYNOPSIS_="$NAME_ [-vhlr] <file> [file...]"
REQUIRES_="standard GNU commands, ps2ascii"
VERSION_="1.0"
DATE_="2004-04-18; last update: 2005-03-03"
AUTHOR_="Dawid Michalczyk <dm@eonworks.com>"
URL_="www.comp.eonworks.com"
CATEGORY_="text"
PLATFORM_="Linux"
SHELL_="bash"
DISTRIBUTE_="yes"

# #############################################################################
# This program is distributed under the terms of the GNU General Public License

usage () {

echo >&2 "$NAME_ $VERSION_ - $PURPOSE_
Usage: $SYNOPSIS_
Requires: $REQUIRES_
Options:
-r, remove input file after conversion
-v, verbose
-h, usage and options (help)
-l, see this script"
exit 1
}

# arg check
[ $# -eq 0 ] && { echo >&2 missing argument, type $NAME_ -h for help; exit 1; }

# var initializing
rmf=
verbose=

# option and argument handling
while getopts vhlr options; do

case $options in
r) rmf=on ;;
v) verbose=on ;;
h) usage ;;
l) more $0 ;;
\?) echo invalid or missing argument, type $NAME_ -h for help; exit 1 ;;
esac

done

shift $(( $OPTIND - 1 ))

# check if required command is in $PATH variable
which ps2ascii &> /dev/null
[[ $? != 0 ]] && { echo >&2 the required \"ps2ascii\" command is not in your PATH; exit 1; }

# main
for a in $@; do

if [ -f ${a%.*}.txt ]; then
echo ${NAME_}: skipping: ${a%.*}.txt file already exist
continue
else
[[ $verbose ]] && echo "${NAME_}: converting: $a -> ${a%.*}.txt"
ps2ascii $a > ${a%.*}.txt
[[ $? == 0 ]] && stat=0 || stat=1
[[ $stat == 0 ]] && [[ $verbose ]] && [[ $rmf ]] && echo ${NAME_}: removing: $a
[[ $stat == 0 ]] && [[ $rmf ]] && rm -f -- $a
fi

done


use the -h switch to get the help for the script
Cheers
 
Old 10-29-2006, 11:50 PM   #7
Sourcetrunk
LQ Newbie
 
Registered: Oct 2006
Location: Brussels
Distribution: Xubuntu
Posts: 13

Rep: Reputation: 0
Hi,

thanks for this usefull info, installing as we speak

regards,
Dimi.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to DOC Converter? vivek.sonny.abraham Linux - Software 20 11-30-2011 03:36 PM
CHM to PDF converter noir911 Linux - Software 1 09-09-2006 10:16 AM
About PDF Converter-Convert2PDF satimis Linux - Software 2 06-01-2006 10:16 AM
pdf converter program bartgymnast Linux - Software 21 01-27-2006 10:13 AM
PDF to PNG converter Kostko Linux - Software 10 05-05-2003 06:23 AM


All times are GMT -5. The time now is 12:47 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration