LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-23-2009, 02:36 PM   #1
geeeeky.girl
LQ Newbie
 
Registered: Jun 2008
Posts: 14

Rep: Reputation: 0
Batch images to pdf / pdf to txt


Hi all! And Season's Greetings!!

1/
I'm looking for some FREE and EASY-TO-USE (no recipes please) software to convert a batch of images into a pdf file. A plus would be to be able to adjust margins for printing. Another plus would be the possibility of reducing the size of the pdf file without sacrificing too much quality. It's for scanned documents, so any additional information on how to create a pdf version of a document via scanning would be appreciated.


2/
I'm also looking for FREE software to convert pdf files into .txt, .odt, .doc and/or .rtf files deleting all of the carriage returns present in the pdf file so as to facilitate editing and modification (I would then reconvert into a pdf document after proofreading and/or making changes).

I've had a look on the net and have found some software for a certain commercial operating system whose name I won't mention, but not a lot for Linux. Some require following recipes, which is of no use to me because the person doing the job is not an IT specialist.

Thanks in advance,

GG
 
Old 12-23-2009, 02:43 PM   #2
rweaver
Senior Member
 
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833

Rep: Reputation: 167Reputation: 167
Scanning isn't related to document conversation. Completely difference processes involving different things.

The txt2pdf application is what you want for making pdf's from text files. The one that does the reverse is called pdftotext.

Quote:
Some require following recipes, which is of no use to me because the person doing the job is not an IT specialist.
The thought occurs... then maybe they shouldn't be doing the job, if following directions is out of their scope... O_o

Last edited by rweaver; 12-23-2009 at 02:46 PM.
 
Old 12-23-2009, 03:17 PM   #3
geeeeky.girl
LQ Newbie
 
Registered: Jun 2008
Posts: 14

Original Poster
Rep: Reputation: 0
Thank you rweaver for your reply.

However, the two things that interest me most are:

1/ Batch image conversion to pdf

2/ pdf to txt DELETING CARRIAGE RETURNS (except for changes of paragraph)

I don't agree with you:
Quote:
The thought occurs... then maybe they shouldn't be doing the job, if following directions is out of their scope... O_o
Information Technology is for everybody, including those who are not interested in following recipes, programming and tweaking. If there is no software out there that does this, perhaps it needs to be developed. Programmers need to put themselves in the shoes of users. So often, overzealous programmers develop really wicked software which is unusable from a typical user's point of view, which is a pity really, because their talent goes to waste. However, that's not the point of this thread, I'm getting a little carried away here...

So, to sum up, it's the answers to the two points above that I'm looking for, and the former interests me more. The second point is not as important. One could simply cut and paste to do pdf->txt or use OpenOffice to do txt->pdf. So if it doesn't eliminate the carriage returns, which are present at the end of every single line in a pdf file, it's of no interest to anybody really.

Many thanks nonetheless rweaver.

GG
 
Old 12-23-2009, 03:24 PM   #4
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Quote:
Originally Posted by geeeeky.girl View Post
I've had a look on the net and have found some software for a certain commercial operating system whose name I won't mention, but not a lot for Linux. Some require following recipes, which is of no use to me because the person doing the job is not an IT specialist.
These sort of people can often be replaced with a very short shell script.
I think with a little bit of effort you could write a script that would automate this process. Ie you can write a script to "follow a recipe", then the "person doing the job" would not actually have to do anything more than "click a button" that runs your script.

Evo2.
 
Old 12-23-2009, 04:09 PM   #5
geeeeky.girl
LQ Newbie
 
Registered: Jun 2008
Posts: 14

Original Poster
Rep: Reputation: 0
Good idea evo2.

However, before doing so, I'd like to know if there is software out there that can do the job. That's the point of my post.

However, absolutely, if there isn't, your idea is a good alternative.

There MUST be software that does job 1/ (batch images to pdf). How do people convert books into pdf documents?

Many thanks evo2.

GG

Last edited by geeeeky.girl; 12-23-2009 at 04:25 PM.
 
Old 12-23-2009, 05:03 PM   #6
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
[quote]
There MUST be software that does job 1/ (batch images to pdf).
There are a huge number of command line tools for image file conversion.

This is the sort of script I was talking about for 1. Please note that it is completely untested

Code:
#!/bin/bash
outfs=""
for inf in *.jpg ; do
    outf=${inf%jpg}pdf
    outfs="${outfs} ${outf}"
    convert $inf $outf
done
pdftk ${outfs} cat output all.pdf
rm -f ${outfs}
Basically it just converts each jpg file in the current directory into a pdf, and then joins all the pdfs into one multipage pdf.

For 2. you've already been told what commands you can use. I'm not exactly sure when you care about the carriage returns, but you can use commands like dos2unix for that.

Evo2.
 
Old 12-23-2009, 05:57 PM   #7
geeeeky.girl
LQ Newbie
 
Registered: Jun 2008
Posts: 14

Original Poster
Rep: Reputation: 0
Code:
#!/bin/bash
outfs=""
for inf in *.jpg ; do
    outf=${inf%jpg}pdf
    outfs="${outfs} ${outf}"
    convert $inf $outf
done
pdftk ${outfs} cat output all.pdf
rm -f ${outfs}
Intriguing.

I've done a bit of C, Pascal, C++, C#, Java, php, etc. but not a lot of bash, hardly any at all.

I've installed pdftk. Looks like a cool tool.

How about "convert" ?!

I assume your code is pseudo-code.

I know what "rm -f" does, it force deletes files, in this case, I imagine it's the file name stored in the outfs variable (which stands for out-file-s...? Plural? An array? No longer needed because it's all in all.pdf now? ... The other singular (outf)? Simple variable? A file name?)... What does "inf" stand for? Do you go through an entire directory and only allow files that end with .jpg into the for loop? I don't understand the "for" loop.

"cat" concatenates files, and I assume "output" is an abbreviation for ... Not sure...

"inf" is a variable name? An array? ... Not sure what the percent sign does...

I guess I'll have to do a few bash script tutorials...

Thanks for the inspiring tip!

Last edited by geeeeky.girl; 12-23-2009 at 06:13 PM.
 
Old 12-23-2009, 06:29 PM   #8
geeeeky.girl
LQ Newbie
 
Registered: Jun 2008
Posts: 14

Original Poster
Rep: Reputation: 0
(A little later...)

I've read the man pages for "pdftk".

I don't see anything about inserting an image in a pdf file, and I don't understand how to do the "convert" part in your script. I might be that I don't understand the pseudo-code, but I get the feeling that something essential is missing.

How do you do the "convert" part?
 
Old 12-23-2009, 06:52 PM   #9
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Quote:
Originally Posted by geeeeky.girl View Post
(A little later...)

I've read the man pages for "pdftk".

I don't see anything about inserting an image in a pdf file, and I don't understand how to do the "convert" part in your script. I might be that I don't understand the pseudo-code, but I get the feeling that something essential is missing.

How do you do the "convert" part?
The "convert" command is part of the imagemagick package. It can convert one image format to another image format, where it determines the output type by the file extension. Basically each iteration of the loop is calling "convert file1.jpg file1.pdf" and building a lists of all the created pdf files. Finally at the end all the single page pdf files are joined together into a multipage file, and the single page files are deleted. The script is more than just psudo code. It's untested, but I suspect it has a good chance of working without any modifications.

Cheers,

Evo2.

Last edited by evo2; 12-23-2009 at 06:53 PM.
 
Old 12-24-2009, 12:31 AM   #10
StephenMurphy
LQ Newbie
 
Registered: Dec 2009
Posts: 3

Rep: Reputation: 0
What the issue always come across to me is the conversion of PDF to doc.

There are some free online conversion site such as http://www.zamzar.com but they sometimes let me wait a long time to get the converted files and it does not support encrypted files conversion. And the conversion quality is far from satisfying when PDF files with complicated layout.

So I always use desktop application AnyBizSoft PDF to Word Converter. From my long time experience of searching and testing, this tool supports encrypted files, batch conversion and preserves text, layouts, images and hyperlinks well.

You can have a try on them and choose your prefer one.
 
Old 12-24-2009, 01:40 AM   #11
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi Stephen,

the OP explicitly stated they wanted "FREE" software, since it was capitalized I assume this means free as in freedom, not free as in free beer.

Cheers,

Evo2.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
bourne shell script: Txt to PDF?!! MiamiLoco Programming 2 11-03-2009 12:38 PM
Convert pdf to txt problems J_Szucs Linux - Software 7 02-15-2009 01:02 PM
.PDF and .CHM to .TXT converter moljac024 Linux - Software 3 08-27-2007 08:08 PM
Convert pdf to html or txt or remaster the pdf? jago25_98 Linux - Software 1 12-13-2005 01:11 AM
perl reading pdf,ps,txt j-ray Programming 1 02-04-2003 10:49 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:15 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration