Batch images to pdf / pdf to txt

geeeeky.girl · 12-23-2009, 02:36 PM

Hi all! And Season's Greetings!!

1/
I'm looking for some FREE and EASY-TO-USE (no recipes please) software to convert a batch of images into a pdf file. A plus would be to be able to adjust margins for printing. Another plus would be the possibility of reducing the size of the pdf file without sacrificing too much quality. It's for scanned documents, so any additional information on how to create a pdf version of a document via scanning would be appreciated.

2/
I'm also looking for FREE software to convert pdf files into .txt, .odt, .doc and/or .rtf files deleting all of the carriage returns present in the pdf file so as to facilitate editing and modification (I would then reconvert into a pdf document after proofreading and/or making changes).

I've had a look on the net and have found some software for a certain commercial operating system whose name I won't mention, but not a lot for Linux. Some require following recipes, which is of no use to me because the person doing the job is not an IT specialist.

Thanks in advance,

GG

rweaver · 12-23-2009, 02:43 PM

Scanning isn't related to document conversation. Completely difference processes involving different things.

The txt2pdf application is what you want for making pdf's from text files. The one that does the reverse is called pdftotext.

Quote:

Some require following recipes, which is of no use to me because the person doing the job is not an IT specialist.

The thought occurs... then maybe they shouldn't be doing the job, if following directions is out of their scope... O_o

geeeeky.girl · 12-23-2009, 03:17 PM

Thank you rweaver for your reply.

However, the two things that interest me most are:

1/ Batch image conversion to pdf

2/ pdf to txt DELETING CARRIAGE RETURNS (except for changes of paragraph)

I don't agree with you:

Quote:

The thought occurs... then maybe they shouldn't be doing the job, if following directions is out of their scope... O_o

Information Technology is for everybody, including those who are not interested in following recipes, programming and tweaking. If there is no software out there that does this, perhaps it needs to be developed. Programmers need to put themselves in the shoes of users. So often, overzealous programmers develop really wicked software which is unusable from a typical user's point of view, which is a pity really, because their talent goes to waste. However, that's not the point of this thread, I'm getting a little carried away here...

So, to sum up, it's the answers to the two points above that I'm looking for, and the former interests me more. The second point is not as important. One could simply cut and paste to do pdf->txt or use OpenOffice to do txt->pdf. So if it doesn't eliminate the carriage returns, which are present at the end of every single line in a pdf file, it's of no interest to anybody really.

Many thanks nonetheless rweaver.

GG

evo2 · 12-23-2009, 03:24 PM

Quote:

Originally Posted by geeeeky.girl

I've had a look on the net and have found some software for a certain commercial operating system whose name I won't mention, but not a lot for Linux. Some require following recipes, which is of no use to me because the person doing the job is not an IT specialist.

These sort of people can often be replaced with a very short shell script.
I think with a little bit of effort you could write a script that would automate this process. Ie you can write a script to "follow a recipe", then the "person doing the job" would not actually have to do anything more than "click a button" that runs your script.

Evo2.

geeeeky.girl · 12-23-2009, 04:09 PM

Good idea evo2.

However, before doing so, I'd like to know if there is software out there that can do the job. That's the point of my post.

However, absolutely, if there isn't, your idea is a good alternative.

There MUST be software that does job 1/ (batch images to pdf). How do people convert books into pdf documents?

Many thanks evo2.

GG

evo2 · 12-23-2009, 05:03 PM

[quote]
There MUST be software that does job 1/ (batch images to pdf).
There are a huge number of command line tools for image file conversion.

This is the sort of script I was talking about for 1. Please note that it is completely untested

Code:

#!/bin/bash
outfs=""
for inf in *.jpg ; do
    outf=${inf%jpg}pdf
    outfs="${outfs} ${outf}"
    convert $inf $outf
done
pdftk ${outfs} cat output all.pdf
rm -f ${outfs}

Basically it just converts each jpg file in the current directory into a pdf, and then joins all the pdfs into one multipage pdf.

For 2. you've already been told what commands you can use. I'm not exactly sure when you care about the carriage returns, but you can use commands like dos2unix for that.

Evo2.

geeeeky.girl · 12-23-2009, 05:57 PM

Code:

#!/bin/bash
outfs=""
for inf in *.jpg ; do
    outf=${inf%jpg}pdf
    outfs="${outfs} ${outf}"
    convert $inf $outf
done
pdftk ${outfs} cat output all.pdf
rm -f ${outfs}

Intriguing.

I've done a bit of C, Pascal, C++, C#, Java, php, etc. but not a lot of bash, hardly any at all.

I've installed pdftk. Looks like a cool tool.

How about "convert" ?!

I assume your code is pseudo-code.

I know what "rm -f" does, it force deletes files, in this case, I imagine it's the file name stored in the outfs variable (which stands for out-file-s...? Plural? An array? No longer needed because it's all in all.pdf now? ... The other singular (outf)? Simple variable? A file name?)... What does "inf" stand for? Do you go through an entire directory and only allow files that end with .jpg into the for loop? I don't understand the "for" loop.

"cat" concatenates files, and I assume "output" is an abbreviation for ... Not sure...

"inf" is a variable name? An array? ... Not sure what the percent sign does...

I guess I'll have to do a few bash script tutorials...

Thanks for the inspiring tip!

geeeeky.girl · 12-23-2009, 06:29 PM

(A little later...)

I've read the man pages for "pdftk".

I don't see anything about inserting an image in a pdf file, and I don't understand how to do the "convert" part in your script. I might be that I don't understand the pseudo-code, but I get the feeling that something essential is missing.

How do you do the "convert" part?

evo2 · 12-23-2009, 06:52 PM

Quote:

Originally Posted by geeeeky.girl

(A little later...)

I've read the man pages for "pdftk".

I don't see anything about inserting an image in a pdf file, and I don't understand how to do the "convert" part in your script. I might be that I don't understand the pseudo-code, but I get the feeling that something essential is missing.

How do you do the "convert" part?

The "convert" command is part of the imagemagick package. It can convert one image format to another image format, where it determines the output type by the file extension. Basically each iteration of the loop is calling "convert file1.jpg file1.pdf" and building a lists of all the created pdf files. Finally at the end all the single page pdf files are joined together into a multipage file, and the single page files are deleted. The script is more than just psudo code. It's untested, but I suspect it has a good chance of working without any modifications.

Cheers,

Evo2.

StephenMurphy · 12-24-2009, 12:31 AM

What the issue always come across to me is the conversion of PDF to doc.

There are some free online conversion site such as http://www.zamzar.com but they sometimes let me wait a long time to get the converted files and it does not support encrypted files conversion. And the conversion quality is far from satisfying when PDF files with complicated layout.

So I always use desktop application AnyBizSoft PDF to Word Converter. From my long time experience of searching and testing, this tool supports encrypted files, batch conversion and preserves text, layouts, images and hyperlinks well.

You can have a try on them and choose your prefer one.

evo2 · 12-24-2009, 01:40 AM

Hi Stephen,

the OP explicitly stated they wanted "FREE" software, since it was capitalized I assume this means free as in freedom, not free as in free beer.

Cheers,

Evo2.