You may not be doing anything wrong. If the pdf is Unicode, it will be using the full 8 bits whereas plain old ascii uses 7 bits. It can handle and dialect - some dweeb even coded in Klingon.
try pdffonts /path/to/pdf and have a look at what it's using
I never found the pdftotext option to be much use, and avoid it as it doesn't seem to manage as well as some of the other pdf tools. Text is very limited beside pdf, and linux had to be dragged kicking and screaming into Unicode anyhow - I remember the battles. It works well for those with an odd locale like myself (en_ie.utf8).
|