LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   pdftotext returns "Illegal entry in bfchar block in ToUnicode CMap (https://www.linuxquestions.org/questions/linux-software-2/pdftotext-returns-illegal-entry-in-bfchar-block-in-tounicode-cmap-4175517312/)

kjcook 09-03-2014 07:01 PM

pdftotext returns "Illegal entry in bfchar block in ToUnicode CMap
 
I am trying to convert several PDF files to text. I have tried the following commands all of which return the same errors.

pdftotext 085318.pdf
pdftotext -layout 085318.pdf
pdftotext -layout -enc ASCII7 085318.pdf
pdftotext -enc ASCII7 085318.pdf


Error: Illegal entry in bfchar block in ToUnicode CMap
Error: Illegal entry in bfchar block in ToUnicode CMap
Error: Illegal entry in bfchar block in ToUnicode CMap
Error: Illegal entry in bfchar block in ToUnicode CMap
...

Any ideas what I am doing wrong?

business_kid 09-05-2014 02:43 PM

You may not be doing anything wrong. If the pdf is Unicode, it will be using the full 8 bits whereas plain old ascii uses 7 bits. It can handle and dialect - some dweeb even coded in Klingon.

try pdffonts /path/to/pdf and have a look at what it's using

I never found the pdftotext option to be much use, and avoid it as it doesn't seem to manage as well as some of the other pdf tools. Text is very limited beside pdf, and linux had to be dragged kicking and screaming into Unicode anyhow - I remember the battles. It works well for those with an odd locale like myself (en_ie.utf8).


All times are GMT -5. The time now is 10:48 PM.