[SOLVED] Search and Replace text in PDF using borb failing with AssertionError
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Search and Replace text in PDF using borb failing with AssertionError
Hey Folks,
I have a bunch of PDF files with text (and other stuff in them) where I need to be able to replace text in the document with something else. I can manually do it using LibreDraw but wanted to script it/Automate it. Spent a bit of time searching for options and the Borb library for Python3 seems to allow me to do exactly that.
However, when I try to replace the text using the example from their site the script fails with the following error:
Code:
Traceback (most recent call last):
File "/home/suramya/Temp/BorbReplace.py", line 26, in <module>
main()
File "/home/suramya/Temp/BorbReplace.py", line 18, in main
doc = SimpleFindReplace.sub("Manual", "", doc)
File "/usr/local/lib/python3.10/dist-packages/borb/toolkit/text/simple_find_replace.py", line 80, in sub
page.apply_redact_annotations()
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/page/page.py", line 271, in apply_redact_annotations
.read(io.BytesIO(self["Contents"]["DecodedBytes"]), [])
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 290, in read
raise e
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 284, in read
operator.invoke(self, operands, event_listeners)
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 271, in invoke
self._write_chunk_of_text(
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 203, in _write_chunk_of_text
)._write_text_bytes()
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 145, in _write_text_bytes
return self._write_text_bytes_in_hex()
File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 160, in _write_text_bytes_in_hex
assert cid is not None, "Font %s can not represent '%s'" % (
AssertionError: Font Arial,Bold can not represent 'E'
Process finished with exit code 1
Searching on the web didn't give me any answers and when I try a different file, the font name changes but the error remains. Any idea how to fix it?
The script:
Code:
from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleFindReplace
import typing
def main():
# attempt to read a PDF
doc: typing.Optional[Document] = None
with open("/home/suramya/Downloads/t/MAA1.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle)
# check whether we actually read a PDF
assert doc is not None
# find/replace
doc = SimpleFindReplace.sub("Manual", "XXXX", doc)
# store
with open("/home/suramya/Downloads/t/MAABLR_out.pdf", "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, doc)
if __name__ == "__main__":
main()
Am open to using other libraries as well (for perl or Python) but so far none of the ones I have tried have worked.
It appears there may be no borb users here. As the problem appears when using their own code with their own example, you may do better to submit your question directly to the authors via their github repo or website.
As for other options for modifying PDFs, there are many libraries and applications available for the purpose. I have not used any so have nothing to personally recommend, but your search engine of choice should produce a number of options.
Thanks Joris for taking the time to post here and the fantastically fast response to my questions.
I was traveling till now with limited access to internet so couldn't update the the thread with the resolution. I have marked the thread as solved.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.