LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-12-2023, 03:39 PM   #1
suramya
Member
 
Registered: Jan 2022
Location: Earth
Distribution: Debian
Posts: 242

Rep: Reputation: 98
Search and Replace text in PDF using borb failing with AssertionError


Hey Folks,

I have a bunch of PDF files with text (and other stuff in them) where I need to be able to replace text in the document with something else. I can manually do it using LibreDraw but wanted to script it/Automate it. Spent a bit of time searching for options and the Borb library for Python3 seems to allow me to do exactly that.

However, when I try to replace the text using the example from their site the script fails with the following error:

Code:
Traceback (most recent call last):
  File "/home/suramya/Temp/BorbReplace.py", line 26, in <module>
    main()
  File "/home/suramya/Temp/BorbReplace.py", line 18, in main
    doc = SimpleFindReplace.sub("Manual", "", doc)
  File "/usr/local/lib/python3.10/dist-packages/borb/toolkit/text/simple_find_replace.py", line 80, in sub
    page.apply_redact_annotations()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/page/page.py", line 271, in apply_redact_annotations
    .read(io.BytesIO(self["Contents"]["DecodedBytes"]), [])
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 290, in read
    raise e
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 284, in read
    operator.invoke(self, operands, event_listeners)
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 271, in invoke
    self._write_chunk_of_text(
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 203, in _write_chunk_of_text
    )._write_text_bytes()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 145, in _write_text_bytes
    return self._write_text_bytes_in_hex()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 160, in _write_text_bytes_in_hex
    assert cid is not None, "Font %s can not represent '%s'" % (
AssertionError: Font Arial,Bold can not represent 'E'

Process finished with exit code 1
Searching on the web didn't give me any answers and when I try a different file, the font name changes but the error remains. Any idea how to fix it?

The script:

Code:
from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleFindReplace

import typing

def main():

    # attempt to read a PDF
    doc: typing.Optional[Document] = None
    with open("/home/suramya/Downloads/t/MAA1.pdf", "rb") as pdf_file_handle:
        doc = PDF.loads(pdf_file_handle)

    # check whether we actually read a PDF
    assert doc is not None

    # find/replace
    doc = SimpleFindReplace.sub("Manual", "XXXX", doc)

    # store
    with open("/home/suramya/Downloads/t/MAABLR_out.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)


if __name__ == "__main__":
    main()
Am open to using other libraries as well (for perl or Python) but so far none of the ones I have tried have worked.

- Suramya
 
Old 01-14-2023, 11:01 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,005
Blog Entries: 23

Rep: Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967
It appears there may be no borb users here. As the problem appears when using their own code with their own example, you may do better to submit your question directly to the authors via their github repo or website.

As for other options for modifying PDFs, there are many libraries and applications available for the purpose. I have not used any so have nothing to personally recommend, but your search engine of choice should produce a number of options.
 
Old 01-27-2023, 07:43 PM   #3
JorisSchellekens
LQ Newbie
 
Registered: Jan 2023
Location: Ghent, Belgium
Posts: 3

Rep: Reputation: 1
Hi there,

I'm Joris Schellekens, author of **borb**.

Just wanted to end this thread on a positive note. The author did reach out to me on GitHub, and opened an issue.

After some back and forth, the issue was ultimately resolved.

You can find said issue in the issues section of the borb GitHub repository.
 
1 members found this post helpful.
Old 01-27-2023, 11:59 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,683

Rep: Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008Reputation: 2008
Quote:
Originally Posted by JorisSchellekens View Post
You can find said issue in the issues section of the borb GitHub repository.
Specifically this one I guess: https://github.com/jorisschellekens/borb/issues/149
 
Old 01-28-2023, 03:10 AM   #5
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,005
Blog Entries: 23

Rep: Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967Reputation: 3967
Quote:
Originally Posted by JorisSchellekens View Post
Hi there,

I'm Joris Schellekens, author of **borb**.

Just wanted to end this thread on a positive note...
Welcome and thanks for taking the time to provide closure!

I hope you may stick around and find a comfortable place here at LQ!
 
Old 01-31-2023, 12:07 PM   #6
suramya
Member
 
Registered: Jan 2022
Location: Earth
Distribution: Debian
Posts: 242

Original Poster
Rep: Reputation: 98
Thanks Joris for taking the time to post here and the fantastically fast response to my questions.
I was traveling till now with limited access to internet so couldn't update the the thread with the resolution. I have marked the thread as solved.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Search and replace text in a csv file according to a text file zillur Linux - Newbie 13 03-07-2017 02:49 PM
How to search contents of multiple pdf files and return the pdf's file name? Hoxygen232 Linux - Newbie 4 04-28-2013 10:39 AM
Need a script to search and replace text in file using shell script unixlearner Programming 14 06-21-2007 11:37 PM
How to search and replace a text using grep DediPlace Linux - General 2 05-29-2005 07:47 PM
Search and replace text in file using shell script? matthurne Linux - Software 2 11-02-2004 11:11 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:19 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration