LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Abiword converts PDF to Word easily if imperfectly; surprised Libreoffice won't (https://www.linuxquestions.org/questions/linux-software-2/abiword-converts-pdf-to-word-easily-if-imperfectly%3B-surprised-libreoffice-wont-4175647811/)

newbiesforever 02-07-2019 07:24 AM

Abiword converts PDF to Word easily if imperfectly; surprised Libreoffice won't
 
Of course I know Libreoffice Writer can convert its documents to PDF; you simply select Export to PDF. But I was hoping to do the opposite: I downloaded a PDF of a doctor's new-patient form before my appointment, and wanted to convert it to a Word document and edit it in Libreoffice. I researched this and found out that Abiword does the conversion easily. It's not perfect--the fonts and other formatting generally aren't there--but I can use it, and the provider and staff can read it and enter it into the computer. I'll settle for it because I don't like writing detailed answers on pre-made forms with a limited amount of space, such as I often face on medical history.

Great, I solved my issue; but could Libreoffice do it? If Abiword can, I guessed the superior Libreoffice can probably do it too. To my surprise, I found a seemingly "official" statement that no, it can't: https://ask.libreoffice.org/en/quest...o-a-word-file/ . Although that post is going on three years old. I imagine the Libreoffice designers simply don't want to incorporate whatever Abiword did, because the conversion doesn't meet their high standards: they would want their conversion to look exactly like the PDF, and Abiword's conversion is crude.

wpeckham 02-07-2019 07:41 AM

LO can convert a document CREATED in LO between document format and pdf. I have no problem converting a PDF using ABIWORD, then always handling it using LO forever after.

sevendogsbsd 02-07-2019 07:45 AM

Interesting - I have attempted to replace Libreoffice with Abiword and Gnumeric for a couple of years but every time I try Abiword, it is horrible: the UI is black and flickers and is unusable. This is on both Linux and FreeBSD.

Good to know it works for someone!

Turbocapitalist 02-07-2019 07:45 AM

It also depends on what is in the PDF. The format PDF is a terminal stage format. Your document goes there while waiting either to go to the printer or the bit bucket. Trying to recover data from a PDF is a fool's errand.

tldr; Go get the original which was used to create the PDF and work with that.

newbiesforever 02-07-2019 07:52 AM

Quote:

Originally Posted by wpeckham (Post 5958981)
LO can convert a document CREATED in LO between document format and pdf. I have no problem converting a PDF using ABIWORD, then always handling it using LO forever after.

I don't particularly like ABiword either, and this is the first useful purpose I've had for it.

TB0ne 02-07-2019 08:52 AM

Quote:

Originally Posted by newbiesforever (Post 5958985)
I don't particularly like ABiword either, and this is the first useful purpose I've had for it.

Converting a PDF back into 'text' is *NEVER* going to work 100%, unless you just have a basic text-document, single column. Any formatting (dual columns, etc.), is going to throw off whatever you convert.

Personally, if you can't get a hold of the source that the PDF I'd use the pdftotext utility from the command line, and make peace with the fact you're not going to get good results. When I've had to do such things and the PDF's contained images, I'd extract the images from the PDF's first, and then get the text. Copy/paste the text into LibreOffice Write, shove in the images, and go from there. There just isn't a good way to do this with PDF's.

wpeckham 02-07-2019 07:22 PM

Quote:

Originally Posted by TB0ne (Post 5958992)
Converting a PDF back into 'text' is *NEVER* going to work 100%, unless you just have a basic text-document, single column. Any formatting (dual columns, etc.), is going to throw off whatever you convert.

Personally, if you can't get a hold of the source that the PDF I'd use the pdftotext utility from the command line, and make peace with the fact you're not going to get good results. When I've had to do such things and the PDF's contained images, I'd extract the images from the PDF's first, and then get the text. Copy/paste the text into LibreOffice Write, shove in the images, and go from there. There just isn't a good way to do this with PDF's.

Good advice, but I take one exception: if you are talking about a LO PDF file, LO leaves adequate clues in the metadata to do a (Near)prefect conversion back to LO Writer. If the PDF was created by anything else, it will lack that kind of metadata. Somethign may be able to read and convert it, but it may not look as you think it should. Always best to have the source.

TB0ne 02-08-2019 07:10 AM

Quote:

Originally Posted by wpeckham (Post 5959201)
Good advice, but I take one exception: if you are talking about a LO PDF file, LO leaves adequate clues in the metadata to do a (Near)prefect conversion back to LO Writer. If the PDF was created by anything else, it will lack that kind of metadata. Somethign may be able to read and convert it, but it may not look as you think it should. Always best to have the source.

Quite correct, and great observation. The PDF's I had to work did NOT have that metadata, so I had to improvise.


All times are GMT -5. The time now is 05:24 AM.