Is it possible to print a HTML document into PDF by conserving all links and anchors?
OS: Fedora Core 17 (x86_64)
I have a very big HTML file which I want to print/convert into a PDF file. While the browser (firefox) is open, I just go to file > print and then I chose the PDF option in order to print the content into a PDF file. This works pretty well and fast (less than 40 seconds for producing a big PDF file about 36 MB).
The only problem is that there are a lot of hypertext links (or rather to say anchors as it is a big single HTML document) which obviously facilitates considerably the navigation in the document. Well, the problem is that non of these links are conserved once the PDF file has been created.
After a lot of Googling I found a tool called wkhtmltopdf which apparently does the job as I expect by conserving the links. I installed it successfully, yet once I launched the program to create the PDF file in command line mode, it has been running (on the 4th step resolving links) for more than 2 hours and therefore I was wondering whether it would finish the job and even if it does, such delay even for big documents doesn't seem to be reasonable for future uses (I will have many big HTML documents to be exported into PDF in the close future)
Consequently, I would like to ask your opinion, do you know any practical way under linux to print a single HTML document into a PDF file by conserving at the same time all the links and anchors?
Thanks in advance,
You might try something like pandoc, though I don't know for certain if it preserves links. Alternately, you could maybe just open the HTML file in Libre/OpenOffice and export it to a PDF.
I forgot about LibreOffice. You could try something like...
I did some testing and found the conversion to be a little bit buggy. It cuts off the first word in my simple html document (test.html).
Headless file conversion using a LibreOffice API as a service
Start libreoffice as a foreground service.
One thing that is neat about that little experiment is that the conversion is a little faster than my original example because LibreOffice remains open as a service. The above example should work with OpenOffice using soffice.bin/soffice.
I made a blog post about this if you want to see some extra info about this method.
Thanks a lot for the help.
I exported a test HTML file into PDF with LibreOffice and in fact it worked (the links were conserved). However for very big files (almost 10000 pages) it halts after three hours without doing any thing.
The same happens by doing the command line method you provided, besides running unoconv gives me a Segmentation Fault error after running the script (Probably a dependency problem or a corrupted LibreOffice installation). Besides I did a test with a small html file and the links were not conserved (yet it was the case when I exported directly the file by using LibreOffice GUI)
Anyway, thank you very much both of you for your help and your time.
|All times are GMT -5. The time now is 11:29 PM.|