I forgot about LibreOffice. You could try something like...
Code:
libreoffice --headless --convert-to pdf *.html
Usually that command is for *.odf but it's worth a try.
**EDIT**
I did some testing and found the conversion to be a little bit buggy. It cuts off the first word in my simple html document (test.html).
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>This is a test</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<h1>This is my title</h1>
<p>This is some text in the page</p>
<p><a href="http://www.gleske.net/">Visit Gleske Homepage</a></p>
<ul>
<li><a href="http://www.tldp.org/">Linux Documentation Project</a></li>
<li>This is some text in a bullet.</li>
<li><a href="http://www.gimp.org/">GIMP, An image manipulation program!</a></li>
</ul>
</body>
</html>
However, I was able to successfully convert the document using the LibreOffice API without any problems at all. Basically, you start libreoffice as a daemon and it will stay open in a headless environment. Then use the unoconv client to connect to the service for the conversion.
Headless file conversion using a LibreOffice API as a service
Start libreoffice as a foreground service.
Code:
soffice --nologo --headless --nofirststartwizard --accept="socket,host=127.0.0.1,port=2220,tcpNoDelay=1;urp"
Then use unoconv to connect to that service and use the API to convert the HTML file.
Code:
unoconv --connection "socket,host=127.0.0.1,port=2220,tcpNoDelay=1;urp;StarOffice.ComponentContext" -f pdf *.html
Links were preserved in my experiments.
One thing that is neat about that little experiment is that the conversion is a little faster than my original example because LibreOffice remains open as a service. The above example should work with OpenOffice using soffice.bin/soffice.
**EDIT2**
I made a blog post
about this if you want to see some extra info about this method.
SAM