Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Using wget ( 'i' option) you can download from Internet several pages,
which can represent the various parts of a lenghty document (manual or tutorial).
Is there in Linux a sw to merge the html pages
in a single html file (or better in a PDF file)?
If the code is nicely formed (aka has the ending tags </html> and </body> on seperate / a seperate line(s)) This will just grab them out and merge into a single html which can be converted to whatever you want..
The result of the site merge wont be pretty, and can break the code, but thats what you get when automating this kinda stuff anyways..
Thr above code would still need to be modified to take out the start tags of the new files also, but you hopefully get my point and can modify it yourself.. Thats the way to learn anyways.. (:
Thank you for your replay. I have followed what you suggest (but in a more drastic way):
I simply merged the html files as they were, without beeing warried of having, for example 100 <html> tags.
For my purposes, that is toprint the manual or tutorial, it worked perfectly.
The only problem is to get the correct list of the html files as defined by the Table of Content, and to merge them in that order.
But with a little script...
FYI, any var created inside a SHELL is invisible to the parent shell and they get destroyed when the shell exits, so there's no need to unset them all at the end.
I would need to convert HTML to PDF first and then merge them together. To do the conversion, this one could help: (URL removed)HTML to PDF converter for .NET[/URL]. That tool can also merge PDF files. Merging HTML files will need to modify the HTML structure.
This (URL removed) componentpro pdf one[/url] can also help.
Before posting it is a good idea to look at the dates on the post so you don't end up a 'necro poster'. This thread originated over 13 years ago and the last post was almost 7 years ago. I doubt the original poster is looking anymore and I'm sure things have changed in the intervening years and more is possible, the world moves on.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.