Visit Jeremy's Blog.
Go Back > Forums > Linux Forums > Linux - Software
User Name
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.


  Search this Thread
Old 01-31-2005, 01:20 PM   #1
Registered: Sep 2004
Posts: 63

Rep: Reputation: 15
Merge Of Html Files Into A Single Html (or Pdf)

Using wget ( 'i' option) you can download from Internet several pages,
which can represent the various parts of a lenghty document (manual or tutorial).

Is there in Linux a sw to merge the html pages
in a single html file (or better in a PDF file)?
Old 01-31-2005, 01:43 PM   #2
Registered: Jan 2005
Location: Finland
Distribution: Ubuntu, Debian, Gentoo, Slackware
Posts: 827

Rep: Reputation: 31
well, none that ive heard of, but imho such a simple task doesn't need a specific program.
Ill demonstrate via an example:
grep -iv "</body>" downloads/html/site/page1.html | grep -iv "</html>" | grep -iv "</HTML>" | grep -iv "</BODY>" > downloads/html/site/all.html
grep -iv "</body>" downloads/html/site/page2.html | grep -iv "</html>" | grep -iv "</HTML>" | grep -iv "</BODY>" >> downloads/html/site/all.html
grep -iv "</body>" downloads/html/site/page3.html | grep -iv "</html>" | grep -iv "</HTML>" | grep -iv "</BODY>" >> downloads/html/site/all.html
#..repeat as long as neccessary..
echo "</body></html>" >> downloads/html/site/all.html
If the code is nicely formed (aka has the ending tags </html> and </body> on seperate / a seperate line(s)) This will just grab them out and merge into a single html which can be converted to whatever you want..

The result of the site merge wont be pretty, and can break the code, but thats what you get when automating this kinda stuff anyways..

Thr above code would still need to be modified to take out the start tags of the new files also, but you hopefully get my point and can modify it yourself.. Thats the way to learn anyways.. (:

Last edited by Artanicus; 01-31-2005 at 01:56 PM.
Old 02-10-2005, 06:46 PM   #3
Registered: Sep 2004
Posts: 63

Original Poster
Rep: Reputation: 15
Thank you for your replay. I have followed what you suggest (but in a more drastic way):
I simply merged the html files as they were, without beeing warried of having, for example 100 <html> tags.
For my purposes, that is toprint the manual or tutorial, it worked perfectly.

The only problem is to get the correct list of the html files as defined by the Table of Content, and to merge them in that order.
But with a little script...
Old 05-30-2007, 01:37 PM   #4
Registered: Jul 2003
Location: Iowa
Distribution: Debian
Posts: 32

Rep: Reputation: 15
merge HTML's to PDF

Seee: htmlDoc.

it is open source.
(A $$ version adds a GUI)
Old 01-03-2011, 06:50 AM   #5
LQ Newbie
Registered: Jan 2011
Posts: 2

Rep: Reputation: 0
You can use this HTML to PDF converter to convert multiple HTML documents to same PDF. There is also a sample for this on the website.
Old 06-20-2011, 08:02 AM   #6
LQ Newbie
Registered: Jun 2011
Posts: 0

Rep: Reputation: Disabled
Post Script!

I know, this is an old post, but i just wrote a bash script to get this done, here it is.
You will have to chmod +x it, to run it.
echo "Enter directory path pages:";
read html_path;
echo "Enter complete filename of the starting page:"
read start_page;
ls $html_path > "list.txt";
grep -iv "</body>" "$html_path/$start_page" | grep -iv "</html>" > "$html_path/all_merged.html";
for i in $(< list.txt)
	grep -iv "<body>" "$html_path/$i" | grep -iv "<html>" | grep -iv "</body>" | grep -iv "</html>" >> "$html_path/all_merged.html"
echo "</body></html>" >> "$html_path/all_merged.html"
echo "Merged file ---> $html_path/all_merged.html"
unset html_path;
unset start_page;
unset i;

PS: i know one shouldn't parse ls output, but for simplicity's sake i chose to ignore that.
Old 06-20-2011, 07:28 PM   #7
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,375

Rep: Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755Reputation: 2755
FYI, any var created inside a SHELL is invisible to the parent shell and they get destroyed when the shell exits, so there's no need to unset them all at the end.
Old 05-10-2018, 12:53 AM   #8
LQ Newbie
Registered: May 2018
Posts: 2

Rep: Reputation: 0
Merge HTMLto PDF files can be done

I would need to convert HTML to PDF first and then merge them together. To do the conversion, this one could help: (URL removed)HTML to PDF converter for .NET[/URL]. That tool can also merge PDF files. Merging HTML files will need to modify the HTML structure.

This (URL removed) componentpro pdf one[/url] can also help.

Last edited by jefro; 05-10-2018 at 02:50 PM.
Old 05-10-2018, 06:58 AM   #9
LQ Guru
Registered: Apr 2008
Distribution: Slackware, Ubuntu, PCLinux,
Posts: 10,607

Rep: Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504Reputation: 2504

Before posting it is a good idea to look at the dates on the post so you don't end up a 'necro poster'. This thread originated over 13 years ago and the last post was almost 7 years ago. I doubt the original poster is looking anymore and I'm sure things have changed in the intervening years and more is possible, the world moves on.
Old 05-10-2018, 02:51 PM   #10
Registered: Mar 2008
Posts: 22,026

Rep: Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632Reputation: 3632
I removed links. If you feel it is in error contact me.
Old 05-11-2018, 11:28 AM   #11
LQ Veteran
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
Downloading an Entire Web Site with wget


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
merge multiple pdf files esteeven Linux - Software 8 11-21-2007 01:59 PM
How To Merge multiple files into a single PDF ? kkempter Linux - Software 1 10-28-2005 01:02 PM
html code and including html files Hockeyfan Programming 2 08-22-2005 05:11 PM
print files in PDF or html format from the linux command line IBKnobel Linux - Software 3 07-12-2004 09:29 PM
Converting html files to pdf saurya_s Linux - Software 1 01-12-2004 06:49 AM > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:23 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration