Is there a way to download an html/php book with wget or something?
As an example I am wanting to try to download this CentOS book. I found an old thread that here that uses wget as follows:
Code:
wget -r -np http://www.tldp.org/LDP/abs/html/index.html and 2) it doesn't work for sites like the one linked above (CentOS book), I think because it is php? I'm not sure. Does anyone have a solution for this? |
There are a dozen ways to do that.
Code:
lynx -dump -listonly http://www.techotopia.com/index.php/CentOS_6_Essentials getfile.txt Code:
http://www.techotopia.com/index.php/Performing_a_CentOS_6_Network_Installation Code:
wget -i getfile.txt Or you could get a formatted .txt of a page with Code:
lynx -dump http://www.techotopia.com/index.php/Performing_a_CentOS_6_Network_Installation > chapter3.txt Or write a script to automate it. |
Brilliant!
I know I could go page by page and just "save as pdf" but that is wayyy too slow. Is lynx dump a method that will pull each subsequent or linked to page? If not then what is the best automated method to do that? If i was going to make a getfile, is there also an autmated method for grabbing the url of each link on a page like the one linked in OP? Each chapter and section is it's own link, which can be quite a lot of copy/pasting. |
A basic tutorial for you. Do each and look at the results to get an understanding.
Code:
lynx -dump -listonly http://www.techotopia.com/index.php/CentOS_6_Essentials > dump.txt Code:
cat dump.txt | head -n 280 > dump2.txt Code:
cat dump2.txt | tail -n 263 > dump3.txt Code:
cat dump3.txt | sed 's .\{6\} ' > dump4.txt Code:
cat dump4.txt | sort -u > dump5.txt Code:
wget -U Mozilla/5.0 -i dump5.txt If you want to convert them on the fly write a script using wget, curl, lynx, html2pdf, html2text etc. to do what you wish. There are also apps like htmldoc. Look at: man cat man wget man head man tail man sort man lynx man sed After understanding the above, Code:
a=1; for i in $(cat dump5.txt); do lynx -dump $i > CentOS$a.txt; let a++; done Code:
#! /usr/bin/env bash Edit: Fixed wget syntax and added example script. |
between dump 2 and 3, there is no difference in the files. Is that supposed to happen? Am I missing something?
I know the commands have different context, and I get that the "head" or "tail" is supposed to be pulling info from list entry 280 and 263 respectively, but the file "dump2.txt" and "dump3.txt" are exactly the same...? |
dump2.txt will have the first 280 lines.
dump3.txt will have the last 263 lines. Those were examples of how to edit unwanted lines from a .txt file that you can then use with wget -i |
Quote:
So I went through the rest and, wow... very quick and streamlined, very nice! I will need to read into using these tools more to understand better what's going on. Thanks! |
lol, all that effort to save $10 :)
|
Quote:
|
Quote:
|
Quote:
|
All times are GMT -5. The time now is 06:51 PM. |