I want to download all the comics directly linked to from this page:
... but NOT any parent links, links to other authors, etc. Only those in the middle, and all their (scanned) pages.
The reason is, of course, that the page might go down some day, in which case... well, it would be inaccessible.
The optimal scenario would be that I get that index page, and can browse the pages just as I do online, with wget rewriting all URLs to be relative (the -k option IIRC).
The PROBLEM I'm having is that even if I try to download one comic at a time, it finds the link back to the index (upper left corner when viewing a comic page) and starts downloading the rest of the site. Since I don't want it, that's a giant waste of bandwidth for the site owner (doesn't matter to ME as I don't have a GB/month limit, but I'm trying to be as nice as possible here).
A solution for either downloading them all via wget, or downloading one at a time (e.g. http://disneycomics.free.fr/Ducks/Ro...?loc=D2002-033
- I'll grab the URLs using regexes) would be very welcome.
Of course, if I have to download them "manually", that might cause problems with directory naming instead. Still, that too should be extractable with an ugly perl-regex hack.