Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
links1 contains links to a bunch of directories and to links2. I am already using -m (which is the same as -rl inf) to descend (infinitely) down the directories linked to in links1, but I don't want to follow any of the links contained in links2. So I want to say "Download links1 and follow all of its links, but do not download or follow links from links2".
Hope that makes it more clear. This seems like something that should be well within wget's core functionality, so I'm still baffled as to why it isn't easier.
First one gets me 19 files, and the other gets me 7. So please try those 2 commands and tell us how it goes
Yep, that makes perfect sense, because your first example keeps following links to the default recursion level of 5, whereas your second one only follows each link 1 level down. My problem isn't the level of recursion, it's which pages' links are being followed.
To use that page as an example, what I'm trying to do is to say "download pages 5 levels of recursion on all the links from http://babelfish.yahoo.com/translate_txt except for the privacy page or anything linked from it". The links there look a little funny so I'm not sure how well that example will actually work on that page, but that's the idea.
links1 contains links to A/index and to links2. links2 contains a link to B/index. The index files for each directory just contain links to the other files in the directory, i.e., wgettest/A/index.html just links to a.html, and the same for B. Have a look at the site if that helps make things clearer.
I want a single wget command that downloads only the files links1.html, A/index.html, and A/a.html. That is, I want to recursively download everything from links1, except I don't want to download or follow anything from links2. In reality, links1 and links2 link to many more directories, so I can't include or exclude them all by name.
Should be simple, right? Can anybody do it? Feel free to try it on the demo -- since it's only 40K, I hope my server can handle it!...
I added -np for "no-parent" for those annoying links that go back upwards in the tree and cause you to download the whole site.
Are you trying to download from the root directory on the site? (i.e. example.com/*) That, I imagine, would make it harder to specifically exclude links2, especially if both the root directory AND a subdirectory have objects that point to page links2. Although -X still might work there, too.
As for -X's syntax, it's not clear to me if it needs to be relative ("links2.html") or absolute ("http://example.com/links2.html"). Just try em both ways and see which one works.
...and they both download all six files. (Feel free to try it out on my server.)
This is the crux of the problem, I think: as I understand it, -X does the correct thing, but will only do it for directories. (That is, if links2 were a directory rather than an html file, something like "wget -X links2/" would work as desired.)
-R is a similar option for files, but wget -R links2.html will still follow all of links2's links.
The fact that wget is prepared to do exactly what I want for directories makes me think that there must be some way to do it for files. This also seems too fundamental to be a bug in such a stable program -- surely it's more likely that I'm overlooking something? But who knows, maybe I should report it as a bug and see what happens?...