LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-13-2016, 09:09 PM   #1
X-LFS-2010
Member
 
Registered: Apr 2016
Posts: 510

Rep: Reputation: 58
wget exclude directory does not work -X pattern(s)


(search was taking too long so i wrote a new article for the record - for a good cause)

Many want to use wget to download a site but exclude directory trees by name pattern (glob) - but using wget's manpage doesn't seem to work.

Many including me had problems figuring out how to use -X, and found the answer hard to remember (between years of span between use).

This is a hard to remember trick write it down

# does not work for directories
$ wget -X fo*o ...

# works for dirs

$ wget -X */fo*o,*/*/fo*o,*/*/*/fo*o ...

(biatch my stars dissapeared try again?)

$ wget -X \*/fo\*o/,\*/\*/fo\*o/,\*/\*/\*/fo\*o/ ...

ANSWER:

site hack wget-?/src/utils.c so to use basename instead of current path, also on command line use filename pattern (just 'fo*o', nothing else). and note param FNM_PATHNAME set to 0 (otherwise it demands '/' to finish tail of match). the following patch is against wget-1.12 there my be a simpler way by definining ?FNM_FLAGS but this works "fine".

<code>
--- utils.c.old 2016-09-13 07:49:11.000000000 -0400
+++ utils.c 2016-09-13 09:32:58.000000000 -0400
@@ -907,6 +907,9 @@
return *d1 == '\0' && (*d2 == '\0' || *d2 == '/');
}

+/* for basename */
+#include <libgen.h>
+
/* Iterate through DIRLIST (which must be NULL-terminated), and return the
first element that matches DIR, through wildcards or front comparison (as
appropriate). */
@@ -921,18 +924,24 @@
{
/* Remove leading '/' */
char *p = *x + (**x == '/');
+ /* SITE HACK - only if patterned ignore leading dirs cmp as file */
+ char sh_str[1024*16], *pp;
+ strcpy(sh_str,basename(dir));
+ pp=sh_str;
+#if 0
+ printf("? %s == %s ?\n",p,pp);
+#endif
if (has_wildcards_p (p))
{
- if (matcher (p, dir, FNM_PATHNAME) == 0)
+ if (matcher (p, pp, 0) == 0)
break;
}
else
{
- if (subdir_p (p, dir))
+ if (subdir_p (p, pp))
break;
}
}
-
return *x ? true : false;
}
</code>

this is my example of use, however demented the result might be. prepending each level of */*/*/ to each would obviously be tedious

$ wget \
--no-remove-listing -L -r -nc -np -nH -l 10 -p --limit-rate=127k \
-X '*-alpha*,*-arm*,*-arm64*,*-hppa*,*-ia64*,*-m68k*,*-mips*,*-sparc*,*-amd64*,*-armel*,*-armhf*,*-mipsel*,*-powerpc*,*-ppc64el*,*-s390x*,*-s390*,*-kfreebsd*' \
-R '*_alpha*,*_arm*,*_arm64*,*_hppa*,*_ia64*,*_m68k*,*_mips*,*_sparc*,*_amd64*,*_armel*,*_armhf*,*_mips el*,*_powerpc*,*_ppc64el*,*_s390*,*_kfreebsd*,*-alpha*,*-arm*,*-arm64*,*-hppa*,*-ia64*,*-m68k*,*-mips*,*-sparc*,*-amd64*,*-armel*,*-armhf*,*-mipsel*,*-powerpc*,*-ppc64el*,*-s390*,*-kfreebsd*' \
http://archive.debian.org/debian/

CONCLUSION: enjoy !
 
Old 09-25-2016, 09:35 PM   #2
sag47
Senior Member
 
Registered: Sep 2009
Location: Raleigh, NC
Distribution: Kubuntu x64, Raspbian, CentOS
Posts: 1,861
Blog Entries: 36

Rep: Reputation: 459Reputation: 459Reputation: 459Reputation: 459Reputation: 459
The initial mistake you made was not properly quoting your argument for excludes. Bash handles asterisk as a special character for globbing and only wrapping in single quotes can one guarantee that bash will not try to interpret it as a glob expression. The same issue can occur with other cli utilities like find.

Last edited by sag47; 09-25-2016 at 09:36 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] du --exclude='pattern' only opposite? ctav01 Linux - Newbie 5 02-21-2013 01:18 PM
rsync exclude pattern genese Linux - Software 3 12-03-2012 04:26 AM
[SOLVED] how to exclude everything except a pattern with sed rafaeldeoliveiracosta Programming 12 07-23-2010 11:36 AM
Exclude file pattern from unzip command XeroXer Linux - Newbie 3 01-10-2009 07:10 AM
wget & --exclude-directories Tinkster Slackware 4 07-20-2003 10:02 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration