LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   wget with regular expressions. (https://www.linuxquestions.org/questions/linux-newbie-8/wget-with-regular-expressions-846368/)

xeon123 11-24-2010 05:11 PM

wget with regular expressions.
 
Hi,

i would to download some html files using wget.
The files that are would like to download are:
page1.html
page2.html
page3.html
page4.html
page5.html

I was expecting to download these files using the command:
wget http://localhost/page[1-5].html

although this option doesn't work.

Does any one know a way in using regular expressions with wget for this case?

Thanks,

neonsignal 11-24-2010 05:26 PM

In general you cannot use wildcards with wget, because the http servers do not provide a way of getting a list of files.

Wildcards are supported for ftp (though you would need to quote your url, otherwise the shell will attempt to expand the wildcard characters before wget sees them).

There are some specific arguments to wget that support wildcards (such as the accept and reject list), but this would only help you if you were doing a recursive wget (eg, if there was a parent page or index page with links to all the pages that interest you), for example:
Code:

wget -r -A 'page*.html' www.kidsolr.com
(though it will have to recurse through all the files in order to find the ones named 'page*.html', which can waste bandwidth)

markush 11-24-2010 05:31 PM

Hi pedrosacosta,

this
Code:

for i in 1 2 3 4 5; do wget http://localhost/page$i.html; done
will work for you.

Markus

Kenhelm 11-24-2010 05:52 PM

Try bash brace expansion or use curl instead of wget.
curl has an inbuilt ability to do this sort of thing.
Code:

wget http://localhost/page{1..5}.html      # bash brace expansion

curl -o 'page#1.html' 'http://localhost/page[1-5].html'



All times are GMT -5. The time now is 03:54 PM.