LinuxQuestions.org - [SOLVED] wget with regular expressions.

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - wget with regular expressions. (https://www.linuxquestions.org/questions/linux-newbie-8/wget-with-regular-expressions-846368/)

wget with regular expressions.

Hi,

i would to download some html files using wget.
The files that are would like to download are:
page1.html
page2.html
page3.html
page4.html
page5.html

I was expecting to download these files using the command:
wget http://localhost/page[1-5].html

although this option doesn't work.

Does any one know a way in using regular expressions with wget for this case?

Thanks,

In general you cannot use wildcards with wget, because the http servers do not provide a way of getting a list of files.

Wildcards are supported for ftp (though you would need to quote your url, otherwise the shell will attempt to expand the wildcard characters before wget sees them).

There are some specific arguments to wget that support wildcards (such as the accept and reject list), but this would only help you if you were doing a recursive wget (eg, if there was a parent page or index page with links to all the pages that interest you), for example:

Code:

wget -r -A 'page*.html' www.kidsolr.com

(though it will have to recurse through all the files in order to find the ones named 'page*.html', which can waste bandwidth)

Hi pedrosacosta,

this

Code:

for i in 1 2 3 4 5; do wget http://localhost/page$i.html; done

will work for you.

Markus

Try bash brace expansion or use curl instead of wget.
curl has an inbuilt ability to do this sort of thing.

Code:

wget http://localhost/page{1..5}.html      # bash brace expansion



curl -o 'page#1.html' 'http://localhost/page[1-5].html'