Download your favorite Linux distribution at LQ ISO.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 11-24-2010, 05:11 PM   #1
Registered: Sep 2006
Posts: 374

Rep: Reputation: 16
wget with regular expressions.


i would to download some html files using wget.
The files that are would like to download are:

I was expecting to download these files using the command:
wget http://localhost/page[1-5].html

although this option doesn't work.

Does any one know a way in using regular expressions with wget for this case?

Old 11-24-2010, 05:26 PM   #2
Senior Member
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Bookworm (Fluxbox WM)
Posts: 1,391
Blog Entries: 53

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
In general you cannot use wildcards with wget, because the http servers do not provide a way of getting a list of files.

Wildcards are supported for ftp (though you would need to quote your url, otherwise the shell will attempt to expand the wildcard characters before wget sees them).

There are some specific arguments to wget that support wildcards (such as the accept and reject list), but this would only help you if you were doing a recursive wget (eg, if there was a parent page or index page with links to all the pages that interest you), for example:
wget -r -A 'page*.html'
(though it will have to recurse through all the files in order to find the ones named 'page*.html', which can waste bandwidth)

Last edited by neonsignal; 11-24-2010 at 05:36 PM.
Old 11-24-2010, 05:31 PM   #3
Senior Member
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,979

Rep: Reputation: Disabled
Hi pedrosacosta,

for i in 1 2 3 4 5; do wget http://localhost/page$i.html; done
will work for you.

Old 11-24-2010, 05:52 PM   #4
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 359

Rep: Reputation: 170Reputation: 170
Try bash brace expansion or use curl instead of wget.
curl has an inbuilt ability to do this sort of thing.
wget http://localhost/page{1..5}.html      # bash brace expansion

curl -o 'page#1.html' 'http://localhost/page[1-5].html'

Last edited by Kenhelm; 11-24-2010 at 06:24 PM. Reason: Added "-o 'page#1.html' " to create output file names
1 members found this post helpful.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular expressions Khaj.pandey Linux - Newbie 19 04-21-2010 11:09 PM
regular expressions. stomach Linux - Software 1 02-10-2006 06:41 AM
Regular expressions aromes Linux - General 1 10-15-2003 12:29 PM
regular expressions? alaios Linux - General 2 06-11-2003 03:51 PM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:09 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration