LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-24-2010, 06:11 PM   #1
xeon123
Member
 
Registered: Sep 2006
Posts: 374

Rep: Reputation: 16
wget with regular expressions.


Hi,

i would to download some html files using wget.
The files that are would like to download are:
page1.html
page2.html
page3.html
page4.html
page5.html

I was expecting to download these files using the command:
wget http://localhost/page[1-5].html

although this option doesn't work.

Does any one know a way in using regular expressions with wget for this case?

Thanks,
 
Old 11-24-2010, 06:26 PM   #2
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Jessie (Fluxbox WM)
Posts: 1,387
Blog Entries: 52

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
In general you cannot use wildcards with wget, because the http servers do not provide a way of getting a list of files.

Wildcards are supported for ftp (though you would need to quote your url, otherwise the shell will attempt to expand the wildcard characters before wget sees them).

There are some specific arguments to wget that support wildcards (such as the accept and reject list), but this would only help you if you were doing a recursive wget (eg, if there was a parent page or index page with links to all the pages that interest you), for example:
Code:
wget -r -A 'page*.html' www.kidsolr.com
(though it will have to recurse through all the files in order to find the ones named 'page*.html', which can waste bandwidth)

Last edited by neonsignal; 11-24-2010 at 06:36 PM.
 
Old 11-24-2010, 06:31 PM   #3
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,979

Rep: Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850
Hi pedrosacosta,

this
Code:
for i in 1 2 3 4 5; do wget http://localhost/page$i.html; done
will work for you.

Markus
 
Old 11-24-2010, 06:52 PM   #4
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 333

Rep: Reputation: 141Reputation: 141
Try bash brace expansion or use curl instead of wget.
curl has an inbuilt ability to do this sort of thing.
Code:
wget http://localhost/page{1..5}.html      # bash brace expansion

curl -o 'page#1.html' 'http://localhost/page[1-5].html'

Last edited by Kenhelm; 11-24-2010 at 07:24 PM. Reason: Added "-o 'page#1.html' " to create output file names
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular expressions Khaj.pandey Linux - Newbie 19 04-22-2010 12:09 AM
regular expressions. stomach Linux - Software 1 02-10-2006 07:41 AM
Regular expressions aromes Linux - General 1 10-15-2003 01:29 PM
regular expressions? alaios Linux - General 2 06-11-2003 04:51 PM


All times are GMT -5. The time now is 06:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration