OK, I managed to get this working. It tries to simulate a human surfer so it doesn't eat up anybody's connection, ignores robots.txt (yes, I know that's kind of rude) and also spoofs firefox in case anyone blocks wget (is that necessary? i saw it in a forum post somewhere). The current command does not filter out ads (like doubleclick) nor does it block Javascript, if you know how to do that please let me know. Here's what I'm using:
Quote:
wget -t 7 -w 5 --waitretry=14 --random-wait --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1" -m -k -p -e robots=off --span-hosts -r -l 1 --no-check-certificate 'http://del.icio.us/html/username?count=200'
|
Unfortunately it also downloads a file for every del.icio.us tag you have which is kind of wasteful.
Next step is to search through the pages downloaded for .zip, .tgz, or .gz files and download them as well...