LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 10-20-2006, 01:20 PM   #1
dbc001
Member
 
Registered: Jan 2004
Distribution: Slackware, Ubuntu
Posts: 97

Rep: Reputation: 15
Spidering del.icio.us bookmarks?


I'm trying to write a script that will create an archive copy of all my del.icio.us links once a month or so in case a site goes down or I need offline access. I've tried using wget but I can't seem to get it to download from more than one server at a time. curl seems to be targeted at one-shot downloads. Has anyone done anything like this? Any suggestions on where to start? (I'm still pretty new to the whole scripting thing so any advice would be much appreciated)

thanks!
dbc
 
Old 10-20-2006, 01:44 PM   #2
vls
Member
 
Registered: Jan 2005
Location: The grassy knoll
Distribution: Slackware,Debian
Posts: 192

Rep: Reputation: 31
Use a link like this(untested):

Code:
http://del.icio.us/username/?count=100
Give count some outrageous number to cover all the links.

http://del.icio.us/help/html
 
Old 10-26-2006, 01:45 PM   #3
dbc001
Member
 
Registered: Jan 2004
Distribution: Slackware, Ubuntu
Posts: 97

Original Poster
Rep: Reputation: 15
got it working...

OK, I managed to get this working. It tries to simulate a human surfer so it doesn't eat up anybody's connection, ignores robots.txt (yes, I know that's kind of rude) and also spoofs firefox in case anyone blocks wget (is that necessary? i saw it in a forum post somewhere). The current command does not filter out ads (like doubleclick) nor does it block Javascript, if you know how to do that please let me know. Here's what I'm using:
Quote:
wget -t 7 -w 5 --waitretry=14 --random-wait --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1" -m -k -p -e robots=off --span-hosts -r -l 1 --no-check-certificate 'http://del.icio.us/html/username?count=200'
Unfortunately it also downloads a file for every del.icio.us tag you have which is kind of wasteful.

Next step is to search through the pages downloaded for .zip, .tgz, or .gz files and download them as well...

Last edited by dbc001; 10-26-2006 at 01:46 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Print.Print Add to Project.Add to Project Bookmark with del.icio.us Simplify PHP Development with WASP LXer Syndicated Linux News 0 01-21-2006 08:46 PM
cannot route del ciscohead Linux - Networking 2 10-03-2005 09:53 AM
how to del the history shadowsurfer Linux - Newbie 2 09-19-2004 11:27 AM
DEL linux? JreL Linux - Software 2 08-14-2003 12:50 PM
Do I can del them? explorer1979 Linux - General 1 02-20-2003 02:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 12:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration