Visit Jeremy's Blog.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 04-28-2009, 09:17 AM   #1
Registered: Jan 2007
Location: Athlone, ROI
Distribution: Ubuntu Hardy Desktop, Solaris 10, Workstation 2008 x64
Posts: 75

Rep: Reputation: 16
Question Benchmarking Web Spider

Basically, what i would like is 'wget -r' with timing information for each webpage downloaded, formated in either csv or something relativly easy to work with...

I thought that this could be the kinda thing that people might have in their 'toolchain', but if not and if i build one I'll throw it up on here.

Many Thanks

PS Checked Similar threads and some guy was looking for a client side web server benchmark, which i assume would be like this, but didnt get any response; hope i am luckier.
Old 04-28-2009, 10:20 AM   #2
Senior Member
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,295

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Well, wget -r outputs it, so it should be fairly easy to write a script to format it the way you want.

wget -r 2>&1 |egrep "^[0-9]{2}:[0-9]{2}:[0-9]{2}"
Old 04-28-2009, 01:17 PM   #3
Registered: Jan 2007
Location: Athlone, ROI
Distribution: Ubuntu Hardy Desktop, Solaris 10, Workstation 2008 x64
Posts: 75

Original Poster
Rep: Reputation: 16
Post Awk Version

Using this awk script

BEGIN { inRecord = 0
        thisPage = ""
        startTime = ""
        endTime = ""
function timeDiff(startTime, endTime,     result, Eseconds, Sseconds) {
            # seconds           minutes                 hours
    return result;

    if ( inRecord == 1 ){
        endTime = match($0,/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/);

        if ( $0 ~ /\[following\]/ ) {
            #this means its going somewhere else and will start a new download record
            inRecord = 0;
        } else if ( endTime )
            #have completed download
            inRecord = 0;
            if ( $0 !~ /ERROR/ ) {
                # Calculate times and print record here
                print $4,timeDiff( startTime, endTime ),startTime,endTime;
        }  else {;}
        if ( $0 ~ /^--[0-9][0-9]:[0-9][0-9]:[0-9][0-9]--/ ) {
            #starting a download
            inRecord = 1;
            startTime = match($0,/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/);
I get this error
awk: syntax error near line 7
awk: bailing out near line 7
Line seven is the function definition, but either I've drank too much coffee or I'm just being stupid but i cant see whats wrong with it.
Any ideas?
Many Thanks

PS I know its no where near "functionally complete", lol


benchmark, server, web, wget

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Web spider search engine for a website pridefc Linux - Software 1 04-21-2008 11:36 AM
LXer: Build a Web Spider on Linux LXer Syndicated Linux News 0 11-15-2006 08:54 AM
wget as web spider/crawler kpachopoulos Linux - Software 2 08-27-2005 01:58 PM
remote computers cannot spider my web page directories jacksonscottsly Linux - Networking 4 07-02-2004 06:10 PM
Web server benchmarking program Travis86 Linux - Networking 0 09-25-2003 10:53 PM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:26 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration