LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-28-2009, 08:17 AM   #1
Geneset
Member
 
Registered: Jan 2007
Location: Athlone, ROI
Distribution: Ubuntu Hardy Desktop, Solaris 10, Workstation 2008 x64
Posts: 75

Rep: Reputation: 16
Question Benchmarking Web Spider


Basically, what i would like is 'wget -r' with timing information for each webpage downloaded, formated in either csv or something relativly easy to work with...

I thought that this could be the kinda thing that people might have in their 'toolchain', but if not and if i build one I'll throw it up on here.

Many Thanks
G

PS Checked Similar threads and some guy was looking for a client side web server benchmark, which i assume would be like this, but didnt get any response; hope i am luckier.
 
Old 04-28-2009, 09:20 AM   #2
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,110

Rep: Reputation: 218Reputation: 218Reputation: 218
Well, wget -r outputs it, so it should be fairly easy to write a script to format it the way you want.

wget -r http://example.org 2>&1 |egrep "^[0-9]{2}:[0-9]{2}:[0-9]{2}"
 
Old 04-28-2009, 12:17 PM   #3
Geneset
Member
 
Registered: Jan 2007
Location: Athlone, ROI
Distribution: Ubuntu Hardy Desktop, Solaris 10, Workstation 2008 x64
Posts: 75

Original Poster
Rep: Reputation: 16
Post Awk Version

Using this awk script

Code:
BEGIN { inRecord = 0
        thisPage = ""
        startTime = ""
        endTime = ""
}
function timeDiff(startTime, endTime,     result, Eseconds, Sseconds) {
            # seconds           minutes                 hours
    Sseconds=(substr(start,7,2)+(60*substr(start,4,2))+(360*substr(start,1,2)));
    Eseconds=(substr(end,7,2)+(60*substr(end,4,2))+(360*substr(end,1,2)));
    result=(Eseconds-Sseconds);
    return result;
}

{
    if ( inRecord == 1 ){
        endTime = match($0,/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/);

        if ( $0 ~ /\[following\]/ ) {
            #this means its going somewhere else and will start a new download record
            inRecord = 0;
        } else if ( endTime )
            #have completed download
            inRecord = 0;
            if ( $0 !~ /ERROR/ ) {
                # Calculate times and print record here
                print $4,timeDiff( startTime, endTime ),startTime,endTime;
            }
        }  else {;}
    else{
        if ( $0 ~ /^--[0-9][0-9]:[0-9][0-9]:[0-9][0-9]--/ ) {
            #starting a download
            inRecord = 1;
            startTime = match($0,/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/);
        }
    }
}
I get this error
Code:
awk: syntax error near line 7
awk: bailing out near line 7
Line seven is the function definition, but either I've drank too much coffee or I'm just being stupid but i cant see whats wrong with it.
Any ideas?
Many Thanks

PS I know its no where near "functionally complete", lol
 
  


Reply

Tags
benchmark, server, web, wget


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Web spider search engine for a website pridefc Linux - Software 1 04-21-2008 10:36 AM
LXer: Build a Web Spider on Linux LXer Syndicated Linux News 0 11-15-2006 07:54 AM
wget as web spider/crawler kpachopoulos Linux - Software 2 08-27-2005 12:58 PM
remote computers cannot spider my web page directories jacksonscottsly Linux - Networking 4 07-02-2004 05:10 PM
Web server benchmarking program Travis86 Linux - Networking 0 09-25-2003 09:53 PM


All times are GMT -5. The time now is 10:38 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration