Benchmarking Web Spider

Geneset · 04-28-2009, 08:17 AM

Basically, what i would like is 'wget -r' with timing information for each webpage downloaded, formated in either csv or something relativly easy to work with...

I thought that this could be the kinda thing that people might have in their 'toolchain', but if not and if i build one I'll throw it up on here.

Many Thanks
G

PS Checked Similar threads and some guy was looking for a client side web server benchmark, which i assume would be like this, but didnt get any response; hope i am luckier.

Guttorm · 04-28-2009, 09:20 AM

Well, wget -r outputs it, so it should be fairly easy to write a script to format it the way you want.

wget -r http://example.org 2>&1 |egrep "^[0-9]{2}:[0-9]{2}:[0-9]{2}"

Geneset · 04-28-2009, 12:17 PM

Using this awk script

Code:

BEGIN { inRecord = 0
        thisPage = ""
        startTime = ""
        endTime = ""
}
function timeDiff(startTime, endTime,     result, Eseconds, Sseconds) {
            # seconds           minutes                 hours
    Sseconds=(substr(start,7,2)+(60*substr(start,4,2))+(360*substr(start,1,2)));
    Eseconds=(substr(end,7,2)+(60*substr(end,4,2))+(360*substr(end,1,2)));
    result=(Eseconds-Sseconds);
    return result;
}

{
    if ( inRecord == 1 ){
        endTime = match($0,/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/);

        if ( $0 ~ /\[following\]/ ) {
            #this means its going somewhere else and will start a new download record
            inRecord = 0;
        } else if ( endTime )
            #have completed download
            inRecord = 0;
            if ( $0 !~ /ERROR/ ) {
                # Calculate times and print record here
                print $4,timeDiff( startTime, endTime ),startTime,endTime;
            }
        }  else {;}
    else{
        if ( $0 ~ /^--[0-9][0-9]:[0-9][0-9]:[0-9][0-9]--/ ) {
            #starting a download
            inRecord = 1;
            startTime = match($0,/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/);
        }
    }
}

I get this error

Code:

awk: syntax error near line 7
awk: bailing out near line 7

Line seven is the function definition, but either I've drank too much coffee or I'm just being stupid but i cant see whats wrong with it.
Any ideas?
Many Thanks

PS I know its no where near "functionally complete", lol