LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 03-13-2008, 01:24 PM   #1
blizunt7
Member
 
Registered: Mar 2004
Distribution: Fedora Core 1,2,3, RHEL3,4,5 Ubuntu
Posts: 272

Rep: Reputation: 30
webalizer bandwidth discrepancy


Hey all,
Try to figure out how acurate webalizer is.
For a particular host of mine, its reporting for the Month of Mar:
4459461 K.

In httpd.conf, i have custom logs for this user.
I then wrote a script with the following login:
Code:
cat /var/log/httpd/user_access* | grep Mar | awk '{print $10}' >> /tmp/webtest

total=0
while read LINE
do
    if [ $LINE = "-" ]
    then
        error=1
    else
        total=`expr $total + $LINE`
    fi
done < /tmp/webtest

total_web=$total

KB=`expr $total_web / 1024`
MB=`expr $KB / 1024`
GB=`expr $MB / 1024`
GB_round=$(echo "scale=2; $MB/1024" | bc)
this reads from the custom logs, position 10 which seems to be all the data transfer for HTTP.

The result of this was
Code:
9114953 KB
8901 MB
8.69 GB
As you can see, the totals are WAY off.
Am I doing something wrong in my understanding of webalizer and the http custom logs (using common)?

Thanks all!
 
Old 03-14-2008, 10:02 AM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Webalizer may be configured to ignore hits from some sites. This is useful to some people to prevent local testing of the web site from impacting the statistics. Having said this, my own webalizer seems to differ significantly from statistics gathered by my own code, but in the opposite direction; webalizer is reporting, at times, much more traffic than my analysis code that breaks down the log files. I am going to look into this a bit more closely. It will probably require inspecting the Webalizer source code, so don't hold your breath waiting for an explanation from that.
--- rod.
 
Old 03-14-2008, 03:36 PM   #3
blizunt7
Member
 
Registered: Mar 2004
Distribution: Fedora Core 1,2,3, RHEL3,4,5 Ubuntu
Posts: 272

Original Poster
Rep: Reputation: 30
Hey Rod,
Thanks. How are you inspecting the log files, if you dont mind me asking? Curious if my script is missing something, or if I have the right logic.

Thanks for the reply.
 
Old 03-14-2008, 05:13 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
I am using a perl script that analyzes a daily Apache access log file, in one of the standard formats, the name of which I cannot recall. I know the format is different from the one you are using, since the style I am using has less than ten fields per record. Until now, I haven't looked too closely since Webalizer and my log files are about 4 hours out of sync, since the log rotation is occurring at 0400 each day. My own script does not count bytes over periods greater than one day.
For what its worth, here is my code that parses the logfile:
Code:
#! /bin/perl -w
#
#       httpLogParse.pl
#       Parse apache log file, display daily hit summary
#
#================================================================

use strict;

use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:all/;

my $recordCount = 0;
my $byteTotal = 0;
my $bogusRequests;
my %hits;
my %sites;

my (
    $clientIp,
    $user,
    $group,
    $timeDate,
    $request,
    $retCode,
    $byteCount,
    $referrer,
    $userAgent,
    $method,
);

    my $filespec = param( 'logfile' );
    if( ! defined( $filespec ) || $filespec eq "" ){
        $filespec = "access_log";
    }
    my $fileName = $filespec;
    $filespec =~ s/^.*\///;         # remove any '.'s
    $filespec =~ s/|//g;            # remove pipes (prevent hacking by popen() )
    # $filespec = "/usr/local/apache2/logs/$filespec";
    $filespec = "/var/log/httpd/$filespec";

    open( LOGFILE, $filespec ) || die "Cannot open $filespec for input: $!\n";


    print header();
    print start_html( -title => "Daily httpd Log Parser" );
    print "httpLogParse ver 1.1.0<br>\n";

    print center( h2( "<A HREF=\"/cgi-bin/hitSources.pl?$fileName\">$filespec</A>" ) );
    print "<table border=\"2\">\n";

    my $todHour = 0;

    while(<LOGFILE>){
        my $logRecord = $_;

        $logRecord =~ m/^.+?\s+/g;
        # print "IP: '$&'\n";
        $clientIp = $&;
        chomp $clientIp;
        $logRecord =~ m/\G.+?\s+/g;
        # print "User: '$&'\n";
        $user = $&;
        chomp $user;

        $logRecord =~ m/\G.+?\s+/g;
        # print "Group: '$&'\n";
        $group = $&;
        chomp $group;

        $logRecord =~ m/\G.+?]\s+/g;
        # print "Datime: '$&'\n";
        $timeDate = $&;
        chomp $timeDate;

        $logRecord =~ m/\G".+?"\s+/g;
        # print "Req: '$&'\n";
        $request = $&;
        $request =~ s/\sHTTP.*$//;
        $request =~ m/^.+\s/;
        $method = $&;
        $request =~ s/^.+\s//;
        my $requestBase = $request;
        $requestBase =~ s/\?.+//;
        $hits{$requestBase}++;
        chomp $request;

        $logRecord =~ m/\G.+?\s+/g;
        # print "Retcode: '$&'\n";
        $retCode = $&;
        chomp $retCode;

        if( ! exists( $sites{ $clientIp } ) && ( $retCode == 200) ){
            $sites{ $clientIp } = 1;
        }

        $logRecord =~ m/\G.+?\s+/g;
        # print "Bytes: '$&'\n";
        $byteCount = $&;
        chomp $byteCount;
        if( $byteCount eq "- " ){ $byteCount = 0; }

        $logRecord =~ m/\G".+?"\s+/g;
        # print "Referrer: '$&'\n";
        $referrer = $&;
        chomp $referrer;

        $logRecord =~ m/\G".+?"\s+/g;
        # print "UserAgent: '$&'\n";
        $userAgent = $&;
        chomp $userAgent;

        $timeDate =~ m/:([0-9][0-9]):[0-9][0-9]:[0-9][0-9]/;
        my $tod = $&;
        if( $todHour != $1 ){
            print "<tr><td>$1</td></tr>\n";
            $todHour = $1;
        }


        print "<tr>\n";
        if( $retCode == 200 ){
            $byteTotal += $byteCount;
            print "<td bgcolor=\"#90EE90\"><A HREF=\"/cgi-bin/resolveClient.pl?$clientIp\">$clientIp</A></td>\n";
            print "<td bgcolor=\"#90EE90\">$request</td>\n";
            print "<td bgcolor=\"#90EE90\">$retCode</td>\n";
            print "<td bgcolor=\"#90EE90\">$byteCount</td>\n";
            print "<td bgcolor=\"#90EE90\">$timeDate</td>\n";

            $recordCount++;
        }
        elsif( $retCode == 304 ){
            $byteTotal += $byteCount;
            print "<td bgcolor=\"#FFFFBB\"><A HREF=\"/cgi-bin/resolveClient.pl?$clientIp\">$clientIp</A></td>\n";
            print "<td bgcolor=\"#FFFFBB\">$request</td>\n";
            print "<td bgcolor=\"#FFFFBB\">$retCode</td>\n";
            print "<td bgcolor=\"#FFFFBB\">$byteCount</td>\n";
            print "<td bgcolor=\"#FFFFBB\">$timeDate</td>\n";
            $recordCount++;
        }
        elsif( $method !~ m/^GET/ ){
            $bogusRequests++;
            print "<td bgcolor=\"#CD5C5C\"><A HREF=\"/cgi-bin/resolveClient.pl?$clientIp\">$clientIp</A></td>\n";
            print "<td bgcolor=\"#CD5C5C\"> $request</td>\n";
            print "<td bgcolor=\"#CD5C5C\"> $retCode</td>\n";
            print "<td bgcolor=\"#CD5C5C\"> $byteCount</td>\n";
            print "<td bgcolor=\"#CD5C5C\">$timeDate</td>\n";
        }
        print "</tr>\n";


    }
    close( LOGFILE );
    $filespec = param( 'logfile' );
    print "</table>\n";
    print "Total records: ", $recordCount + $bogusRequests, " ($bogusRequests bogus requests) for $byteTotal total bytes\n";
    print "<BR>\n";
    print "Total sites: ", scalar keys %sites,"<BR>\n";
    print "<table border=\"2\">\n";
    foreach my $hit ( sort keys %hits ){
            print "<tr><td><A HREF=\"/cgi-bin/hitSources.pl?$hit%20$filespec\">$hit</A></td><td> $hits{$hit}</td></tr>\n";
    }
    print "</table>\n";
    print end_html();
    exit;
This runs as a CGI, and also generates links to other of my web traffic analysis tools. Maybe someone else can see a problem with this code.

--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hard drive use discrepancy Yxven Linux - Newbie 8 05-12-2007 04:05 AM
hard disk size discrepancy rattlesocks Linux - Software 2 05-05-2007 08:38 AM
x.org 6.9 discrepancy? bobbens Linux - Software 6 01-12-2006 06:17 AM
A/V speed discrepancy in mPlayer coldwater Linux - Software 2 11-12-2005 08:47 PM
Server uptime discrepancy peppiv Linux - Software 2 05-17-2004 11:41 AM


All times are GMT -5. The time now is 03:29 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration