LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-06-2015, 11:23 AM   #1
jzoudavy
Member
 
Registered: Apr 2012
Distribution: Ubuntu, SUSE, Redhat
Posts: 188

Rep: Reputation: Disabled
awk to parse time and file transfer size


Hi all

I got this ftp server that I am trying to do some performance monitoring. I got the vsftpd logs.

Code:
[root@localhost log]# tail xferlog-20150906
Tue Oct  6 09:07:49 2015 1 192.168.10.1 1170448 /home/15.2.129.tar.gz  
Tue Sep  1 17:49:27 2015 1 192.168.10.2 0 /home/15.3.129.tar.gz  
Wed Sep  2 10:34:01 2015 1 192.168.10.11 0 /home/15.3.129.tar.gz
What I want is to find out the transfer size per day (8th item) and hour. I have worked out how to get the number of transferes per day. But how to break it down by day and hour eludes me.

This is what I got so far:
Code:
## reads number of downloads per day
awk '{count[$3]++} END {for(j in count) print j,"("count[j]" bytes)"}' xferlog*


## reads size of total transfer size from all avaliable logs, ie the last 30 days, --> works now thanks to HMW
awk '{sum+=$8} END {print sum}' xferlog*


## reads number of downloads by the hour --> doesn't work
awk '{count[$4]++} END {for(j in count) print j,"("count[j]" bytes)"}' xferlog*
Can anyone help me out?

#added for clarity below
The problem I have is that if I just added all the 01am downloads together I would get the 1am downloads for every day.
What I am looking for is like this:

Sept 1st 1am downloads 5GB.
Sept 2nd 1am downloads 2GB.

I am not sure how to structure the bash array with in awk to get that.

I also need to strip out the time from hh:mm:ss format into just hh so I can aggregate the hourly transfers. --> this part I am fairly certain I almost got it. sed to the punishment!

Thanks

Last edited by jzoudavy; 10-06-2015 at 02:15 PM. Reason: updated the question
 
Old 10-06-2015, 11:43 AM   #2
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Well, no expert in awk. But maybe I can help you with this part:
Quote:
reads size of total monthly transfer
So, lets say the log file looks like this:
Code:
Tue Oct  6 09:07:49 2015 1 192.168.10.1 10 /home/15.2.129.tar.gz  
Tue Sep  1 17:49:27 2015 1 192.168.10.2 20 /home/15.3.129.tar.gz  
Wed Sep  2 10:34:01 2015 1 192.168.10.11 30 /home/15.3.129.tar.gz
This awk prints out the expected result (60):
Code:
awk '{ sum+=$8 } END { print sum }' vsftp.log 
60
So, using the same logic (or lack thereof, awk still eludes me most of the time!) you ought to be able to solve your third question.

Best regards,
HMW
 
1 members found this post helpful.
Old 10-06-2015, 12:19 PM   #3
jzoudavy
Member
 
Registered: Apr 2012
Distribution: Ubuntu, SUSE, Redhat
Posts: 188

Original Poster
Rep: Reputation: Disabled
@HMW: thanks for your help, the changes worked!
 
Old 10-06-2015, 12:38 PM   #4
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by jzoudavy View Post
@HMW: thanks for your help, the changes worked!
Awesome. Please mark the thread as [SOLVED] if you consider this problem thus.

Best regards,
HMW
 
Old 10-06-2015, 01:43 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
ummmm ... I am a little confused (easily done some times). OP said :- reads size of total monthly transfer. Now I am not knocking HMW's solution as it is part way there on what I read that
to mean, but the OP has come back and said this is correct. Using the example data from HMW I would have thought the following would be the correct output:
Code:
# input data
Tue Oct  6 09:07:49 2015 1 192.168.10.1 10 /home/15.2.129.tar.gz  
Tue Sep  1 17:49:27 2015 1 192.168.10.2 20 /home/15.3.129.tar.gz  
Wed Sep  2 10:34:01 2015 1 192.168.10.11 30 /home/15.3.129.tar.gz

# output per month
Oct 10
Sep 50
Please advise exactly the type of data you want or if the suggested solution is actually all you wanted?
 
Old 10-06-2015, 02:09 PM   #6
jzoudavy
Member
 
Registered: Apr 2012
Distribution: Ubuntu, SUSE, Redhat
Posts: 188

Original Poster
Rep: Reputation: Disabled
Hi grail

sorry for the confusion. The current data set I have avaliable is from the past month. Sept 6 till Oct 6th. So that is one month for me. Or at least close enough for my purposes. I should have been clearer and say the past 30 days or so.
 
Old 10-06-2015, 02:23 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Quote:
Originally Posted by jzoudavy View Post
Hi grail

sorry for the confusion. The current data set I have avaliable is from the past month. Sept 6 till Oct 6th. So that is one month for me. Or at least close enough for my purposes. I should have been clearer and say the past 30 days or so.
And that is fine, but should you have say 2 months worth in the single file, the current solution will only give you the total of all entries in the file and not break it down.
I just wanted you to be aware.

Plus, as you also mentioned by hour, you will need more code to provide that level of detail.

May I also mention, you can create a complete awk script instead of several single awk lines in a bash script:
Code:
#!/usr/bin/awk -f

<your_code_here>
Then if you make it executable you can run it as you would your bash script.
 
Old 10-06-2015, 03:02 PM   #8
jzoudavy
Member
 
Registered: Apr 2012
Distribution: Ubuntu, SUSE, Redhat
Posts: 188

Original Poster
Rep: Reputation: Disabled
Hi grail

Thanks for awk scripting. did not know about that.

am still not sure on the logic of how to modify the code to pick up dates and hours though.
 
Old 10-06-2015, 04:12 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Here is a rough idea of what i would do:
Code:
# using this input file
Tue Oct  6 09:07:49 2015 1 192.168.10.1 11 /home/15.2.129.tar.gz  
Tue Sep  1 17:47:27 2015 1 192.168.10.2 10 /home/15.3.129.tar.gz  
Tue Sep  1 17:48:27 2015 1 192.168.10.2 20 /home/15.3.129.tar.gz  
Tue Sep  1 17:49:27 2015 1 192.168.10.2 30 /home/15.3.129.tar.gz  
Tue Sep  1 18:49:27 2015 1 192.168.10.2 40 /home/15.3.129.tar.gz  
Tue Sep  1 18:49:27 2015 1 192.168.10.2 50 /home/15.3.129.tar.gz  
Wed Sep  2 10:34:01 2015 1 192.168.10.11 60 /home/15.3.129.tar.gz

# and this code
#!/usr/bin/awk -f

NR > 1 && !( $2 in mon_sum){
	print "Totals for the month of " month ":"

	for( d in per_day_per_hour )
		for( h in per_day_per_hour[d] )
			print " Day :- " d " has a hourly sum of :- " per_day_per_hour[d][h]

	print "Monthly total is :- " mon_sum[month]

	delete per_day_per_hour
}

{
	month = $2
	mon_sum[month]+=$8
	
	split($4,hour,":")
	
	per_day_per_hour[$3][hour[1]]+=$8
}

END{
	print "Totals for the month of " month ":"

	for( d in per_day_per_hour )
		for( h in per_day_per_hour[d] )
			print " Day :- " d " has a hourly sum of :- " per_day_per_hour[d][h]

	print "Monthly total is :- " mon_sum[month]
}

# produces this output
Totals for the month of Oct:
 Day :- 6 has a hourly sum of :- 11
Monthly total is :- 11
Totals for the month of Sep:
 Day :- 1 has a hourly sum of :- 60
 Day :- 1 has a hourly sum of :- 90
 Day :- 2 has a hourly sum of :- 60
Monthly total is :- 210
I would of course put the repeated stuff in a function, but you get the idea
 
1 members found this post helpful.
Old 10-07-2015, 01:41 AM   #10
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by grail View Post
ummmm ... I am a little confused (easily done some times). OP said :- reads size of total monthly transfer. Now I am not knocking HMW's solution as it is part way there on what I read that
to mean, but the OP has come back and said this is correct.
Ah, yes. Well spotted! For that level of detail I would probably have used a Python script instead. Nice awk script there grail!

Best regards,
HMW
 
Old 10-07-2015, 03:46 AM   #11
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Ok... turns out I managed to do this in Bash (with just a little awk in there). But the logic was trickier than I expected, or maybe it's the fact that I only got four hours of sleep last night. Anyway... with this script.
Code:
#!/bin/bash

MONTH=""
HOUR=""

while read line; do
    CURR_MONTH=$(echo "$line" | awk '{ print $2 }')
    CURR_HOUR=$(echo "$line" | awk '{ print $8 }')
    if [[ $CURR_MONTH == $MONTH ]]; then
        ((HOUR=$HOUR+$CURR_HOUR))
    else # Update $MONTH and reset & add new hour(s) to new $MONTH
        if [[ $MONTH != "" ]]; then
            echo "Month $MONTH total $HOUR hours"
        fi  
        MONTH=$(echo "$line" | awk '{ print $2 }')
        HOUR=$CURR_HOUR
    fi  
done < vsftp.log

# If we have reched EOF, print out the final month
echo "Month $MONTH total $HOUR hours"

And the same infile as grail, I get this output:
Code:
$ ./parselog.sh 
Month Oct total 11 hours
Month Sep total 210 hours
My script is not as fancy as grail's, but I just wanted to give it a go based on the month variable.

Best regards,
HMW

Last edited by HMW; 10-07-2015 at 03:52 AM.
 
Old 10-07-2015, 06:25 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I like the idea HMW, but I am generally not a fan of using outside commands in my bash script unless I really have to or unless it really gives a huge speed boost.
Also, quick nit pick, you are totaling the downloaded data and not the hours.

So here are a couple of quick variants to show what I would do in bash (again just for downloads per month):
Code:
#!/usr/bin/env bash

MONTH=""
# Below you have 2 alternative while/read combos

# option 1
#while read _ c_month _ _ _ _ _ c_size _; do

# option 2 (if you want to try option 1, comment the next 3 lines)
while read -a data; do
	c_month=${data[1]}
	c_size=${data[7]}

	if [[ $c_month == $MONTH ]]; then
		(( size += c_size ))
	else # Update $MONTH and reset & add new hour(s) to new $MONTH
		[[ $MONTH ]] && echo "Month $MONTH total $size size"

		MONTH=$c_month
		size=$c_size
	fi  
done < "$1"

# If we have reched EOF, print out the final month
echo "Month $MONTH total $size size"
 
1 members found this post helpful.
Old 10-07-2015, 06:53 AM   #13
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by grail View Post
Also, quick nit pick, you are totaling the downloaded data and not the hours.
Yes, I know. Your awk was more complete in that regard. I ran out of gas.

Thanks for your version, appreciate it. Especially the approach to read data (line) into an array with the -a option. Gonna save that in memory!

Thanks again for taking the time buddy!

Best regards,
HMW

Zzzzzzz...
 
Old 10-07-2015, 03:55 PM   #14
jzoudavy
Member
 
Registered: Apr 2012
Distribution: Ubuntu, SUSE, Redhat
Posts: 188

Original Poster
Rep: Reputation: Disabled
hey HMW and grail

thanks for all your help on this. I got a question:

why the double brackets?

(( size += c_size ))
((HOUR=$HOUR+$CURR_HOUR))
 
Old 10-07-2015, 05:21 PM   #15
jzoudavy
Member
 
Registered: Apr 2012
Distribution: Ubuntu, SUSE, Redhat
Posts: 188

Original Poster
Rep: Reputation: Disabled
actually I got a follow up questions.
I have sanitized the input a bit more to make life easier:
So for the DL rate I am simply skipping everything after the decimal.

it was like this: 67402.46Kbyte/sec but arithmetically it doesn't work so I just used set to seperate the decimal and the kbyte/sec and use just 67402.

Code:
 
Oct 7 13 36 42 208430626 bytes, 67402 46 Kbyte/sec
Oct 7 13 36 53 7004609 bytes, 55082 20 Kbyte/sec
Oct 7 13 36 53 7004596 bytes, 38641 45 Kbyte/sec
Oct 7 13 36 53 7004326 bytes, 53266 48 Kbyte/sec
Oct 7 13 36 53 7003780 bytes, 48976 23 Kbyte/sec
Oct 7 13 37 23 11188721 bytes, 57261 59 Kbyte/sec
Oct 7 13 37 23 11187409 bytes, 49023 38 Kbyte/sec
Oct 7 13 38 12 2013066706 bytes, 45416 61 Kbyte/sec
Oct 7 13 38 15 2344883553 bytes, 48741 71 Kbyte/sec
Oct 6 09 07 49 1170448 bytes, 28916 61 Kbyte/sec
and modified/experimented the script that you have both provided to below, but I keep getting banged at line 16 and 21 with the brackets.
Code:
#!/bin/bash
#declare everything
MONTH=""
DAY=""
HOUR=""
DL_SIZE=0
DL_RATE=0


# read in via array, if i use awk then DL_SIZE gets treated as a string then the whole thing becomes string concatenation
while read -a line; do 
    CURR_MONTH=${line[1]}
    CURR_DAY=${line[2]}
    CURR_HOUR=${line[3]}
    CURR_DL_SIZE=${line[6]}
    CURR_DL_RATE=${line[8]}
	
#if same month, same day and same hour, add the DL size and DL rate, DL rate for avg hourly transfer rate, which will be implemented later

    if [ [ $CURR_MONTH == $MONTH ] && [ $CURR_DAY == $DAY ] && [ $CURR_HOUR == $HOUR ] ] 
	then
        DL_SIZE=$DL_SIZE+$CURR_DL_SIZE
	DL_RATE=$DL_RATE+$CURR_DL_RATE
		
    else # Update $MONTH and $DAY and $HOUR  
        if [ [  $MONTH != "" ] ]
	then
            echo "$MONTH $Day $HOUR total $DL_SIZE bytes " # introduce average DL rate later
        fi  
        MONTH=${line[1]}
	DAY=${line[2]}
	HOUR=${line[3]}
        
    fi  
done < simplified.vsftpd.log

# If we have reched EOF, print out the final month
echo "$MONTH $Day $HOUR total $DL_SIZE bytes "
when I run it I get this result and I have no idea why... line 16 and 32 are my if statements.

Code:
 
./ftp_analyzer.sh: line 16: [: 7: binary operator expected
./ftp_analyzer.sh: line 22: [: too many arguments
./ftp_analyzer.sh: line 16: [: too many arguments
./ftp_analyzer.sh: line 22: [: too many arguments
./ftp_analyzer.sh: line 16: [: too many arguments
./ftp_analyzer.sh: line 22: [: too many arguments
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
parse file with awk and sum totals master-of-puppets Programming 20 09-30-2014 01:06 AM
Help needed for using awk to parse a file to make array for bash script tallmtt Programming 12 04-14-2012 01:16 PM
bash: use file as input into array, parse out other variables from array using awk beeblequix Linux - General 2 11-20-2009 10:07 AM
file time stamp is wrong with ssh file transfer cy163 Linux - Newbie 8 05-18-2008 01:40 AM
ssimple shell script to parse a file ~sed or awk stevie_velvet Programming 7 07-14-2006 03:41 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration