Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
What I want is to find out the transfer size per day (8th item) and hour. I have worked out how to get the number of transferes per day. But how to break it down by day and hour eludes me.
This is what I got so far:
Code:
## reads number of downloads per day
awk '{count[$3]++} END {for(j in count) print j,"("count[j]" bytes)"}' xferlog*
## reads size of total transfer size from all avaliable logs, ie the last 30 days, --> works now thanks to HMW
awk '{sum+=$8} END {print sum}' xferlog*
## reads number of downloads by the hour --> doesn't work
awk '{count[$4]++} END {for(j in count) print j,"("count[j]" bytes)"}' xferlog*
Can anyone help me out?
#added for clarity below
The problem I have is that if I just added all the 01am downloads together I would get the 1am downloads for every day.
What I am looking for is like this:
I am not sure how to structure the bash array with in awk to get that.
I also need to strip out the time from hh:mm:ss format into just hh so I can aggregate the hourly transfers. --> this part I am fairly certain I almost got it. sed to the punishment!
Thanks
Last edited by jzoudavy; 10-06-2015 at 02:15 PM.
Reason: updated the question
ummmm ... I am a little confused (easily done some times). OP said :- reads size of total monthly transfer. Now I am not knocking HMW's solution as it is part way there on what I read that
to mean, but the OP has come back and said this is correct. Using the example data from HMW I would have thought the following would be the correct output:
Code:
# input data
Tue Oct 6 09:07:49 2015 1 192.168.10.1 10 /home/15.2.129.tar.gz
Tue Sep 1 17:49:27 2015 1 192.168.10.2 20 /home/15.3.129.tar.gz
Wed Sep 2 10:34:01 2015 1 192.168.10.11 30 /home/15.3.129.tar.gz
# output per month
Oct 10
Sep 50
Please advise exactly the type of data you want or if the suggested solution is actually all you wanted?
sorry for the confusion. The current data set I have avaliable is from the past month. Sept 6 till Oct 6th. So that is one month for me. Or at least close enough for my purposes. I should have been clearer and say the past 30 days or so.
sorry for the confusion. The current data set I have avaliable is from the past month. Sept 6 till Oct 6th. So that is one month for me. Or at least close enough for my purposes. I should have been clearer and say the past 30 days or so.
And that is fine, but should you have say 2 months worth in the single file, the current solution will only give you the total of all entries in the file and not break it down.
I just wanted you to be aware.
Plus, as you also mentioned by hour, you will need more code to provide that level of detail.
May I also mention, you can create a complete awk script instead of several single awk lines in a bash script:
Code:
#!/usr/bin/awk -f
<your_code_here>
Then if you make it executable you can run it as you would your bash script.
# using this input file
Tue Oct 6 09:07:49 2015 1 192.168.10.1 11 /home/15.2.129.tar.gz
Tue Sep 1 17:47:27 2015 1 192.168.10.2 10 /home/15.3.129.tar.gz
Tue Sep 1 17:48:27 2015 1 192.168.10.2 20 /home/15.3.129.tar.gz
Tue Sep 1 17:49:27 2015 1 192.168.10.2 30 /home/15.3.129.tar.gz
Tue Sep 1 18:49:27 2015 1 192.168.10.2 40 /home/15.3.129.tar.gz
Tue Sep 1 18:49:27 2015 1 192.168.10.2 50 /home/15.3.129.tar.gz
Wed Sep 2 10:34:01 2015 1 192.168.10.11 60 /home/15.3.129.tar.gz
# and this code
#!/usr/bin/awk -f
NR > 1 && !( $2 in mon_sum){
print "Totals for the month of " month ":"
for( d in per_day_per_hour )
for( h in per_day_per_hour[d] )
print " Day :- " d " has a hourly sum of :- " per_day_per_hour[d][h]
print "Monthly total is :- " mon_sum[month]
delete per_day_per_hour
}
{
month = $2
mon_sum[month]+=$8
split($4,hour,":")
per_day_per_hour[$3][hour[1]]+=$8
}
END{
print "Totals for the month of " month ":"
for( d in per_day_per_hour )
for( h in per_day_per_hour[d] )
print " Day :- " d " has a hourly sum of :- " per_day_per_hour[d][h]
print "Monthly total is :- " mon_sum[month]
}
# produces this output
Totals for the month of Oct:
Day :- 6 has a hourly sum of :- 11
Monthly total is :- 11
Totals for the month of Sep:
Day :- 1 has a hourly sum of :- 60
Day :- 1 has a hourly sum of :- 90
Day :- 2 has a hourly sum of :- 60
Monthly total is :- 210
I would of course put the repeated stuff in a function, but you get the idea
ummmm ... I am a little confused (easily done some times). OP said :- reads size of total monthly transfer. Now I am not knocking HMW's solution as it is part way there on what I read that
to mean, but the OP has come back and said this is correct.
Ah, yes. Well spotted! For that level of detail I would probably have used a Python script instead. Nice awk script there grail!
Ok... turns out I managed to do this in Bash (with just a little awk in there). But the logic was trickier than I expected, or maybe it's the fact that I only got four hours of sleep last night. Anyway... with this script.
Code:
#!/bin/bash
MONTH=""
HOUR=""
while read line; do
CURR_MONTH=$(echo "$line" | awk '{ print $2 }')
CURR_HOUR=$(echo "$line" | awk '{ print $8 }')
if [[ $CURR_MONTH == $MONTH ]]; then
((HOUR=$HOUR+$CURR_HOUR))
else # Update $MONTH and reset & add new hour(s) to new $MONTH
if [[ $MONTH != "" ]]; then
echo "Month $MONTH total $HOUR hours"
fi
MONTH=$(echo "$line" | awk '{ print $2 }')
HOUR=$CURR_HOUR
fi
done < vsftp.log
# If we have reched EOF, print out the final month
echo "Month $MONTH total $HOUR hours"
And the same infile as grail, I get this output:
Code:
$ ./parselog.sh
Month Oct total 11 hours
Month Sep total 210 hours
My script is not as fancy as grail's, but I just wanted to give it a go based on the month variable.
I like the idea HMW, but I am generally not a fan of using outside commands in my bash script unless I really have to or unless it really gives a huge speed boost.
Also, quick nit pick, you are totaling the downloaded data and not the hours.
So here are a couple of quick variants to show what I would do in bash (again just for downloads per month):
Code:
#!/usr/bin/env bash
MONTH=""
# Below you have 2 alternative while/read combos
# option 1
#while read _ c_month _ _ _ _ _ c_size _; do
# option 2 (if you want to try option 1, comment the next 3 lines)
while read -a data; do
c_month=${data[1]}
c_size=${data[7]}
if [[ $c_month == $MONTH ]]; then
(( size += c_size ))
else # Update $MONTH and reset & add new hour(s) to new $MONTH
[[ $MONTH ]] && echo "Month $MONTH total $size size"
MONTH=$c_month
size=$c_size
fi
done < "$1"
# If we have reched EOF, print out the final month
echo "Month $MONTH total $size size"
actually I got a follow up questions.
I have sanitized the input a bit more to make life easier:
So for the DL rate I am simply skipping everything after the decimal.
it was like this: 67402.46Kbyte/sec but arithmetically it doesn't work so I just used set to seperate the decimal and the kbyte/sec and use just 67402.
Code:
Oct 7 13 36 42 208430626 bytes, 67402 46 Kbyte/sec
Oct 7 13 36 53 7004609 bytes, 55082 20 Kbyte/sec
Oct 7 13 36 53 7004596 bytes, 38641 45 Kbyte/sec
Oct 7 13 36 53 7004326 bytes, 53266 48 Kbyte/sec
Oct 7 13 36 53 7003780 bytes, 48976 23 Kbyte/sec
Oct 7 13 37 23 11188721 bytes, 57261 59 Kbyte/sec
Oct 7 13 37 23 11187409 bytes, 49023 38 Kbyte/sec
Oct 7 13 38 12 2013066706 bytes, 45416 61 Kbyte/sec
Oct 7 13 38 15 2344883553 bytes, 48741 71 Kbyte/sec
Oct 6 09 07 49 1170448 bytes, 28916 61 Kbyte/sec
and modified/experimented the script that you have both provided to below, but I keep getting banged at line 16 and 21 with the brackets.
Code:
#!/bin/bash
#declare everything
MONTH=""
DAY=""
HOUR=""
DL_SIZE=0
DL_RATE=0
# read in via array, if i use awk then DL_SIZE gets treated as a string then the whole thing becomes string concatenation
while read -a line; do
CURR_MONTH=${line[1]}
CURR_DAY=${line[2]}
CURR_HOUR=${line[3]}
CURR_DL_SIZE=${line[6]}
CURR_DL_RATE=${line[8]}
#if same month, same day and same hour, add the DL size and DL rate, DL rate for avg hourly transfer rate, which will be implemented later
if [ [ $CURR_MONTH == $MONTH ] && [ $CURR_DAY == $DAY ] && [ $CURR_HOUR == $HOUR ] ]
then
DL_SIZE=$DL_SIZE+$CURR_DL_SIZE
DL_RATE=$DL_RATE+$CURR_DL_RATE
else # Update $MONTH and $DAY and $HOUR
if [ [ $MONTH != "" ] ]
then
echo "$MONTH $Day $HOUR total $DL_SIZE bytes " # introduce average DL rate later
fi
MONTH=${line[1]}
DAY=${line[2]}
HOUR=${line[3]}
fi
done < simplified.vsftpd.log
# If we have reched EOF, print out the final month
echo "$MONTH $Day $HOUR total $DL_SIZE bytes "
when I run it I get this result and I have no idea why... line 16 and 32 are my if statements.
Code:
./ftp_analyzer.sh: line 16: [: 7: binary operator expected
./ftp_analyzer.sh: line 22: [: too many arguments
./ftp_analyzer.sh: line 16: [: too many arguments
./ftp_analyzer.sh: line 22: [: too many arguments
./ftp_analyzer.sh: line 16: [: too many arguments
./ftp_analyzer.sh: line 22: [: too many arguments
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.