[SOLVED] [Bash] Totalling & Averaging in one go

blenderfox · 07-03-2013, 04:15 AM

I want to total up all files created within each minute in the current directory and display the quantities and the final average. I can do the first part by this code (not the best, but it does its job)

Code:

for a in `ls -l | awk --field-separator=" " '{ print $8 }' | sort -u`
  do
    echo "$a,`ls -l | grep $a | wc -l`"
  done

Which gives me output like this:

Code:

09:01,14
09:02,22
...
...
09:22,16

How do I total up the second fields (14, 22, ..., 16, etc.) and create the average and have something like:

Code:

09:01,14
09:02,22
....
....
09:22,16
Average: 20

NevemTeve · 07-03-2013, 04:20 AM

Use an actual programming language like C, perl, awk, whatever
(for the first part find(1) would be a better tool)

blenderfox · 07-03-2013, 06:52 AM

Found one way of doing it:

Code:

if [ -f timing.tmp ]; then
  rm timing.tmp
fi

for a in `ls -l | awk --field-separator=" " '{ print $8 }' | sort -u`
  do
    echo Time=$a, Count=`ls -l | grep $a | wc -l`
	ls -l | grep $a | wc -l >>timing.tmp
  done

awk '{ s += $1 } END { print "Sum: ", s,", Avg: ", s/NR, ", Count: ", NR }' timing.tmp
  
rm timing.tmp

Example output:

Code:

Time=09:01, Count=14
Time=09:02, Count=22
Time=09:03, Count=18
...
...
Time=09:21, Count=22
Time=09:22, Count=16
Sum:  444 , Avg:  20.1818 , Count:  22

grail · 07-03-2013, 07:09 AM

Ok, so a few things:

1. Do not parse ls, see here for more. the main reason would be the awk you are using will fall in a screaming heap if any file contains white space

2. Why repeat code twice when you can just place it in a variable -- ls -l | grep $a | wc -l

3. You seem to be ok at using awk, so as advised originally, why not use it to do the work you require

4. There would be no need for a temp file if using awk (or one of the other languages suggested)

H_TeXMeX_H · 07-03-2013, 07:18 AM

I recommend using 'stat' instead of 'ls', it is capable of displaying the same information, but in a more predictable manner.

blenderfox · 07-03-2013, 07:43 AM

Quote:

Originally Posted by grail

Ok, so a few things:

1. Do not parse ls, see here for more. the main reason would be the awk you are using will fall in a screaming heap if any file contains white space

2. Why repeat code twice when you can just place it in a variable -- ls -l | grep $a | wc -l

3. You seem to be ok at using awk, so as advised originally, why not use it to do the work you require

4. There would be no need for a temp file if using awk (or one of the other languages suggested)

Well, you learn by experimenting then refining. But thanks for the links, I'll definitely take a look at them.

EDIT: Actually, on the topic on refining, how would this be done totally in awk? I do know awk to a limited level, so this could help improve my knowledge.

danielbmartin · 07-03-2013, 08:32 AM

Quote:

Originally Posted by blenderfox

... how would this be done totally in awk?

This is an example of computing columwise averages using awk. Adapt it to your own application.

With this InFile ...

Code:

20 30 50
18 32 55
22 34 60

... this awk ...

Code:

awk '{for(i=1;i<=NF;i++){num[i]++; sum[i]+=$i} print}
  END{for(i=1;i<=NF;i++) $i=sum[i]/num[i];print}' $InFile >$OutFile

... produced this OutFile ...

Code:

Daniel B. Martin

grail · 07-03-2013, 09:56 AM

Interestingly, after looking a little further at your solutions, you do realise that by returning the time portion (which is another reason not to use ls as on my computer $8 is the file name)
that while the time of the file may be '09:01' (from your example) that the actual date could be from any point in time, ie 30.05.2013 09:01 and 03.07.2013 09:01
One would guess that this would not be the desired output ( I could be wrong )

blenderfox · 07-04-2013, 01:20 AM

Quote:

Originally Posted by grail

Interestingly, after looking a little further at your solutions, you do realise that by returning the time portion (which is another reason not to use ls as on my computer $8 is the file name)
that while the time of the file may be '09:01' (from your example) that the actual date could be from any point in time, ie 30.05.2013 09:01 and 03.07.2013 09:01
One would guess that this would not be the desired output ( I could be wrong )

Yes, appreciate that might be the case. However this script was meant as a quick fix. Once it worked (which it does), then comes the refining and improving, which all the contributions here are helping with - so thank you all for your comments. I'm not an expert at scripting by any means, and as with most programming and scripting, there's many ways to do the same thing, although some ways are better and/or more efficient. I forgot about the iteration and looping constructs in awk, so I have to thank danielbmartin for that.

konsolebox · 07-04-2013, 02:14 AM

I second H_TeXMeX_H's suggestion of using stat. And as a proof of concept you can have a script like this which makes use of associative arrays and exploits default behavior of bash to have indices in arrays to be always sorted:

Code:

#!/bin/bash

[[ BASH_VERSINFO -ge 4 ]] || {
	echo "You need at least Bash version 4.0 to run this script." >&2
	exit 1
}

declare -a LIST=()
declare -A COUNTS=()
declare -i TOTALFILES=0

for FILE in *; do
	if [[ -f $FILE ]]; then
		DATESTRING=$(exec stat -c '%y' "$FILE")
		DATESTRING=${DATESTRING%:*}
		TIMESTAMP=$(exec date -d "$DATESTRING" '+%s')
		LIST[TIMESTAMP]="$DATESTRING"
		(( ++COUNTS[$DATESTRING] ))
		(( ++TOTALFILES ))
	fi
done

for I in "${!LIST[@]}"; do
	DATESTRING=${LIST[I]}
	COUNT=${COUNTS[$DATESTRING]}
	echo "Time: $DATESTRING, Count: $COUNT"
done

TOTALTIMES=${#LIST[@]}
AVERAGE10000=$(( TOTALFILES * 10000 / TOTALTIMES ))
if [[ ${#AVERAGE10000} -gt 4 ]]; then
	INT=${AVERAGE10000:0:(-4)}
else
	INT=0
fi
DEC=0000${AVERAGE10000}; DEC=${DEC:(-4)}

echo "Sum: $TOTALFILES, Avg: $INT.$DEC, Count: $TOTALTIMES"

Example output:

Code:

Time: 2013-05-12 06:44, Count: 1
Time: 2013-05-14 12:09, Count: 1
Time: 2013-05-27 05:10, Count: 1
Time: 2013-05-27 15:02, Count: 1
Time: 2013-05-27 19:25, Count: 1
Time: 2013-05-27 19:32, Count: 1
Time: 2013-05-27 23:44, Count: 1
Time: 2013-06-05 15:12, Count: 1
Time: 2013-06-07 10:52, Count: 1
Time: 2013-06-07 17:44, Count: 1
Time: 2013-06-28 10:42, Count: 1
Time: 2013-06-28 11:21, Count: 1
Time: 2013-06-28 15:29, Count: 1
Time: 2013-06-28 17:07, Count: 1
Time: 2013-06-28 17:12, Count: 1
Time: 2013-06-28 17:14, Count: 1
Time: 2013-07-02 22:11, Count: 1
Time: 2013-07-04 11:49, Count: 1
Time: 2013-07-04 15:10, Count: 1
Sum: 19, Avg: 1.0000, Count: 19

Additional Note: Following grail's idea I decided to just base it on datetimes instead of just hours and minutes. Also I just based it on the modification time instead of creation time as it seems to be not supported or was disabled on my filesystem, but you could just change the argument to the stat command.

And you need at least version 4.0 of Bash to run the script.

blenderfox · 07-04-2013, 02:17 AM

@konsolebox - that looks perfect. A lot of extra lines of code compared to the other solutions, but like I said previously, there's always more than one way to do the same thing.

konsolebox · 07-04-2013, 02:30 AM

Quote:

Originally Posted by blenderfox

but like I said previously, there's always more than one way to do the same thing.

Obviously, that is why I'm just showing a proof of concept. Yet again, what was it that you really needed at first, and do you plan to change that now? Still, having one good solution I believe this thread could be marked as solved already.

And why do you have to seek for another way? And I don't think you could do that better in Awk, although you could do it better with interpreted languages.

blenderfox · 07-04-2013, 02:31 AM

Quote:

Originally Posted by konsolebox

Obviously, that is why I'm just showing a proof of concept. Yet again, what was it that you really needed at first, and do you plan to change that now? Still, having one good solution I believe this thread could be marked as solved already.

Yep, will mark it solved. Thanks for all the contributions.

grail · 07-04-2013, 06:01 AM

And here is a Ruby option:

Code:

ruby -e 'BEGIN{ list={}; 
                      list.default = 0;
                      total_f = 0
                    };
               $*.each{ |f| list[File.mtime(f).strftime("%F %R")] += 1; 
                            total_f += 1
                      };
               END{ list.each{|k,v| puts "Time: #{k}, Count: #{v}" }; 
                    puts "Sum: #{total_f}, Avg: #{total_f/list.length.to_f}, Count: #{list.length}"
                  }
               ' *

blenderfox · 07-04-2013, 06:04 AM

I'm not too familiar with Ruby, but thanks for that as well

Quote:

Originally Posted by grail

And here is a Ruby option:

Code:

ruby -e 'BEGIN{ list={}; 
                      list.default = 0;
                      total_f = 0
                    };
               $*.each{ |f| list[File.mtime(f).strftime("%F %R")] += 1; 
                            total_f += 1
                      };
               END{ list.each{|k,v| puts "Time: #{k}, Count: #{v}" }; 
                    puts "Sum: #{total_f}, Avg: #{total_f/list.length.to_f}, Count: #{list.length}"
                  }
               ' *