LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-03-2013, 04:15 AM   #1
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Rep: Reputation: Disabled
[Bash] Totalling & Averaging in one go


I want to total up all files created within each minute in the current directory and display the quantities and the final average. I can do the first part by this code (not the best, but it does its job)

Code:
for a in `ls -l | awk --field-separator=" " '{ print $8 }' | sort -u`
  do
    echo "$a,`ls -l | grep $a | wc -l`"
  done
Which gives me output like this:

Code:
09:01,14
09:02,22
...
...
09:22,16
How do I total up the second fields (14, 22, ..., 16, etc.) and create the average and have something like:

Code:
09:01,14
09:02,22
....
....
09:22,16
Average: 20
 
Old 07-03-2013, 04:20 AM   #2
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,862
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Use an actual programming language like C, perl, awk, whatever
(for the first part find(1) would be a better tool)
 
Old 07-03-2013, 06:52 AM   #3
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Original Poster
Rep: Reputation: Disabled
Found one way of doing it:

Code:
if [ -f timing.tmp ]; then
  rm timing.tmp
fi

for a in `ls -l | awk --field-separator=" " '{ print $8 }' | sort -u`
  do
    echo Time=$a, Count=`ls -l | grep $a | wc -l`
	ls -l | grep $a | wc -l >>timing.tmp
  done

awk '{ s += $1 } END { print "Sum: ", s,", Avg: ", s/NR, ", Count: ", NR }' timing.tmp
  
rm timing.tmp
Example output:

Code:
Time=09:01, Count=14
Time=09:02, Count=22
Time=09:03, Count=18
...
...
Time=09:21, Count=22
Time=09:22, Count=16
Sum:  444 , Avg:  20.1818 , Count:  22
 
Old 07-03-2013, 07:09 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Ok, so a few things:

1. Do not parse ls, see here for more. the main reason would be the awk you are using will fall in a screaming heap if any file contains white space

2. Why repeat code twice when you can just place it in a variable -- ls -l | grep $a | wc -l

3. You seem to be ok at using awk, so as advised originally, why not use it to do the work you require

4. There would be no need for a temp file if using awk (or one of the other languages suggested)
 
Old 07-03-2013, 07:18 AM   #5
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
I recommend using 'stat' instead of 'ls', it is capable of displaying the same information, but in a more predictable manner.
 
Old 07-03-2013, 07:43 AM   #6
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Ok, so a few things:

1. Do not parse ls, see here for more. the main reason would be the awk you are using will fall in a screaming heap if any file contains white space

2. Why repeat code twice when you can just place it in a variable -- ls -l | grep $a | wc -l

3. You seem to be ok at using awk, so as advised originally, why not use it to do the work you require

4. There would be no need for a temp file if using awk (or one of the other languages suggested)
Well, you learn by experimenting then refining. But thanks for the links, I'll definitely take a look at them.

EDIT: Actually, on the topic on refining, how would this be done totally in awk? I do know awk to a limited level, so this could help improve my knowledge.

Last edited by blenderfox; 07-03-2013 at 07:44 AM.
 
Old 07-03-2013, 08:32 AM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by blenderfox View Post
... how would this be done totally in awk?
This is an example of computing columwise averages using awk. Adapt it to your own application.

With this InFile ...
Code:
20 30 50
18 32 55
22 34 60
... this awk ...
Code:
awk '{for(i=1;i<=NF;i++){num[i]++; sum[i]+=$i} print}
  END{for(i=1;i<=NF;i++) $i=sum[i]/num[i];print}' $InFile >$OutFile
... produced this OutFile ...
Code:
20 30 50
18 32 55
22 34 60
20 32 55
Daniel B. Martin
 
1 members found this post helpful.
Old 07-03-2013, 09:56 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Interestingly, after looking a little further at your solutions, you do realise that by returning the time portion (which is another reason not to use ls as on my computer $8 is the file name)
that while the time of the file may be '09:01' (from your example) that the actual date could be from any point in time, ie 30.05.2013 09:01 and 03.07.2013 09:01
One would guess that this would not be the desired output ( I could be wrong )
 
Old 07-04-2013, 01:20 AM   #9
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Interestingly, after looking a little further at your solutions, you do realise that by returning the time portion (which is another reason not to use ls as on my computer $8 is the file name)
that while the time of the file may be '09:01' (from your example) that the actual date could be from any point in time, ie 30.05.2013 09:01 and 03.07.2013 09:01
One would guess that this would not be the desired output ( I could be wrong )
Yes, appreciate that might be the case. However this script was meant as a quick fix. Once it worked (which it does), then comes the refining and improving, which all the contributions here are helping with - so thank you all for your comments. I'm not an expert at scripting by any means, and as with most programming and scripting, there's many ways to do the same thing, although some ways are better and/or more efficient. I forgot about the iteration and looping constructs in awk, so I have to thank danielbmartin for that.
 
Old 07-04-2013, 02:14 AM   #10
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
I second H_TeXMeX_H's suggestion of using stat. And as a proof of concept you can have a script like this which makes use of associative arrays and exploits default behavior of bash to have indices in arrays to be always sorted:
Code:
#!/bin/bash

[[ BASH_VERSINFO -ge 4 ]] || {
	echo "You need at least Bash version 4.0 to run this script." >&2
	exit 1
}

declare -a LIST=()
declare -A COUNTS=()
declare -i TOTALFILES=0

for FILE in *; do
	if [[ -f $FILE ]]; then
		DATESTRING=$(exec stat -c '%y' "$FILE")
		DATESTRING=${DATESTRING%:*}
		TIMESTAMP=$(exec date -d "$DATESTRING" '+%s')
		LIST[TIMESTAMP]="$DATESTRING"
		(( ++COUNTS[$DATESTRING] ))
		(( ++TOTALFILES ))
	fi
done

for I in "${!LIST[@]}"; do
	DATESTRING=${LIST[I]}
	COUNT=${COUNTS[$DATESTRING]}
	echo "Time: $DATESTRING, Count: $COUNT"
done

TOTALTIMES=${#LIST[@]}
AVERAGE10000=$(( TOTALFILES * 10000 / TOTALTIMES ))
if [[ ${#AVERAGE10000} -gt 4 ]]; then
	INT=${AVERAGE10000:0:(-4)}
else
	INT=0
fi
DEC=0000${AVERAGE10000}; DEC=${DEC:(-4)}

echo "Sum: $TOTALFILES, Avg: $INT.$DEC, Count: $TOTALTIMES"
Example output:
Code:
Time: 2013-05-12 06:44, Count: 1
Time: 2013-05-14 12:09, Count: 1
Time: 2013-05-27 05:10, Count: 1
Time: 2013-05-27 15:02, Count: 1
Time: 2013-05-27 19:25, Count: 1
Time: 2013-05-27 19:32, Count: 1
Time: 2013-05-27 23:44, Count: 1
Time: 2013-06-05 15:12, Count: 1
Time: 2013-06-07 10:52, Count: 1
Time: 2013-06-07 17:44, Count: 1
Time: 2013-06-28 10:42, Count: 1
Time: 2013-06-28 11:21, Count: 1
Time: 2013-06-28 15:29, Count: 1
Time: 2013-06-28 17:07, Count: 1
Time: 2013-06-28 17:12, Count: 1
Time: 2013-06-28 17:14, Count: 1
Time: 2013-07-02 22:11, Count: 1
Time: 2013-07-04 11:49, Count: 1
Time: 2013-07-04 15:10, Count: 1
Sum: 19, Avg: 1.0000, Count: 19
Additional Note: Following grail's idea I decided to just base it on datetimes instead of just hours and minutes. Also I just based it on the modification time instead of creation time as it seems to be not supported or was disabled on my filesystem, but you could just change the argument to the stat command.

And you need at least version 4.0 of Bash to run the script.

Last edited by konsolebox; 07-04-2013 at 02:27 AM.
 
Old 07-04-2013, 02:17 AM   #11
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Original Poster
Rep: Reputation: Disabled
@konsolebox - that looks perfect. A lot of extra lines of code compared to the other solutions, but like I said previously, there's always more than one way to do the same thing.
 
Old 07-04-2013, 02:30 AM   #12
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Quote:
Originally Posted by blenderfox View Post
but like I said previously, there's always more than one way to do the same thing.
Obviously, that is why I'm just showing a proof of concept. Yet again, what was it that you really needed at first, and do you plan to change that now? Still, having one good solution I believe this thread could be marked as solved already.

And why do you have to seek for another way? And I don't think you could do that better in Awk, although you could do it better with interpreted languages.

Last edited by konsolebox; 07-04-2013 at 02:33 AM.
 
Old 07-04-2013, 02:31 AM   #13
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by konsolebox View Post
Obviously, that is why I'm just showing a proof of concept. Yet again, what was it that you really needed at first, and do you plan to change that now? Still, having one good solution I believe this thread could be marked as solved already.
Yep, will mark it solved. Thanks for all the contributions.
 
Old 07-04-2013, 06:01 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
And here is a Ruby option:
Code:
ruby -e 'BEGIN{ list={}; 
                      list.default = 0;
                      total_f = 0
                    };
               $*.each{ |f| list[File.mtime(f).strftime("%F %R")] += 1; 
                            total_f += 1
                      };
               END{ list.each{|k,v| puts "Time: #{k}, Count: #{v}" }; 
                    puts "Sum: #{total_f}, Avg: #{total_f/list.length.to_f}, Count: #{list.length}"
                  }
               ' *
 
Old 07-04-2013, 06:04 AM   #15
blenderfox
LQ Newbie
 
Registered: Apr 2013
Location: London, UK
Distribution: Debian, Ubuntu, Linux Mint, Fedora
Posts: 24

Original Poster
Rep: Reputation: Disabled
I'm not too familiar with Ruby, but thanks for that as well

Quote:
Originally Posted by grail View Post
And here is a Ruby option:
Code:
ruby -e 'BEGIN{ list={}; 
                      list.default = 0;
                      total_f = 0
                    };
               $*.each{ |f| list[File.mtime(f).strftime("%F %R")] += 1; 
                            total_f += 1
                      };
               END{ list.each{|k,v| puts "Time: #{k}, Count: #{v}" }; 
                    puts "Sum: #{total_f}, Avg: #{total_f/list.length.to_f}, Count: #{list.length}"
                  }
               ' *
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Averaging columns from multiple files carlr Programming 3 03-18-2012 01:24 AM
alsa - averaging stereo to mono for speaker output bdjnk Linux - General 0 08-18-2009 02:38 PM
Bash background (&) with control operators (&&) int0x80 Programming 1 02-11-2009 12:54 PM
Bash: Print usage statement & exit; otherwise continue using Bash shorthand operators stefanlasiewski Programming 9 02-07-2006 05:20 PM
Adding 2 more drives to /proc/stat (totalling 4) fishsponge Linux - General 1 09-08-2004 07:00 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:16 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration