Greetings,
I'm new to writing shell scripts but after searching through this forum I've gotten most of what I want. However, I'm still having an issue. I am attempting to backload data from hundred's of large csv's into RRD. Essentially I want to parse out info from hundred's of columns in hundred's of files into 4 columns of data I want in one file. Essentially the date in unix time, the min, max, and average of a reading (say CPU) for the day, compiling all the days data into one file. The problem is I can't get my head around how to get the data to return on the same line.
data:
file1.csv
4 lines of headers I don't need
1-JUL-2006 17:41:00.00,111.42,xxx,yyy,zzz,etc..
1-JUL-2006 17:42:00.00,69.87,xxx,yyy,zzz,etc..
1-JUL-2006 17:43:00.00,101.10,xxx,yyy,zzz,etc..
etc...
file2.csv
4 lines of headers I don't need
2-JUL-2006 17:41:00.00,111.42,xxx,yyy,zzz,etc..
2-JUL-2006 17:42:00.00,69.87,xxx,yyy,zzz,etc..
2-JUL-2006 17:43:00.00,101.10,xxx,yyy,zzz,etc..
etc..
What I have is:
Code:
#!/bin/bash
destdir=/home/t4data/working
for f in $destdir/*.csv
do
tail -1 $f |cut -c1-12| while read line # gives me the date for the file
do
date -d "$line" +%s # convert to unix time
done
tail -n+5 $f |awk -F, ' { print $7 } ' |awk -f sum.awk # min,max,ave
done
where sum.awk is
Code:
$ cat sum.awk
{
if (NR == 1) {
sum=min=max=$1;
} else {
sum += $1;
min = (min < $1) ? min : $1;
max = (max > $1) ? max : $1;
}
}
END {
# print "The average is: " sum/NR " min: " min " max: " max;
print " min: " min " max: " max " average: "sum/NR;
}
Output:
1153022400
min: 44.45 max: 666.22 average: 136.055
1153108800
min: 51.28 max: 1298.38 average: 374.275
etc...
I realise why the data returns on two separate lines, and I thought it would be simple to correct, but no matter what I try I can't seem to get to my desired format of:
1153022400 min: 44.45 max: 666.22 average: 136.055
1153108800 min: 51.28 max: 1298.38 average: 374.275
etc...
Thoughts?