[SOLVED] Using BASH to automate data processing and table generation.
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Here are the code examples and the results they produced.
Code:
echo
echo "Method of LQ guru grail (using awk)"
[ -e "$OutFile01" ] && rm "$OutFile01"
for file in /home/daniel/Desktop/LQfiles/dbm330i*.txt; do
awk '$2 < 100{tot[FILENAME]+= $2 / $1;count[FILENAME]++}END{for(f in tot)print f,tot[f]/count[f]}' $file >> $OutFile01
done
/home/daniel/Desktop/LQfiles/dbm330i01.txt 0.99
/home/daniel/Desktop/LQfiles/dbm330i02.txt 0.376068
/home/daniel/Desktop/LQfiles/dbm330i05.txt 0.700787
Code:
echo
echo "Method of LQ member millgates (using awk),"
echo " with suggested improvements of LQ guru grail"
[ -e "$OutFile02" ] && rm "$OutFile02"
for file in /home/daniel/Desktop/LQfiles/dbm330i*.txt; do
awk -v ifname="$file" -v ofname="$OutFile02" '
BEGIN { sum = 0; num = 0; OFS="\t" }
$2 > 100 { x=$2/$1; sum+=x; num++; print $1"\t"$2"\t", x | "sort -nk 2" }
END { print ifname, sum/num >> ofname }
' "$file" > "${ifname}.sorted"
done
/home/daniel/Desktop/LQfiles/dbm330i01.txt 12.25
/home/daniel/Desktop/LQfiles/dbm330i02.txt 1.47183
/home/daniel/Desktop/LQfiles/dbm330i03.txt 1.4317
/home/daniel/Desktop/LQfiles/dbm330i04.txt 1.38984
/home/daniel/Desktop/LQfiles/dbm330i05.txt 1.15724
Code:
echo
echo "Method of LQ guru grail (using bash)"
[ -e "$OutFile03" ] && rm "$OutFile03"
for f in /home/daniel/Desktop/LQfiles/dbm330i*.txt
do
tot=0
count=0
while read -r x y
do
if (( y < 100 ))
then
tot=$( echo "$tot + $y / $x" | bc -l )
(( count++ ))
fi
done<"$f"
mean=$( echo "$tot / $count" | bc -l )
echo -e "$f\t$mean" >> $OutFile03
done
/home/daniel/Desktop/LQfiles/dbm330i01.txt .99000000000000000000
/home/daniel/Desktop/LQfiles/dbm330i02.txt .37606837606837606837
/home/daniel/Desktop/LQfiles/dbm330i03.txt
/home/daniel/Desktop/LQfiles/dbm330i04.txt
/home/daniel/Desktop/LQfiles/dbm330i05.txt .70078740157480314960
Code:
echo
echo "Method of LQ member millgates (using sed+bc)"
files=( /home/daniel/Desktop/LQfiles/dbm330i*.txt )
[ -e "$OutFile04" ] && rm "$OutFile04"
for f in /home/daniel/Desktop/LQfiles/dbm330i*.txt; do
echo -e "$f\t$((sed -r 's_([0-9]+)\s([0-9]+)_if(\2>100){sum+=\2/\1;cnt+=1}_' "$f";echo "sum/cnt")|bc -l)" >> $OutFile04
done
/home/daniel/Desktop/LQfiles/dbm330i01.txt 12.25000000000000000000
/home/daniel/Desktop/LQfiles/dbm330i02.txt 1.47182832284479490484
/home/daniel/Desktop/LQfiles/dbm330i03.txt 1.43170481444729154959
/home/daniel/Desktop/LQfiles/dbm330i04.txt 1.38984422635584763874
/home/daniel/Desktop/LQfiles/dbm330i05.txt 1.15724328447440575586
The results are not all alike. For those who contributed code: please examine my rendition of your post to make sure I didn't botch it.
Well for mine you can see the 2 scripts output the same except digits after the decimal point, an easy fix on either side. The reason they will differ to the others is I used y < 100
and they used y > 100. This was driven from the second requirement:
Quote:
2) Delete all data points whose y-coordinate (column 2) was less than a specified value, in this case 100.
I of course read this wrong (or too quickly as is normally the case) and saw 100 and less than
So to have mine concur with the others is again a simple change:
Code:
#Awk
awk '$2 >= 100{tot[FILENAME]+= $2 / $1;count[FILENAME]++}END{for(f in tot)print f,tot[f]/count[f]}' /home/daniel/Desktop/LQfiles/dbm330i*.txt
#Bash
#!/bin/bash
for f in /home/daniel/Desktop/LQfiles/dbm330i*.txt
do
tot=0
count=0
while read -r x y
do
if (( y >= 100 ))
then
tot=$( echo "scale=6; $tot + $y / $x" | bc )
(( count++ ))
fi
done<"$f"
mean=$( echo "scale=6; $tot / $count" | bc )
echo -e "$f\t$mean" >> $OutFile03
done
Thank you, grail, for minor corrections to your code. With those changes all results are equivalent.
Apologies to you, ta0kira, for omitting your code. No offense intended. As an inexperienced player in the Linux world, I had never even heard of Rscript. Consequently I was unable to understand your code, unable to execute it.
Let's hope that OP benefits from the many ideas presented in this thread. For sure, I did.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.