LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-27-2011, 12:24 AM   #1
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Rep: Reputation: Disabled
Bash script to read csv file with multiple length columns


I've searched everywhere and I can't come up with a good solution. Unfortunately I'm kind of stuck using a shell script to achieve the following.

I have the following type of data


0.46,0.45,0.43,0.42,0.43,0.52,0.57,0.65,0.69,0.70
0.71,0.95,0.95,1.00,1.02,1.03,1.02,1.16
1.21,1.41,1.42,1.40,1.40,1.39,1.39,1.35,1.45
1.67,1.66,1.65,1.65,1.63,1.65,1.68,1.66,1.64,1.60,1.58
1.56,1.52,1.47,1.42

For each line I need to find the average, min, and max. I've seen plenty of solutions where the number of columns is fixed, unfortunately for me these lines can get pretty large.

My thought was to read each line individually into an array, loop through the array and find the avg, min, and max that way but i haven't had much luck.

I can read each line using a while loop but I'm having trouble with the array part, or perhaps that's not the best solution? Any suggestions, help is appreciated.
 
Old 07-27-2011, 12:34 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Does it have to be bash? Bash is not naturally suited to this type of problem, either in ease of programming nor running speed.

If it has to be bash then it is possible so please ask.
 
Old 07-27-2011, 12:36 AM   #3
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Yes unfortunately it has to be bash or shell. i would personally much rather use ruby or almost anything else.
 
Old 07-27-2011, 12:58 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by japena View Post
Yes unfortunately it has to be bash or shell. i would personally much rather use ruby or almost anything else.
Before launching into a pure bash solution ...

bash does not have fractional arithmetic capability. The normal solution is for bash to call the bc or expr commands. bash is a command shell, it is a way of running commands that also has some language constructs. Are you allowed to call awk from your bash script?
 
Old 07-27-2011, 01:08 AM   #5
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Yes I was planning on using expr to do the sum while keeping a counter to be a able to divide afterwards for the average. Yes I'm able to use awk. I saw several examples that use awk but all of them had a fixed amount of columns and most of the time only 2 or 3 columns which doesn't work for me, I didn't see how I could use awk.
 
Old 07-27-2011, 01:15 AM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
So it would be OK to use an awk script directly ... ?
 
Old 07-27-2011, 01:16 AM   #7
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Yes
 
Old 07-27-2011, 01:19 AM   #8
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
OK

I've got to go now but will help with an awk solution later if nobody else has by then
 
Old 07-27-2011, 01:23 AM   #9
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Great would really appreciate it. I should mention that I can't use expr after all because the file doesn't only contain integers.
 
Old 07-27-2011, 01:53 AM   #10
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Hi,

I had been thinking about a pure bash solution before you stated that awk is OK to use. Well, awk is definitely the way to go. However, since I spent some time thinking about pure bash I'd still like to present a clumsy pure bash solution:
Code:
IFS=',';while read line; do set -- $line; echo "10 k 0 ${line//,/+}+${#}/ p" | dc ; done < file
You will notice that values like '0.123' are just printed as '.123'. I am not sure if there is any way to tell 'bc' to format the output like a normal person would expect it. So I tried to compute the result with 'dc'. But it has the same problem regarding the formatting.
 
Old 07-27-2011, 02:17 AM   #11
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Ok,

an awk solution that calulates average, min and max values:
Code:
awk -F ',' '{min=$1;max=$1;a=0;for (i=1;i<=NF;i++) {a+=$i;if ($i < min){min=$i};if ($i > max){max=$i}};print "average: " a/NF " min: " min " max: " max}' file
Not sure if the results are needed for further processing. If so, then you might need an alternative output format.
 
Old 07-27-2011, 05:44 AM   #12
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by crts View Post
values like '0.123' are just printed as '.123'. I am not sure if there is any way to tell 'bc' to format the output like a normal person would expect it.
AFAIK there is no way to tell bc to do that. You could capture the bc or dc output and format it with bash' printf:
Code:
IFS=','
while read line
do 
    set -- $line
    avg=$( echo "10 k 0 ${line//,/+}+${#}/ p" | dc )
    echo printf '%1.2f' $avg
done < file
unset IFS # Effectively restores the default value

Last edited by catkin; 07-27-2011 at 05:46 AM. Reason: brevity and clarity
 
Old 07-27-2011, 09:18 AM   #13
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi crts, both solutions work great I really have to start learning awk.

Not sure I understand this line in the bash solution

echo "10 k 0 ${line//,/+}+${#}/ p" | dc


Could you tell me what the "10 k 0" and "p" are?
 
Old 07-27-2011, 09:51 AM   #14
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Quote:
Originally Posted by japena View Post
Hi crts, both solutions work great I really have to start learning awk.

Not sure I understand this line in the bash solution

echo "10 k 0 ${line//,/+}+${#}/ p" | dc


Could you tell me what the "10 k 0" and "p" are?
Ok,

there are some things you need to know about dc:
1. it is a reverse polish notation calulator
2. division is by default an integer division, i.e. 3/2 will return 1 as result. You have to explicitly set the precision to get the fractional part.

Let's break the above statement down:
${line//,/+}
This is bash's string substitution mechanism. Suppose we have the follwing input
a,b
Afterwards the input will be:
a+b

As I mentioned dc is an RPN calulator, so it would expect input in the form of
a b +
This is not yet the case, so we need to manipulate the input a bit more. Instead of a complicated reordering I simply prepend a zero and append a plus:
0 a + b +
This is indeed a valid RPN expression and equivalent to a+b (infix notation).

${#} is the number of arguments that have been "created" by 'set -- $line'. This is what we need to divide by to get the average - in the example that would be 2. In RPN this looks like
0 a + b + 2 /
This is our expression that is equivalent to (a+b)/2. After it is calculated we need to tell dc to print the result. This is what 'p' does.
The '10 k' part sets the precision. As I said, division is by default an integer division. To get the fraction we set '10 k' which tells dc to truncate 10 numbers after the decimal point. E.g.:
'3 2 / p' will by default print 1
'2 k 3 2 / p' will print 1.50
'10 k 3 2 / p' will print 1.5000000000

The calculation is a stack based operation process. If you are not familiar with RPN then this will probably look a bit confusing at first. Read the link I provided and consult the manpage of dc for more information.

PS: I had a solution with bc first, which is an infix calculator. As I mentioned in a previous post, there was the problem with the formatting, so I experimented with dc to see if it has the same problem. It does.
I only posted the bash solution because I had been thinking about it before I knew that awk is OK to use. I do not really recommend it.
I posted the dc solution instead of bc because, well, I thought if I am going to post an ugly solution then it might as well be the ugliest one I came up with

Last edited by crts; 07-27-2011 at 10:05 AM. Reason: typos
 
Old 07-27-2011, 12:58 PM   #15
japena
LQ Newbie
 
Registered: Jul 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thanks for the explanation it makes a lot more sense now, I'm not familiar with RPN which made it that much more confusing. I'm definitely going to go with the awk solution as it's much more elegant and easier to understand. I have to do some more formatting but I think I can take it from here now. Just one more thing, what's the proper format of the awk command in multiple lines?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script choice menu from csv file deefke Linux - Newbie 5 01-25-2011 01:31 PM
[SOLVED] bash read csv file daberkow Programming 4 04-28-2010 03:13 AM
Update csv file into mysql db via bash script xmrkite Linux - Software 4 01-13-2010 12:43 AM
[SOLVED] Need help create a bash script to edit CSV File imkornhulio Programming 13 02-05-2009 10:23 AM
Shell script to read from csv file hendemeg Programming 1 05-11-2004 08:23 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration