LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-08-2012, 07:49 AM   #1
umix
LQ Newbie
 
Registered: Oct 2011
Posts: 13

Rep: Reputation: Disabled
sum the third field of csv file ignoring the commas in double quotes


(Source_file : first row header)
col1,col2,col3,col4
abc,"xy,1M",20,b
xyz,"ab,2N",25,b
fgh,"uv,1M",30,b
abc,"xy,1M",35,b

Output:
Sum of the third field=110


Thanks in advance

Last edited by umix; 10-08-2012 at 07:50 AM. Reason: thanks
 
Old 10-08-2012, 09:43 AM   #2
porphyry5
Member
 
Registered: Jul 2010
Location: oregon usa
Distribution: Slackware 14.1, Arch
Posts: 437

Rep: Reputation: 19
Quote:
Originally Posted by umix View Post
(Source_file : first row header)
col1,col2,col3,col4
abc,"xy,1M",20,b
xyz,"ab,2N",25,b
fgh,"uv,1M",30,b
abc,"xy,1M",35,b

Output:
Sum of the third field=110


Thanks in advance
Code:
~ $ cat j.csv
col1,col2,col3,col4
abc,"xy,1M",20,b
xyz,"ab,2N",25,b
fgh,"uv,1M",30,b
abc,"xy,1M",35,b
~ $ awk 'BEGIN {FS=","; tot=0} {tot=tot+$4; next} END {print tot}' j.csv
110
~ $
To get literally "Sum of the third field=110" as output, the END statement should be
Code:
END {print "Sum of the third field="tot}

Last edited by porphyry5; 10-08-2012 at 09:47 AM. Reason: change output
 
Old 10-08-2012, 10:00 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,251

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
My question would be if the 'col2' field could perhaps contains more / less data and hence the column being summed may not be the same, ie. that they all happen to be in the fourth comma
separated field at present, like
Code:
col1,col2,col3,col4
abc,"xy,1M,4m",20,b
xyz,"ab",25,b
fgh,"uv,1M",30,b
abc,"xy,1M",35,b
 
Old 10-08-2012, 10:22 AM   #4
porphyry5
Member
 
Registered: Jul 2010
Location: oregon usa
Distribution: Slackware 14.1, Arch
Posts: 437

Rep: Reputation: 19
Quote:
Originally Posted by grail View Post
My question would be if the 'col2' field could perhaps contains more / less data and hence the column being summed may not be the same, ie. that they all happen to be in the fourth comma
separated field at present, like
Code:
col1,col2,col3,col4
abc,"xy,1M,4m",20,b
xyz,"ab",25,b
fgh,"uv,1M",30,b
abc,"xy,1M",35,b
Code:
awk 'BEGIN {FS=","; tot=0} {tot=tot+$(NF-1); next} END {print tot}' j.csv
110
But that assumes there is always one more field in the line than the desired field, so 6 of one, half-dozen of the other.
 
Old 10-08-2012, 10:27 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,251

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Quote:
But that assumes there is always one more field in the line than the desired field, so 6 of one, half-dozen of the other.
Well this also assumes the OP's query around ignoring the commas in double quotes is saying that quotes only appear in 'col2'.

I am curious though, why the use of 'next' in your script?
 
1 members found this post helpful.
Old 10-08-2012, 12:16 PM   #6
porphyry5
Member
 
Registered: Jul 2010
Location: oregon usa
Distribution: Slackware 14.1, Arch
Posts: 437

Rep: Reputation: 19
Quote:
Originally Posted by grail View Post
I am curious though, why the use of 'next' in your script?
Because I'm not that certain of awk's default behavior, and I often state what to do next when it is unnecessary. I guess 'next' is only required when there are subsidiary statements one wishes to skip.
 
Old 10-08-2012, 12:32 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
This is the kind of problem gawk's FPAT feature was designed to handle:

http://www.gnu.org/software/gawk/man...y-Content.html
 
1 members found this post helpful.
Old 10-08-2012, 03:14 PM   #8
porphyry5
Member
 
Registered: Jul 2010
Location: oregon usa
Distribution: Slackware 14.1, Arch
Posts: 437

Rep: Reputation: 19
Quote:
Originally Posted by David the H. View Post
This is the kind of problem gawk's FPAT feature was designed to handle:

http://www.gnu.org/software/gawk/man...y-Content.html
That feature is a recent addition to gawk, only in version 4 and up. I'm still using 3.1.8, and my system's man gawk doesn't even mention FPAT. But good to know, so when a final slackware 14 is released, I'll be able to use it.
 
Old 10-08-2012, 04:40 PM   #9
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
For older versions of awk, assuming the double quotes are well balanced and you're not intersted in the content of the quoted fields, you can simply remove them and split the record by the remaining commas, e.g.
Code:
awk '{gsub(/"[^"]+"/,""); split($0,c,","); sum+=c[3]} END{print sum}' file
Anyway, awk is not the right tool to parse tricky CSV files (specifically you cannot build a regexp FS to exclude commas inside quotes). The perl or python modules for parsing CSV files are more suitable to accomplish these tasks.

Last edited by colucix; 10-08-2012 at 04:43 PM.
 
Old 10-09-2012, 02:53 AM   #10
umix
LQ Newbie
 
Registered: Oct 2011
Posts: 13

Original Poster
Rep: Reputation: Disabled
Thank you all for your inputs....It works and is very nice experience. I come to know UNIX is very vast and interesting subject to learn.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Replace a field in quotes within a csv file hattori.hanzo Programming 3 05-29-2012 02:10 AM
Vi/ViM - how to remove commas between quotes Micro420 Programming 6 09-25-2009 01:40 AM
Parsing a comma separated CSV file where fields have commas in to trickyflash Linux - General 7 03-26-2009 04:30 PM
AWK: change a particular field in a csv file help help help!!!! haydar68 Programming 20 08-03-2008 02:10 AM
Removing quotes from a CSV file with Perl Grafbak Programming 5 07-25-2006 04:23 AM


All times are GMT -5. The time now is 06:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration