sum the third field of csv file ignoring the commas in double quotes
(Source_file : first row header)
col1,col2,col3,col4 abc,"xy,1M",20,b xyz,"ab,2N",25,b fgh,"uv,1M",30,b abc,"xy,1M",35,b Output: Sum of the third field=110 Thanks in advance |
Quote:
Code:
~ $ cat j.csv Code:
END {print "Sum of the third field="tot} |
My question would be if the 'col2' field could perhaps contains more / less data and hence the column being summed may not be the same, ie. that they all happen to be in the fourth comma
separated field at present, like Code:
col1,col2,col3,col4 |
Quote:
Code:
awk 'BEGIN {FS=","; tot=0} {tot=tot+$(NF-1); next} END {print tot}' j.csv |
Quote:
I am curious though, why the use of 'next' in your script? |
Quote:
|
This is the kind of problem gawk's FPAT feature was designed to handle:
http://www.gnu.org/software/gawk/man...y-Content.html |
Quote:
|
For older versions of awk, assuming the double quotes are well balanced and you're not intersted in the content of the quoted fields, you can simply remove them and split the record by the remaining commas, e.g.
Code:
awk '{gsub(/"[^"]+"/,""); split($0,c,","); sum+=c[3]} END{print sum}' file |
Thank you all for your inputs....It works and is very nice experience. I come to know UNIX is very vast and interesting subject to learn.
|
All times are GMT -5. The time now is 11:24 AM. |