LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-08-2008, 09:39 PM   #1
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
awk question on handling *.CSV "text fields" in awk


I'm still more comfortable using sed than awk, and tried to reverse a solution I gave on another post.
Code:
cat example.csv
"Jim Jackson",100,10/12/04
"John Doe",150,10/12/08
"Jane Smith",50,10/13/08
 > awk -F, 'BEGIN {OFS="\n"; ORS="\n\n"} $1 ~ /^".*"$/ {$1 = gensub(/""/,"\"","g",gensub(/"(.*)"$/,"\\1","g",$1)); print $0 }' example.csv

Jim Jackson
100
10/12/04

John Doe
150
10/12/08

Jane Smith
50
10/13/08
I was wondering if there was a better, or more standard way in awk to remove the double quotes surrounding a text field, such as converting from some types of csv files to another form.

If you have an entry like: John "Skinny" Benson, oocalc will save a csv file with ""Skinny"" in the text field, so I modified the awk command a bit to handle this situation.
 
Old 03-09-2008, 12:22 AM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
with your sample,
Code:
# awk -F"," '{gsub(/"/,"",$1);print $1,$2,$3}' file
Jim Jackson 100 10/12/04
John Doe 150 10/12/08
Jane Smith 50 10/13/08
however, i don't think this is what you meant right? more of something like this perhaps? with commas in between double quotes? then it will be different story
Code:
"Jim , Jackson",100,10/12/04
"John , Doe",150,10/12/08
"Jane , Smith",50,10/13/08
 
Old 03-09-2008, 01:19 AM   #3
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Original Poster
Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
No, the first field contains a name. This is a contrived example but I don't think that a persons name in a spread sheet column is something that would be very odd.

It occurred to me that such a table might contain nicknames which would naturally be enclosed in quotes, so I entered three lines in oocalc and saved it as a .csv file:
Code:
cat example2.csv
"Jim Smith",010,10/10/08
"John Doe",121,10/11/08
"James ""Skinny"" Johnson",023,10/12/08
Quote:
Originally Posted by from post#1
If you have an entry like: John "Skinny" Benson, oocalc will save a csv file with ""Skinny"" in the text field, so I modified the awk command a bit to handle this situation.
Notice that I transformed the input sample to a form that has one line per field and an empty line between records.
For the answer to someone elses post, I included an awk command to translate from this type of file structure to a csv file. After testing it, I thought that doing the opposite would be a good learning exercise.

You solution doesn't work for the third entry. However it did show me how to modify it to do things differently:
Code:
awk -F, 'BEGIN {OFS="\n"; ORS="\n\n"} $1 ~ /^".*"$/ {gsub(/^"/,"",$1);gsub(/"$/,"",$1);gsub(/""/,"\"",$1); print $1"\n"$2"\n"$3 }' example2.csv
I had been looking at gensub and substr and thinking of functions returning a string and doing something like F1(F2()). Your example shows me how to use a F1()F2()F3() form by acting on the field variable "$1" directly.

Having to assign one of the fields so that the OFS and ORS variables would be updated was something I hadn't realized before today.
compare:
Code:
> echo -e 'ab cd ef\ngh ij kl\nmn op qr\n' | awk 'BEGIN { OFS="," } {print }'
ab cd ef
gh ij kl
mn op qr

> echo -e 'ab cd ef\ngh ij kl\nmn op qr\n' | awk 'BEGIN { OFS="," } {$1=$1;print }'
ab,cd,ef
gh,ij,kl
mn,op,qr
Thanks for your help.

Last edited by jschiwal; 03-09-2008 at 01:20 AM.
 
Old 03-09-2008, 03:27 AM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
no problem. i didn't help much either. just so you might want to know, gsub() takes in alternation (regexp) so similar subs can be shortened.
Code:
awk '{ gsub(/pat1|pat2|pat3/,"replacement" ) }' file
 
Old 03-09-2008, 02:54 PM   #5
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
Your code doesn't won't work when any field other then the first is quoted, and you have to know the number of fields for it to work. This will get around both these problems, however its harder to work with this since it doesn't really split the file into meaningful records/fields.

Code:
gawk -F'""' '{gsub(/"/,"",$1);print $0(RT=="\n"?"\n":"")}' OFS='"' RS='"*,"*|\n' example.csv
If you just wanted a more condensed version of what you wrote, and don't care about the issues I mentioned above, then maybe something like this..
Code:
gawk -F'"*,"*|^"' '{gsub(/""/,"\"",$2);print $2,$3,$4}' OFS='\n' ORS='\n\n' example.csv
Anyways, if you're actually doing this to parse csv files (not just to learn awk) I recommend using something like python/perl/ruby, they will be a lot more powerful at handling this.
 
Old 03-09-2008, 06:25 PM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Original Poster
Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Thank you for the generalized solution. Yes, I am looking at this to learn more about AWK. I haven't gotten around to learning the perl language yet. When I use a flat file to store records, I use a tab separated file. The only real example I have is storing file backup information at work and then producing a pdf catalog from these files. This is using cygwin on Win2000.

Last edited by jschiwal; 03-09-2008 at 06:28 PM.
 
Old 04-17-2008, 03:53 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,965
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
jschiwal,

Can you please contact me via e-Mail?

Ta,
Tink
 
Old 05-27-2010, 06:03 AM   #8
anantdhar
LQ Newbie
 
Registered: May 2010
Posts: 1

Rep: Reputation: 0
Quote:
Originally Posted by Tinkster View Post
jschiwal,

Can you please contact me via e-Mail?

Ta,
Tink
i have a file that looks like

john
3
5
4
Mary
7
8
4
Bob
2
6
4


i need to add the integers individually so the output should be

john
12
Mary
19
Bob
12

i have a command that adds all of them but i need to get sum before it gets a character.

awk '{s+=$1} END {print s}' filename

Please help?
 
Old 05-27-2010, 06:23 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,433

Rep: Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879Reputation: 1879
Why not raise your own question instead of hijacking an ancient one???
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sed/Awk: print lines between n'th and (n+1)'th match of "foo" xaverius Programming 17 08-20-2007 11:39 AM
Replacing "function(x)" with "x" using sed/awk/smth Griffon26 Linux - General 3 11-22-2006 10:47 AM
awk "for loop" question rharvey@cox Programming 5 05-03-2006 06:12 AM
search for specific text in fields using awk Helene Programming 2 04-23-2004 12:13 AM
awk/gawk handling of "delete" ? realos Programming 1 06-26-2003 09:45 AM


All times are GMT -5. The time now is 03:22 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration