Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
03-08-2008, 09:39 PM
|
#1
|
|
Moderator
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
awk question on handling *.CSV "text fields" in awk
I'm still more comfortable using sed than awk, and tried to reverse a solution I gave on another post.
Code:
cat example.csv
"Jim Jackson",100,10/12/04
"John Doe",150,10/12/08
"Jane Smith",50,10/13/08
> awk -F, 'BEGIN {OFS="\n"; ORS="\n\n"} $1 ~ /^".*"$/ {$1 = gensub(/""/,"\"","g",gensub(/"(.*)"$/,"\\1","g",$1)); print $0 }' example.csv
Jim Jackson
100
10/12/04
John Doe
150
10/12/08
Jane Smith
50
10/13/08
I was wondering if there was a better, or more standard way in awk to remove the double quotes surrounding a text field, such as converting from some types of csv files to another form.
If you have an entry like: John "Skinny" Benson, oocalc will save a csv file with ""Skinny"" in the text field, so I modified the awk command a bit to handle this situation.
|
|
|
|
03-09-2008, 12:22 AM
|
#2
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
with your sample,
Code:
# awk -F"," '{gsub(/"/,"",$1);print $1,$2,$3}' file
Jim Jackson 100 10/12/04
John Doe 150 10/12/08
Jane Smith 50 10/13/08
however, i don't think this is what you meant right? more of something like this perhaps? with commas in between double quotes? then it will be different story
Code:
"Jim , Jackson",100,10/12/04
"John , Doe",150,10/12/08
"Jane , Smith",50,10/13/08
|
|
|
|
03-09-2008, 01:19 AM
|
#3
|
|
Moderator
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
Original Poster
|
No, the first field contains a name. This is a contrived example but I don't think that a persons name in a spread sheet column is something that would be very odd.
It occurred to me that such a table might contain nicknames which would naturally be enclosed in quotes, so I entered three lines in oocalc and saved it as a .csv file:
Code:
cat example2.csv
"Jim Smith",010,10/10/08
"John Doe",121,10/11/08
"James ""Skinny"" Johnson",023,10/12/08
Quote:
|
Originally Posted by from post#1
If you have an entry like: John "Skinny" Benson, oocalc will save a csv file with ""Skinny"" in the text field, so I modified the awk command a bit to handle this situation.
|
Notice that I transformed the input sample to a form that has one line per field and an empty line between records.
For the answer to someone elses post, I included an awk command to translate from this type of file structure to a csv file. After testing it, I thought that doing the opposite would be a good learning exercise.
You solution doesn't work for the third entry. However it did show me how to modify it to do things differently:
Code:
awk -F, 'BEGIN {OFS="\n"; ORS="\n\n"} $1 ~ /^".*"$/ {gsub(/^"/,"",$1);gsub(/"$/,"",$1);gsub(/""/,"\"",$1); print $1"\n"$2"\n"$3 }' example2.csv
I had been looking at gensub and substr and thinking of functions returning a string and doing something like F1(F2()). Your example shows me how to use a F1()F2()F3() form by acting on the field variable "$1" directly.
Having to assign one of the fields so that the OFS and ORS variables would be updated was something I hadn't realized before today.
compare:
Code:
> echo -e 'ab cd ef\ngh ij kl\nmn op qr\n' | awk 'BEGIN { OFS="," } {print }'
ab cd ef
gh ij kl
mn op qr
> echo -e 'ab cd ef\ngh ij kl\nmn op qr\n' | awk 'BEGIN { OFS="," } {$1=$1;print }'
ab,cd,ef
gh,ij,kl
mn,op,qr
Thanks for your help.
Last edited by jschiwal; 03-09-2008 at 01:20 AM.
|
|
|
|
03-09-2008, 03:27 AM
|
#4
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
no problem. i didn't help much either. just so you might want to know, gsub() takes in alternation (regexp) so similar subs can be shortened.
Code:
awk '{ gsub(/pat1|pat2|pat3/,"replacement" ) }' file
|
|
|
|
03-09-2008, 02:54 PM
|
#5
|
|
Member
Registered: Oct 2003
Distribution: Archlinux
Posts: 147
Rep:
|
Your code doesn't won't work when any field other then the first is quoted, and you have to know the number of fields for it to work. This will get around both these problems, however its harder to work with this since it doesn't really split the file into meaningful records/fields.
Code:
gawk -F'""' '{gsub(/"/,"",$1);print $0(RT=="\n"?"\n":"")}' OFS='"' RS='"*,"*|\n' example.csv
If you just wanted a more condensed version of what you wrote, and don't care about the issues I mentioned above, then maybe something like this..
Code:
gawk -F'"*,"*|^"' '{gsub(/""/,"\"",$2);print $2,$3,$4}' OFS='\n' ORS='\n\n' example.csv
Anyways, if you're actually doing this to parse csv files (not just to learn awk) I recommend using something like python/perl/ruby, they will be a lot more powerful at handling this.
|
|
|
|
03-09-2008, 06:25 PM
|
#6
|
|
Moderator
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
Original Poster
|
Thank you for the generalized solution. Yes, I am looking at this to learn more about AWK. I haven't gotten around to learning the perl language yet. When I use a flat file to store records, I use a tab separated file. The only real example I have is storing file backup information at work and then producing a pdf catalog from these files. This is using cygwin on Win2000.
Last edited by jschiwal; 03-09-2008 at 06:28 PM.
|
|
|
|
04-17-2008, 03:53 PM
|
#7
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,902
|
jschiwal,
Can you please contact me via e-Mail?
Ta,
Tink
|
|
|
|
05-27-2010, 06:03 AM
|
#8
|
|
LQ Newbie
Registered: May 2010
Posts: 1
Rep:
|
Quote:
Originally Posted by Tinkster
jschiwal,
Can you please contact me via e-Mail?
Ta,
Tink
|
i have a file that looks like
john
3
5
4
Mary
7
8
4
Bob
2
6
4
i need to add the integers individually so the output should be
john
12
Mary
19
Bob
12
i have a command that adds all of them but i need to get sum before it gets a character.
awk '{s+=$1} END {print s}' filename
Please help?
|
|
|
|
05-27-2010, 06:23 AM
|
#9
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,305
|
Why not raise your own question instead of hijacking an ancient one???
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 02:40 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|