LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-02-2015, 12:45 PM   #1
gaurvrishi
Member
 
Registered: Apr 2012
Posts: 51

Rep: Reputation: Disabled
Remove Column from .CSV using AWK command


Hi,

I am trying to remove the Column from .CSV using Awk Command. Below is the input

Name,Age,Description,Location
Gopal,32,Tech Lead,"Ramnagar, Naintal"
Gaurav,24,"Linux Admin, Expert","Noda"

I want column 1,2,4 and i am using below mention command

awk -F, ' BEGIN { OFS="," }
{
print $1,$2,$4
}' new.csv > new-done.csv


but instead of getting Gaurav,24,Noda

i am getting

Name,Age,Location
Gopal,32,"Ramnagar
Gaurav,24, Expert"
 
Old 04-02-2015, 12:58 PM   #2
steeladept
LQ Newbie
 
Registered: Oct 2014
Posts: 24

Rep: Reputation: Disabled
This one is simple. The comma inside your " " are not escaped, so it thinks that is the separation point. To do this, simply insert the "\," in place of the commas that are inappropriately listed. Unfortunately, if it is a LARGE file, that isn't feasible. Instead, you will likely need to see if you can generate some other separator (pipe perhaps) that you can key on instead.
 
Old 04-02-2015, 01:06 PM   #3
jlinkels
Senior Member
 
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
Posts: 4,493

Rep: Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636
Parsing CSV files is awful. Because what is within quotes should be considered part of one field. So you have to implement a stateful parser.

It must have been thousands of times before, so instead of spending time to find it yourself, you should google for it. I know lots of them exist in Perl, but I have seen only a few in Awk. If I had one, I would post it. But I don't. Unfortunately.

jlinkels
 
Old 04-02-2015, 01:13 PM   #4
gaurvrishi
Member
 
Registered: Apr 2012
Posts: 51

Original Poster
Rep: Reputation: Disabled
Hi,

I have spent many hours in Google to find out the result but not able to find out. After adding the "\,"

i am getting below mention error

awk -F, ' BEGIN { OFS="\," }
{
print $1,$2,$4
}' new.csv > new-done.csv
awk: warning: escape sequence `\,' treated as plain `,'
 
Old 04-02-2015, 01:47 PM   #5
steeladept
LQ Newbie
 
Registered: Oct 2014
Posts: 24

Rep: Reputation: Disabled
You should keep the AWK statement the same as it was. The "\," goes in the input file. If it is a small file, it is fine. If not, then it either needs generated with a different separator (such as if it is generated from Excel where you can choose your separator), or you will need to work out something to handle it programmatically. I have been fortunate enough to never have dealt a lot with dirty data such as this, so I didn't know anything about the stateful parsers jlinkels speaks of. That may be the best way to go if you can't get clean data and/or a different output separator.

Turns out the above was incorrect. Excel apparently only lets you import on different separators, not generate them - at least not easily.

Last edited by steeladept; 04-02-2015 at 01:59 PM. Reason: Error correction.
 
Old 04-02-2015, 10:24 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,054

Rep: Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915
This could get ugly.
I don't like the insertion of "\," as it changes the number of fields (as awk sees it). And there may be multiple occurrences (description and location say), which exacerbates that.
I like the suggestion of a replacement character, but even that has to be handled carefully - what if the real comma separators occasionally have a blank character following? Makes it hard to construct a global change ...

Personally I'd run the input through sed first to insert the substitute character, then just use sub() in awk to take it back out when printing the fourth field. awk has regex in string replacement functions, but given the complexity of the regex, I find it easier to construct and test in sed.
 
Old 04-03-2015, 02:49 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
I think if we read the manual and use the example provided for FPAT the whole issue will go away, see here
 
5 members found this post helpful.
Old 04-03-2015, 02:59 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,054

Rep: Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915
Indeed - why didn't we know that ?.
 
Old 04-03-2015, 08:35 AM   #9
BenCollver
Rogue Class
 
Registered: Sep 2006
Location: OR, USA
Distribution: Slackware 14.1
Posts: 201
Blog Entries: 3

Rep: Reputation: 78
If you need to process CSV data that contains embedded newlines between the double quotes, then csvformat -M may help.

http://csvkit.readthedocs.org/en/0.9...csvformat.html
 
Old 04-03-2015, 08:35 AM   #10
jlinkels
Senior Member
 
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
Posts: 4,493

Rep: Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636
Quote:
Originally Posted by syg00 View Post
Indeed - why didn't we know that ?.
I think because it is new in Gawk. I don't recall having seen this paragraph ever before. As per my previous post in this thread, you can be sure that I did search for CSV processing in awk.

Now I might be wrong, I don't keep the awk changelog on hand for the last 10 years. But I would be surprised if this FPAT is feature which has been there for long.

jlinkels
 
Old 04-03-2015, 08:46 AM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,054

Rep: Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915Reputation: 1915
That's always a bit of an issue - I can remember being peeved about a year and a half ago that RHEL was way behind in it's official gawk. Lots of features I was used to using on Fedora were missing.

Does Ubuntu still ship mawk ?.
 
Old 04-03-2015, 09:59 AM   #12
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 18,336

Rep: Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896Reputation: 3896
Quote:
Originally Posted by gaurvrishi View Post
Hi,
I have spent many hours in Google to find out the result but not able to find out.
I find that hard to believe, because there are hundreds of thousands of hits for "parsing csv data with awk in linux".
Quote:
After adding the "\," i am getting below mention error

awk -F, ' BEGIN { OFS="\," }
{
print $1,$2,$4
}' new.csv > new-done.csv
awk: warning: escape sequence `\,' treated as plain `,'
You don't tell us where the input data is coming from. The simplest thing would be to format your data, so that all your fields are enclosed in double-quotes, which would give you a standard pattern to look for, and a standard field-separator. Barring that, if you're using gawk:
Code:
gawk -vFPAT='[^,]*|"[^"]*"' '{print $1,$2,$3}'
...that might work. As the "Linux Admin, Expert", in the sample data, you should be able to modify the regex if needed, or see the man page.
 
Old 04-03-2015, 10:14 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
FPAT has only been available since v4 But as you can see, a welcome feature
 
Old 04-03-2015, 12:14 PM   #14
jlinkels
Senior Member
 
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
Posts: 4,493

Rep: Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636
Quote:
Originally Posted by grail View Post
FPAT has only been available since v4 But as you can see, a welcome feature
If my search is correct, 4.0 was released in 2011. And I am running Debian Stable on most of my machines, so I am running 4.0 only recently. Actually I am surprised that 4.0 entered Stable so soon

jlinkels
 
  


Reply

Tags
awk regex


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
need shell command to wrap text &adjust column width (csv file ) Sankaran29 Linux - Newbie 6 12-17-2012 10:33 AM
awk - rearrange column data in csv file to match columns wolverene13 Programming 9 12-21-2011 05:55 AM
[ask awk] remove certain row in a column dhodho Programming 18 06-08-2010 10:36 AM
awk gsub() command - string (column) manipulation - substitution casperdaghost Linux - Newbie 1 03-08-2010 03:12 AM
sed / awk command to print line number as column? johnpaulodonnell Linux - Newbie 2 01-22-2007 08:07 AM


All times are GMT -5. The time now is 07:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration