LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 01-26-2006, 10:25 AM   #1
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Rep: Reputation: 30
Post CSV and CUT


Hi,

I ran into some trouble yesterday, when i was given a file with comma separated values, of which i only need a few columns out of about 200 columns.

I tried
Code:
cat file.csv | cut -d, -f13,99 > out.csv
it worked great, but some fields, whith quotation marks (" ") around them, can contain commas as values, which DO NOT mean a new field starts.

Is there any way to get cut or a similar command to select certain csv fields, but which ignors commas or other delimiters inside of quotation marks?

Thanks!!
 
Old 01-26-2006, 12:04 PM   #2
haertig
Senior Member
 
Registered: Nov 2004
Distribution: Debian, Ubuntu, LinuxMint, Slackware, SysrescueCD
Posts: 2,032

Rep: Reputation: 309Reputation: 309Reputation: 309Reputation: 309
The die-hard Linux way to accomplish this would be to brute-force write your own script to parse out the file.

The easy way would be to load the file into some GUI program that is CSV aware, like OpenOffice Calc, delete the columns you don't want, then resave the file. This kludge would probably be good enough if you only need to do this operation once, or very infrequently.
 
Old 01-26-2006, 01:01 PM   #3
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by haertig
The die-hard Linux way to accomplish this would be to brute-force write your own script to parse out the file.

The easy way would be to load the file into some GUI program that is CSV aware, like OpenOffice Calc, delete the columns you don't want, then resave the file. This kludge would probably be good enough if you only need to do this operation once, or very infrequently.
Yeah. I did this for this time. But in future, i am really trying to get some shell script to do the work for me, and to automate the process...
 
Old 01-26-2006, 01:51 PM   #4
Dragineez
Member
 
Registered: Oct 2005
Location: Annapolis
Distribution: Ubuntu
Posts: 275

Rep: Reputation: 32
awk

Wouldn't awk be a better fit than cut? It can filter fields by quotes.
 
Old 01-26-2006, 09:20 PM   #5
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Original Poster
Rep: Reputation: 30
Does anybody have an example which can actually do the task?

Quote:
Is there any way to get cut or a similar command to select certain csv fields, but which ignors commas or other delimiters inside of quotation marks?
 
Old 01-26-2006, 10:06 PM   #6
Dragineez
Member
 
Registered: Oct 2005
Location: Annapolis
Distribution: Ubuntu
Posts: 275

Rep: Reputation: 32
Tutorials Online

Awk/gawk can be considered to be a programming language in its own right. If you have to do string manipulation and text data file parsing, you must have this in your toolkit. When combined with the power of the scripting language and some other handy-dandy CLI utilities in every Linux distro (sed comes to mind), there is no text data file you can't slice and dice any way you want to. I could try to drop an example or two into this forum, but there's no way I'd do as good a job as the gazillion and one online tutorials already available.

http://www.gnu.org/software/gawk/manual/gawk.html
http://sparky.rice.edu/~hartigan/awk.html
http://www.vectorsite.net/tsawk.html

These can look pretty daunting, but for what you're trying to do you really won't have to dive too deep into the tutorial. All you're telling awk to do is:

1) Treat commas as field delimiters
2) Fields are encapsulated by quotes, ignore commas, quotes, and single quotes within a field
3) Open file {x}
4) Extract fields 13 and 99, store in variable {y}
5) Write {y} to file {z}

Is that about right?

Last edited by Dragineez; 01-26-2006 at 10:09 PM.
 
Old 01-26-2006, 10:17 PM   #7
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by Dragineez
These can look pretty daunting, but for what you're trying to do you really won't have to dive too deep into the tutorial. All you're telling awk to do is:

1) Treat commas as field delimiters
2) Fields are encapsulated by quotes, ignore commas, quotes, and single quotes within a field
3) Open file {x}
4) Extract fields 13 and 99, store in variable {y}
5) Write {y} to file {z}

Is that about right?
Yes. That's exactly it.

I took a look at the tutorials, and they all seem awfully complicated to me. I'll do my best at trying to archieve what i am trying to do, but if there is somebody out there, who knows how to do it, could you please take the time to lead me onto the right tracks? Please.

Last edited by wwnexc; 01-26-2006 at 10:32 PM.
 
Old 01-26-2006, 10:53 PM   #8
gilead
Senior Member
 
Registered: Dec 2005
Location: Brisbane, Australia
Distribution: Slackware64 14.0
Posts: 4,123

Rep: Reputation: 162Reputation: 162
How are you with perl? The following does what I think you want (there's my escape hatch if it doesn't ). Here's my sample data file (data.csv):

Code:
1,hello world,"hello world","hello,world"
2,line two,"line,two","line two"
Here's the perl script (checker.pl):

Code:
#!/usr/bin/perl -w

use Text::ParseWords;

my $target = shift;

open ( CSVFILE, "$target" ) || die "Can't open csv file: $!\n";
while ( $thisline = <CSVFILE> ) {
  chomp $thisline;
  print "Original line:\n$thisline\n";
  @new = quotewords(",", 0, $thisline);
  print "Separated fields:\n";
  foreach ( @new ) {
    print "$_|";
  }
  print "\n";
}
And here's the output:

Code:
$ ./checker.pl data.csv
Original line:
1,hello world,"hello world","hello,world"
Separated fields:
1|hello world|hello world|hello,world|
Original line:
2,line two,"line,two","line two"
Separated fields:
2|line two|line,two|line two|
There needs to be some code to just take the fields that you want (do you want me to add that?)...

Last edited by gilead; 01-26-2006 at 10:54 PM.
 
Old 01-26-2006, 11:00 PM   #9
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Original Poster
Rep: Reputation: 30
Wow!! Cool Steve! I never knew perl could do this much (and it doesn't even take much code to do it)

It would be really great, if you could actually show me how to select certain fields. I really appreciate it.

I am just wondering: what is the best way to learn perl (if you already have knowlege about the loop-structures and other stuff of that level)?

I've started doing some c++, but i never really got all to far. I am now starting java, and i am thinking that perl would be a great file and os-oriented language.

I have always found Shell Scripts and Batch files very helpful, as i like doing stuff that has to do with servers and other command line oriented projects. Perl seems to be a great asset that i would really like to add to my "toolbox".

I guess i know what i am doing next break...

Last edited by wwnexc; 01-26-2006 at 11:06 PM.
 
Old 01-26-2006, 11:35 PM   #10
gilead
Senior Member
 
Registered: Dec 2005
Location: Brisbane, Australia
Distribution: Slackware64 14.0
Posts: 4,123

Rep: Reputation: 162Reputation: 162
I've added another line that shows each field individually as an element from the array. The Text::ParseWords modules is part of most standard perl installations - it just populates an array based on each delimited line given to it. In this case, the lines are retrieved from the CSV file identified by the command line argument (which is shifted into $target).

Code:
#!/usr/bin/perl -w

use Text::ParseWords;

my $target = shift;

open ( CSVFILE, "$target" ) || die "Can't open csv file: $!\n";
while ( $thisline = <CSVFILE> ) {
  chomp $thisline;
  print "Original line:\n$thisline\n";
  @new = quotewords(",", 0, $thisline);
  print "Separated fields:\n";
  foreach ( @new ) {
    print "$_|";
  }
  print "\n";

  print "Field 1:  $new[0]\nField 2:  $new[1]\nField 3;  $new[2]\nField 4:  $new[3]\n";
}
I've used it a fair bit (under Unix and Windows) for my last few employers so I bought the Learning perl book from O'Reilly. I don't know what the current edition is like, but the 2nd edition was so good I got Programming Perl and The Perl Cookbook as well.

The perl man pages are very good, but they take a bit of time... Have a look at man perl, man perlfaq4, etc.
 
Old 01-26-2006, 11:47 PM   #11
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Original Poster
Rep: Reputation: 30
That Perl stuff is great. It reminds me of a mix of the TI graphing calculator syntaxes (yes, odd - but that's the first thing that came to mind, not sure why) compined with c and normal shell / bat scripting.

I've got the O'Reilly Linux in a Nutshell. It's a good book, but i wouldn't suggest it for a noob to linux (it got pretty confusing in the beginning, as i came from a windows environment). It's mainly a command line reference; -- Is the perl one similar?

With this script of yours, i'll have something to start experimenting with and something to add stuff to and to take apart, in addition to the fact that it simply works great for the job it was intended to do.

Steve, you are great!

Can all perl script be run on windows and on linux, without having to change anything except the first line?
 
Old 01-27-2006, 12:23 AM   #12
gilead
Senior Member
 
Registered: Dec 2005
Location: Brisbane, Australia
Distribution: Slackware64 14.0
Posts: 4,123

Rep: Reputation: 162Reputation: 162
I'm glad that helped And yes, the Learning Perl reference was a good one - plenty of examples and explanations, I do recommend it.

And yes again, a lot scripts can be run without change on both platforms. I tend to put platform specific stuff in properties files and just use whichever one I'm running it against. For example, scripts working with files may be the same once you have the file open and processing, but path syntax may be different (drives instead of mount points, etc).

I use Active Perl from http://www.activestate.com/ on Windows and it's very nice.

A lot of people bag perl, but it's well worth keeping around.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
default application for .csv ahz10 Linux - Software 2 11-04-2005 04:24 PM
Mysql CSV problems farmerjoe Linux - General 2 11-13-2004 12:21 PM
CSV File AMMullan Programming 2 11-10-2003 01:49 AM
perl DBD::CSV pk21 Programming 0 09-26-2003 05:37 AM
tar and CSV lhoff Linux - Newbie 1 10-29-2002 01:23 PM


All times are GMT -5. The time now is 11:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration