LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   CSV and CUT (https://www.linuxquestions.org/questions/linux-software-2/csv-and-cut-408198/)

wwnexc 01-26-2006 09:25 AM

CSV and CUT
 
Hi,

I ran into some trouble yesterday, when i was given a file with comma separated values, of which i only need a few columns out of about 200 columns.

I tried
Code:

cat file.csv | cut -d, -f13,99 > out.csv
it worked great, but some fields, whith quotation marks (" ") around them, can contain commas as values, which DO NOT mean a new field starts.

Is there any way to get cut or a similar command to select certain csv fields, but which ignors commas or other delimiters inside of quotation marks?

Thanks!!

haertig 01-26-2006 11:04 AM

The die-hard Linux way to accomplish this would be to brute-force write your own script to parse out the file.

The easy way would be to load the file into some GUI program that is CSV aware, like OpenOffice Calc, delete the columns you don't want, then resave the file. This kludge would probably be good enough if you only need to do this operation once, or very infrequently.

wwnexc 01-26-2006 12:01 PM

Quote:

Originally Posted by haertig
The die-hard Linux way to accomplish this would be to brute-force write your own script to parse out the file.

The easy way would be to load the file into some GUI program that is CSV aware, like OpenOffice Calc, delete the columns you don't want, then resave the file. This kludge would probably be good enough if you only need to do this operation once, or very infrequently.

Yeah. I did this for this time. But in future, i am really trying to get some shell script to do the work for me, and to automate the process...

Dragineez 01-26-2006 12:51 PM

awk
 
Wouldn't awk be a better fit than cut? It can filter fields by quotes.

wwnexc 01-26-2006 08:20 PM

Does anybody have an example which can actually do the task?

Quote:

Is there any way to get cut or a similar command to select certain csv fields, but which ignors commas or other delimiters inside of quotation marks?
:Pengy:

Dragineez 01-26-2006 09:06 PM

Tutorials Online
 
Awk/gawk can be considered to be a programming language in its own right. If you have to do string manipulation and text data file parsing, you must have this in your toolkit. When combined with the power of the scripting language and some other handy-dandy CLI utilities in every Linux distro (sed comes to mind), there is no text data file you can't slice and dice any way you want to. I could try to drop an example or two into this forum, but there's no way I'd do as good a job as the gazillion and one online tutorials already available.

http://www.gnu.org/software/gawk/manual/gawk.html
http://sparky.rice.edu/~hartigan/awk.html
http://www.vectorsite.net/tsawk.html

These can look pretty daunting, but for what you're trying to do you really won't have to dive too deep into the tutorial. All you're telling awk to do is:

1) Treat commas as field delimiters
2) Fields are encapsulated by quotes, ignore commas, quotes, and single quotes within a field
3) Open file {x}
4) Extract fields 13 and 99, store in variable {y}
5) Write {y} to file {z}

Is that about right?

wwnexc 01-26-2006 09:17 PM

Quote:

Originally Posted by Dragineez
These can look pretty daunting, but for what you're trying to do you really won't have to dive too deep into the tutorial. All you're telling awk to do is:

1) Treat commas as field delimiters
2) Fields are encapsulated by quotes, ignore commas, quotes, and single quotes within a field
3) Open file {x}
4) Extract fields 13 and 99, store in variable {y}
5) Write {y} to file {z}

Is that about right?

Yes. That's exactly it.

I took a look at the tutorials, and they all seem awfully complicated to me. I'll do my best at trying to archieve what i am trying to do, but if there is somebody out there, who knows how to do it, could you please take the time to lead me onto the right tracks? Please.

gilead 01-26-2006 09:53 PM

How are you with perl? The following does what I think you want (there's my escape hatch if it doesn't :) ). Here's my sample data file (data.csv):

Code:

1,hello world,"hello world","hello,world"
2,line two,"line,two","line two"

Here's the perl script (checker.pl):

Code:

#!/usr/bin/perl -w

use Text::ParseWords;

my $target = shift;

open ( CSVFILE, "$target" ) || die "Can't open csv file: $!\n";
while ( $thisline = <CSVFILE> ) {
  chomp $thisline;
  print "Original line:\n$thisline\n";
  @new = quotewords(",", 0, $thisline);
  print "Separated fields:\n";
  foreach ( @new ) {
    print "$_|";
  }
  print "\n";
}

And here's the output:

Code:

$ ./checker.pl data.csv
Original line:
1,hello world,"hello world","hello,world"
Separated fields:
1|hello world|hello world|hello,world|
Original line:
2,line two,"line,two","line two"
Separated fields:
2|line two|line,two|line two|

There needs to be some code to just take the fields that you want (do you want me to add that?)...

wwnexc 01-26-2006 10:00 PM

Wow!! Cool Steve! I never knew perl could do this much (and it doesn't even take much code to do it)

It would be really great, if you could actually show me how to select certain fields. I really appreciate it.

I am just wondering: what is the best way to learn perl (if you already have knowlege about the loop-structures and other stuff of that level)?

I've started doing some c++, but i never really got all to far. I am now starting java, and i am thinking that perl would be a great file and os-oriented language.

I have always found Shell Scripts and Batch files very helpful, as i like doing stuff that has to do with servers and other command line oriented projects. Perl seems to be a great asset that i would really like to add to my "toolbox".

I guess i know what i am doing next break...

gilead 01-26-2006 10:35 PM

I've added another line that shows each field individually as an element from the array. The Text::ParseWords modules is part of most standard perl installations - it just populates an array based on each delimited line given to it. In this case, the lines are retrieved from the CSV file identified by the command line argument (which is shifted into $target).

Code:

#!/usr/bin/perl -w

use Text::ParseWords;

my $target = shift;

open ( CSVFILE, "$target" ) || die "Can't open csv file: $!\n";
while ( $thisline = <CSVFILE> ) {
  chomp $thisline;
  print "Original line:\n$thisline\n";
  @new = quotewords(",", 0, $thisline);
  print "Separated fields:\n";
  foreach ( @new ) {
    print "$_|";
  }
  print "\n";

  print "Field 1:  $new[0]\nField 2:  $new[1]\nField 3;  $new[2]\nField 4:  $new[3]\n";
}

I've used it a fair bit (under Unix and Windows) for my last few employers so I bought the Learning perl book from O'Reilly. I don't know what the current edition is like, but the 2nd edition was so good I got Programming Perl and The Perl Cookbook as well.

The perl man pages are very good, but they take a bit of time... Have a look at man perl, man perlfaq4, etc.

wwnexc 01-26-2006 10:47 PM

That Perl stuff is great. It reminds me of a mix of the TI graphing calculator syntaxes (yes, odd - but that's the first thing that came to mind, not sure why) compined with c and normal shell / bat scripting.

I've got the O'Reilly Linux in a Nutshell. It's a good book, but i wouldn't suggest it for a noob to linux (it got pretty confusing in the beginning, as i came from a windows environment). It's mainly a command line reference; -- Is the perl one similar?

With this script of yours, i'll have something to start experimenting with and something to add stuff to and to take apart, in addition to the fact that it simply works great for the job it was intended to do.

Steve, you are great!

Can all perl script be run on windows and on linux, without having to change anything except the first line?

gilead 01-26-2006 11:23 PM

I'm glad that helped :) And yes, the Learning Perl reference was a good one - plenty of examples and explanations, I do recommend it.

And yes again, a lot scripts can be run without change on both platforms. I tend to put platform specific stuff in properties files and just use whichever one I'm running it against. For example, scripts working with files may be the same once you have the file open and processing, but path syntax may be different (drives instead of mount points, etc).

I use Active Perl from http://www.activestate.com/ on Windows and it's very nice.

A lot of people bag perl, but it's well worth keeping around.


All times are GMT -5. The time now is 08:37 PM.