LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
LinkBack Search this Thread
Old 03-25-2009, 08:34 AM   #1
trickyflash
LQ Newbie
 
Registered: Mar 2009
Posts: 3

Rep: Reputation: 0
Question Parsing a comma separated CSV file where fields have commas in to


I have a (comma separated) CSV file which has fields with text. Some of the text fields have commas in, and the CSV file puts (only) these fields in inverted commas.

From a command (bash) prompt, I wish to write a bash script to (for each line) separate values $columnA, $columnB etc. so that I can 'play' with them.
If I use comma as a delimiter, it doesn't work. If I use " as a delimiter, same thing.
I'm sure I did this before, and probably did it the long way, but really cannot for the life of me think of the best way to do this.

The end outcome is simple...
For each line (using for line in bla bla bla), I wish to capture variables reflective of the full value of each 'column' (CSV field).
Is there a simple way to do this? Am I being thick?!!!

I've just got a netbook, with Linux, and want to make my life simpler using shell scripts.

Help gratefully appreciated.

Richard
 
Old 03-25-2009, 09:26 AM   #2
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,039

Rep: Reputation: 952Reputation: 952Reputation: 952Reputation: 952Reputation: 952Reputation: 952Reputation: 952Reputation: 952
Easy answer: use Perl.

For instance, have a look at this page from Perl's CPAN library: http://search.cpan.org/~adamk/Parse-...b/Parse/CSV.pm.

What you see here is a very well-thought-out solution to this general problem, which you can avail yourself of very simply.

Or... maybe you don't want to learn anything about installing Perl stuff (a reasonable conclusion... you just want to get the job done). So how about PHP? You'll see a six-line example here: http://www.homeandlearn.co.uk/php/php10p6.html

The key point that I am making here is... you've got lots of high-level languages, probably already-installed on your Linux box. You can use any of them in the shell through the magic of the #! "shebang" mechanism. You are not confined to "BASH shell-scripting," and in this case you'd be using that facility for something other than its intended purpose ... never a good idea.

So, since you do have so many choices available to you, don't waste your own time trying to figure out how to get a job done using an inappropriate tool. "Pointless academic exercises" are best left behind you, in your good-riddance memories of bygone college days.
 
Old 03-25-2009, 09:37 AM   #3
crabboy
Moderator
 
Registered: Feb 2001
Location: Atlanta, GA
Distribution: Slackware
Posts: 1,822

Rep: Reputation: 120Reputation: 120
I'd agree, bash is probably not the best tool to parse csv files. I saw this script, but it may need some help:

http://fixunix.com/unix/83523-how-do...sh-script.html

When I get to the point of having to use arrays in bash, I usually switch to another language. Here is a quick hacked up example of parsing csv in Java. Much cleaner and robust than the bash example above. This example uses the opencsv library for the parsing.

Code:
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;

import au.com.bytecode.opencsv.CSVReader;


public class ParseCSV
{
      public void loadCSV(String strFileName, char delimiter )
      {
         try
         {
            CSVReader reader = new CSVReader(new FileReader(strFileName), delimiter );
            List<String[]> csvEntries = reader.readAll();
            Iterator<String[]> Iter = csvEntries.iterator();

            while ( Iter.hasNext())
            {
               String [] strArray = (String[])Iter.next();
               for ( int i = 0; i < strArray.length; i++ )
               {
                  System.out.print("[" + strArray[i] + "]");
               }
               System.out.println("");
            }
         }
         catch ( FileNotFoundException fnfe )
         {
         }
         catch ( IOException ioe )
         {
         }

      }
      
   public static void main(String[] args)
   {
      ParseCSV p = new ParseCSV();
      p.loadCSV("file.csv", ',');
   }

}
File:
Code:
row1col1,"row1,col2","row1,col3",,row1col4,,,,"r,o,w,1,c,o,l,8"
row2col1,"row2,col2","row2,col3",,row2col4,,,,"r,o,w,2,c,o,l,8"
output:
Code:
[row1col1][row1,col2][row1,col3][][row1col4][][][][r,o,w,1,c,o,l,8]
[row2col1][row2,col2][row2,col3][][row2col4][][][][r,o,w,2,c,o,l,8]
 
Old 03-25-2009, 09:54 AM   #4
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 114Reputation: 114
In PHP you have the explode command which handles all the parsing for you, splitting a string on the character you define and returning an array.

In C++, here is an implementation of explode:

Code:
std::vector<std::string> explode(std::string const& str, std::string const& delim)
{
   using std::string;
   string::size_type start(0);
   string::size_type end(string::npos);
   std::vector<std::string> sub;

   while( (end = str.find(delim,start)) != string::npos)
   {
      sub.push_back(str.substr(start, end - start));
      start = end + delim.size();
   }
   if(start < str.size() )sub.push_back(str.substr(start, str.size() - start));
   return sub;
}
 
Old 03-25-2009, 10:03 AM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,388
Blog Entries: 2

Rep: Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900
Because of the possibility of fields containing embedded commas as you point out, this problem is elevated from the domain of that which is best done with a shell script. Specialized modules/packages for various language bindings exist, which are well crafted to do the job. I would be inclined to use Perl, and the Parse::CSV module or the Text::CSV::Simple module.
--- rod.
 
Old 03-26-2009, 04:42 AM   #6
trickyflash
LQ Newbie
 
Registered: Mar 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Firstly, thank-you all for your input.

I'm not familiar with (how to write in) Perl, and fairly poor on php, and am conscious of introducing extra complexity.
My background dates to DOS, hence the 'love' of 'bash' shell scripts to do my 'routine' stuff.

Sounds like I might have to get to learn another language in order to do something mundane!!
Surely there must be another way! (I'm sure I'll get to learn some php/perl at some point, but not sure now is the best time to begin - with 2 small daughters, time is not on my side !!)

So options:
1. PHP script (scary due to lack of knowledge, and also unknown how can make 'part php/part bash'
2. Perl script (scary for same reasons as 1.)
3. Think up a clever way with bash (and be able to re-use it elsewhere in other bash scripts)
4. Give up on scripting it, and do it manually (tedious)!

Another reason behind bash the rest of the script was in bash, and the csv bit was to capture the variables from each column for each line.

Any more thoughts/options to make my life easier ??

R.

On a separate note.. Not sure what the requirements are for php/perl etc. and think I could spend hours and hours setting it up for something I'm unlikely to use for anything else. (the netbook eee1000 does what I need for most things, but isn't the smartest netbook, as the build with Xandros is quite restricted, and missing lots of dependencies, has a dodgy sources.list file and makes adding apps quite painful)
 
Old 03-26-2009, 05:54 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 239Reputation: 239Reputation: 239
Quote:
Originally Posted by trickyflash View Post
I have a (comma separated) CSV file which has fields with text. Some of the text fields have commas in, and the CSV file puts (only) these fields in inverted commas.

From a command (bash) prompt, I wish to write a bash script to (for each line) separate values $columnA, $columnB etc. so that I can 'play' with them.
If I use comma as a delimiter, it doesn't work. If I use " as a delimiter, same thing.
I'm sure I did this before, and probably did it the long way, but really cannot for the life of me think of the best way to do this.
a similar thread. you can get the awk script and play with it.
Code:
 # awk -f awkcsv file
First Last,"Las Vegas, NV",80 -> 3|First Last|Las Vegas, NV|80|

Last edited by ghostdog74; 03-26-2009 at 05:58 AM.
 
Old 03-26-2009, 03:30 PM   #8
trickyflash
LQ Newbie
 
Registered: Mar 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Cool. I shall give it a whirl.

Something said on this forum jogged my memory, and I discovered my old script. Not as clean cut as the awk example above. A few grep's and sed's and cut's to read and change each line, and it had the same effect. Think I'll have a play with awk and figure out how to make best use of it. (and perhaps save learning php/perl for another year!!)

Thanks to all of you.

R.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to delete Comma in a comma separated file with double quotes as quote character pklcnu Linux - Newbie 2 03-24-2009 05:50 PM
Parsing a pseudo CSV file. sharky Programming 8 11-03-2008 10:47 PM
help with comma separated values and what should be a simple script. zaber Programming 10 03-06-2008 12:58 PM
parsing fields of a file sang_froid Programming 2 10-26-2006 02:16 AM
Reading comma-separated data from file MeLassen Programming 4 04-04-2004 02:41 PM


All times are GMT -5. The time now is 08:21 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration