Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
03-25-2009, 08:34 AM
|
#1
|
|
LQ Newbie
Registered: Mar 2009
Posts: 3
Rep:
|
Parsing a comma separated CSV file where fields have commas in to
I have a (comma separated) CSV file which has fields with text. Some of the text fields have commas in, and the CSV file puts (only) these fields in inverted commas.
From a command (bash) prompt, I wish to write a bash script to (for each line) separate values $columnA, $columnB etc. so that I can 'play' with them.
If I use comma as a delimiter, it doesn't work. If I use " as a delimiter, same thing.
I'm sure I did this before, and probably did it the long way, but really cannot for the life of me think of the best way to do this.
The end outcome is simple...
For each line (using for line in bla bla bla), I wish to capture variables reflective of the full value of each 'column' (CSV field).
Is there a simple way to do this? Am I being thick?!!!
I've just got a netbook, with Linux, and want to make my life simpler using shell scripts.
Help gratefully appreciated.
Richard
|
|
|
|
03-25-2009, 09:26 AM
|
#2
|
|
Senior Member
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 4,548
|
Easy answer: use Perl.
For instance, have a look at this page from Perl's CPAN library: http://search.cpan.org/~adamk/Parse-...b/Parse/CSV.pm.
What you see here is a very well-thought-out solution to this general problem, which you can avail yourself of very simply.
Or... maybe you don't want to learn anything about installing Perl stuff (a reasonable conclusion... you just want to get the job done). So how about PHP? You'll see a six-line example here: http://www.homeandlearn.co.uk/php/php10p6.html
The key point that I am making here is... you've got lots of high-level languages, probably already-installed on your Linux box. You can use any of them in the shell through the magic of the #! "shebang" mechanism. You are not confined to "BASH shell-scripting," and in this case you'd be using that facility for something other than its intended purpose ... never a good idea.
So, since you do have so many choices available to you, don't  waste your own time trying to figure out how to get a job done using an inappropriate tool. "Pointless academic exercises" are best left behind you, in your good-riddance memories of bygone college days. 
|
|
|
|
03-25-2009, 09:37 AM
|
#3
|
|
Moderator
Registered: Feb 2001
Location: Atlanta, GA
Distribution: Slackware
Posts: 1,816
Rep: 
|
I'd agree, bash is probably not the best tool to parse csv files. I saw this script, but it may need some help:
http://fixunix.com/unix/83523-how-do...sh-script.html
When I get to the point of having to use arrays in bash, I usually switch to another language. Here is a quick hacked up example of parsing csv in Java. Much cleaner and robust than the bash example above. This example uses the opencsv library for the parsing.
Code:
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import au.com.bytecode.opencsv.CSVReader;
public class ParseCSV
{
public void loadCSV(String strFileName, char delimiter )
{
try
{
CSVReader reader = new CSVReader(new FileReader(strFileName), delimiter );
List<String[]> csvEntries = reader.readAll();
Iterator<String[]> Iter = csvEntries.iterator();
while ( Iter.hasNext())
{
String [] strArray = (String[])Iter.next();
for ( int i = 0; i < strArray.length; i++ )
{
System.out.print("[" + strArray[i] + "]");
}
System.out.println("");
}
}
catch ( FileNotFoundException fnfe )
{
}
catch ( IOException ioe )
{
}
}
public static void main(String[] args)
{
ParseCSV p = new ParseCSV();
p.loadCSV("file.csv", ',');
}
}
File:
Code:
row1col1,"row1,col2","row1,col3",,row1col4,,,,"r,o,w,1,c,o,l,8"
row2col1,"row2,col2","row2,col3",,row2col4,,,,"r,o,w,2,c,o,l,8"
output:
Code:
[row1col1][row1,col2][row1,col3][][row1col4][][][][r,o,w,1,c,o,l,8]
[row2col1][row2,col2][row2,col3][][row2col4][][][][r,o,w,2,c,o,l,8]
|
|
|
|
03-25-2009, 09:54 AM
|
#4
|
|
Senior Member
Registered: Sep 2003
Posts: 3,171
Rep: 
|
In PHP you have the explode command which handles all the parsing for you, splitting a string on the character you define and returning an array.
In C++, here is an implementation of explode:
Code:
std::vector<std::string> explode(std::string const& str, std::string const& delim)
{
using std::string;
string::size_type start(0);
string::size_type end(string::npos);
std::vector<std::string> sub;
while( (end = str.find(delim,start)) != string::npos)
{
sub.push_back(str.substr(start, end - start));
start = end + delim.size();
}
if(start < str.size() )sub.push_back(str.substr(start, str.size() - start));
return sub;
}
|
|
|
|
03-25-2009, 10:03 AM
|
#5
|
|
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,257
|
Because of the possibility of fields containing embedded commas as you point out, this problem is elevated from the domain of that which is best done with a shell script. Specialized modules/packages for various language bindings exist, which are well crafted to do the job. I would be inclined to use Perl, and the Parse::CSV module or the Text::CSV::Simple module.
--- rod.
|
|
|
|
03-26-2009, 04:42 AM
|
#6
|
|
LQ Newbie
Registered: Mar 2009
Posts: 3
Original Poster
Rep:
|
Firstly, thank-you all for your input.
I'm not familiar with (how to write in) Perl, and fairly poor on php, and am conscious of introducing extra complexity.
My background dates to DOS, hence the 'love' of 'bash' shell scripts to do my 'routine' stuff.
Sounds like I might have to get to learn another language in order to do something mundane!!
Surely there must be another way! (I'm sure I'll get to learn some php/perl at some point, but not sure now is the best time to begin - with 2 small daughters, time is not on my side !!)
So options:
1. PHP script (scary due to lack of knowledge, and also unknown how can make 'part php/part bash'
2. Perl script (scary for same reasons as 1.)
3. Think up a clever way with bash (and be able to re-use it elsewhere in other bash scripts)
4. Give up on scripting it, and do it manually (tedious)!
Another reason behind bash the rest of the script was in bash, and the csv bit was to capture the variables from each column for each line.
Any more thoughts/options to make my life easier ??
R.
On a separate note.. Not sure what the requirements are for php/perl etc. and think I could spend hours and hours setting it up for something I'm unlikely to use for anything else. (the netbook eee1000 does what I need for most things, but isn't the smartest netbook, as the build with Xandros is quite restricted, and missing lots of dependencies, has a dodgy sources.list file and makes adding apps quite painful)
|
|
|
|
03-26-2009, 05:54 AM
|
#7
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
Quote:
Originally Posted by trickyflash
I have a (comma separated) CSV file which has fields with text. Some of the text fields have commas in, and the CSV file puts (only) these fields in inverted commas.
From a command (bash) prompt, I wish to write a bash script to (for each line) separate values $columnA, $columnB etc. so that I can 'play' with them.
If I use comma as a delimiter, it doesn't work. If I use " as a delimiter, same thing.
I'm sure I did this before, and probably did it the long way, but really cannot for the life of me think of the best way to do this.
|
a similar thread. you can get the awk script and play with it.
Code:
# awk -f awkcsv file
First Last,"Las Vegas, NV",80 -> 3|First Last|Las Vegas, NV|80|
Last edited by ghostdog74; 03-26-2009 at 05:58 AM.
|
|
|
|
03-26-2009, 03:30 PM
|
#8
|
|
LQ Newbie
Registered: Mar 2009
Posts: 3
Original Poster
Rep:
|
Cool. I shall give it a whirl.
Something said on this forum jogged my memory, and I discovered my old script. Not as clean cut as the awk example above. A few grep's and sed's and cut's to read and change each line, and it had the same effect. Think I'll have a play with awk and figure out how to make best use of it. (and perhaps save learning php/perl for another year!!)
Thanks to all of you.
R.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:25 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|