[SOLVED] AWK / SED - Parsing a CSV file with comma delimiter, and some extra needs.
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
In other words, I'm trying to apply the following rules :
1) All fields separated by a comma (including blank ones) must be simple-quoted, unless this (these) comma(s) belongs to a double-quoted field.
2) All simple-quotes contained within a field must be doubled.
3) If a field is already encapsulated in double-quotes, replace them by simple quotes.
My main problem is that I don't know how to code in AWK or SED the fact to ignore comma(s) within double-quotes when parsing the fields. I'm pretty sure that once this step is done, I can do the rest.
While parsing CSV can be done in awk, it's not straight forward
or trivial; I'd recommend using perl, python or some language that
has modules built for that very purpose.
I also thought that it might be a better idea to use a "full" programming language for this, but I didn't want to take this thread off the zero reply list.
Code:
print "'"
for every character in the line
is it a comma?
print "','"
or is it a double quote?
skip forward until the next double quote, printing each character.
or is it a single quote?
print "''"
it's neither of the above
print the current character
print "'"
And by the way, they're "single" quotes, not "simple" quotes.
Thank you for all these accurate and fast answers, I'm already in the awk manual trying to figure out how crts script is working, and also checking catkin's link.
Regards,
PenguinJr
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.