LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   How to to strip commas from csv file but keep the fields separated? (https://www.linuxquestions.org/questions/linux-general-1/how-to-to-strip-commas-from-csv-file-but-keep-the-fields-separated-4175471830/)

keenboy 08-02-2013 04:23 AM

How to to strip commas from csv file but keep the fields separated?
 
Hi,

I have an annoying problem with a CSV file I use to process data.

My CSV file is comma separated but sometimes data in a field contains commas i.e postal addresses etc.

I need to find a way to strip the commas within the fields but keep the commas that separate the field so I can still process the column data.

The text that sometimes has a comma in it is always contained in " ", maybe this is the key to the solution?

If I use sed it will change/remove all commas. Is there a smarter way to do this?

I'd appreciate any help.

Thanks,

druuna 08-02-2013 04:47 AM

Wouldn't it be a lot easier to change the separator to something unique when creating the CSV file? Calc and excel, for example, can both export to CSV using a different separator and if the CSV file is created by a script, you can also do this.

Changing the , after the fact seems a waste of time and is not that easy to do (someone please correct me if I'm wrong), but maybe I'm missing some important information.

This might work:
Code:

sed -re 's/("[^,]+),([^,]+")/\1%\2/g' -e 's/,/ /g' -e 's/%/,/g' infile

wait_a_minute 08-02-2013 10:59 AM

Hi keenboy,

what about loading the csv-file into e.g. soffice/libreoffice etc. Then save with <tab> as field delimiter. Done. Not a cool bash script but it should do the trick.

allend 08-02-2013 12:07 PM

If the underlying CSV file is subject to change, then I would use perl with Text::CSV_XS http://search.cpan.org/~hmbrand/Text...1.01/CSV_XS.pm

schneidz 08-02-2013 12:58 PM

i havent tested it out but this c program will probably work:
Code:

[schneidz@hyper temp-cc]$ cat keenboy.c
#include "stdio.h"

main(int argc, char *argv[])
{
 int c, commaflag = 0;
 FILE * fstream;
 fstream = fopen(argv[1], "r");

 c = fgetc(fstream);
 while(c != EOF)
 {
  if(c == 34)
  commaflag = !commaflag;
  if(commaflag == 1 && c == 44)
  c = 32;
  printf("%c", c);
  c = fgetc(fstream);
 }
 fclose(fstream);
}


danielbmartin 08-04-2013 10:15 PM

Quote:

Originally Posted by keenboy (Post 5001478)
My CSV file is comma separated but sometimes data in a field contains commas i.e postal addresses etc.

I need to find a way to strip the commas within the fields but keep the commas that separate the field so I can still process the column data.

The text that sometimes has a comma in it is always contained in " ", maybe this is the key to the solution?

With this InFile ...
Code:

how,now,brown,cow
now,is,the time for, all good, men,"to, come", to the aid
of their, party
the quick brown fox, "always,struggled", with Regular Expressions

... this awk ...
Code:

awk 'BEGIN{RS="\"";ORS=RT}!(NR%2){gsub(/,/," ")}1' $InFile >$OutFile
... produced this OutFile ...
Code:

how,now,brown,cow
now,is,the time for, all good, men,to  come, to the aid
of their, party
the quick brown fox, always struggled, with Regular Expressions

Credit: this solution was adapted from that of LQ Guru grail as seen in
http://www.linuxquestions.org/questi...places-924110/

Daniel B. Martin

keenboy 08-05-2013 06:05 AM

Thanks guys. That helped lots!


All times are GMT -5. The time now is 08:52 PM.