sed or awk for csv file clarifying
Hello all ,
I've been tasked with accepting an odd csv feed from one of my clients. Right now it has 26k rows, it will grow over time. The alpha values are encapsulated with double quotes, the numeric values are not. Fields separated by commas. 6 or 7, for now, entries have a carriage return somewhere in the data. Addresses mainly, which throws off importing these into a database. I can obviously scan through and find the lines , manually deleting the useless carriage returns and all is well. But I need this as a bash piece so I can forget about it. Example: Code:
1234,"Johnny's Pizza","111 Main St.","Yankee Town","PA","11111","US","Johnny Contact" The header values are not wrapped with quotes. So the only qualifier I can come up with is no carriage returns after an odd number of quotes. Is there any way to do this with sed or awk? Or something else? Thanks, Joe |
Ah, the lovely CVS file.
Here's a little shell program, dos2unx that will clean things up: Code:
!/bin/sh Hope this helps some. |
Well that might help on some other files and I may have misspoke as the new line is what I need to get rid of.
The files are transfered, fed, through my FTP. I was to understand that took care of the different styles of line feeds? My issue is the 6 to 7 rows that have line feed/carriage returns in the data field. I need to filter just those out. |
Hi.
Try this perl script: Code:
while(<>){ I use combination of comma and quote as indication of beginning of a string. This code will not work if there are newline in a string in first column, because there are no comma before it. Hope this helps. |
Here's essential awk to parse such a file, including newlines in fields. It is taken from a utility that parses Outlook CSVs. Irrelevant code has been snipped from BEGIN. The msg function is not essential to how it works. The core of the functionality is in the get_field function. Header_NF is the number of fields as defined by the header. The my_getline function is essentially awk's getline but returns an error if there should be another line (field with embedded newline immediately before end of file)
Code:
BEGIN { |
All times are GMT -5. The time now is 04:49 PM. |