LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   splitting csv into array with IFS (https://www.linuxquestions.org/questions/linux-general-1/splitting-csv-into-array-with-ifs-4175528586/)

sudowtf 12-17-2014 02:44 PM

splitting csv into array with IFS
 
I need to split csv lines into an array of 16 fields. The csv is comma delimited, BUT the field values are also quote delimited.

I can successfully do so with IFS=',' read -a array However; it will obviously fail for values that contain commas.

example data:
Code:

"32744","STANDARD","Lake Helen",,,"FL","Volusia County","America/New_York","386","28.98","-81.23","NA","US","0","2861",
"00650","STANDARD","Florida",,"Alt De Florida, Est De Arroyo, Repto Seoane, Urb Altos De Florida, Urb Las Flores, Urb Vegas De Florida","PR",,,"787,939","18.36","-66.56","NA","US","0","0",

my code
Code:

zip=INSERT_ZIP_HERE
line=$(grep $zip zip_code_database.csv)
IFS=',' read -a array <<< $line
echo lat ${array[9]}
echo long ${array[10]}

You can see this will work fine for zip 32744 but fail for zip 00650.

How can I properly (and as simply as possible) split the csv line?

linosaurusroot 12-17-2014 07:41 PM

Really crudely; you could see any field that starst with a quote but doesn't end with one needs to be joined up to the neighbours (with an added comma each time) until a closing quaote is found.

sudowtf 12-21-2014 10:42 AM

Well this may be a sucky solution but seems to work without error:

given http://www.unitedstateszipcodes.org/...e_database.csv

i'm replacing comma delimiters with a char that should never be used. but not replacing commas within the quoted value:
Code:

#replace (",") (,") (",) then remove (") and finally replace lingering field seperators that do not have values - denoted by (,,) (ↈ,) (,ↈ)
cat zip_code_database.csv | sed 's/\",\"/ↈ/g' | sed 's/,\"/ↈ/g' | sed 's/\",/ↈ/g' | sed 's/\"//g' | sed 's/ↈ,/ↈↈ/g' | sed 's/,ↈ/ↈↈ/g' | sed 's/,,/ↈↈ/g' > NEW_ZIP.csv

line=$(grep ^$zip NEW_ZIP.csv)

IFS=ↈ read -a array <<< $line

echo lat ${array[9]}
echo long ${array[10]}



All times are GMT -5. The time now is 10:24 PM.