[SOLVED] How to script csv editing? Remove rows from csv file that do not contain certain text

ingram87 · 08-03-2012, 08:24 AM

I have a csv file with a column for states. How do I write a script to remove all rows that are not "NY" in that column? I'm not even sure where to begin, as I don't know of any tools to edit csv files from the command line.

If anyone knows where to point me, or knows a short script that does this, it would be much appreciated.

pixellany · 08-03-2012, 08:45 AM

The tools do not care whether it is a csv file. Are you familiar with AWK and SED? If not, read all about them here:
http://www.grymoire.com/Unix/

Also, please post a sample of what the file looks like.

MensaWater · 08-03-2012, 08:50 AM

CSV = comma separated values and is simply a text file using commas as delimiters.

Your file should look something like:
123 State Street,Albany,NY,10005
247 City Road,Atlanta,GA,30008
4957 County Avenue,Durham,NC,50333
2932 Parish Lane,New Orleans,LA,79999
4193 Borough Circle,New York,NY,10029

Since you know there is a comma before and after the value you're interested in the simplest way would be to simply grep for the pattern:

grep ,NY, csvfile

For above example that would output:
123 State Street,Albany,NY,10005
4193 Borough Circle,New York,NY,10029

You can redirect the above output into a new file:

grep ,NY, csvfile >newcsvfile

Note that you'd want to simply "cat csvfile" first to make sure the fields are as expected (e.g. it is not padded with extra spaces somehow such as ",NY ,")

ingram87 · 08-03-2012, 08:55 AM

I'm not familiar with those tools, will look into them. Here is a sample of the csv file:

Code:

id,name,state,
50,Jamie,AL,
51,Jenifer,GA,
52,George,NY,
53,Corey,NY,
54,Leslie,TN,
55,David,NY,

ingram87 · 08-03-2012, 09:07 AM

Thankyou MensaWater. This is what I needed

ingram87 · 08-03-2012, 09:42 AM

Instead of removing the lines that do not contain NY, how could I also send those to a new file so I can see which ones were removed?

pixellany · 08-03-2012, 09:51 AM

Code:

grep ", NY," filename > newfilename

Sends all lines containing the quoted phrase to "newfilename"

Note that this is not totally robust---i.e. it assumes that that pattern does not occur anywhere else. With AWK, you can do operations based on the specific field.

In addition to the references I linked, also go here: http://tldp.org
Get the Bash Guide for Beginners, and---later---the Advanced BASH Scripting Guide

MensaWater · 08-03-2012, 12:23 PM

Quote:

Originally Posted by ingram87

Instead of removing the lines that do not contain NY, how could I also send those to a new file so I can see which ones were removed?

When you use grep it gives you the pattern for which you're searching. To have it instead give you everything EXCEPT the pattern you use the -v option.

Code:

grep -v ",NY," filename > noNYentries

schneidz · 08-03-2012, 12:34 PM

awk solution:

Code:

awk -F , '$3 == "NY" {print $0}' ingram87.csv

MensaWater · 08-03-2012, 12:45 PM

Quote:

Originally Posted by schneidz

awk solution:

Code:

awk -F , '$3 == "NY" {print $0}' ingram87.csv

The above would be to get the records with NY.

For getting the ones WITHOUT NY:

Code:

awk -F , '$3 != "NY" {print $0}' ingram87.csv

And just as in the grep examples you can redirect to new files with the "> newfilename" at end of the line.