[SOLVED] How to script csv editing? Remove rows from csv file that do not contain certain text
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
How to script csv editing? Remove rows from csv file that do not contain certain text
I have a csv file with a column for states. How do I write a script to remove all rows that are not "NY" in that column? I'm not even sure where to begin, as I don't know of any tools to edit csv files from the command line.
If anyone knows where to point me, or knows a short script that does this, it would be much appreciated.
CSV = comma separated values and is simply a text file using commas as delimiters.
Your file should look something like:
123 State Street,Albany,NY,10005
247 City Road,Atlanta,GA,30008
4957 County Avenue,Durham,NC,50333
2932 Parish Lane,New Orleans,LA,79999
4193 Borough Circle,New York,NY,10029
Since you know there is a comma before and after the value you're interested in the simplest way would be to simply grep for the pattern:
grep ,NY, csvfile
For above example that would output:
123 State Street,Albany,NY,10005
4193 Borough Circle,New York,NY,10029
You can redirect the above output into a new file:
grep ,NY, csvfile >newcsvfile
Note that you'd want to simply "cat csvfile" first to make sure the fields are as expected (e.g. it is not padded with extra spaces somehow such as ",NY ,")
Sends all lines containing the quoted phrase to "newfilename"
Note that this is not totally robust---i.e. it assumes that that pattern does not occur anywhere else. With AWK, you can do operations based on the specific field.
In addition to the references I linked, also go here: http://tldp.org
Get the Bash Guide for Beginners, and---later---the Advanced BASH Scripting Guide
Instead of removing the lines that do not contain NY, how could I also send those to a new file so I can see which ones were removed?
When you use grep it gives you the pattern for which you're searching. To have it instead give you everything EXCEPT the pattern you use the -v option.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.