LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-11-2010, 06:03 AM   #1
ziggy25
Member
 
Registered: Aug 2005
Distribution: Debian 5.2
Posts: 56

Rep: Reputation: 15
How to ignore rows with a specific character in a csv file


Hi,

I have a csv file that has around 3 million rows. The file format is shown below

Code:
"fd!","sdf","dsfds","dsfd"
"fd!","asdf","dsfds","dsfd"
"fd","sdf","rdsfds","dsfd"
"fdd!","sdf","dsfds","fdsfd"
"fd!","sdf","dsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
"a2!","sdf","tdsfds","dsfd"
"faaav","sdf","tdsfds","dsfd"
"b","sdf","tdsfds","dsfd"
"fd!","sdf","dsfds","dsfd"
"fd","sdf","dsfds","dsfd"
I want to process this file so that it creates a new file that contains only the rows that have 2 characters or less in the first column column. The resultant file should look like this

Code:
"fd","sdf","rdsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
"b","sdf","tdsfds","dsfd"
"fd","sdf","dsfds","dsfd"
At the moment i am using SQL Loader but its taking too long so im wondering whether this would be easier if done in Unix?

Thanks
 
Old 03-11-2010, 06:18 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
Hi,

Would this help:

sed -n '/^"[[:alpha:]]\{0,2\}",/p' csvfile

Hope this helps.
 
Old 03-11-2010, 06:23 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
awk -F"," 'length($1)<=4' file
 
Old 03-11-2010, 06:48 AM   #4
ziggy25
Member
 
Registered: Aug 2005
Distribution: Debian 5.2
Posts: 56

Original Poster
Rep: Reputation: 15
Thanks. Both work perfectly. Does that mean the semi colon is included as part of the characters?
And also, how can i modify the awk solution so that it only selects the ones that dont have the "!" at end of the first column so that the output looks like this .

Code:
"fd","sdf","rdsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
"faaav","sdf","tdsfds","dsfd"
"b","sdf","tdsfds","dsfd"
"fd","sdf","dsfds","dsfd"
I spent a whole week trying to achieve this with SQL loader and now you guys are telling me its do'able with a simple unix command. I feel stupid!
 
Old 03-11-2010, 06:57 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
Hi,

Glad I (we) could be of help!

Quote:
Does that mean the semi colon is included as part of the characters?
Which semicolon [;] (do you mean comma[,])? Both examples do output the comma[,].

Quote:
I spent a whole week trying to achieve this with SQL loader and now you guys are telling me its do'able with a simple unix command. I feel stupid!
Thats not being stupid!! You "just" aren't around long enough to know what is possible I'm still learning new stuff as I go along.

Hope this helps.
 
Old 03-11-2010, 06:59 AM   #6
ziggy25
Member
 
Registered: Aug 2005
Distribution: Debian 5.2
Posts: 56

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by druuna View Post
Hi,

Glad I (we) could be of help!

Which semicolon [;] (do you mean comma[,])? Both examples do output the comma[,].

Thats not being stupid!! You "just" aren't around long enough to know what is possible I'm still learning new stuff as I go along.

Hope this helps.
Sorry i meant the double quote.
 
Old 03-11-2010, 07:02 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
Hi,

This is what the output looks like (input file contains your example in post #1):
Code:
 $ sed -n '/^"[[:alpha:]]\{0,2\}",/p' input 
"fd","sdf","rdsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
"b","sdf","tdsfds","dsfd"
"fd","sdf","dsfds","dsfd"
The found lines are shown without any changes.

Hope this clears things up.
 
Old 03-11-2010, 09:43 AM   #8
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by ziggy25 View Post
Thanks. Both work perfectly. Does that mean the semi colon is included as part of the characters?
And also, how can i modify the awk solution so that it only selects the ones that dont have the "!" at end of the first column so that the output looks like this .

Code:
"fd","sdf","rdsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
"faaav","sdf","tdsfds","dsfd"
"b","sdf","tdsfds","dsfd"
"fd","sdf","dsfds","dsfd"
I spent a whole week trying to achieve this with SQL loader and now you guys are telling me its do'able with a simple unix command. I feel stupid!
why is this line
Code:
"faaav","sdf","tdsfds","dsfd"
selected ?
to check for "!" at end of column1, (as well as length 4)

Code:
awk -F"," 'length($1)<=4 && $1!~/\!\"$/' file

Last edited by ghostdog74; 03-11-2010 at 09:45 AM.
 
Old 03-13-2010, 01:38 PM   #9
ziggy25
Member
 
Registered: Aug 2005
Distribution: Debian 5.2
Posts: 56

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by ghostdog74 View Post
why is this line
Code:
awk -F"," 'length($1)<=4 && $1!~/\!\"$/' file
hi, i understood the first condition where it checks the length but how exactly does the second condition work?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Text file manipulation: Extracting specific rows according to numerical pattern CHARL0TTE Linux - Newbie 3 10-07-2009 08:14 AM
convert columns to rows (tab separated file to csv) doug23 Programming 16 08-16-2009 10:14 PM
Print only specific rows in a text file Mike_V Programming 3 04-24-2009 08:18 PM
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 10:15 PM
Extra Character in Linux from CSV file freephoneid Linux - General 2 11-26-2007 05:20 PM


All times are GMT -5. The time now is 08:33 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration