LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-05-2012, 07:03 AM   #1
arbex5
LQ Newbie
 
Registered: Jul 2012
Location: Brazil
Distribution: Ubuntu
Posts: 13

Rep: Reputation: Disabled
Post Delete row of csv file that doesn't meet a criteria


I do need something very similar to this an old post that have a solution.
http://www.linuxquestions.org/questi...18#post4719618

I have two cvs files (geodata files)

One have 3 columns (lets call file_1) and the other 5 columns (lets call file_2)

file_1 is like this:

startIpNum,endIpNum,locId
"16777216","16777471","17"
"16777472","16777727","24328"
"16777728","16778239","49"

file_2 is like this:

geo_id,country,region,city,postalCode
32,BR,,,
19304,BR,SP,São Paulo,
24328,BR,RJ,Rio De Janeiro,
22975,BR,SP,Salto,

What I need is to create a new file (lets call file_3) that match the field locId of the file_1 with the field geo_id of the file_2
In other words, if a field locId (file_1) and a field gep_id (file_2) are equal, generate a file_3 with the row of file_1 that matched the criteria.

Example.

Using the tables above and running the script the resulting file_3 would be

startIpNum,endIpNum,locId
"16777472","16777727","24328"

Which is the locId "24328" (file_1) matched the geo_id "24328" (file_2) and generate the resulting file_3

Is this hard to make?
Thanks.
 
Old 07-05-2012, 08:46 AM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Have InFile1...
Code:
"16777216","16777471","17"
"16777472","16777727","24328"
"16777728","16778239","49"
Have InFile2...
Code:
32,BR,,,
19304,BR,SP,São Paulo,
24328,BR,RJ,Rio De Janeiro,
22975,BR,SP,Salto,
Want OutFile...
Code:
"16777472","16777727","24328"
[/QUOTE]
Code:
cut -d, -f1 $InFile2               \
|sed 's/^/"/g'                     \
|sed 's/$/"/g'                     \
|sort                              \
|join -t, -1 3 -2 1 $InFile1 -     \
|awk -F"," '{printf $2","$3","$1}' \
> $OutFile
Daniel B. Martin
 
Old 07-05-2012, 09:42 AM   #3
arbex5
LQ Newbie
 
Registered: Jul 2012
Location: Brazil
Distribution: Ubuntu
Posts: 13

Original Poster
Rep: Reputation: Disabled
Hi Daniel, thanks for answer.

Here is what I did:

I renamed the files to adjust to your command and make it easy.

Here is what I got:

join: InFile1.csv:2: is not sorted: "16777216","16777471","17"


The resulting was an empty file.
In order to simplify, InFile2.cvs can be just a column with geo_id like this:

geo_id
32
19304
24328
22975

So if the field locId in InFile1.csv has a matching field or number in InFile2.csv make a new InFile3.csv with these rows.

Thanks
 
Old 07-05-2012, 09:54 AM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by arbex5 View Post
Here is what I got:

join: InFile1.csv:2: is not sorted: "16777216","16777471","17"
I suspect you ran my code against files different from those in your original post. join wants input files in sorted order. Here is a revised solution which assures sorted files. This code requires the use of a temporary work file.
Code:
cut -d, -f1 $InFile2               \
|sed 's/^/"/g'                     \
|sed 's/$/"/g'                     \
|sort                              \
> $Work1

sort -k3,3  $InFile1               \
|join -t, -1 3 -2 1 - $Work1       \
|awk -F"," '{printf $2","$3","$1}' \
> $OutFile2
Daniel B. Martin
 
Old 07-05-2012, 11:08 AM   #5
arbex5
LQ Newbie
 
Registered: Jul 2012
Location: Brazil
Distribution: Ubuntu
Posts: 13

Original Poster
Rep: Reputation: Disabled
Hi Daniel, thank you again.

I think I'm doing something wrong.
Here is the error.

join: -:2: is not sorted: "1000005632","1000006655","104820"

Here are the commands I executed.

$ cut -d, -f1 InFile2.csv |sed 's/^/"/g' |sed 's/$/"/g' |sort > Work1
$ sort -k3,3 InFile1.csv |join -t, -1 3 -2 1 - Work1 |awk -F"," '{printf $2","$3","$1}' > OutFile2

the result is a Wok1 file with sorted ascending geo_id and an empty OutFile2

The file InFile1.csv are not sorted.
Here is a snapshot of them (InFile1.csv has almost 2M rows and InFile2.csv has almost 5k rows:

InFile1.csv

startIpNum,endIpNum,locId
"16777216","16777471","17"
"16777472","16777727","24328"
"16777728","16778239","49"
"16778240","16779263","17"
"16779264","16781311","49"
"16781312","16785407","111"
"16785408","16793599","49"
"16793600","16809983","111"
"16809984","16842751","209"
.
.
.

InFile2.csv (I isolate to just geo_id column)

geo_id
32
19304
19319
22975
23787
23793
24758
24927
24930
.
.
.

Thanks again.
 
1 members found this post helpful.
Old 07-05-2012, 11:38 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
How about:
Code:
awk -F, 'NR==FNR{id[$1];next}FNR==1 || gensub(/"/,"","g",$3) in id' file2 file1
This was using original data.
 
Old 07-05-2012, 02:01 PM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by arbex5 View Post
I think I'm doing something wrong.
I post a solution only after testing it. The code generates the expected output here.

Are you testing with the same small test files which were given in your original post? If not, please do that first. If the code works, then test with the large real-world files. If the code works with small test files but not with large real-world files you may be discovering something interesting in your data.

Daniel B. Martin
 
Old 07-05-2012, 02:15 PM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by arbex5 View Post
Here is the error.
join: -:2: is not sorted: "1000005632","1000006655","104820"
Let's try a different line of attack. This code does not use join and does not require sorting.
Code:
cut -d, -f1 $InFile2   \
|sed 's/^/"/g'         \
|sed 's/$/"$/g'        \
|grep -f -  $InFile1   \
> $OutFile
Daniel B. Martin
 
1 members found this post helpful.
Old 07-05-2012, 05:12 PM   #9
arbex5
LQ Newbie
 
Registered: Jul 2012
Location: Brazil
Distribution: Ubuntu
Posts: 13

Original Poster
Rep: Reputation: Disabled
Thumbs up Worked pretty well

Hello Daniel, it worked pretty well.
Both codes.
The first one worked with the sample data I provided you.
The last one worked with the file itself.

Thank you so much.
Have a nice day.
 
Old 07-06-2012, 02:18 PM   #10
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
OP says this code works on his real-world files.
Code:
cut -d, -f1 $InFile2   \
|sed 's/^/"/g'         \
|sed 's/$/"$/g'        \
|grep -f -  $InFile1   \
> $OutFile
I found a way to tighten up this code but (on the minus side) it is less readable.
Code:
sed -r 's/([^,]*).*/"\1"$/1' $InFile2 \
|grep -f - $InFile1                   \
> $OutFile8
This sed says:
- read the file $InFile2
- extract whatever character string precedes the first comma
- prefix it with a " (double quote)
- suffix it with a " (double quote) and a $ (dollar sign)

Daniel B. Martin
 
  


Reply

Tags
csv, delete, matching


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple grep outputs appended to single row of CSV file findme.krish Linux - Newbie 5 09-12-2010 06:56 PM
Shell script to parse csv-like output, row by row utahnix Linux - General 8 12-08-2007 05:03 AM
bash script - remove header row from csv file pljvaldez Programming 5 08-30-2006 11:05 AM
Delete the first row from a text file loopoo Linux - Newbie 2 08-15-2006 02:57 AM
C++ read csv file row into vector taban1 Programming 3 11-08-2004 02:01 PM


All times are GMT -5. The time now is 12:37 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration