[SOLVED] Using sed to remove lines with duplicate ID's, but different endings...

wapitismith · 05-08-2010, 11:08 AM

I have a file that contains lines representing the nodes of a polyline but I only need the first point in each segment. With the following text:

0,"013A",0.57,260739.891,4379258.87
0,"013A",0.57,260737.674,4379258.94
0,"013A",0.57,260684.628,4379258.35
1,"013A",0.545,260769.915,4379257.84
1,"013A",0.545,260739.891,4379258.87
2,"013A",1.059,259567.126,4379293.16
2,"013A",1.059,259562.637,4379302.59
2,"013A",1.059,259534.423,4379337.52
2,"013A",1.059,259460.853,4379414.3
3,"013A",1.036,259574.096,4379278.51
3,"013A",1.036,259567.126,4379293.16
4,"013A",1,259580.147,4379253.83
4,"013A",1,259574.415,4379277.84
4,"013A",1,259574.096,4379278.51
5,"013A",0.98,259581.802,4379185.53
5,"013A",0.98,259580.147,4379253.83

I would like to have this as output:

0,"013A",0.57,260737.674,4379258.94
1,"013A",0.545,260769.915,4379257.84
2,"013A",1.059,259567.126,4379293.16
3,"013A",1.036,259574.096,4379278.51
4,"013A",1,259580.147,4379253.83
5,"013A",0.98,259581.802,4379185.53

I've tried combinations of uniq and awk, and sed, but I am stumped. I'm sure I'm too close to the problem and can't find the simple solution.

The problem with uniq is that the last two colums will differ. I don't care about the x/y for any points following the first one.

Any assistance would be greatly appreciated.

~wapitismith~

colucix · 05-08-2010, 11:33 AM

What about this?

Code:

awk -F, '{ if ( ! ( $1$2$3 in _ )) _[$1$2$3] = $0 } END { for ( i in _ ) print _[i] }' file | sort

grail · 05-08-2010, 11:36 AM

Or maybe:

Code:

awk -F, '!_[$1]++' file

colucix · 05-08-2010, 11:40 AM

Great, grail!

wapitismith · 05-08-2010, 12:30 PM

Incredilby easy! I knew I was missing something simple. very nice responses!!!

Thanks!