last occurrence of duplicate record, gawk
Hi Linux pros !
I am new in programming, and I am trying with gawk to remove duplicates lines based on the first 2 fields, then keep the last occurrence of the record. I've seen a lot of commands to keep the first occurrence but not really the last one (and it seems more complicated than I thought). The file I want to treat is (although the real one id much longer): Code:
item1/ref.001/eur/Bel. Code:
item1/ref.001/eur/Spa. Code:
gawk 'BEGIN{FS="/"} Thanks in advance ! |
This code should work.
Code:
awk 'BEGIN{FS="/"} Quote:
|
Whoops, I forgot the END... Thanks rosehosting.com !
Can I ask you another question related to this thread? (maybe I should create a new thread) Even if it is not necessary now, I am just curious about a detail. While I was looking for a solution to this problem, I faced another problem to create an array. For each line of the previous input, I added a field with the number of occurrence of the paired $1 and $2, and wanted to sort the field I just created by descending order using the "asort" function. Then I could have use the classical way to remove duplicates by keeping the first instance. The problem is I couldn't get the right output after sorting the last field in order to obtain that: Code:
item1/ref.001/eur/Spa./1 Code:
BEGIN{FS=OFS="/"} I am not sure about the array in the asort function. Can you create an array without using the "split" function? Thanks ! |
Here is another alternative:
Code:
awk -F/ '!($1$2 in a){i=1}{a[$1$2][i++]=$0}END{for(x in a)print a[x][length(a[x])]}' file |
@grail:
Thanks ! (Your code didn't work for me with awk but with gawk) So if I follow your logic, we can also create an array by writing Code:
( items in array){ do stuff with array ... } But when I tried to create an array based on the iteration of the input lines (and thus count[$1$2]++) with my previous code, Code:
gawk 'BEGIN{FS=OFS="/"} Code:
item1/ref.001/eur/Spa./1 |
How about a simple non-awk solution?
Code:
tac infile.txt | sort -u -t '/' -k 1,2 |
All times are GMT -5. The time now is 09:15 AM. |