filter columns
Hi ,
This is my first time posting in this forum and i hope someone will help me with my query. Basically i have dataframe with two columns. I want to keep only those in first column that are not duplicated. If they are duplicated then i would like to keep only one based on value in column 2. I have given an example for this kind below. target_id fpkm 1 comp247393_c0_seq1 3.197885 2 comp257058_c0_seq4 1.624577 3 comp242590_c0_seq1 1.750319 4 comp77911_c0_seq1 1.293059 5 comp241426_c0_seq1 1.626589 6 comp288413_c0_seq1 14.828853 7 comp294436_c0_seq1 11.555596 8 comp63603_c0_seq1 1.982386 9 comp267138_c0_seq1 8.594494 10 comp267138_c0_seq2 11.134958 11 comp321623_c0_seq1 6.934149 In the above dataframe as you can see there are two with the same name (almost) comp267138_c0_seq1 comp267138_c0_seq2 and i want to keep only comp267138_c0_seq2 because it has higher value in column 2. Please help me with this.... |
First of all the sample output you've mentioned contains 3 fields seperatee by columns :-)... So I will assume only last 2 columns of every line as your sample output, as:
comp247393_c0_seq1 3.197885 comp257058_c0_seq4 1.624577 comp242590_c0_seq1 1.750319 comp77911_c0_seq1 1.293059 ...... ........ So, first, move this output to some file, and then you can filter the content as follow: $ <your-date> > /tmp/sampledata.txt (Moving your above output data into a file named /tmp/sampledata.txt) $ more /tmp/sampledata.txt | awk -F" " '{print $1} | awk '!_[$0]++' Also mentione that on what basis you want to keep a unique value of column on basis of 2nd column? Would you like to retain 1st column if it's corresponding 2nd column value is high, if duplicate value found in column 1st? |
If the order is not important:
Code:
awk '{ In practice, at first you remove the number at the end of the first field and use the first part of the string as index of the arrays (this ensures equality of the first fields, except for the last number). Then the conditional expressions check if the second field is greater than the value previously stored for the same element (if any) and act accordingly, by assigning the new value if the condition is true or retaining the previous value if the condition is false. Note that I used two arrays __ and _ in order to store both the first (unchanged) field and the value in the second field respectively. Hope this helps. |
Thanks both of you. Both of the solutions worked. Thanks again!
|
Quote:
Please mark this thread as solved (find it on top left corner of the page, under thread tool option), if no more queries left. Have a nice time! |
All times are GMT -5. The time now is 02:33 AM. |