LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-11-2012, 02:06 PM   #1
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Rep: Reputation: Disabled
Rename duplicate values in a column


Hi all,

I have a file like given below. I would like to rename the unique values in the first column by adding a,b,c..etc at the end for each occurence and add something like NU at the begining for non-uniq lines. Any ideas of how to do this using awk/sed or any other programming?

Code:
Contig  Data1 Data2

con1    pass   pass
con2    pass   pass
con3    pass     -
con3    fail   pass
con3    pass   fail
con4    fail   pass
con5    pass   fail
con5    fail   fail
My result file should look something like this

Code:
Contig     Data1 Data2

NU_con1    pass   pass
NU_con2    pass   pass
con3a      pass     -
con3b      fail   pass
NU_con3    pass   fail
NU_con4    fail   pass
con5a      pass   fail
con5b      fail   fail
Thanks in advance,
 
Old 07-11-2012, 07:36 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
I think you'll need to better explain what qualifies a line as not unique.

In your example there's only one occurrence of con1, yet you flag it
as NU. Why?
Why does NU_con3 spring into existence, but the last con5 turns into a con5b?



Cheers,
Tink
 
Old 07-12-2012, 04:59 AM   #3
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
I am very sorry for the confusion. There is error in the result file I have given in my previous post. Here is a better explanation.

What I want to do is to give unique values one ID and non uique values another ID. Then for non unique values I would like the IDs added to appear in series. So my result file should look something like this.

Code:
Contig             Data1 Data2

con1_Uniq          pass   pass
con2_Uniq          pass   pass
con3_NotUniq_1     pass     -
con3_NotUniq_2     fail   pass
con3_NotUniq_3     pass   fail
con4_Uniq          fail   pass
con5_NotUniq_1     pass   fail
con5_NotUniq_2     fail   fail
Thanks
 
Old 07-12-2012, 08:51 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Is the data sorted by the first column?
 
Old 07-12-2012, 09:12 AM   #5
Farzan Mufti
LQ Newbie
 
Registered: Feb 2007
Location: USA/Canada
Distribution: Red Hat, CentOS, Scientific, Fedora, Ubuntu, SUSE, SLES
Posts: 14

Rep: Reputation: 6
Bash script solution

Here's a solution. I have made a number of assumptions.
1. You are using a newer version of Bash that accepts associative arrays. You can check by issuing command: declare -A arr
2. Actual data starts with line number 3, so I processed the file starting line 3.

Code:
#!/bin/bash

FILE=$1

#Find all the duplicates
dups=$(cat $FILE | awk '{print $1}'| sort | uniq -d)
#Keep track of duplicate values
declare -A count
for val in $dups
do
    count[$val]=1
done

#Now lets process the file one line at a time
sed -n '3,$ p' $FILE | while read line
do
    #Get the first field
    f1=$(echo "$line" | awk '{print $1}')
    if [[ -n ${count[$f1]} ]]
    then
        #value is duplicate
        echo "$line" | sed "s/\($f1\)/\1_NotUniq_${count[$f1]}/"
        (( count[$f1]++ ))
    else
        echo $line | sed "s/\($f1\)/\1_Unique/"
    fi
done

USAGE:
Code:
./script_name filename
NOTE: I have tested the code before posting.



From: Farzan Jameel Mufti Thursday July 12, 2012
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to display 2 different column field values as one column value in mysql VijayaRaghavanLakshman Linux - General 2 04-16-2012 10:56 AM
How to read CSV data and compare the column values and then write them in new file VijayaRaghavanLakshman Linux - Newbie 9 01-26-2012 10:02 PM
MS Access SELECT Statement to Exclude Duplicate Values in a Specified Column devUnix Programming 4 12-14-2011 11:45 AM
[SOLVED] Delete rows based on values in a column using sed captainentropy Linux - Newbie 6 01-19-2011 09:59 AM
remove duplicate entries from first column?? kadvar Programming 2 05-12-2010 07:22 PM


All times are GMT -5. The time now is 09:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration