LinuxQuestions.org - script to remove the lines which are having the duplicate value in 2 fields

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - script to remove the lines which are having the duplicate value in 2 fields (https://www.linuxquestions.org/questions/linux-newbie-8/script-to-remove-the-lines-which-are-having-the-duplicate-value-in-2-fields-771845/)

ajcapri

11-27-2009 01:11 AM

script to remove the lines which are having the duplicate value in 2 fields

How to write a script to remove the lines which are having the duplicate value in 1st field and last field and redirect to another file, the file name should have the current time stamp.

catkin

11-27-2009 01:45 AM

Please post a sample of the data, the desired output and what you have tried so far.

ajcapri

11-27-2009 02:30 AM

I dint know how to proceed with writing the script as i am new to unix.

the sample data could be something like

1|a|aaaa|11
2|b|bbbb|222
3|c|cccc|333
2|d|dddd|333
3|e|aaaa|222

and the o/p i wish to get wud be

1|a|aaaa|11
2|b|bbbb|222
3|c|cccc|333

ajcapri

11-27-2009 03:07 AM

also I need to preserve the order...and what if the fields i need to be checked(1st and last) for duplication are alpha numeric

vinaytp

11-27-2009 03:17 AM

Hello ajcapri,

I have just tried out a code....Hope its working just check

Code:

#!/bin/bash

file=$1

while read line

do

a=`echo $line | cut -d '|' -f 1`

b=`echo $line | cut -d '|' -f 4`

if [ `echo $b | grep $a` ]

then

echo $line >> temp

touch temp

fi

done < $file

Execute it this way...

Code:

sh filename.sh /path/to/the/file/where/pattern/is/stored

Hope it helps....

Cheers !!!

ajcapri

11-27-2009 05:11 AM

how do i do de same in case i dont know de number of fields...in that situation how do i find de $n value for last field??

nagendrar

11-27-2009 05:51 AM

Write below code in filename.sh

#!/bin/bash
file=$1
while read line
do
len=`echo ${#line}`
name1=`echo "${line:0:1}"`
name2=`echo "${line:$len-1:$len}"`
if [ $name1 == $name2 ]
then
echo $line >> temp
touch temp
fi
done < $file

Run this as follows :
sh filename.sh /path/to/the/file/where/pattern/is/store

Regards,
Nagendra Rednam

ajcapri

11-27-2009 07:11 AM

i guess the codes u ppl have posted is fore comparing values in de first n last fields...i wat to delete records based on duplicate values in de first field and also based on last field...many thanks for ppl who ve been helping

catkin

11-27-2009 07:15 AM

Quote:

Originally Posted by ajcapri (Post 3771326)

Wot u "duplicate values in de first field"? Field wi XX wd hv dup X. K?

bsat	11-28-2009 02:36 AM

Try this awk script
you can put it into a file, say "awk_script" and your data in a file say "data".

BEGIN{FS="|"}
{
str=substr($NF,1,1)
if($1==str)
{
print $0
}
}

run it as follows

awk -f awk_script data

ajcapri

11-29-2009 09:38 PM

by duplicates i mean..

abc|aaa|bbb|ccc
def|ddd|eee|fff
ghi|ddd|eee|ccc
abc|ggg|hhh|iii
ghi|rrr|sss|fff

for this sample data, i need the last 2 rows removed as the value in the first field ("abc" and "ghi") are already present in the previous scanned rows. Further more, i need the 3rd and lst row removed as the values present in the last field ("ccc" and "fff") are duplicates.

All times are GMT -5. The time now is 02:19 PM.