script to remove the lines which are having the duplicate value in 2 fields
How to write a script to remove the lines which are having the duplicate value in 1st field and last field and redirect to another file, the file name should have the current time stamp.
|
Please post a sample of the data, the desired output and what you have tried so far.
|
I dint know how to proceed with writing the script as i am new to unix.
the sample data could be something like 1|a|aaaa|11 2|b|bbbb|222 3|c|cccc|333 2|d|dddd|333 3|e|aaaa|222 and the o/p i wish to get wud be 1|a|aaaa|11 2|b|bbbb|222 3|c|cccc|333 |
also I need to preserve the order...and what if the fields i need to be checked(1st and last) for duplication are alpha numeric
|
Hello ajcapri,
I have just tried out a code....Hope its working just check Code:
#!/bin/bash Code:
sh filename.sh /path/to/the/file/where/pattern/is/stored Cheers !!! |
how do i do de same in case i dont know de number of fields...in that situation how do i find de $n value for last field??
|
Write below code in filename.sh
#!/bin/bash file=$1 while read line do len=`echo ${#line}` name1=`echo "${line:0:1}"` name2=`echo "${line:$len-1:$len}"` if [ $name1 == $name2 ] then echo $line >> temp touch temp fi done < $file Run this as follows : sh filename.sh /path/to/the/file/where/pattern/is/store Regards, Nagendra Rednam |
i guess the codes u ppl have posted is fore comparing values in de first n last fields...i wat to delete records based on duplicate values in de first field and also based on last field...many thanks for ppl who ve been helping
|
Quote:
|
Try this awk script
you can put it into a file, say "awk_script" and your data in a file say "data". BEGIN{FS="|"} { str=substr($NF,1,1) if($1==str) { print $0 } } run it as follows awk -f awk_script data |
by duplicates i mean..
abc|aaa|bbb|ccc def|ddd|eee|fff ghi|ddd|eee|ccc abc|ggg|hhh|iii ghi|rrr|sss|fff for this sample data, i need the last 2 rows removed as the value in the first field ("abc" and "ghi") are already present in the previous scanned rows. Further more, i need the 3rd and lst row removed as the values present in the last field ("ccc" and "fff") are duplicates. |
All times are GMT -5. The time now is 02:19 PM. |