how to sort the 2nd column on the basis of first column without repeating the value ?
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
#!/bin/bash
for i in `cat input |cut -d" " -f1 |sort -u`; do
n=0
for z in `cat input |grep ^$i |cut -d" " -f2 |sort`; do
if [ $n -eq 0 ]; then
echo $i $z
else
echo " $z"
fi
n=1
done
done
#!/bin/bash
for i in `cat input |cut -d" " -f1 |sort -u`; do
n=0
for z in `cat input |grep ^$i |cut -d" " -f2 |sort`; do
if [ $n -eq 0 ]; then
echo $i $z
else
echo " $z"
fi
n=1
done
done
Indeed the ++ notation to increment the value of the array element is not valid in solaris awk (actually it is implementation specific and should be avoided for compatibility between different awk flavours).
Here is a more explicit version that should work out-of-the-box for any awk installation:
i want to remove single entry like 1235 2211 ,means if the value in column one(1235) doesnt have more than one combination in second column(like 1235 have only 2211) then it doesnt come in the out put.
Following the previous solution in awk, you can put the file as argument twice, so that awk processes it in sequence: the first time it simply counts the occurrence of the first field, the second time prints out fields accordingly. Since we first apply the sort command, we might try process substitution as arguments of the awk command:
Following the previous solution in awk, you can put the file as argument twice, so that awk processes it in sequence: the first time it simply counts the occurrence of the first field, the second time prints out fields accordingly. Since we first apply the sort command, we might try process substitution as arguments of the awk command:
Actually process substitution is a bash feature, but maybe it was introduced later and/or your version on Solaris doesn't support it. Anyway, since awk on solaris doesn't support the FNR internal variable, we need to change the logic and avoid process substitution to pass the argument twice.
If the file is not huge in size, we can count the number of occurrences of the first field in the main program and do the printing out in the END section. E.g.
Code:
{
c[$1] = c[$1] + 1
rec[++n] = $0
}
END {
for ( i = 1; i <= n; i++ ) {
$0 = rec[i]
_[$1] = _[$1] + 1
if ( c[$1] > 1 )
if ( _[$1] > 1 )
printf "%9d\n", $2
else
print
}
}
Actually process substitution is a bash feature, but maybe it was introduced later and/or your version on Solaris doesn't support it. Anyway, since awk on solaris doesn't support the FNR internal variable, we need to change the logic and avoid process substitution to pass the argument twice.
If the file is not huge in size, we can count the number of occurrences of the first field in the main program and do the printing out in the END section. E.g.
Code:
{
c[$1] = c[$1] + 1
rec[++n] = $0
}
END {
for ( i = 1; i <= n; i++ ) {
$0 = rec[i]
_[$1] = _[$1] + 1
if ( c[$1] > 1 )
if ( _[$1] > 1 )
printf "%9d\n", $2
else
print
}
}
thank you,as you said if the file is not huge,but i hv approximately 25000000 lines seriously. so should i try this for that and one more thing, i hv to use it in linux. please tell me
thank you,as you said if the file is not huge,but i hv approximately 25000000 lines seriously. so should i try this for that and one more thing, i hv to use it in linux. please tell me
The issue should be the sorting part! I think it will require a lot of time, instead the awk program should be quick. If you have to run it on linux, you can either:
1. launch the sort command alone and save the results in a new file, then use the linux version of the awk program with the double argument (using the name of the saved file in place of process substitution)
2. use the solaris version of the program (it works in linux as well), so that you don't need to sort the file twice or save the result in a new file. Anyway, the sorting part - I repeat - makes me worry about the execution time.
Edit: I tried on my machine with CPU Intel T2300 @1.66GHz and 1 Gb di RAM and here is the result (reasonable in my opinion):
Code:
$ wc -l file
25148214 file
$ time sort -u file | awk '{c[$1] = c[$1] + 1; rec[++n] = $0} END{for ( i = 1; i <= n; i++ ) {$0 = rec[i]; _[$1] = _[$1] + 1; if ( c[$1] > 1 ) if ( _[$1] > 1 ) printf "%9d\n", $2; else print }}'
<results omitted>
real 3m1.074s
user 2m59.655s
sys 0m0.658s
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.