LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-14-2011, 06:16 AM   #1
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Rep: Reputation: Disabled
Question how to sort the 2nd column on the basis of first column without repeating the value ?


i have two columns:

1232 3454
1232 3453
1232 3454
1232 3455
1233 2222
1233 2322
1234 4545
1234 4545
1233 2222
1233 2322
1234 4545
1234 4245
1234 4545
1234 4545
1234 5664
1232 3456
1233 6767

i want the out like this:

1232 3454
3453
3455
3456
1233 2222
2322
6767

1234 4245
4545
5664

please please help me to this .
 
Old 12-14-2011, 06:47 AM   #2
zQUEz
Member
 
Registered: Jun 2007
Distribution: Fedora, RHEL, Centos
Posts: 294

Rep: Reputation: 53
Assuming your input file looks like:
Code:
1232 3454
1232 3453
1232 3454
1232 3455
1233 2222
1233 2322
1234 4545
1234 4545
1233 2222
1233 2322
1234 4545
1234 4245
1234 4545
1234 4545
1234 5664
1232 3456
1233 6767
you could run a script that looks like:

Code:
#!/bin/bash

for i in `cat input |cut -d" " -f1 |sort -u`; do
        n=0
        for z in `cat input |grep ^$i |cut -d" " -f2 |sort`; do
                if [ $n -eq 0 ]; then
                        echo $i $z
                else
                        echo "     $z"
                fi
                n=1
        done
done
which will return:
Code:
1232 3453
     3454
     3454
     3455
     3456
1233 2222
     2222
     2322
     2322
     6767
1234 4245
     4545
     4545
     4545
     4545
     4545
     5664
 
1 members found this post helpful.
Old 12-14-2011, 07:02 AM   #3
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Code:
$ sort -u file | awk '_[$1]++ {printf "%9d\n", $2; next}1'
1232 3453
     3454
     3455
     3456
1233 2222
     2322
     6767
1234 4245
     4545
     5664
 
1 members found this post helpful.
Old 12-14-2011, 10:32 AM   #4
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Original Poster
Rep: Reputation: Disabled
Thumbs up

Quote:
Originally Posted by zQUEz View Post
Assuming your input file looks like:
Code:
1232 3454
1232 3453
1232 3454
1232 3455
1233 2222
1233 2322
1234 4545
1234 4545
1233 2222
1233 2322
1234 4545
1234 4245
1234 4545
1234 4545
1234 5664
1232 3456
1233 6767
you could run a script that looks like:

Code:
#!/bin/bash

for i in `cat input |cut -d" " -f1 |sort -u`; do
        n=0
        for z in `cat input |grep ^$i |cut -d" " -f2 |sort`; do
                if [ $n -eq 0 ]; then
                        echo $i $z
                else
                        echo "     $z"
                fi
                n=1
        done
done
which will return:
Code:
1232 3453
     3454
     3454
     3455
     3456
1233 2222
     2222
     2322
     2322
     6767
1234 4245
     4545
     4545
     4545
     4545
     4545
     5664
thanks for help,its working fine ,thanks alot.

could u please help to sort it out like:

1232 3453
3454
3455
3456
1233 2222
2322
6767
1234 4245
4545
5664

Last edited by zediok; 12-14-2011 at 10:47 AM.
 
Old 12-14-2011, 10:35 AM   #5
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Original Poster
Rep: Reputation: Disabled
Unhappy

Quote:
Originally Posted by colucix View Post
Code:
$ sort -u file | awk '_[$1]++ {printf "%9d\n", $2; next}1'
1232 3453
     3454
     3455
     3456
1233 2222
     2322
     6767
1234 4245
     4545
     5664


thanks for help, but its not working :-(
 
Old 12-14-2011, 11:36 AM   #6
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Quote:
Originally Posted by zediok View Post
thanks for help, but its not working :-(
I'm curious, what is the error? Or unexpected output?
 
1 members found this post helpful.
Old 12-14-2011, 11:55 AM   #7
asimba
Member
 
Registered: Mar 2005
Location: 127.0.0.0
Distribution: Red Hat / Fedora
Posts: 349

Rep: Reputation: 42
Quote:
Originally Posted by zediok View Post
thanks for help, but its not working :-(
I tried it - Works for me
 
Old 12-14-2011, 09:03 PM   #8
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Original Poster
Rep: Reputation: Disabled
Thumbs up

Quote:
Originally Posted by colucix View Post
I'm curious, what is the error? Or unexpected output?

i made a little mistake , i was trying it in solaris and the output comes everytime :

bash-3.00# cat fi
1232 3454
1232 3453
1232 3454
1232 3455
1233 2222
1233 2322
1234 4545
1234 4545
1233 2222
1233 2322
1234 4545
1234 4245
1234 4545
1234 4545
1234 5664
1232 3456
1233 6767
bash-3.00# sort -u fi | awk '_[$1]++ {printf "%9d\n", $2; next}1'
awk: syntax error near line 1
awk: bailing out near line 1

but when i tried it in Linux the output come as you showed,

thanks alot for solving this problem :-)
 
Old 12-15-2011, 02:32 AM   #9
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Indeed the ++ notation to increment the value of the array element is not valid in solaris awk (actually it is implementation specific and should be avoided for compatibility between different awk flavours).

Here is a more explicit version that should work out-of-the-box for any awk installation:
Code:
sort -u testfile | awk '{_[$1] = _[$1] + 1; if ( _[$1] > 1 ) printf "%9d\n", $2; else print}'
 
1 members found this post helpful.
Old 12-15-2011, 09:11 PM   #10
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Original Poster
Rep: Reputation: Disabled
Question

1232 3454
1232 3453
1232 3454
1232 3455
1233 2222
1233 2322
1234 4545
1234 4545
1233 2222
1233 2322
1234 4545
1234 4245
1234 4545
1234 4545
1234 5664
1232 3456
1233 6767
1235 2211

till now we get the output like this:

1232 3453
3454
3455
3456
1233 2222
2322
6767
1234 4245
4545
5664
1235 2211


i want to remove single entry like 1235 2211 ,means if the value in column one(1235) doesnt have more than one combination in second column(like 1235 have only 2211) then it doesnt come in the out put.


means the output comes like this:

1232 3453
3454
3455
3456
1233 2222
2322
6767
1234 4245
4545
5664
 
Old 12-16-2011, 01:16 AM   #11
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Following the previous solution in awk, you can put the file as argument twice, so that awk processes it in sequence: the first time it simply counts the occurrence of the first field, the second time prints out fields accordingly. Since we first apply the sort command, we might try process substitution as arguments of the awk command:
Code:
awk 'FNR==NR {c[$1]++; next} _[$1]++ {printf "%9d\n", $2; next} c[$1]>1' <(sort -u file) <(sort -u file)
Note that process substitution doesn't work on Solaris if using /bin/sh or /bin/tcsh.
 
Old 12-16-2011, 09:36 PM   #12
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by colucix View Post
Following the previous solution in awk, you can put the file as argument twice, so that awk processes it in sequence: the first time it simply counts the occurrence of the first field, the second time prints out fields accordingly. Since we first apply the sort command, we might try process substitution as arguments of the awk command:
Code:
awk 'FNR==NR {c[$1]++; next} _[$1]++ {printf "%9d\n", $2; next} c[$1]>1' <(sort -u file) <(sort -u file)
Note that process substitution doesn't work on Solaris if using /bin/sh or /bin/tcsh.



while running this i am getting some error like as shown below:

# more fi
1232 3454
1232 3453
1232 3454
1232 3455
1233 2222
1233 2322
1234 4545
1234 4545
1233 2222
1233 2322
1234 4545
1234 4245
1234 4545
1234 4545
1234 5664
1232 3456
1233 6767
1235 2211
# awk 'FNR==NR {c[$1]++; next} _[$1]++ {printf "%9d\n", $2; next} c[$1]>1' <(sort -u fi) <(sort -u fi)
syntax error: `(' unexpected
# bash
bash-3.00# pwd
/work
bash-3.00# awk 'FNR==NR {c[$1]++; next} _[$1]++ {printf "%9d\n", $2; next} c[$1]>1' <(sort -u fi) <(sort -u fi)
awk: syntax error near line 1
awk: bailing out near line 1
bash-3.00#


as you said (process substitution doesn't work on Solaris if using /bin/sh or /bin/tcsh)

so could u please tell me how should i run it.
 
Old 12-17-2011, 03:30 AM   #13
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Actually process substitution is a bash feature, but maybe it was introduced later and/or your version on Solaris doesn't support it. Anyway, since awk on solaris doesn't support the FNR internal variable, we need to change the logic and avoid process substitution to pass the argument twice.

If the file is not huge in size, we can count the number of occurrences of the first field in the main program and do the printing out in the END section. E.g.
Code:
{
  c[$1] = c[$1] + 1
  rec[++n] = $0 
}

END {
  for ( i = 1; i <= n; i++ ) {
    $0 = rec[i]
    
    _[$1] = _[$1] + 1
    
    if ( c[$1] > 1 )
      if ( _[$1] > 1 )
        printf "%9d\n", $2
      else
        print
  }
}
On a single (long) line it would be:
Code:
sort -u file | awk '{c[$1] = c[$1] + 1; rec[++n] = $0} END{for ( i = 1; i <= n; i++ ) {$0 = rec[i]; _[$1] = _[$1] + 1; if ( c[$1] > 1 ) if ( _[$1] > 1 ) printf "%9d\n", $2; else print }}'
 
Old 12-17-2011, 09:32 AM   #14
zediok
LQ Newbie
 
Registered: Dec 2011
Posts: 8
Blog Entries: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by colucix View Post
Actually process substitution is a bash feature, but maybe it was introduced later and/or your version on Solaris doesn't support it. Anyway, since awk on solaris doesn't support the FNR internal variable, we need to change the logic and avoid process substitution to pass the argument twice.

If the file is not huge in size, we can count the number of occurrences of the first field in the main program and do the printing out in the END section. E.g.
Code:
{
  c[$1] = c[$1] + 1
  rec[++n] = $0 
}

END {
  for ( i = 1; i <= n; i++ ) {
    $0 = rec[i]
    
    _[$1] = _[$1] + 1
    
    if ( c[$1] > 1 )
      if ( _[$1] > 1 )
        printf "%9d\n", $2
      else
        print
  }
}
On a single (long) line it would be:
Code:
sort -u file | awk '{c[$1] = c[$1] + 1; rec[++n] = $0} END{for ( i = 1; i <= n; i++ ) {$0 = rec[i]; _[$1] = _[$1] + 1; if ( c[$1] > 1 ) if ( _[$1] > 1 ) printf "%9d\n", $2; else print }}'




thank you,as you said if the file is not huge,but i hv approximately 25000000 lines seriously. so should i try this for that and one more thing, i hv to use it in linux. please tell me
 
Old 12-17-2011, 12:40 PM   #15
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Quote:
Originally Posted by zediok View Post
thank you,as you said if the file is not huge,but i hv approximately 25000000 lines seriously. so should i try this for that and one more thing, i hv to use it in linux. please tell me
The issue should be the sorting part! I think it will require a lot of time, instead the awk program should be quick. If you have to run it on linux, you can either:

1. launch the sort command alone and save the results in a new file, then use the linux version of the awk program with the double argument (using the name of the saved file in place of process substitution)

2. use the solaris version of the program (it works in linux as well), so that you don't need to sort the file twice or save the result in a new file. Anyway, the sorting part - I repeat - makes me worry about the execution time.


Edit: I tried on my machine with CPU Intel T2300 @1.66GHz and 1 Gb di RAM and here is the result (reasonable in my opinion):
Code:
$ wc -l file
25148214 file
$ time sort -u file | awk '{c[$1] = c[$1] + 1; rec[++n] = $0} END{for ( i = 1; i <= n; i++ ) {$0 = rec[i]; _[$1] = _[$1] + 1; if ( c[$1] > 1 ) if ( _[$1] > 1 ) printf "%9d\n", $2; else print }}'
<results omitted>

real    3m1.074s
user    2m59.655s
sys     0m0.658s

Last edited by colucix; 12-17-2011 at 12:49 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple column sort brownflamigo1 Programming 7 09-15-2011 09:05 PM
unix sort on column PMP Linux - Newbie 3 08-24-2009 06:42 AM
[Perl] Sort a file by column Kunsheng Programming 4 04-24-2009 09:09 AM
[SOLVED] Unable to sort by column in KDE 4.2.2 bassmadrigal Slackware 5 04-18-2009 05:22 AM
sort on a single column? baidym Programming 3 01-03-2009 08:46 AM


All times are GMT -5. The time now is 02:13 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration