LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-21-2012, 01:56 AM   #1
akeka
Member
 
Registered: May 2012
Posts: 39

Rep: Reputation: Disabled
Sort certain field but preserve other field data


Hi,

Referring to my last question at here

Code:
http://www.linuxquestions.org/questions/programming-9/faster-bash-sort-based-on-column-number-945008/
Now I want to sort the field 4th and 5th, but keep the 1st, 2nd and 3rd field data

Here's example of the input

Code:
   100000.0       191400.0         9671100.         39.00000         196.2000
   100000.0       191400.0         9671100.         28.30000         248.7000
   100000.0       191400.0         9671100.         172.3000         103.8000
   100000.0       191400.0         9671100.         331.0000         1011.900
   100000.0       191400.0         9671100.         581.5000         2068.000
   100000.0       191400.0         9671100.         740.1000         2774.700
   100000.0       191400.0         9671100.         1008.700         4157.100
   100000.0       191400.0         9671100.         1132.200         5031.100
   100001.0       191400.0         9670900.         38.20000         192.6000
   100001.0       191400.0         9670900.         86.90000         243.9000
   100001.0       191400.0         9670900.         171.0000         499.0000
   100001.0       191400.0         9670900.         331.5000         1012.500
   100001.0       191400.0         9670900.         582.6000         2071.600
   100001.0       191400.0         9670900.         740.5000         2774.400
   100001.0       191400.0         9670900.         1008.500         4152.700
   100001.0       191400.0         9670900.         1132.000         5027.200
   100002.0       191400.0         9670700.         37.40000         189.0000
   100002.0       191400.0         9670700.         85.40000         238.8000
   100002.0       191400.0         9670700.         169.6000         494.0000
   100002.0       191400.0         9670700.         332.0000         1013.300
   100002.0       191400.0         9670700.         583.7000         2075.200
   100002.0       191400.0         9670700.         740.9000         2774.100
   100002.0       191400.0         9670700.         1008.300         4148.300
   100002.0       191400.0         9670700.         1131.900         5023.600
Here's the output that I want

Code:
100000.0000	191400.0000	9671100.0000	39.0000	        103.8000
100000.0000	191400.0000	9671100.0000	28.3000	        196.2000
100000.0000	191400.0000	9671100.0000	172.3000	248.7000
100000.0000	191400.0000	9671100.0000	331.0000	1011.9000
100000.0000	191400.0000	9671100.0000	581.5000	2068.0000
100000.0000	191400.0000	9671100.0000	740.1000	2774.7000
100000.0000	191400.0000	9671100.0000	1008.7000	4157.1000
100000.0000	191400.0000	9671100.0000	1132.2000	5031.1000
100001.0000	191400.0000	9670900.0000	38.2000	        192.6000
100001.0000	191400.0000	9670900.0000	86.9000	        243.9000
100001.0000	191400.0000	9670900.0000	171.0000	499.0000
100001.0000	191400.0000	9670900.0000	331.5000	1012.5000
100001.0000	191400.0000	9670900.0000	582.6000	2071.6000
100001.0000	191400.0000	9670900.0000	740.5000	2774.4000
100001.0000	191400.0000	9670900.0000	1008.5000	4152.7000
100001.0000	191400.0000	9670900.0000	1132.0000	5027.2000
100002.0000	191400.0000	9670700.0000	37.4000	        189.0000
100002.0000	191400.0000	9670700.0000	85.4000	        238.8000
100002.0000	191400.0000	9670700.0000	169.6000	494.0000
100002.0000	191400.0000	9670700.0000	332.0000	1013.3000
100002.0000	191400.0000	9670700.0000	583.7000	2075.2000
100002.0000	191400.0000	9670700.0000	740.9000	2774.1000
100002.0000	191400.0000	9670700.0000	1008.3000	4148.3000
100002.0000	191400.0000	9670700.0000	1131.9000	5023.6000
Thanks
 
Old 05-21-2012, 02:21 PM   #2
Birei
LQ Newbie
 
Registered: Nov 2010
Posts: 17

Rep: Reputation: 6
Hi akeka,

One way using perl. I hope it can be useful. Output seems weird but fields are separated with '\t'.
Code:
$ cat infile
   100000.0       191400.0         9671100.         39.00000         196.2000
   100000.0       191400.0         9671100.         28.30000         248.7000
   100000.0       191400.0         9671100.         172.3000         103.8000
   100000.0       191400.0         9671100.         331.0000         1011.900
   100000.0       191400.0         9671100.         581.5000         2068.000
   100000.0       191400.0         9671100.         740.1000         2774.700
   100000.0       191400.0         9671100.         1008.700         4157.100
   100000.0       191400.0         9671100.         1132.200         5031.100
   100001.0       191400.0         9670900.         38.20000         192.6000
   100001.0       191400.0         9670900.         86.90000         243.9000
   100001.0       191400.0         9670900.         171.0000         499.0000
   100001.0       191400.0         9670900.         331.5000         1012.500
   100001.0       191400.0         9670900.         582.6000         2071.600
   100001.0       191400.0         9670900.         740.5000         2774.400
   100001.0       191400.0         9670900.         1008.500         4152.700
   100001.0       191400.0         9670900.         1132.000         5027.200
   100002.0       191400.0         9670700.         37.40000         189.0000
   100002.0       191400.0         9670700.         85.40000         238.8000
   100002.0       191400.0         9670700.         169.6000         494.0000
   100002.0       191400.0         9670700.         332.0000         1013.300
   100002.0       191400.0         9670700.         583.7000         2075.200
   100002.0       191400.0         9670700.         740.9000         2774.100
   100002.0       191400.0         9670700.         1008.300         4148.300
   100002.0       191400.0         9670700.         1131.900         5023.600
$ cat script.pl
use warnings;
use strict;

my ($key, $prev_key, @sort_fields);

while ( <> ) {
        chomp;

        ## Split line in fields and check they are five.
        my @f = split;
        next unless @f == 5;

        ## Change format of numbers to a float with four decimal numbers.
        @f = map { sprintf q[%.4f], $_ } @f;

        ## Get first three fields as key to save values to sort.
        my $key = join qq[\t], @f[0..2];

        ## Save values if line is the first one or the key is the same
        ## as previous line.
        if ( ! defined $prev_key || $key eq $prev_key ) {
                for my $i ( 0 .. 1 ) {
                        push @{ $sort_fields[ $i ] }, $f[ $i+3 ];
                }
                $prev_key = $key;
                next unless eof;
        }

        ## All values saved, sort them as numbers.
        for my $i ( 0 .. $#sort_fields ) {
                @{ $sort_fields[ $i ] } = sort { $a <=> $b } @{ $sort_fields[ $i ] };
        }

        ## Print to output.
        for my $i ( 0 .. $#{ $sort_fields[0] } ) {
                printf qq[%s\n], join qq[\t], $prev_key, $sort_fields[0]->[ $i ], $sort_fields[1]->[ $i ];
        }

        ## Save values of next key unless line is the last one.
        if ( ! eof ) {
                @sort_fields = ();
                for my $i ( 0 .. 1 ) {
                        push @{ $sort_fields[ $i ] }, $f[ $i+3 ];
                }
                $prev_key = $key;
        }
}
$ perl script.pl infile
100000.0000     191400.0000     9671100.0000    28.3000 103.8000
100000.0000     191400.0000     9671100.0000    39.0000 196.2000
100000.0000     191400.0000     9671100.0000    172.3000        248.7000
100000.0000     191400.0000     9671100.0000    331.0000        1011.9000
100000.0000     191400.0000     9671100.0000    581.5000        2068.0000
100000.0000     191400.0000     9671100.0000    740.1000        2774.7000
100000.0000     191400.0000     9671100.0000    1008.7000       4157.1000
100000.0000     191400.0000     9671100.0000    1132.2000       5031.1000
100001.0000     191400.0000     9670900.0000    38.2000 192.6000
100001.0000     191400.0000     9670900.0000    86.9000 243.9000
100001.0000     191400.0000     9670900.0000    171.0000        499.0000
100001.0000     191400.0000     9670900.0000    331.5000        1012.5000
100001.0000     191400.0000     9670900.0000    582.6000        2071.6000
100001.0000     191400.0000     9670900.0000    740.5000        2774.4000
100001.0000     191400.0000     9670900.0000    1008.5000       4152.7000
100001.0000     191400.0000     9670900.0000    1132.0000       5027.2000
100002.0000     191400.0000     9670700.0000    37.4000 189.0000
100002.0000     191400.0000     9670700.0000    85.4000 238.8000
100002.0000     191400.0000     9670700.0000    169.6000        494.0000
100002.0000     191400.0000     9670700.0000    332.0000        1013.3000
100002.0000     191400.0000     9670700.0000    583.7000        2075.2000
100002.0000     191400.0000     9670700.0000    740.9000        2774.1000
100002.0000     191400.0000     9670700.0000    1008.3000       4148.3000
100002.0000     191400.0000     9670700.0000    1131.9000       5023.6000
 
1 members found this post helpful.
Old 05-21-2012, 04:01 PM   #3
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Another way is to use GNU awk:
Code:
#!/usr/bin/gawk -f
BEGIN {
    # Accept any newline convention, removing leading and trailing whitespace.
    RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"

    # Field are spearated by whitespace.
    FS = "[\t\v\f ]+"

    # In output, use Unix newline convention,
    ORS = "\n"

    # TABs between fields, and
    OFS = "\t"

    # four decimal digits for floating-point values with 'print'.
    OFMT = "%.4f"

    # Clear the fourth and fifth field arrays.
    split("", field4)
    split("", field5)

    # No fields yet.
    fields = 0

    # Clear current keys.
    key1 = ""
    key2 = ""
    key3 = ""
}

# Output array contents.
function output() {
    # Sort the fields.
    asort(field4)
    asort(field5)

    # Output.
    for (i = 1; i <= fields; i++)
        print key1, key2, key3, field4[i], field5[i]

    # Clear.
    fields = 0
    split("", field4)
    split("", field5)
}

# Remove leading whitespace from first input line.
(FNR == 1) {
    sub(/^[\t\v\f ]+/, "", $0)
}

# Input record:
{
    # If any of the keys don't match, this is a new set.
    if (key1 != $1 || key2 != $2 || key3 != $3) {
        # Output the previous set, if any.
        output()
        key1 = $1
        key2 = $2
        key3 = $3
    }

    # Add data to current set.
    fields++
    field4[fields] = $4 * 1.0       # Make sure it is a number
    field5[fields] = $5 * 1.0       # Make sure it is a number
}

# After the final input record, output the final set, if any.
END {
    output()
}
This considers all consecutive lines where the first three fields match to be a single set. For each set, the values in the fourth field, and the values in the fifth field, are separately sorted in ascending order. The input is streamed, with just the three common fields and the values in fourth and fifth field for the current set kept in memory.

Integers are printed without a decimal point. If you want four decimal digits on all fourth and fifth field values, or perhaps some other format, use
Code:
    printf("%s%s%s%s%s%s%.4f%s%.4f%s", key1, OFS, key2, OFS, key3, OFS, field4[i], OFS, field5[i], ORS)
in the output() function. The keys are treated as strings (and therefore will be exactly like they were in the input), but you can format them too if you want.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Count occurrence of character in field and print in a new field Trd300 Linux - Newbie 5 03-21-2012 07:57 PM
[SOLVED] awk: how to print a field when field position is unknown? elfoozo Programming 12 08-18-2010 03:52 AM
bash script to sort data by field lothario Linux - Newbie 4 08-26-2009 02:23 AM
Can I use GNU sort to sort one field in order, another in reverse? zombieite Linux - Newbie 4 04-27-2009 12:23 AM
Sort File by Field - but with a Twist! ;) moo-cow Programming 8 06-12-2006 11:26 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:15 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration