LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Sorting a file based on a parameter (https://www.linuxquestions.org/questions/linux-software-2/sorting-a-file-based-on-a-parameter-4175459305/)

MikeyCarter 04-23-2013 11:10 AM

Sorting a file based on a parameter
 
I have a file that has TEXT=value on column 5, 8 or 9. I need to sort the level+number in value field.

ie:

junk junk junk sort=first+10 junk
junk junk sort=first+38 junk junk
junk junk junk junk sort=first+08 junk

How do I sort such a beast.

cortman 04-23-2013 11:51 AM

The example you gave has the desired column at 4, 3, and 5. ?
Does the desired column also have the vaule "second" as well as "first" (as you portray)?

Please post a clearer example of input and desired output. Thanks.

MikeyCarter 04-23-2013 11:53 AM

Current file:
junk junk junk sort=first+10 junk
junk junk sort=first+38 junk junk
junk junk junk junk sort=first+08 junk
junk junk junk sort=first+40 junk
junk junk sort=first+6 junk junk

What I want
junk junk sort=first+6 junk junk
junk junk junk junk sort=first+08 junk
junk junk junk sort=first+10 junk
junk junk sort=first+38 junk junk
junk junk junk sort=first+40 junk

David the H. 04-23-2013 12:35 PM

Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.


The sort command will most likely do what you want in any case. Read info sort and give it a try. Post back with the results of your attempts and we'll help you with any problems you still have.

Edit: No, looking again, actually it probably won't. If the fields to sort by are not all in the same column, then you'll need something more powerful, like awk or perl.

David the H. 04-23-2013 12:57 PM

Ok, since this is non-trivial, here's a gawk solution for you. It relies on its new gensub function and internal sorting features, so you need gawk v4+.

Code:

awk 'BEGIN{ PROCINFO["sorted_in"]="@ind_num_asc" } \
    { a[gensub(/.*sort=first[+]([0-9]+).*/,"\\1","1")]=$0 }\
    END{ for (i in a) { print a[i] } }'  infile.txt

What it does is extract the number from the sorting fields, and uses them as the index numbers for an array that holds the whole line. Sort the array indexes and the when the array is printed out it will also be in the order you want.

cortman 04-23-2013 01:52 PM

Looks like David the H beat me to it. I initially thought sort as well but realized the OP's needs were beyond sort's scope.
Nice solution with awk.

chrism01 04-23-2013 09:04 PM

I'd have used a Perl hash, where the reqd field is used as the hash key. Gets a tad more fiddly if dupe keys are allowed, but its still do-able :)

David the H. 04-25-2013 10:27 AM

Pretty much any solution just requires extracting the desired number from each line, sorting by them, then printing out the lines they are associated with.

Simple arrays are generally the easiest way to do so, but yeah, they usually won't work if there are duplicate numbers involved. Later entries would overwrite earlier ones unless you went to special lengths to keep the indexes unique. And that would probably require some extra work with associative arrays/hashes.


But here's a simple shell loop that modifies the lines so that they can be processed with sort. It should handle duplicates safely and is posix-portable.

Code:

while read -r line || [ -n "$line" ] ; do
    num=${line##*first+}
    num=${num%% *}
    echo "$num $line"
done <input.txt | sort -k1n | cut -d' ' -f2-

The only remaining limitation is that, since it hasn't been defined how duplicate lines should be ordered in relation to each other, they will just come out in whatever way sort thinks they should. '-k1n' will simply sort the whole line alphanumerically, starting from the first field.


All times are GMT -5. The time now is 03:59 AM.