LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Why is sort -k not working all the time? (https://www.linuxquestions.org/questions/linux-newbie-8/why-is-sort-k-not-working-all-the-time-4175421089/)

kristo5747 08-08-2012 05:50 PM

Why is sort -k not working all the time?
 
I have a script that puts a list of files in two separate arrays:

First, I get a file list from a ZIP file and fill `FIRST_Array()` with it. Second, I get a file list from a control file within a ZIP file and fill `SECOND_Array()` with it

Code:

                while read length date time filename
                do
                        FIRST_Array+=( "$filename" )
                        echo "$filename" >> FIRST.report.out
                done < <(/usr/bin/unzip -qql AAA.ZIP |sort -k12 -t~)

Third, I compare both array like so:

Code:

    diff -q <(printf "%s\n" "${FIRST_Array[@]}") <(printf "%s\n" "${SECOND_Array[@]}") |wc -l
I can tell that `Diff` fails because I output each array to files: `FIRST.report.out` and `SECOND.report.out` are simply not sorted properly.

1) FIRST.report.out (what's inside the ZIP file)


Code:

JGS-Memphis~AT1~Pre-Test~X-BanhT~JGMDTV387~6~P~1100~HR24-500~033072053326~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-GuinE~JGMDTV069~6~P~1100~H24-700~033081107519~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-MooreBe~JGM98745~40~P~1100~H21-200~029264526103~20120808~240914.XML
JGS-Memphis~FUN~Pre-Test~X-RossA~jgmdtv168~2~P~1100~H21-200~029415655926~20120808~240914.XML

2) SECOND.report.out (what's inside the ZIP's control file)

Code:

JGS-Memphis~AT1~Pre-Test~X-BanhT~JGMDTV387~6~P~1100~HR24-500~033072053326~20120808~240914.XML
JGS-Memphis~FUN~Pre-Test~X-RossA~jgmdtv168~2~P~1100~H21-200~029415655926~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-GuinE~JGMDTV069~6~P~1100~H24-700~033081107519~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-MooreBe~JGM98745~40~P~1100~H21-200~029264526103~20120808~240914.XML

Using sort -k12 -t~ made sense since ~ is the delimiter for the file's date field ("20120808" : 12th position). But it is not working consistently.

The sort is worse when my script processes bigger ZIP files. Why is sort -k not working all the time? How can I sort both arrays?

rigor 08-11-2012 02:44 PM

Hi kristo5747!

Are you using a shell that will expand ~ to your home directory, but not escaping ~

???

You might want to see if your code works any better if you put a backslash in front of ~
as in \~

David the H. 08-13-2012 01:33 PM

By my count the 12th field is the last one, the "240914.XML". I think you want to use "-k 11".

However, if you use "-k 11" it sorts by all fields from the 11th to the end of the line. You have to use "-k 11,11" to limit the sort to only the 11th field. (See the sort info page for details.)

Finally, you'll probably also want to use numerical sorting.


Edit: looking again, the sort command is on the raw unzip output, but you only posted the final sorted output. There's no way to tell from what you posted if it's targeting the correct field. Perhaps there's something else in it that's affecting the sort on some lines.

So how about posting the raw input from unzip too so we can compare?


All times are GMT -5. The time now is 10:17 AM.