LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   sorting words with numbers (https://www.linuxquestions.org/questions/linux-general-1/sorting-words-with-numbers-932499/)

vostrushka 03-03-2012 07:52 AM

sorting words with numbers
 
I have a text report that I need to sort by the first field.
That first field has letters and numbers, like this:
aaa1a bbb 1q1q1q1
aaa10a ccc 1q1q1q1
aaa2a ddd 1q1q1q1

As you can see the regular sorting will do exactly that. Line with "aaa10a" appears as second not third.
Number or characters before digits, for digits and after digits can be different.
Is it possible to make sorting right in such case and how?

Best regards
Leonid

Snark1994 03-03-2012 08:02 AM

Can you give us some actual examples? How will the programme decide which characters it needs to use to sort? You said the "Number or [of?] characters before digits, for digits and after digits can be different" - so what would you sort "a11b5a" by, 11 or 5? Is there always going to be 'aaa' before the number?

My suspicion is that you're going to have to use something like python, perl or awk to do it, unless there's more regularity to the data than you've implied.

vostrushka 03-03-2012 08:10 AM

It is always alphabetic characters at the beginning, then digits and then alphabetic characters in the end.
I do not mind doing it in perl or awk. I never used python.

Leonid

H_TeXMeX_H 03-03-2012 08:20 AM

'sort -n' would work if the numbers were at the beginning. Do they always begin with the exact same sequence of letters 'aaa' ? If so, strip them off, sort, then add them back in.

wpeckham 03-03-2012 11:04 AM

Or you could
 
Script a way to strip out the sequences of alpha and numeric characters into a multidimensional array,
so from the array of lines, the line 'aaa10xx86bbz'
becomes array 'aaa 10 xx 86 bbz'
and sort the array using the identified key type for each field of the sub-array. (aaa alpha, 10 numeric, xx alpha, 86 numeric, bbz alpha)
then paste the sub-arrays back together by stripping spaces.

Simple in Pascal, Perl, or C. Somewhat challenging in scripting, but possible (make functions WORK for you), and the scripting approach would make use of sort using the field nomenclature (see the man page).

Easier of the pattern of alpha-numeric-alpha is always the same, but solvable either way. Just break down the steps.

vostrushka 03-03-2012 01:25 PM

This is good idea. Thank you. I'll try.
Leonid


All times are GMT -5. The time now is 10:34 AM.