Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a text report that I need to sort by the first field.
That first field has letters and numbers, like this:
aaa1a bbb 1q1q1q1
aaa10a ccc 1q1q1q1
aaa2a ddd 1q1q1q1
As you can see the regular sorting will do exactly that. Line with "aaa10a" appears as second not third.
Number or characters before digits, for digits and after digits can be different.
Is it possible to make sorting right in such case and how?
Can you give us some actual examples? How will the programme decide which characters it needs to use to sort? You said the "Number or [of?] characters before digits, for digits and after digits can be different" - so what would you sort "a11b5a" by, 11 or 5? Is there always going to be 'aaa' before the number?
My suspicion is that you're going to have to use something like python, perl or awk to do it, unless there's more regularity to the data than you've implied.
It is always alphabetic characters at the beginning, then digits and then alphabetic characters in the end.
I do not mind doing it in perl or awk. I never used python.
'sort -n' would work if the numbers were at the beginning. Do they always begin with the exact same sequence of letters 'aaa' ? If so, strip them off, sort, then add them back in.
Script a way to strip out the sequences of alpha and numeric characters into a multidimensional array,
so from the array of lines, the line 'aaa10xx86bbz'
becomes array 'aaa 10 xx 86 bbz'
and sort the array using the identified key type for each field of the sub-array. (aaa alpha, 10 numeric, xx alpha, 86 numeric, bbz alpha)
then paste the sub-arrays back together by stripping spaces.
Simple in Pascal, Perl, or C. Somewhat challenging in scripting, but possible (make functions WORK for you), and the scripting approach would make use of sort using the field nomenclature (see the man page).
Easier of the pattern of alpha-numeric-alpha is always the same, but solvable either way. Just break down the steps.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.