How to sort by line size (number of characters in a line)
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The -n flag tells it to sort numerically. Most unix programs have a lot of interesting flags that can be used for a variety of functions try 'man programname'
just got the script working:
----------------------------
data.txt:
---------
1 2
1 2 3
12345
12 34
12 345
sortme.sh
-----------
Quote:
file="data.txt"
for i in `seq $(cat $file|wc -l)` #let's read all lines one by one
do
line="`head -n$i $file|tail -n1`" #get text from line number i
linesize=`echo "$line"|wc -c ` #count number of characters
#let's append the numbers to the line and sort it then get the data out
echo -e "$linesize\t$line"
done
Now needs sorting the data:
---------------------
Quote:
chmod +rx ./sortme; ./sortme.sh |sort -n|cut -f2
output:
------------
1 2
1 2 3
12345
12 34
12 345
Last edited by fast_rizwaan; 01-08-2010 at 12:13 PM.
There ya go, good solution, but if it has to meet criteria like that you need to specify it up front or we have no idea... just about all types come here complete newbies to professionals.
I admit I am intrigued by this issue and I wonder if we can do it by means of the sort options. Looking at the info page of sort (that is more exhaustive than the man page) I reached this solution:
Code:
sort $(seq -f "-k1.%0.0f" 100 -1 1) file
in practice it uses multiple -k options, built by command substitution. The resulting command line will be something like:
that is they consider always the first field, but starting from a high position other fields are covered to the end of the line (in other words the entire line is considered as the first field, despite the presence of delimiters). The trick is that it sorts starting from the last character of each line back to the first and whereas the Nth character does not exist (shorter lines) the comparison is performed first. That is it orders lines from the shortest to the longest.
In practice we have to choose a number N greater than or equal to the number of characters in the longest line, but taking in mind that the greater is N the longer is the execution time. In my example I chose 100, which was enough for the text files I had at hand for testing.
Anyway, I'm not completely sure it works as expected. My tests are 100% correct but if someone would like to test it and report the result, it would be very appreciated. Just out of my eager curiosity!
Just a final note: the presence of tabs in the text can be confusing since they are considered as single characters, even if they appear as multiple spaces on the terminal screen. To avoid this "optical illusion" we can expand the file before sorting:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.