Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
01-08-2010, 10:25 AM
|
#1
|
Member
Registered: Oct 2002
Location: Hyderabad, India
Distribution: Slackware 10.1
Posts: 34
Rep:
|
How to sort by line size (number of characters in a line)
Hi,
I want to sort a number of lines based on their size:
data:
-------
12345678
87654321
1234
4321
123
321
12
21
1
2
Should output as:
-----------------
1
2
12
21
123
321
1234
4321
12345678
87654321
But i'm gettings this with sort
----------------
1
12
123
1234
12345678
2
21
321
4321
87654321
----------------
Can we sort the above "data" text, based on "number of characters" instead of "character order"? a small bash script would also help.
Thanks in advance.
|
|
|
01-08-2010, 10:45 AM
|
#2
|
Senior Member
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833
Rep: 
|
sort -n
Code:
core:~$ cat datafile
12345678
87654321
1234
4321
123
321
12
21
1
2
core:~$ sort datafile
1
12
123
1234
12345678
2
21
321
4321
87654321
core:~$ sort -n datafile
1
2
12
21
123
321
1234
4321
12345678
87654321
core:~$ man sort
SORT(1) User Commands SORT(1)
NAME
sort - sort lines of text files
SYNOPSIS
sort [OPTION]... [FILE]...
DESCRIPTION
Write sorted concatenation of all FILE(s) to standard output.
Mandatory arguments to long options are mandatory for short options too. Ordering options:
-b, --ignore-leading-blanks
ignore leading blanks
-d, --dictionary-order
consider only blanks and alphanumeric characters
-f, --ignore-case
fold lower case to upper case characters
-g, --general-numeric-sort
compare according to general numerical value
-i, --ignore-nonprinting
consider only printable characters
-M, --month-sort
compare (unknown) < âJANâ < ... < âDECâ
-n, --numeric-sort
compare according to string numerical value
-R, --random-sort
sort by random hash of keys
--random-source=FILE
get random bytes from FILE (default /dev/urandom)
-r, --reverse
reverse the result of comparisons
Other options:
-c, --check, --check=diagnose-first
check for sorted input; do not sort
-C, --check=quiet, --check=silent
like -c, but do not report first bad line
--compress-program=PROG
compress temporaries with PROG; decompress them with PROG -d
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
-m, --merge
merge already sorted files; do not sort
-o, --output=FILE
write result to FILE instead of standard output
-s, --stable
stabilize sort by disabling last-resort comparison
-S, --buffer-size=SIZE
use SIZE for main memory buffer
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
-T, --temporary-directory=DIR
use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories
-u, --unique
with -c, check for strict ordering; without -c, output only the first of an equal run
-z, --zero-terminated
end lines with 0 byte, not newline
--help display this help and exit
--version
output version information and exit
POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitesâ
pace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.
With no FILE, or when FILE is -, read standard input.
*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
AUTHOR
Written by Mike Haertel and Paul Eggert.
REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.
COPYRIGHT
Copyright © 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
The full documentation for sort is maintained as a Texinfo manual. If the info and sort programs are properly installed at your site, the command
info sort
should give you access to the complete manual.
GNU coreutils 6.9.92.4-f088d-dirty January 2008 SORT(1)
The -n flag tells it to sort numerically. Most unix programs have a lot of interesting flags that can be used for a variety of functions try 'man programname'
Last edited by rweaver; 01-08-2010 at 10:54 AM.
|
|
|
01-08-2010, 11:28 AM
|
#3
|
Member
Registered: Oct 2002
Location: Hyderabad, India
Distribution: Slackware 10.1
Posts: 34
Original Poster
Rep:
|
sort -n won't work with words with "spaces"
example:
$ sort -n
1 2
1 2 3
12345
12 34
12 345
--------
will get us:
---------
1 2
1 2 3
12 34
12 345
12345
Instead I want:
---------------
1 2
1 2 3
12 34 <= same size of 5 digits
12345 <= same size of 5 digits
12 345 <= six digit below 5 digits
-----------
I did see the man sort. but couldn't find the right option which includes "blanks" and sort by sizes (wc -c).
thinking of creating a script with wc-c, sedding etc..
|
|
|
01-08-2010, 11:39 AM
|
#4
|
Member
Registered: Oct 2002
Location: Hyderabad, India
Distribution: Slackware 10.1
Posts: 34
Original Poster
Rep:
|
just got the script working:
----------------------------
data.txt:
---------
1 2
1 2 3
12345
12 34
12 345
sortme.sh
-----------
Quote:
file="data.txt"
for i in `seq $(cat $file|wc -l)` #let's read all lines one by one
do
line="`head -n$i $file|tail -n1`" #get text from line number i
linesize=`echo "$line"|wc -c ` #count number of characters
#let's append the numbers to the line and sort it then get the data out
echo -e "$linesize\t$line"
done
|
Now needs sorting the data:
---------------------
Quote:
chmod +rx ./sortme; ./sortme.sh |sort -n|cut -f2
|
output:
------------
1 2
1 2 3
12345
12 34
12 345
Last edited by fast_rizwaan; 01-08-2010 at 12:13 PM.
|
|
|
01-08-2010, 12:31 PM
|
#5
|
Senior Member
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833
Rep: 
|
There ya go, good solution, but if it has to meet criteria like that you need to specify it up front or we have no idea... just about all types come here complete newbies to professionals.
A shorter solution would be:
Code:
core:~/test/test20$ cat datafile | awk '{print length,$0}' | sort -n | awk ' {$1="";print $0}' | cut -f2- -d' '
1
2
12
21
123
321
1234
4321
1 723
4 234
92 784
12345678
87654321
Last edited by rweaver; 01-08-2010 at 12:39 PM.
|
|
|
01-08-2010, 02:47 PM
|
#6
|
LQ Guru
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509
|
In alternative, what about a little perl one-liner?
Code:
perl -e 'print sort { length $a <=> length $b } <>' data.txt
|
|
|
01-08-2010, 03:01 PM
|
#7
|
Senior Member
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833
Rep: 
|
Honestly, it could probably be shortened to a sed or awk one liner also.
|
|
|
01-08-2010, 03:03 PM
|
#8
|
Member
Registered: Jan 2009
Location: wherever I can make a living
Distribution: OpenBSD / Debian / Ubuntu / Win7 / OpenVMS
Posts: 440
Rep:
|
I have a perl script that I wrote for doing this quickly on incredibly huge files - but for normal size stuff 'sort' utility definitely is awesome.
|
|
|
01-08-2010, 05:53 PM
|
#9
|
LQ Guru
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509
|
I admit I am intrigued by this issue and I wonder if we can do it by means of the sort options. Looking at the info page of sort (that is more exhaustive than the man page) I reached this solution:
Code:
sort $(seq -f "-k1.%0.0f" 100 -1 1) file
in practice it uses multiple -k options, built by command substitution. The resulting command line will be something like:
Code:
sort -k1.100 -k1.99 -k1.98 ... <omitted> ... -k1.2 -k1.1 file
that is they consider always the first field, but starting from a high position other fields are covered to the end of the line (in other words the entire line is considered as the first field, despite the presence of delimiters). The trick is that it sorts starting from the last character of each line back to the first and whereas the Nth character does not exist (shorter lines) the comparison is performed first. That is it orders lines from the shortest to the longest.
In practice we have to choose a number N greater than or equal to the number of characters in the longest line, but taking in mind that the greater is N the longer is the execution time. In my example I chose 100, which was enough for the text files I had at hand for testing.
Anyway, I'm not completely sure it works as expected. My tests are 100% correct but if someone would like to test it and report the result, it would be very appreciated. Just out of my eager curiosity!
Just a final note: the presence of tabs in the text can be confusing since they are considered as single characters, even if they appear as multiple spaces on the terminal screen. To avoid this "optical illusion" we can expand the file before sorting:
Code:
expand file | sort $(seq -f "-k1.%0.0f" 100 -1 1)
|
|
|
All times are GMT -5. The time now is 03:48 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|