LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Natural sort with just Bash or core Linux commands? (https://www.linuxquestions.org/questions/programming-9/natural-sort-with-just-bash-or-core-linux-commands-507796/)

ahz 12-05-2006 09:16 PM

Natural sort with just Bash or core Linux commands?
 
How do you sort a directory in the natural sort order just using Bash and core Linux utilities (such as sort and ls)? (In other words, no C, C++, PHP, Perl, etc.)

In natural sort, the file names img0.jpg img1.jpg img10.jpg should appear
img0.jpg
img1.jpg
img10.jpg

Instead of the standard sort or even dictionary sort (not helpful):
img0.jpg
img10.jpg
img1.jpg

fordeck 12-05-2006 09:46 PM

Here is an example:

Quote:

$ ls *txt
sort.txt
$
$
$ cat sort.txt
img0.jpg
img10.jpg
img1.jpg
$
$
$ sort -k1.4,1.5 sort.txt
img0.jpg
img1.jpg
img10.jpg
$
Let me know if that works for you.

Regards,
Fordeck

matthewg42 12-05-2006 09:49 PM

Rename your files to have leading 0's in the name so that dictionary sort order IS natural sort order.</cheap shot>

Seriously though, I'd like to know this too.

matthewg42 12-05-2006 09:51 PM

fordeck: that's nice, but what if your list is like this:
Code:

file0.dat
file10.dat
file1.dat
bananas0.dat
bananas10.dat
bananas1.dat

...and you want to do it in one go?

ahz 12-05-2006 10:18 PM

Matthew is correct. The program should automatically be able to handle any kind of file such as 123.jpg, Picture123.jpg, img123.jpg, etc. These algorithms exist in C, but as I wrote, I need Bash. :)

My main purpose of this original question is to make it easier to create DVD slide shows from OpenOffice.org Impress (and PowerPoint) using a nice, existing program written in Bash.
https://sourceforge.net/tracker/?fun...roup_id=100188
http://www.oooforum.org/forum/viewtopic.phtml?t=45483

matthewg42 12-05-2006 10:31 PM

After googling about a bit I can't find a Free Software tool for this. If you have a C program already written, consider releasing it under the GPL. :D

Seems like a massive omission though. I'm surprised there's no option in the GNU sort program.

tuxdev 12-05-2006 11:10 PM

If you've got C, shouldn't it be possible (if painful) to port that to Bash? If you don't want to do the work yourself for whatever reason, then please release the source so that maybe somebody else here can work on it.

I suppose it was omitted because it never came up as an issue. I think most would take the <cheap shot/> approach, or didn't care what order things were in. There's a secondary advantage to <cheap shot/>, it is a lot prettier to ls because everything is nicely aligned.

matthewg42 12-05-2006 11:18 PM

Quote:

Originally Posted by tuxdev
If you've got C, shouldn't it be possible (if painful) to port that to Bash? If you don't want to do the work yourself for whatever reason, then please release the source so that maybe somebody else here can work on it.

There's a Perl module on CPAN, so as long as we're happy making a program to slurp up all the input into memory and sort it that way it's a trivial matter to use the module. Maybe the utility could be called "natsort". Making an efficient program to sort huge files which don't fit in memory is another matter. One would hope that adding the feature to GNU sort would just be a matter of having a comparison function, and adding the command line option.

Quote:

Originally Posted by tuxdev
I suppose it was omitted because it never came up as an issue. I think most would take the <cheap shot/> approach, or didn't care what order things were in. There's a secondary advantage to <cheap shot/>, it is a lot prettier to ls because everything is nicely aligned.

Well, sort of. After the first time I came across this sort of thing, I made sure all my files has zero-padded numbers in their names, but it's not always the case that the person doing the sorting has control over the names of the files / input data.

tuxdev 12-06-2006 12:16 AM

Quote:

There's a Perl module on CPAN, so as long as we're happy making a program to slurp up all the input into memory and sort it that way it's a trivial matter to use the module. Maybe the utility could be called "natsort". Making an efficient program to sort huge files which don't fit in memory is another matter. One would hope that adding the feature to GNU sort would just be a matter of having a comparison function, and adding the command line option.
With scripting languages, just about every computer performance metric flies out the window. But apparently, using a C program is out of the question, and if/when after such an extension is added to GNU sort, it takes time to reach ubiquity. So, we're stuck with scripting. But if there aren't any more than say, 100-300 files, it should be okay.

Hey, this sounds like something cool to do in Lisp. I've been meaning to do something more than "Hello World", and it doesn't look like this has been done before.

Guttorm 12-06-2006 05:16 AM

Hi

PHP has this function, so if you have PHP installed, you can use this script:

Code:

#!/usr/bin/php
<?php

if ($argc == 2)
        $in_file = $argv[1];
else
        $in_file = "php://stdin";

$fp = fopen($in_file,"r")
or die("Failed opening $in_file");
$data = array();
while ($line = fgets($fp,5000))
        $data[] = $line;
fclose($fp);
natsort($data);
foreach ($data as $line)
        echo $line;
?>

You might need to set the path to php in the first line.

If you set execute rights on the script, it will run like every other script, even if its PHP.

When running the script, you can use a filename as parameter, if you don't specify it, it will read from stdin.

makyo 12-06-2006 02:53 PM

Hi.

Given the data file data2:
Code:

file0.dat
file10.dat
file1.dat
bananas0.dat
bananas10.dat
bananas1.dat
img0.jpg
img10.jpg
img1.jpg
Picture123.jpg
img123.jpg
Picture06.jpg
Picture006.jpg
Picture6.jpg
Picture6.gif
Picture8600.jpg

Operated on by script s1:
Code:

#!/bin/sh

# @(#) s1      Demonstrate key extraction for embedded numeric string.
# $Id$

F=${1-data2}
sed -e 's/^\([a-zA-Z]*\)\([0-9]*\)\(.*\)/\1\3 \2 \1\2\3/' $F |
sort -k 1,1 -k 2,2n |
sed -e 's/^.* //'

will produce:
Code:

% ./s1
Picture6.gif
Picture006.jpg
Picture06.jpg
Picture6.jpg
Picture123.jpg
Picture8600.jpg
bananas0.dat
bananas1.dat
bananas10.dat
file0.dat
file1.dat
file10.dat
img0.jpg
img1.jpg
img10.jpg
img123.jpg

The alpha and numeric strings are extracted and placed ahead of the entire filename. Then fields 1 and 2 are sorted, the latter as a numeric field. Finally the extracted key fields are discarded.

Not bullet-proof, but good enough for government work ... cheers, makyo

( edit 1: typo; missing single-quote )

matthewg42 12-06-2006 03:07 PM

makyo, that's really neat. I have to spend some time to understand it.

Having said that, it's not a general solution to the natural sort problem. It fails to work correctly the numeric portion precedes the non-numeric part.

For example, this list won't be sorted correctly.
Code:

02Al002
001Al201
001Al3
3Al001
30Al001

One would be able to properly sort this data with a modification to the script, but it would not be possible to properly sort this data mixed with data which is alpha then numeric, or even more complex alpha, num, alpha, num etc.

matthewg42 12-06-2006 05:02 PM

I just found that there is a patch for the GNU sort program:

http://sourcefrog.net/projects/natsort/textutils.diff

This adds natural sorting to GNU sort. There is also a stand-along natural sort utility written by the guy who made the patch, called natsort.

burninGpi 12-06-2006 08:26 PM

here's a REALLY cheap way to do this:
Code:

#!/bin/bash
ls >/tmp/files
cat << EOF >/tmp/natsort.c
*** insert c code for natural sorting here ***
EOF
gcc /tmp/natsort.c -o /tmp/natsort
/tmp/natsort /tmp/files
rm -f /tmp/files /tmp/natsort


unSpawn 12-06-2006 09:20 PM

here's a REALLY cheap way to do this
Actually its rather expensive cuz it won't work: you'll be missing the header file.

matthewg42 12-06-2006 09:38 PM

I think the options are (in descending order of probable (usefulness * 1/hassle) / 2):
  1. Build and install natsort.
  2. Patch GNU sort. Build. Install.
  3. When you know you'll just have alpha then numeric then alpha, use the sed and sort mechanism above.
  4. Write your own.

The patch for GNU sort from the guy who did natsort is out of date. I talked to the folks in #gnu at freenode, and they said they'd probably consider committing the feature if someone can implement it neatly, and use a nice long option name like --natural-sort (it doesn't deserve a short option apparently... :) )


All times are GMT -5. The time now is 09:28 PM.