LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   BASH sort string separated by commas (http://www.linuxquestions.org/questions/linux-general-1/bash-sort-string-separated-by-commas-916217/)

SilversleevesX 11-29-2011 11:17 AM

BASH sort string separated by commas
 
While the Subject line says BASH, I'm not averse to other scripting solutions (awk, perl, etc.).

Years ago, I was greatly impressed by Extensis Portfolio's innate ability to alphabetically sort keywords and supplemental categories added to those picture file types that supported IPTC and XMP metadata. I have yet to find another application, GUI or CLI, that does this in any OS on any hardware platform, and I've been searching off and on since 1996.

In the meantime, I've "taken the job in hand" to do my own descriptive metadata writing alphabetically. Here's where I've run into a problem. I don't always think alphabetically when describing a picture I wish to add keywords and supplemental categories to, and rearranging by hand can get tedious; even when it's not, it's somewhat time-consuming.

I've also encountered situations where, when adding a new key or supplemental to a picture's metadata, most GUI apps attach it to the end of the set, which means more editing on my part.

To make quick work of this editing is what I'm after. The best I've come up with in BASH shell is
Code:

echo "adult,amateur,happy,blonde,funny,waterslide,rain" | tr , "\n" | sort | tr "\n" , ; echo
Which returns
Quote:

adult,amateur,blonde,funny,happy,rain,waterslide,
Except for the trailing comma, I'd be satisfied. Call it petty, but a few months back, I rewrote three or four BASH scripts in such a way that a trailing comma was made part of the text, instead of being seen as a delimiter. I suppose editing out one comma takes far less time than rearranging whole strings of words, but frankly, I'd rather not have to.

As I mentioned before, I'm not married to, or insistent on, a BASH solution.

Looking forward to any help on this plane,

BZT

makyo 11-29-2011 11:48 AM

Hi.

You might find msort to be useful. This script shows the context, then your data, then the output data, possibly needing to be pre-or-post-processed:
Code:

#!/usr/bin/env bash

# @(#) sh-minimal        Demonstrate record separator (,) with msort.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C msort

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results, noting missing record separator after rain:"
msort -r"," -w --comparison-type l --quiet $FILE
pe

pl " Results, adding record separator:"
sed 's/$/,/' $FILE |
msort -r"," -w --comparison-type l --quiet
pe

pl " Results, post-process, remove embedded newline attached to rain:"
msort -r","  -n1 --comparison-type l --quiet $FILE |
tr -d '\n'
pe

pl " Results, post-process, remove newline, trailing record separator:"
msort -r","  -n1 --comparison-type l --quiet $FILE |
tr -d '\n' |
sed 's/,$//'
pe

exit 0

producing:
Code:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny)
GNU bash 3.2.39
msort 8.44

-----
 Input data file data1:
adult,amateur,happy,blonde,funny,waterslide,rain

-----
 Results, noting missing record separator after rain:
adult,amateur,blonde,funny,happy,rain
,waterslide,

-----
 Results, adding record separator:

,adult,amateur,blonde,funny,happy,rain,waterslide,

-----
 Results, post-process, remove embedded newline attached to rain:
adult,amateur,blonde,funny,happy,rain,waterslide,

-----
 Results, post-process, remove newline, trailing record separator:
adult,amateur,blonde,funny,happy,rain,waterslide

The man page for msort is short, but the on-line documentation is extensive, see: http://freecode.com/projects/msort

This code was in the Debian repository for me. Very, very useful in complicated situations,

Best wishes ... cheers, makyo

SilversleevesX 12-25-2011 09:25 PM

Thanks for the tip on msort.

I'm going to copy your script suggestion and as soon as I have msort installed, I'll try it out on a few strings. Good to hear it's so well documented.

BZT

Telengard 12-25-2011 10:46 PM

Here's my awk solution. Copy and paste the following code into a new file named sort-csv.

Code:

# Sorts comma delimited fields per line.
# Requires Gawk.

BEGIN {
    FS=","
    OFS=","
}

{
    split($0, words)
    asort(words)
    for (i in words) $i=words[i]
    print
}

Single lines may be sorted from the command line by piping into gawk -f sort-csv, just as you were doing.

Code:

$ echo "adult,amateur,happy,blonde,funny,waterslide,rain" | gawk -f sort-csv
adult,amateur,blonde,funny,happy,rain,waterslide
$

If you have a file which contains lines of comma delimited words, then you can sort the whole file at once with the syntax gawk -f sort-csv file-name.

Here's an example data file saved as my-list.csv.

Code:

strawberries,blueberries,strawberries
pencil,crayon,chalk,marker
bus,car,train,motorcycle,bicycle,skateboard

Here's how to invoke the script.

Code:

gawk -f sort-csv my-list.csv
Here's the output produced by the script.

Code:

blueberries,strawberries,strawberries
chalk,crayon,marker,pencil
bicycle,bus,car,motorcycle,skateboard,train

HTH

Dark_Helmet 12-26-2011 12:18 AM

Ok, maybe I don't understand the problem completely, but if your only problem with your original command sequence is the trailing comma, then:
Code:

$ echo "adult,amateur,happy,blonde,funny,waterslide,rain" | tr , "\n" | sort | tr "\n" , | sed 's@\(.*\),@\1\n@'
adult,amateur,blonde,funny,happy,rain,waterslide

EDIT:
Or perhaps a non-regular-expression-using sed command:
Code:

$ echo "adult,amateur,happy,blonde,funny,waterslide,rain" | tr , "\n" | sort | tr "\n" , | sed 's@,$@\n@'
adult,amateur,blonde,funny,happy,rain,waterslide



All times are GMT -5. The time now is 08:16 PM.