[BASH] Sort while ignoring "The"
I have a list of maybe a thousand or more movies that I want sorted, but some of the titles begin with the words "The" or "A", which makes finding the movie your looking for more complicated than I'd like.
Is it possible to sort the list, while ignoring words like "The" or "A", or (ideally) dropping the words and appending them to the end (i.e Movie Title, The) Oh, there's one more thing I didn't mention...Each movie's title will begin with, lets say, a not-so-random a series of symbols used as a code for matching and classification (i.e. [*][+][##][ ] The Movie Title). If needed, I could change the pattern(s) of symbols to be represented instead by numbers if it would make it easier, but I'd like to keep the symbols if possible. |
You could use sed to attach "The" or "A" to the end of the name and separate it from the file with a semi-colon ( ; ). Then you can use sort alphabetically. Reorganize the file names using awk. Awk will place "The" or "A" back to the beginning of the file name.
One-liner solution... Code:
ls -1 | sed 's/^\(The \|A \)\(.*\)/\2;\1/' | sort | awk 'BEGIN{FS=";"};$0 ~ /;/{print $2 $1};$0 !~ /;/{print $0}' Code:
#run substitute command Code:
#$0 is the whole line, $1 is the first field, $2 is the second field (field separator is a space by default) **********EDIT I noticed my one liner did not account for your strange "begins with weird symbols" request, e.g. "[asdf] The Movie.file" I'll attempt to adapt my commands. ONE-LINER SOLUTION #2 Code:
ls -1 | sed 's/[._]/ /g; s/^\(\[[^]]*\]\s*\)\(.*\)/\2;\1/; s/^\(The \|A \)\(.*\)/\2;\1/' | sort | awk 'BEGIN{FS=";"} NF == 1 {print $0} NF == 2 {print $2 $1} NF == 3 {print $2 $3 $1}' Code:
#replace all periods and underscores with spaces Code:
#set field separator to semi-colon |
There's a way to do this purely in Bash and Bash alone as you require, but let's have a concept first. If you want, you could base it to create scripts for Awk or other languages like Ruby instead. You could expect other commands or newer version of known commands to solve this, but it might not be available always.
One way to do it is to map the strings to an associative array where keys are already trimmed with common words like "A" and "The", and punctuation marks like *, +, #, etc. From there you could sort those key strings either by another indexed array or just sort them with the sort command through regeneration by echo. Once those keys are sorted you can then base from those to reprint the keys in a sorted form. An example of it would be like this: Code:
#!/bin/bash Code:
bash script.sh < input_list.txt > output_list.txt |
Thank you both for these responses.
@konsolebox, I hadn't really thought of using associative arrays in this situation but it makes a lot more sense than what I had planned. |
You can also just associate via a delimiter:
Code:
#!/usr/bin/env bash Kevin Barry |
An awk solution as konsolebox had suggested:
Code:
ls /path/to/files/ | awk '{m[gensub(/^(A|The)\./,"","1")]=$0}END{asorti(m,a);for(i=1;i <= length(m);i++)print m[a[i]]}' |
Just some corrections:
Code:
KEY=${KEY#@(The|A)*([[:blank:]])} Code:
KEY=${KEY##@(The|A)+([[:blank:]])} Code:
#!/bin/bash |
You also need $BASH_VERSINFO instead of BASH_VERSINFO.
Kevin Barry |
Quote:
|
hmmm ... that doesn't seem to work for me, I get a nasty set of error messages:
Code:
$ [[ BASH_VERSINFO >= 4 ]] && echo yes Code:
$ (( BASH_VERSINFO >= 4 )) && echo yes |
I guess a good question to ask is are the file names consistent? Do they always have "[stuff] name.file" or are they sometimes just "name.file" with no stuff?
|
Quote:
Kevin Barry |
Quote:
|
My bad, thanks ta0kira for pointing out the oversight. Strangely enough it does seem to work without the $ at the front when using -ge for the test.
|
Much as how (( )) interprets arithmetic expressions without the need of $, so does [[ ]] with arithmetic comparisons.
Memory tells me that around 2006 or 2008 when I had attempted to convert my [[ A -xx -B ]] expressions to (( )), (( )) just returned the same exit code no matter what the expression was. I hope it was actually just a mistake on my part since even now I can't reproduce the same error. Yet I can't help being careful and have doubts with it already. I actually considered using (( )) for a while already for some arithmetic comparisons where some other expressions can't be handled by [[ ]]. That said I still respect [[ ]] as the main tool for conditional expressions, but for more complex comparisons where we had to enclose expressions in () like (( (A + 4) % 5 < B )) which would be a convenience than having to use a slower re-evaluating sub-expression like [[ $(( (A + 4) % 5 )) -lt B ]]. Some simpler expressions like changing the variable's value on the value could be done in [[ ]] like [[ A+=2 -lt X ]] but I dislike the style inconsistency for not being able to add spaces. As for (( BASH_VERSINFO >= 4 )), we can't have that as an alternative for [[ BASH_VERSINFO -ge 4 ]] since some shells could see (( )) as a syntax error for ( ). It's not only about the message being shown but also about how we could prevent commands after it to be executed or misinterpreted in other shells, especially those that cause irrevocable changes. |
All times are GMT -5. The time now is 08:29 PM. |