LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 08-18-2010, 11:46 PM   #1
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Rep: Reputation: 15
BASH Sort list by end of line to x position in each line?


I'm trying to make another file annotation script a little speedier than it has been by the up-until-now proven method of checking the last four characters in a filename before the "dot" (eg .jpg, .psd) against a list of known IPTC categories and Exiv2 command files. It occurred to me that if one script generated a list of files in directory foo, and the same or another script sorted that list by that four-letter tag, then that list could be used (instead of a for/do/done loop on the real files in the folder) by the command-file-matching script to "vomit out" which annotator file would go with file nastynewfile.jpg, f'r'instance.

The script I had been using for this task looks like this:
Code:
while read 'line';
do
	sp=$(echo $line)
	vc=$(echo $sp | cut -d"," -f1)
	cv=$(echo $sp | cut -d"," -f2)
	dv=$(echo $sp | cut -d"," -f3)

	for striM in $(ls *jpg);
	do
	
		k=$(echo $striM | cut -d'.' -f1)
		j=$(echo ${k:(-4)})
		m=$(echo ${k%????})
		if grep -q $vc <<<$striM; then
			matchX=$striM
    		echo -e "I match $matchX with command file '$cv': \033[1;36m$dv.\033[0m"
			echo -e "$matchX:$cv">>/cygdrive/c/blu/newest/tagmatch.txt
		fi
	done
done</cygdrive/c/blu/newest/nbin/catstag
cd /cygdrive/c/blu/newest
sort -t":" -k1 tagmatch.txt>tagmatch-sorted.txt
rm tagmatch.txt
mv tagmatch-sorted.txt tagmatch.txt
while the acorn of my new script looks like this:
Code:
touch templist
for file in $(ls *jpg)
do
	echo -e $file>>templist
done

g=1
f=$(cat templist | wc -l)
while [ g -le $f ];
do
Where I seem to be stuck is with how to sort the lines in templist, which may be any number of different lengths, from back to front. sort -k looked promising, except it seems only to work the other way round. I thought of invoking a
Code:
q=$(expr length $line); echo $q
n=$[q-8]; echo $n
kind of thing, but that presented the problems of how to sort by those, how to tell sort where to find them (grep?) and how to "stitch them back in" to the original list, which is what I want to sort in the first place.

Any help moving this forward would be much appreciated.

BZT
 
Old 08-19-2010, 12:22 AM   #2
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 16

Rep: Reputation: 233Reputation: 233Reputation: 233
Can you give us the contents of /cygdrive/c/blu/newest/nbin/catstag?

Also, you don't have to be very conservative when choosing variable names. You can always have $SOMETHING_LONG_AND_DESCRIPTIVE. It doesn't really affect speed. It can also people who will help you, understand your code easily.

Some quick tips:
Code:
f=0
for file in *jpg
do
        let f++
	echo -e "$file"
done > templist

g=1
while [[ g -le f ]];
do
In simple commands, always place variables around double quotes ("$var").
 
Old 08-19-2010, 12:33 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
you can "speed up" your code by reducing calls to external commands. Those are unnecessary, like cut and grep to search for a string in a string. When you want to assign a variable to another variable, no need to use "echo". Also, don't use ls to list your files with a for loop. Use shell expansion. for finding patterns, you can use case/esac instead of grep.
Code:
while read -r line;
do
	sp="$line"
        OLDIFS="$IFS"
        IFS=","
        set -- $sp
	vc=$1
	cv=$2
	dv=$3
        IFS="$OLDIFS"
	for striM in *jpg
	do
	        OLDIFS="$IFS"
                IFS="."
                set -- $striM
		k=$1
		j=${k:(-4)}
		m=${k%????}
                case "$striM" in
                    *$vc* ) matchX=$striM
                        echo "whatever ... " 
                     ;;
                esac
                IFS="$OLDIFS"
	done
done</cygdrive/c/blu/newest/nbin/catstag
 
Old 08-19-2010, 12:37 AM   #4
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655
Quote:
for file in $(ls *jpg)
for file in *.jpg
does the same thing. The results will be sorted as well.

If there will be 20,000 jpg files in the directory, you can run out of memory on either line, since both expand the * before passing the command line to the command. (e.g. the vargs array)

If you want to remove or replace the extension of a filename, simply use variable substitution
file=picture1.jpg
annotation="${file%.jpg}.ano"

file=picture.jpg
convert "$file" "${file%.jpg}".png
Quote:
touch templist
Touching a file will create it if it doesn't exist, but won't zero it out if it does. You could use this:
: >templist

If you need to revisit the list of pictures, you might consider creating an array and iterating over the array:
pictures=(*.jpg *.JPG);

Last edited by jschiwal; 08-19-2010 at 12:51 AM.
 
Old 08-19-2010, 12:39 AM   #5
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 16

Rep: Reputation: 233Reputation: 233Reputation: 233
I think also that everything will be easier if you use arrays. Your script is easy to simplify but more info about it is needed.

more tips:
Code:
jpegs=(*.jpg)
njpegs=${#jpegs[@]}
IFS=$'\n'
echo "${jpegs[*]}" | sort
...

Last edited by konsolebox; 08-19-2010 at 12:41 AM.
 
Old 08-19-2010, 03:33 AM   #6
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Original Poster
Rep: Reputation: 15
Sample of catstag

About two dozen lines culled from /catstag

Code:
asia,asian,Asian Impressions
beac,beach,On The Beaches
biki,bikini,Bikini Girls
blac,black,Black side of beauty
blon,blonde,A few Blonde moments
boat,boat,Boating Beauties
bpnt,bpaint,Painted Bodies
flas,flash,Showing The Goods
frfm,french,French Females
glas,glasses,Girls in Glasses
gtog,gtog_gwg,Home Girls & Best Girl Buddies
indi,indian,O India!
isms,ism,Self-Shooting Sweeties
lati,latina,Latin Chattin
natu,nature,Great Outdoors
nchx,nudechix,Cute & Sexy
nero,nero,Naked Erotica
nipb,nip,Nudes-In-Public
pool,pool,Pools Rule
preg,preg,Mothers-To-Be
redh,redhead,Nice & Fiery Redheads
russ,russian,Daughters of Russia
sill,silly,Getting Goofy
snow,snow,Snow Bunnies
Also, "long and descriptive" variable names is an excellent suggestion. I've also considered the enormous possibilities in letter-number combinations for variable names. If my math is right, keeping oneself strictly to single letters (24 if you omit "i" and "o") and one- and two-digit numbers (including zero), you can easily come up with over 2400 combinations. You're hardly likely to run out of possibilities, that's for sure.

Still, descriptive, rather than contrived and thematic (my weakness when coming up with variable names is to take the latter route) or simplified like the approach I just mentioned, variable naming could be the best way to go all around. I'll definitely give it some thought.

BZT
 
Old 08-19-2010, 05:13 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Just to follow up on others points here, using a for loop to look for files is subject to word splitting, which I know from past experience is a definite
issue with your files.
 
Old 08-19-2010, 02:36 PM   #8
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Original Poster
Rep: Reputation: 15
ghostdog:

I just tried your all-internals rewrite of my original script (I just had to add the sort -t":" stuff from my old script to make it nearly perfect). I love the speed, but I can't puzzle out how it missed 2 out of 45 JPEGs in the folder I ran it on. I had to add their filenames and command-file names to taglist.txt by hand. No great shakes, and when I think back to how plodding the old script was, I'll take the speed and deal with only 99.98% accuracy any day. You got a winner with that one.

BZT
 
Old 08-19-2010, 03:33 PM   #9
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Original Poster
Rep: Reputation: 15
Spoke too soon.
The gap is getting wider: now it's 46 out of 49 JPEGs, and on some runs of the script, tagmatch.txt is only 24 lines long. All the files in the folder have the requisite 4-letters-before-the-dot "hint" to the command files in the other folder, yet somehow doing this by way of internal commands and functions consistently means omitting or ignoring anywhere from 2 to 25 items that should be matching (the other script got all of them, slow as it was). I think I might need some help with reiterating over arrays, as an internal double-check before dumping to the list looks like it's necessary.

BZT

Quote:
Originally Posted by SilversleevesX View Post
ghostdog:

I just tried your all-internals rewrite of my original script (I just had to add the sort -t":" stuff from my old script to make it nearly perfect). I love the speed, but I can't puzzle out how it missed 2 out of 45 JPEGs in the folder I ran it on. I had to add their filenames and command-file names to taglist.txt by hand. No great shakes, and when I think back to how plodding the old script was, I'll take the speed and deal with only 99.98% accuracy any day. You got a winner with that one.

BZT
 
Old 08-19-2010, 03:54 PM   #10
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Original Poster
Rep: Reputation: 15
Willing to try it .. where do I put it?

Quote:
Originally Posted by konsolebox View Post
I think also that everything will be easier if you use arrays. Your script is easy to simplify but more info about it is needed.

more tips:
Code:
jpegs=(*.jpg)
njpegs=${#jpegs[@]}
IFS=$'\n'
echo "${jpegs[*]}" | sort
...
Willing to try it .. where do I put it?
 
Old 08-19-2010, 04:07 PM   #11
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Original Poster
Rep: Reputation: 15
Just saw what started the "gapping" - I had some unmatch-able "hinters" in the filenames. catstag had the right ones, and as I was downloading I simply didn't think to check it to see if there were more "descriptive" (there's that word again!) four-letter end-offs to the pics I had in front of me.

My bad. Now the script is perfect. I'll just have to keep checking back to catstag until the right hinters become second-nature.

BZT

Quote:
Originally Posted by SilversleevesX View Post
Spoke too soon.
The gap is getting wider: now it's 46 out of 49 JPEGs, and on some runs of the script, tagmatch.txt is only 24 lines long. All the files in the folder have the requisite 4-letters-before-the-dot "hint" to the command files in the other folder, yet somehow doing this by way of internal commands and functions consistently means omitting or ignoring anywhere from 2 to 25 items that should be matching (the other script got all of them, slow as it was). I think I might need some help with reiterating over arrays, as an internal double-check before dumping to the list looks like it's necessary.
 
Old 08-19-2010, 04:38 PM   #12
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Original Poster
Rep: Reputation: 15
Not perfect yet.

Got the old problem fixed, now there's a new one.
The part of the file name that this script is supposed to pay attention to is the "hinter," and if it doesn't, and you have a file with a name such as sillygf-002-009-014blon.jpg, what do you suppose this script
Code:
while read -r line;
do
	sp="$line"
        OLDIFS="$IFS"
        IFS=","
        set -- $sp
	vc=$1
	cv=$2
	dv=$3
        IFS="$OLDIFS"
	
	for striM in *jpg
	do
	        OLDIFS="$IFS"
                IFS="."
                set -- $striM
		k=$1
		j=${k:(-4)}
		m=${k%????}
                case "$striM" in
                    *$vc* ) matchX=$striM
                        echo -e "I match $matchX with command file '$cv': \033[1;36m$dv.\033[0m"
                        echo -e "$matchX:$cv">>/cygdrive/c/blu/newest/tagmatch.txt 
                     ;;
                esac
                IFS="$OLDIFS"
	done
done</cygdrive/c/blu/newest/nbin/catstag
will return for matches on it? An almost-straight export on the command line
Code:
sort -u -t":" -k1 tagmatch.txt
gave me this:
Code:
...
sillygf-002-009-014blon.jpg:blonde
sillygf-002-009-014blon.jpg:silly
...
and 50 lines in a sorted temp file tagmatch-sorted.Recall there are only 49 JPEG files in the folder at this time.

So how to re-focus? I suspect it's worth taking a look at tweaking the case loop, or adding another 'case' that makes it look at that j variable, set to (if I read that part of the script right) the last four letters before the dot in the filename, before proceeding.

Back and forth like a kid on a swing. Well, at least there's good weather for it

BZT

Quote:
Originally Posted by SilversleevesX View Post
Just saw what started the "gapping" - I had some unmatch-able "hinters" in the filenames. catstag had the right ones, and as I was downloading I simply didn't think to check it to see if there were more "descriptive" (there's that word again!) four-letter end-offs to the pics I had in front of me.

My bad. Now the script is perfect. I'll just have to keep checking back to catstag until the right hinters become second-nature.

BZT
 
Old 08-19-2010, 07:14 PM   #13
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 16

Rep: Reputation: 233Reputation: 233Reputation: 233
Honestly I no longer want to do this after I saw the contents of your post. But anyway I still made some minor related words that could turn into a lie so here I made it:
Code:
#!/bin/bash

CATSTAG=/cygdrive/c/blu/newest/nbin/catstag
TAGMATCH=/cygdrive/c/blu/newest/tagmatch.txt

declare -i I=0
declare -a TAGS0=() TAGS1=() TAGS2=()

OLDIFS=$IFS IFS=,
while read -r 'TAGS0[I]' 'TAGS1[I]' 'TAGS2[I]'; do
	(( I++ ))
done < "$CATSTAG"
unset 'TAGS0[I]' 'TAGS1[I]' 'TAGS2[I]'  # read might allocate empty value
IFS=$OLDIFS

for FILE in *.jpg *.JPG; do
	TAG=${FILE: -8:4}

	for I in "${!TAGS0[@]}"; do
		if [[ $TAG = "${TAGS0[0]}" ]]; then
			echo -e "I match $FILE with command file '${TAGS1[I]}': \033[1;36m${TAGS1[I]}.\033[0m"
			echo "$FILE:${TAGS1[I]}" >&3
		fi
	done
done 3> >(exec sort -u -t: -k1 > "$TAGMATCH")
alt.:
Code:
...
		[[ $TAG = "${TAGS0[0]}" ]] || continue
		echo -e "I match $FILE with command file '${TAGS1[I]}': \033[1;36m${TAGS1[I]}.\033[0m"
		echo "$FILE:${TAGS1[I]}" >&3
...
Please don't use the script as is and just use it as a reference to make a version of your own.

Last edited by konsolebox; 08-19-2010 at 07:41 PM.
 
Old 08-19-2010, 08:51 PM   #14
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Well everyone else has had a crack ... here is a variation on a theme (need bash 4+ probably)
Code:
#!/bin/bash

CATSTAG=/cygdrive/c/blu/newest/nbin/catstag
TAGMATCH=/cygdrive/c/blu/newest/tagmatch.txt

declare -A TAG_NAME TAG_DESC

while IFS=, read -r id name desc
do
    TAG_NAME[$id]="$name"
    TAG_DESC[$id]="$desc"
done<"$CATSTAG"

for file in *.jpg
do
    tag=${file:(-8):4}

    (( ${#TAG_NAME[$tag]} )) &&
        echo -e "I match $file with command file '${TAG_NAME[$tag]}': \033[1;36m${TAG_DESC[$tag]}.\033[0m" &&
        echo -e "$file:${TAG_NAME[$tag]}">>"$TAGMATCH"
done
 
1 members found this post helpful.
Old 08-19-2010, 09:30 PM   #15
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 16

Rep: Reputation: 233Reputation: 233Reputation: 233
Good variation.
 
  


Reply

Tags
bash, cat, grep, list, sort


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
-bash: *.sh: line 25: syntax error: unexpected end of file prashanth212 Linux - General 8 04-06-2010 12:52 AM
How to sort by line size (number of characters in a line) fast_rizwaan Linux - General 8 01-08-2010 06:53 PM
End-of-line Characters missing from last line of md5 file. Md5sum fails mehorter Linux - General 5 06-29-2009 09:56 PM
Attempting to append a line of text to the end of the previous line market_garden Linux - General 4 12-11-2008 12:37 PM
bash: append string to end of line khairil Programming 6 02-27-2007 06:09 AM


All times are GMT -5. The time now is 10:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration