|
perl regex matching
because it is perl I will give it the benefit of the doubt and assume this is possible, rather than ask if it is possible I will ask how.
I am going to create a subroutine/function/meathod/whatever you want to call it to isolate the artist name in a directory of (Legal) mp3's
say it is like this:
directory:
Artist A - 22 - songa.mp3
Artist A - 22 - Songb.mp3
... the list goes on.
First off I have it strip character slike '-' and reduce all double spaces into single space, as well all characters are lowercased, also a text replacement replaces all '_' with space ' '
so what we have is:
artist a 22 songa
artist a 23 songb
artist a 24 songc
maybe one or 2 out of order or with the nunber missing
25 artist a songd
songe artist a
(By this point the .mp3 is also stripped)
I then need a way to isolate "artist a"
in other words I need it to run through all the string we now have and find a common string that at least 75% of the filenames checked have in common in this case all the files have "artist a" in there name, but nothing else would match across them. what would be the correct way to impliment such a search?
also it needs to do it on a word basis, as in "artist a songa" and "artist a songb" would only match the "artist a" not the "artist a song" because each time song is found it is followed by a character not a space, so the word as a whole is thrown away and not used in the match.
also it needs to differenciate between a number that is common and a number that is not:
say there is a band called "my number 3"
each song title would look liek this:
"my number 3 12 songname"
"my number 3 13 songnamea"
"my number 3 14 songnameb"
"my number 3 15 songnamec"
this should match "my number 3" becuase the 3 is also found in them all.
I am familiar with basic matching and some other regex, and I am very familiar with perl finishing off a class on it with 3 years+ total programming experiance, so you do not need to dumb it down to much or baby me through it.
Last edited by exodist; 11-14-2004 at 06:56 PM.
|