In my experience, it definitely pays to use several regular expressions when dealing with a complicated structure.
For instance, given the input-data applicationName:id1/id2/id3#, let the first expression look for something that looks like "the whole thing" and that extracts from it applicationName and id1/id2/id3, as two strings. Then, use another regular-expression to break down the second string. If you find that either of the two parts "don't look right," run the first regular-expression again looking for the next available match... it was a "false positive."
Regular-expressions are very powerful, but if you try too hard to get "fancy" with them, they become quite incomprehensible and therefore un-maintainable... which defeats the essential purpose. When even the "puniest" laptop can execute tens or hundreds of millions of instructions per-second, and throws-away 99.5% of that resource anyway, you don't have to be excessively "efficient." Just be clear. Don't make the next programmer to follow your path (even if that programmer is "you") stop-and-think about what you did.
So, I'd say that the first pattern would be: "look for a whitespace-or-start of line, followed by one or more alphanumerics (catch this as group #1), followed by a colon, followed by a group of one or more (say...) '0-9 plus forward slash' (catch this as group #2), followed by a hash-sign." That's a reasonably-specific pattern that will probably obtain good matches, and it won't be complicated for the regular-expression to parse. Then, if you merely "split" the second string on forward-slash, you have merely to verify that none of the pieces are empty.
Do write code that explicitly tests your assumptions ... that each of those split pieces conform to what you know an "id" to be. Your programs need to respond meaningfully and informatively to any input they receive, good or bad. (Here, you are saving hours of very-expensive human time.)
Last edited by sundialsvcs; 12-08-2008 at 09:48 AM.