Your solution works, but we can discover things by looking at the details.
The main thing is the
[] construct in a regular expression. It means "accept any character that appears within the brackets. The asterisk (
*), as seen here:
Code:
$_ =~ / (\/U01[\w*|\d*|\/]*)/;
^
^
means "accept anything represented by the previous item (in this case, the
[] list) as many times as it occurs."
So you will be accepting, as many times as they occur, any of these items you placed in the
[] list:
Code:
anything in the \w list, which includes letters, digits, and underscore
anything in the \d list, which includes digits
| (the pipe character)
* (asterisk)
/ (slash, which you correctly spelled \/)
.. because they're all in the list.
You don't need the pipe character, which I'm sure you placed there to indicate "or", because a
[] list already implies that you're listing alternatives. Indeed, if you change your experiment so that the test string contains
| somewhere (within the
abc, say), you'll find that your regular expression will accept that. But since you don't really want
| in the list, you could simplify the regular expression thus:
Code:
$_ =~ / (\/U01[\w*\d*\/]*)/;
You also don't need the
* within the list, because the list only needs to mention each character once, and the
* outside the list allows repetition. Indeed, if you change your experiment so that the test string contains
* somewhere (within the
abc, say), you'll find that your regular expression will accept that. But since you don't really want
* in the list, you could simplify the regular expression thus:
Code:
$_ =~ / (\/U01[\w\d\/]*)/;
Since
\w includes numerical digits, and
\d includes
only numerical digits, you can lose the
\d:
Code:
$_ =~ / (\/U01[\w\/]*)/;
You could generalize this by leaving the
U01 out of the string, thus making the script more generally useful:
Code:
$_ =~ / (\/[\w\/]*)/;
But perhaps some other character will appear in the directory path.
ab-c is perfectly valid; so is
ab.c; so are quite a few other characters. So this is better, because it accepts everything up to the next space or tab:
Code:
$_ =~ / (\/[^ \t]*)/;
I put in the tab also as a defensive measure. If you don't want that, do this instead:
Code:
$_ =~ / (\/[^ ]*)/;
You still need the
[] to let the
^ mean "a list that includes everything but".
And speaking of tabs, just to be defensive, maybe you want to do this:
Code:
$_ =~ /[ \t](\/[^ \t]*)/;
Hope you had fun with this.