parse a string in perl
I have a string
$_ = "The following is a directory /U01/abc/def/dir3"; How to parse $_, so I just extract the directorty and get my $dir="/U01/abc/def/dir3"; The directory always starts with /U01, but the directory has different length. How to use the regular expression match? Also, the string before the directory has a fixed length, so maybe there is a function like "strpos" ? Thanks!! |
There's a simple regular expression to do this, but I'd rather teach you to fish, rather than just give you a fish.
I'm sure someone will be along any moment to give the answer away and spoil the learning (and the fun!), but let's try anyway. Normally, to remove a chunk of a string from the string, you'd use a statement like this: Code:
$some_string=~s/to_be_removed//; |
Hi thanks so much! After replacing "to_be_removed" with "The following is a directory", I got exactly what I need: "/U01/abc/def/dir3". Thanks!
But the problem is in my real case, in my original string, the part before the directory is a not a fixed string, actually it's something with time stamps, hostnames. And after the directory, there are also some more info. so the $_ is more like this: $_ = "2007 07 28 hostname 1 /U01/abc/def/dir3 filename1 1234byte"; and I need to extract the directory. Thanks! |
Cool. To continue the fishing lesson, let's take a further look at the question: How do I remove everything before the first slash?
To start, take a gander at these documents on regular expressions, in this order: Code:
man perlrequick Don't worry about getting it 100% right, just post a stab at it. You're doing fine so far! |
Hi, I think I got it;)
Here is what I used: $_ =~ / (\/U01[\w*|\d*|\/]*)/; print $1; This extracts the directoy very well. Thanks so much!! Looking forward to seeing your way to solve the problem! |
Your solution works, but we can discover things by looking at the details.
The main thing is the [] construct in a regular expression. It means "accept any character that appears within the brackets. The asterisk (*), as seen here: Code:
$_ =~ / (\/U01[\w*|\d*|\/]*)/; So you will be accepting, as many times as they occur, any of these items you placed in the [] list: Code:
anything in the \w list, which includes letters, digits, and underscore You don't need the pipe character, which I'm sure you placed there to indicate "or", because a [] list already implies that you're listing alternatives. Indeed, if you change your experiment so that the test string contains | somewhere (within the abc, say), you'll find that your regular expression will accept that. But since you don't really want | in the list, you could simplify the regular expression thus: Code:
$_ =~ / (\/U01[\w*\d*\/]*)/; Code:
$_ =~ / (\/U01[\w\d\/]*)/; Code:
$_ =~ / (\/U01[\w\/]*)/; Code:
$_ =~ / (\/[\w\/]*)/; Code:
$_ =~ / (\/[^ \t]*)/; Code:
$_ =~ / (\/[^ ]*)/; And speaking of tabs, just to be defensive, maybe you want to do this: Code:
$_ =~ /[ \t](\/[^ \t]*)/; |
Thank you sooooo much! This is fun!
|
or you could use rindex() http://perldoc.perl.org/functions/rindex.html to get the last '/', and substr() http://perldoc.perl.org/functions/substr.html for a more readable solution.
|
if you think your directory path is always a field by itself, you can do away with complicated regexp and use split ( or others) instead
Code:
$string="The following is a directory /U01/abc/def/dir3 blah blah"; |
The nice thing about Perl is that there are several ways to do practically everything. Perl is known as the Swiss Army chainsaw of programming languages.
|
I think correctly and for portability one should maybe use...
Code:
use File::Basename; |
Actually, re-reading this "the string before the directory has a fixed length", just use substr().
|
Quote:
|
Thank you all for the helpful info.!
|
All times are GMT -5. The time now is 09:44 AM. |