LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-05-2007, 07:47 AM   #1
2007fld
Member
 
Registered: Mar 2007
Distribution: FD4,6
Posts: 52

Rep: Reputation: 15
parse a string in perl


I have a string
$_ = "The following is a directory /U01/abc/def/dir3";

How to parse $_, so I just extract the directorty and get
my $dir="/U01/abc/def/dir3";


The directory always starts with /U01, but the directory has different length. How to use the regular expression match?

Also, the string before the directory has a fixed length, so maybe there is a function like "strpos" ?

Thanks!!
 
Old 08-05-2007, 08:37 AM   #2
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 30
There's a simple regular expression to do this, but I'd rather teach you to fish, rather than just give you a fish.

I'm sure someone will be along any moment to give the answer away and spoil the learning (and the fun!), but let's try anyway.

Normally, to remove a chunk of a string from the string, you'd use a statement like this:

Code:
$some_string=~s/to_be_removed//;
So, care to take a first crack at what that statement should be in your case? It doesn't have to be correct, but it needs to be at least an attempt. I'll show you, step by step, how to fix it so it does exactly what you want.
 
Old 08-05-2007, 11:48 AM   #3
2007fld
Member
 
Registered: Mar 2007
Distribution: FD4,6
Posts: 52

Original Poster
Rep: Reputation: 15
Hi thanks so much! After replacing "to_be_removed" with "The following is a directory", I got exactly what I need: "/U01/abc/def/dir3". Thanks!

But the problem is in my real case, in my original string, the part before the directory is a not a fixed string, actually it's something with time stamps, hostnames. And after the directory, there are also some more info.

so the $_ is more like this:
$_ = "2007 07 28 hostname 1 /U01/abc/def/dir3 filename1 1234byte";
and I need to extract the directory.

Thanks!
 
Old 08-05-2007, 04:10 PM   #4
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 30
Cool. To continue the fishing lesson, let's take a further look at the question: How do I remove everything before the first slash?

To start, take a gander at these documents on regular expressions, in this order:

Code:
man perlrequick
man perlretut
man perlre
There's a lot in those. Don't read them all from cover to cover right now, although you'll eventually want to do that. Just go until you think you can post a guess at an answer to the above question. You may not even have to read all three, or even two, of these documents.

Don't worry about getting it 100% right, just post a stab at it.

You're doing fine so far!
 
Old 08-05-2007, 05:12 PM   #5
2007fld
Member
 
Registered: Mar 2007
Distribution: FD4,6
Posts: 52

Original Poster
Rep: Reputation: 15
Hi, I think I got it

Here is what I used:

$_ =~ / (\/U01[\w*|\d*|\/]*)/;
print $1;

This extracts the directoy very well.

Thanks so much!!

Looking forward to seeing your way to solve the problem!
 
Old 08-05-2007, 07:50 PM   #6
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 30
Your solution works, but we can discover things by looking at the details.

The main thing is the [] construct in a regular expression. It means "accept any character that appears within the brackets. The asterisk (*), as seen here:
Code:
$_ =~ / (\/U01[\w*|\d*|\/]*)/;
                          ^
                          ^
means "accept anything represented by the previous item (in this case, the [] list) as many times as it occurs."

So you will be accepting, as many times as they occur, any of these items you placed in the [] list:
Code:
anything in the \w list, which includes letters, digits, and underscore
anything in the \d list, which includes digits
|  (the pipe character)
*  (asterisk)
/  (slash, which you correctly spelled \/)
.. because they're all in the list.

You don't need the pipe character, which I'm sure you placed there to indicate "or", because a [] list already implies that you're listing alternatives. Indeed, if you change your experiment so that the test string contains | somewhere (within the abc, say), you'll find that your regular expression will accept that. But since you don't really want | in the list, you could simplify the regular expression thus:
Code:
$_ =~ / (\/U01[\w*\d*\/]*)/;
You also don't need the * within the list, because the list only needs to mention each character once, and the * outside the list allows repetition. Indeed, if you change your experiment so that the test string contains * somewhere (within the abc, say), you'll find that your regular expression will accept that. But since you don't really want * in the list, you could simplify the regular expression thus:
Code:
$_ =~ / (\/U01[\w\d\/]*)/;
Since \w includes numerical digits, and \d includes only numerical digits, you can lose the \d:
Code:
$_ =~ / (\/U01[\w\/]*)/;
You could generalize this by leaving the U01 out of the string, thus making the script more generally useful:
Code:
$_ =~ / (\/[\w\/]*)/;
But perhaps some other character will appear in the directory path. ab-c is perfectly valid; so is ab.c; so are quite a few other characters. So this is better, because it accepts everything up to the next space or tab:
Code:
$_ =~ / (\/[^ \t]*)/;
I put in the tab also as a defensive measure. If you don't want that, do this instead:
Code:
$_ =~ / (\/[^ ]*)/;
You still need the [] to let the ^ mean "a list that includes everything but".

And speaking of tabs, just to be defensive, maybe you want to do this:
Code:
$_ =~ /[ \t](\/[^ \t]*)/;
Hope you had fun with this.
 
Old 08-05-2007, 08:27 PM   #7
2007fld
Member
 
Registered: Mar 2007
Distribution: FD4,6
Posts: 52

Original Poster
Rep: Reputation: 15
Thank you sooooo much! This is fun!
 
Old 08-06-2007, 01:28 AM   #8
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,225

Rep: Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021
or you could use rindex() http://perldoc.perl.org/functions/rindex.html to get the last '/', and substr() http://perldoc.perl.org/functions/substr.html for a more readable solution.
 
Old 08-06-2007, 02:04 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
if you think your directory path is always a field by itself, you can do away with complicated regexp and use split ( or others) instead
Code:
$string="The following is a directory /U01/abc/def/dir3 blah blah";
my @array=split /\s+/,$string;
foreach $item (@array) {
    print $item if $item =~ /\//;
}
 
Old 08-06-2007, 08:28 AM   #10
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 30
The nice thing about Perl is that there are several ways to do practically everything. Perl is known as the Swiss Army chainsaw of programming languages.
 
Old 08-06-2007, 09:25 AM   #11
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,276

Rep: Reputation: 170Reputation: 170
I think correctly and for portability one should maybe use...

Code:
use File::Basename;
my $filepath = dirname $file;
my $filename = basename $file;
 
Old 08-06-2007, 06:33 PM   #12
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,225

Rep: Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021Reputation: 2021
Actually, re-reading this "the string before the directory has a fixed length", just use substr().
 
Old 08-06-2007, 07:43 PM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by wjevans_7d1@yahoo.co
The nice thing about Perl is that there are several ways to do practically everything.
too much of nice things is not a good thing.
 
Old 08-07-2007, 02:41 PM   #14
2007fld
Member
 
Registered: Mar 2007
Distribution: FD4,6
Posts: 52

Original Poster
Rep: Reputation: 15
Thank you all for the helpful info.!
 
  


Reply

Tags
perl


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parse String in a Bash script jimwelc Linux - Newbie 8 11-09-2012 07:47 AM
c++ parse date string?? blizunt7 Programming 25 05-18-2006 03:12 PM
Parse --static string no delimiters DaFrEQ Programming 2 04-08-2005 02:06 PM
Parse a perl string djgerbavore Programming 3 10-31-2004 07:23 AM
C++ and C# Parse a string into a numerical type exodist Programming 8 02-23-2004 06:15 AM


All times are GMT -5. The time now is 09:31 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration