LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-19-2010, 03:46 PM   #1
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Rep: Reputation: 36
perl regex query


Hi guys,

How do i use perl regex to extract the hostname from a FQDN?

I have
Quote:
$host=ganymede.a.linux.com
$host=io.a.linux.com
$host=europa.a.linux.com
i just want the characters which are to the left of the first .(dot) in FQDN name. I could get it using substr and split function,but how do i get it through regex. Thanks !

Last edited by kdelover; 11-27-2010 at 07:43 AM. Reason: [Solved]
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 11-19-2010, 04:00 PM   #2
AlucardZero
Senior Member
 
Registered: May 2006
Location: USA
Distribution: Debian
Posts: 4,824

Rep: Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615
Easy:
Code:
/=(.*?)\./
The hostname will be in $1.

For an explanation of the ?, see http://www.regular-expressions.info/repeat.html
 
1 members found this post helpful.
Old 11-19-2010, 04:21 PM   #3
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Original Poster
Rep: Reputation: 36
Quote:
$a=~/(.*?)\./;
print $1;
This works for me,without the = sign.
 
Old 11-19-2010, 08:30 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by kdelover View Post
i just want the characters which are to the left of the first .(dot)
You did yourself and the readers a favor by explaining the requirement fairly clearly and unambiguously. Reading your description of the requirement, almost translates itself into regex code.

'characters': any old character, in regex-speak, is matched by '.' So that becomes the first regex meta-character.

Code:
.
Since you specified plural, it has to be two or more, so the '.' meta-character gets modified by something that says 'two or more':

Code:
{2,}
Since this is Perl and regexes, we should make it look like it. So far, we've got
Code:
/.{2,}/
Now, we want to terminate the match by specifying a literal '.' (dot). To match a dot, we have to escape it from being interpreted as a regex meta-character, and we do that by preceding it with a '\' We can tack that onto the regex that we've built up so far.

Code:
/.{2,}\./
So, we've matched anything including the terminating dot character. To extract just the characters preceding the dot, we can enclose it in parentheses:
Code:
/(.{2,})\./
Now we can test the scalar against the regex, and any matching string will be waiting for us in the special variable '$1' (because it was the 'first' parenthesized subset).

But this is Perl, and Perl regexes are greedy. When matching URLs, that can be a problem, because by nature, Perl regexes will swallow up the biggest possible matching substring, and we want the smallest. So, we can add one last bit of regex notation, and we should be done:
Code:
/(.{2,}?)\./
The question-mark following the the quantifier tells the regex to be 'non-greedy'. It will stop trying to match at the first, not the last, dot in the string.

At least, that's my interpretation of the question.

BTW, you did specify 'characters' (plural), but you probably meant 'character or characters', and not surprisingly, the regex that matches that is quite different. So, you've also pointed out the importance of being accurate about your spec.

--- rod.

Last edited by theNbomr; 11-19-2010 at 08:39 PM.
 
3 members found this post helpful.
Old 11-20-2010, 12:23 AM   #5
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Original Poster
Rep: Reputation: 36
Thanks a bunch Rod for the explanation.Thanks Alucardzero.

If i were to get the characters to the right side of the last .(dot),then i guess i need to do something like this:

Quote:
$a=~/(.{1,})\.(.{1,})/;
print "$2\n";
and for the center:

Quote:
$a=~/(.{1,})\.(.{1,})\./;

Last edited by kdelover; 11-20-2010 at 12:26 AM.
 
Old 11-20-2010, 01:10 AM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Yes, that looks about right to me, however, the '{1,}' notation can be replaced by the nicer '+' quantifier. The '*' quantifier would be equivalent to '{0,}' (although I;m not sure that is legal), and the '?' quantifier could be used as if it were {0,1}. I use these quantifiers to describe the meaning, although I don't know whether they would work, but if it serves to explain the meaning, then okay. There are, of course, many other notations used in regular expressions, and if you are interested, you can find many good references to the subject online.

--- rod.
 
Old 11-20-2010, 05:59 AM   #7
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Original Poster
Rep: Reputation: 36
Quote:
Originally Posted by theNbomr View Post
Yes, that looks about right to me, however, the '{1,}' notation can be replaced by the nicer '+' quantifier. The '*' quantifier would be equivalent to '{0,}' (although I;m not sure that is legal), and the '?' quantifier could be used as if it were {0,1}.

--- rod.
Hi rod,

did you mean something like this?
Quote:
$a=$1 if ($a=~/((.+?)|(.*?))\./) ;

print "$a\n";
 
Old 11-20-2010, 08:59 AM   #8
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I think this is more like what you originally intended
Code:
$a =~ /(.+?)\./
It wasn't clear from your original post whether the scalar against which to match was to include the leading '$host=', or whether that was a fragment of Perl code. If the scalar you are scanning is just the IP, then you can simplify the match by anchoring it to the beginning of the string, using the '^' notation:
Code:
$a =~ m/^(.+)\./
When I want to extract characters that lie between some specified delimiters, I tend to use a regex that says 'match the opening delimiter, followed by everything that isn't a closing delimiter':
Code:
# delimiters are '.' (dot)
$a =~ m/\.([^.]+)/
This tends to eliminate unexpected greediness of the quantifiers.

--- rod.
 
Old 11-22-2010, 04:34 AM   #9
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Original Poster
Rep: Reputation: 36
Thanks NBOR. i was trying to use regex and extract two numeric fields which are separated by whitespaces and words.I could go as far to extract only 1 numeric field,i am unable to get the second,even though i tried making regex non-greedy. Let me still try if i can get both the numeric fields. Thanks again for your explanation in post#4,it has really helped me a lot.

Code:
#!/usr/bin/perl
use strict;
use warnings;
my $output=`awk '/MemFree:|SwapFree:/ {print}' /proc/meminfo`;
print "Awk Output:\n$output\n";
$output=~s/\n//;
print "After removing New line:\n$output\n";
my $regex= $1 if ($output=~m/(\w+\s+)kB/);
print "$regex\n";
Output:
Code:
Awk Output:
MemFree:         2925076 kB
SwapFree:        1366012 kB

After removing New line:
MemFree:         2925076 kBSwapFree:        1366012 kB

2925076

Last edited by kdelover; 11-22-2010 at 04:38 AM.
 
Old 11-22-2010, 01:37 PM   #10
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Yikes! Calling awk from perl is heretical! There is nothing you can do in awk that you cannot do in perl, and it is almost certainly faster than launching awk.
Code:
#! /usr/bin/perl -w
#
#  LQkdelover.pl
#  Usage: LQkdelover.pl /proc/meminfo
use strict;
    while(<>){ 
        if( $_ =~ m/MemFree:|SwapFree:/ ){
            my ($param,$value,$units) = split /\s+/, $_;
            print $value,"\n";
        }
    }
Regexes used:
- match desired parameters
- split lines on 'one or more whitespace'.

--- rod.
 
Old 11-24-2010, 09:50 AM   #11
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Original Poster
Rep: Reputation: 36
thanks Rod. ya i know its a bad practice to use awk in perl scripts.Well,this is what i did to get MemFree and SwapFree Values..


Code:
my $cmd='/bin/cat /proc/meminfo';
$cmd=~s/\s//g; # remove what ever white spaces are there
my ($mem,$swap)=$cmd=~m/MemFree:(\d+).*SwapFree:(\d+)/;
This gives me memfree and swapfree values,i was wondering,can i make this regex even smaller? Sorry,if i have been dragging this Question for too long

Last edited by kdelover; 11-24-2010 at 10:35 AM.
 
Old 11-24-2010, 12:22 PM   #12
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by kdelover View Post
thanks Rod. ya i know its a bad practice to use awk in perl scripts.Well,this is what i did to get MemFree and SwapFree Values..

Code:
my $cmd='/bin/cat /proc/meminfo';
$cmd=~s/\s//g; # remove what ever white spaces are there
my ($mem,$swap)=$cmd=~m/MemFree:(\d+).*SwapFree:(\d+)/;
This gives me memfree and swapfree values,i was wondering,can i make this regex even smaller? Sorry,if i have been dragging this Question for too long
Your regex looks fine. I wouldn't try to overthink the whole thing too much. However, your use of
Code:
my $cmd='/bin/cat /proc/meminfo';
is full of holes. For starters, you should have used backticks, not single-quotes. I'll assume this was a transcription error, although copy & paste tends to prevent those. Having said that, the whole 'use a script to dump a file' paradigm is completely un-necessary. Perl is quite capable of opening and reading a file. What's more, it even knows how to do the right thing by associating a commandline filename with its standard input file descriptor. See my use of this facility in my previous post.
Code:
    while( <> ){
Perl will read its data in the same way, using this method, whether it gets a filename as $ARGV[0], or if data is provided on its standard input using IO redirection or from a pipe. For the simple script in my previous post, the following three commandlines will all work equivalently:
Code:
LQkdelover.pl /proc/meminfo
LQkdelover.pl < /proc/meminfo
cat /proc/meminfo | LQkdelover.pl
The is no need to hardcode any filenames into the application, and no need to launch a subshell to get the data.

As long as there is useful dialog, I don't see any problem extending a thread.

--- rod.

Last edited by theNbomr; 11-24-2010 at 12:24 PM.
 
Old 11-27-2010, 07:42 AM   #13
kdelover
Member
 
Registered: Aug 2009
Posts: 311

Original Poster
Rep: Reputation: 36
Thanks Rod !!
 
Old 11-28-2010, 08:07 PM   #14
garyg007
Member
 
Registered: Aug 2008
Location: north-east ohio
Distribution: Debian-squeeze/stable;
Posts: 279
Blog Entries: 1

Rep: Reputation: 31
@theNbomr ---- Thank you for the excellent description of regex.

Best I've seen and I've been doing a lot of web searching looking for something that was understandable to me
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl to find regex and print following 5 lines after regex casperdaghost Linux - Newbie 3 08-29-2010 08:08 PM
need help with regex in perl umbrella2 Programming 5 01-15-2009 09:13 AM
Perl regex $ ShaqDiesel Programming 6 08-18-2006 02:40 PM
regex Perl help igotlongestname Programming 2 09-14-2005 07:51 PM
simple perl and regex phlx Programming 6 12-03-2004 03:01 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration