[SOLVED] perl regex query

kdelover · 11-19-2010, 03:46 PM

Hi guys,

How do i use perl regex to extract the hostname from a FQDN?

I have

Quote:

$host=ganymede.a.linux.com
$host=io.a.linux.com
$host=europa.a.linux.com

i just want the characters which are to the left of the first .(dot) in FQDN name. I could get it using substr and split function,but how do i get it through regex. Thanks !

AlucardZero · 11-19-2010, 04:00 PM

Easy:

Code:

/=(.*?)\./

The hostname will be in $1.

For an explanation of the ?, see http://www.regular-expressions.info/repeat.html

kdelover · 11-19-2010, 04:21 PM

Quote:

$a=~/(.*?)\./;
print $1;

This works for me,without the = sign.

theNbomr · 11-19-2010, 08:30 PM

Quote:

Originally Posted by kdelover

i just want the characters which are to the left of the first .(dot)

You did yourself and the readers a favor by explaining the requirement fairly clearly and unambiguously. Reading your description of the requirement, almost translates itself into regex code.

'characters': any old character, in regex-speak, is matched by '.' So that becomes the first regex meta-character.

Code:

Since you specified plural, it has to be two or more, so the '.' meta-character gets modified by something that says 'two or more':

Code:

{2,}

Since this is Perl and regexes, we should make it look like it. So far, we've got

Code:

/.{2,}/

Now, we want to terminate the match by specifying a literal '.' (dot). To match a dot, we have to escape it from being interpreted as a regex meta-character, and we do that by preceding it with a '\' We can tack that onto the regex that we've built up so far.

Code:

/.{2,}\./

So, we've matched anything including the terminating dot character. To extract just the characters preceding the dot, we can enclose it in parentheses:

Code:

/(.{2,})\./

Now we can test the scalar against the regex, and any matching string will be waiting for us in the special variable '$1' (because it was the 'first' parenthesized subset).

But this is Perl, and Perl regexes are greedy. When matching URLs, that can be a problem, because by nature, Perl regexes will swallow up the biggest possible matching substring, and we want the smallest. So, we can add one last bit of regex notation, and we should be done:

Code:

/(.{2,}?)\./

The question-mark following the the quantifier tells the regex to be 'non-greedy'. It will stop trying to match at the first, not the last, dot in the string.

At least, that's my interpretation of the question.

BTW, you did specify 'characters' (plural), but you probably meant 'character or characters', and not surprisingly, the regex that matches that is quite different. So, you've also pointed out the importance of being accurate about your spec.

--- rod.

kdelover · 11-20-2010, 12:23 AM

Thanks a bunch Rod for the explanation.Thanks Alucardzero.

If i were to get the characters to the right side of the last .(dot),then i guess i need to do something like this:

Quote:

$a=~/(.{1,})\.(.{1,})/;
print "$2\n";

and for the center:

Quote:

$a=~/(.{1,})\.(.{1,})\./;

theNbomr · 11-20-2010, 01:10 AM

Yes, that looks about right to me, however, the '{1,}' notation can be replaced by the nicer '+' quantifier. The '*' quantifier would be equivalent to '{0,}' (although I;m not sure that is legal), and the '?' quantifier could be used as if it were {0,1}. I use these quantifiers to describe the meaning, although I don't know whether they would work, but if it serves to explain the meaning, then okay. There are, of course, many other notations used in regular expressions, and if you are interested, you can find many good references to the subject online.

--- rod.

kdelover · 11-20-2010, 05:59 AM

Quote:

Originally Posted by theNbomr

Yes, that looks about right to me, however, the '{1,}' notation can be replaced by the nicer '+' quantifier. The '*' quantifier would be equivalent to '{0,}' (although I;m not sure that is legal), and the '?' quantifier could be used as if it were {0,1}.

--- rod.

Hi rod,

did you mean something like this?

Quote:

$a=$1 if ($a=~/((.+?)|(.*?))\./) ;

print "$a\n";

theNbomr · 11-20-2010, 08:59 AM

I think this is more like what you originally intended

Code:

$a =~ /(.+?)\./

It wasn't clear from your original post whether the scalar against which to match was to include the leading '$host=', or whether that was a fragment of Perl code. If the scalar you are scanning is just the IP, then you can simplify the match by anchoring it to the beginning of the string, using the '^' notation:

Code:

$a =~ m/^(.+)\./

When I want to extract characters that lie between some specified delimiters, I tend to use a regex that says 'match the opening delimiter, followed by everything that isn't a closing delimiter':

Code:

# delimiters are '.' (dot)
$a =~ m/\.([^.]+)/

This tends to eliminate unexpected greediness of the quantifiers.

--- rod.

kdelover · 11-22-2010, 04:34 AM

Thanks NBOR. i was trying to use regex and extract two numeric fields which are separated by whitespaces and words.I could go as far to extract only 1 numeric field,i am unable to get the second,even though i tried making regex non-greedy. Let me still try if i can get both the numeric fields. Thanks again for your explanation in post#4,it has really helped me a lot.

Code:

#!/usr/bin/perl
use strict;
use warnings;
my $output=`awk '/MemFree:|SwapFree:/ {print}' /proc/meminfo`;
print "Awk Output:\n$output\n";
$output=~s/\n//;
print "After removing New line:\n$output\n";
my $regex= $1 if ($output=~m/(\w+\s+)kB/);
print "$regex\n";

Output:

Code:

Awk Output:
MemFree:         2925076 kB
SwapFree:        1366012 kB

After removing New line:
MemFree:         2925076 kBSwapFree:        1366012 kB

2925076

theNbomr · 11-22-2010, 01:37 PM

Yikes! Calling awk from perl is heretical! There is nothing you can do in awk that you cannot do in perl, and it is almost certainly faster than launching awk.

Code:

#! /usr/bin/perl -w
#
#  LQkdelover.pl
#  Usage: LQkdelover.pl /proc/meminfo
use strict;
    while(<>){ 
        if( $_ =~ m/MemFree:|SwapFree:/ ){
            my ($param,$value,$units) = split /\s+/, $_;
            print $value,"\n";
        }
    }

Regexes used:
- match desired parameters
- split lines on 'one or more whitespace'.

--- rod.

kdelover · 11-24-2010, 09:50 AM

thanks Rod. ya i know its a bad practice to use awk in perl scripts.Well,this is what i did to get MemFree and SwapFree Values..

Code:

my $cmd='/bin/cat /proc/meminfo';
$cmd=~s/\s//g; # remove what ever white spaces are there
my ($mem,$swap)=$cmd=~m/MemFree:(\d+).*SwapFree:(\d+)/;

This gives me memfree and swapfree values,i was wondering,can i make this regex even smaller? Sorry,if i have been dragging this Question for too long

theNbomr · 11-24-2010, 12:22 PM

Quote:

Originally Posted by kdelover

thanks Rod. ya i know its a bad practice to use awk in perl scripts.Well,this is what i did to get MemFree and SwapFree Values..

Code:

my $cmd='/bin/cat /proc/meminfo';
$cmd=~s/\s//g; # remove what ever white spaces are there
my ($mem,$swap)=$cmd=~m/MemFree:(\d+).*SwapFree:(\d+)/;

This gives me memfree and swapfree values,i was wondering,can i make this regex even smaller? Sorry,if i have been dragging this Question for too long

Your regex looks fine. I wouldn't try to overthink the whole thing too much. However, your use of

Code:

my $cmd='/bin/cat /proc/meminfo';

is full of holes. For starters, you should have used backticks, not single-quotes. I'll assume this was a transcription error, although copy & paste tends to prevent those. Having said that, the whole 'use a script to dump a file' paradigm is completely un-necessary. Perl is quite capable of opening and reading a file. What's more, it even knows how to do the right thing by associating a commandline filename with its standard input file descriptor. See my use of this facility in my previous post.

Code:

    while( <> ){

Perl will read its data in the same way, using this method, whether it gets a filename as $ARGV[0], or if data is provided on its standard input using IO redirection or from a pipe. For the simple script in my previous post, the following three commandlines will all work equivalently:

Code:

LQkdelover.pl /proc/meminfo
LQkdelover.pl < /proc/meminfo
cat /proc/meminfo | LQkdelover.pl

The is no need to hardcode any filenames into the application, and no need to launch a subshell to get the data.

As long as there is useful dialog, I don't see any problem extending a thread.

--- rod.

kdelover · 11-27-2010, 07:42 AM

Thanks Rod !!

garyg007 · 11-28-2010, 08:07 PM

@theNbomr ---- Thank you for the excellent description of regex.

Best I've seen and I've been doing a lot of web searching looking for something that was understandable to me