LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-18-2007, 12:38 PM   #1
Osiris990
Member
 
Registered: Oct 2005
Location: Dallas, TX
Distribution: Slackware 12
Posts: 49

Rep: Reputation: 15
Post PCRE Regex


I'm looking for a regex expression that will match options in a configuration file. The format is...

optionname = optionvalue

...where 'optionname' can be any string consisting of a-z(case independent), 0-9, underscores, dashes, and periods; 'optionvalue' can be a string of any length with any characters (excluding newlines of course) in it; and there can be multiple or no spaces before or after the equals sign.

What I have now: ^[a-zA-Z0-9\-_\.]{1,}+[\s?]+=+[\s?]+[.+]$

It doesn't seem to be working right though... Suggestions?

Also, if anyone knows of a good config file parsing API or something similar that will keep me from having to write my own, I would much prefer that.

Thanks
Shane
 
Old 10-18-2007, 01:11 PM   #2
raskin
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900

Rep: Reputation: 69
^[-a-zA-Z0-9_.]+\s*=\s*.+$

Explanation: you do not need to insert "+" between parts of expression, it is a modifier equivalent to {1,}. "*" is equivalent to {0,}. You do not need to escape anything inside [] and you cannot even escape "-", but it will be escaped at least if it is the first symbol inside [] (there are other cases, but they are all distinct from sane use for a range). \s should stand on its own, I guess [[:space:]] should have similar effect.

PS. I do not use perl, I even cannot write a simple regular-expression replace application in it, so I didn't test it in perl; all I say is from being sed and vim user and 'man regex'. So there can be some subtle error.
 
Old 10-18-2007, 01:26 PM   #3
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
The expression you posted can be improved quite a lot:
  • I'd add \s* to either side of the = symbol, and also at the beginning and end of line too. This means "any whitespace"... it gives you some formatting freedom
  • \w can be used instead of [a-zA-Z0-9_]
  • The {1,} means "1 or more times", which can more clearly be specified with +
  • + goes after an expression, not before it - unless you want to be able to use any number of = characters as your assignment operator, you don't want to put =+
  • putting + inside [square brackets] changes its meaning... I don't think you want to do that at the end of your expression.
  • Please put code in [code] tabe to improve readability.
  • In Perl itself, you can put parts of an expression in (brackets) to extract those sections to $1 $2 and so on.
The total expression can be changed to this:
Code:
^\s*(\w+)\s*=\s*(.*)$
For example, in a perl program:
Code:
#!/usr/bin/perl

use strict;
use warnings;

# some test data
my @input = split /\n/, <<EOD;
setting1 = value1
setting2=value2
  setting3 = value3
invalid setting = value4
EOD

foreach (@input) {
    if (/^\s*(\w+)\s*=\s*(.*)$/) {
        printf("id=%s; value=%s\n", $1, $2);
    }
    else {
        warn "invalid format: $_\n";
    }
}
 
Old 10-18-2007, 01:56 PM   #4
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

If you are willing to use INI-style config files, Perl Best Practices, Conway, suggests the modules Config::{General,Std,Tiny} available on http://cpan.org/

Otherwise, matthewg42's code looks good for the equal-style files ... cheers, makyo

Last edited by makyo; 10-18-2007 at 02:13 PM.
 
Old 10-18-2007, 10:36 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,358

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I use this:

Code:
    # Process cfg file records
    while ( defined ( $cfg_rec = <CONFIG_FILE> ) )
    {
        # Remove unwanted chars
        chomp $cfg_rec;                 # newline
        $cfg_rec =~ s/#.*//;            # comments
        $cfg_rec =~ s/^\s+//;           # leading whitespace
        $cfg_rec =~ s/\s+$//;           # trailing whitespace

        next unless length($cfg_rec);   # anything left?

        # Split 'key=value' string
        ($key, $value) = split( /\s*=\s*/, $cfg_rec, 2);

        # Assign to global hash, forcing uppercase keys
        $cfg::params{uc($key)} = $value;
    }
As you can see, I end up with a hash where the name of the option (key) is forced to upper (so it stands out in the code & is always upper), but the associated hash value is unchanged apart from leading/trailing spaces.
The hash is in a package cfg, which makes it effectively 'global' as I otherwise avoid global variables.

As you can see, it also allows the cfg file to have blank lines and comments, which this routine strips out first.
 
Old 10-19-2007, 11:18 AM   #6
Osiris990
Member
 
Registered: Oct 2005
Location: Dallas, TX
Distribution: Slackware 12
Posts: 49

Original Poster
Rep: Reputation: 15
Er... Something I neglected to mention. I'm using PCRE in *C* not in Perl. Any snippets you give me would be most helpful if given in C... Perl is Greek to me. =/ Thanks for all the attention though, guys. =] I'll try those variations on the regex string and see if any of them work.

Edit: Also, the \s* at the beginning and ends is unnecessary, as I have it strip whitespace from the beginning and end of the string as it's read in.

Last edited by Osiris990; 10-19-2007 at 11:20 AM.
 
Old 10-19-2007, 12:41 PM   #7
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
I think you need escape all \ characters in your C strings.
 
Old 10-19-2007, 02:07 PM   #8
Osiris990
Member
 
Registered: Oct 2005
Location: Dallas, TX
Distribution: Slackware 12
Posts: 49

Original Poster
Rep: Reputation: 15
Okay, so with a little bit of editing, the code posted in the first reply worked. Now I'm faced with the problem of pulling the values out (similar to how you can in javascript/PHP/perl[so I hear]). I don't think it works quite the same way as Perl. =/ I've checked around Google, and I've searched through the PCRE man pages, but I've come up with nothing. Anyone know what I need to do?
 
Old 10-19-2007, 02:31 PM   #9
raskin
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900

Rep: Reputation: 69
I think you should read 'man 3 pcreapi', about pcre_get_substring_list() and similar.
 
Old 10-22-2007, 10:53 AM   #10
Osiris990
Member
 
Registered: Oct 2005
Location: Dallas, TX
Distribution: Slackware 12
Posts: 49

Original Poster
Rep: Reputation: 15
Alright, well I seem to have that working... Ish. The problem is, I don't know how to group it right (I guess) to get it to extract the right thing. I can extract the substring 0 (which is just the whole thing), but the list goes no further. I get error code -7 (no substring matching that number) when I put the index at 1 or over. How would I group them so that I can have it pull out the right substrings? I want to pull out the option name and the option value I.E.:

thisoption = this option value rules

I want to pull out the 'thisoption' and the 'this option value rules'.

Thanks,
Shane
 
Old 10-22-2007, 02:43 PM   #11
raskin
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900

Rep: Reputation: 69
'(' and ')' group parts in regular expressions. Like
^([-a-zA-Z0-9_.]+)\s*=\s*(.+)$
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of PCRE library in C Centinul Programming 11 09-27-2010 05:13 AM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 05:10 AM
help installing snort: pcre header missing cynthia_thomas Linux - Networking 1 11-07-2005 11:19 AM
PCRE Configuring MDK10 Goryan Linux - Newbie 0 08-21-2004 10:55 AM
Bluefish -> pcre-devel?? p41elvis Linux - Software 3 04-25-2004 11:10 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration