ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm looking for a regex expression that will match options in a configuration file. The format is...
optionname = optionvalue
...where 'optionname' can be any string consisting of a-z(case independent), 0-9, underscores, dashes, and periods; 'optionvalue' can be a string of any length with any characters (excluding newlines of course) in it; and there can be multiple or no spaces before or after the equals sign.
What I have now: ^[a-zA-Z0-9\-_\.]{1,}+[\s?]+=+[\s?]+[.+]$
It doesn't seem to be working right though... Suggestions?
Also, if anyone knows of a good config file parsing API or something similar that will keep me from having to write my own, I would much prefer that.
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900
Rep:
^[-a-zA-Z0-9_.]+\s*=\s*.+$
Explanation: you do not need to insert "+" between parts of expression, it is a modifier equivalent to {1,}. "*" is equivalent to {0,}. You do not need to escape anything inside [] and you cannot even escape "-", but it will be escaped at least if it is the first symbol inside [] (there are other cases, but they are all distinct from sane use for a range). \s should stand on its own, I guess [[:space:]] should have similar effect.
PS. I do not use perl, I even cannot write a simple regular-expression replace application in it, so I didn't test it in perl; all I say is from being sed and vim user and 'man regex'. So there can be some subtle error.
The expression you posted can be improved quite a lot:
I'd add \s* to either side of the = symbol, and also at the beginning and end of line too. This means "any whitespace"... it gives you some formatting freedom
\w can be used instead of [a-zA-Z0-9_]
The {1,} means "1 or more times", which can more clearly be specified with +
+ goes after an expression, not before it - unless you want to be able to use any number of = characters as your assignment operator, you don't want to put =+
putting + inside [square brackets] changes its meaning... I don't think you want to do that at the end of your expression.
Please put code in [code] tabe to improve readability.
In Perl itself, you can put parts of an expression in (brackets) to extract those sections to $1 $2 and so on.
The total expression can be changed to this:
Code:
^\s*(\w+)\s*=\s*(.*)$
For example, in a perl program:
Code:
#!/usr/bin/perl
use strict;
use warnings;
# some test data
my @input = split /\n/, <<EOD;
setting1 = value1
setting2=value2
setting3 = value3
invalid setting = value4
EOD
foreach (@input) {
if (/^\s*(\w+)\s*=\s*(.*)$/) {
printf("id=%s; value=%s\n", $1, $2);
}
else {
warn "invalid format: $_\n";
}
}
If you are willing to use INI-style config files, Perl Best Practices, Conway, suggests the modules Config::{General,Std,Tiny} available on http://cpan.org/
Otherwise, matthewg42's code looks good for the equal-style files ... cheers, makyo
# Process cfg file records
while ( defined ( $cfg_rec = <CONFIG_FILE> ) )
{
# Remove unwanted chars
chomp $cfg_rec; # newline
$cfg_rec =~ s/#.*//; # comments
$cfg_rec =~ s/^\s+//; # leading whitespace
$cfg_rec =~ s/\s+$//; # trailing whitespace
next unless length($cfg_rec); # anything left?
# Split 'key=value' string
($key, $value) = split( /\s*=\s*/, $cfg_rec, 2);
# Assign to global hash, forcing uppercase keys
$cfg::params{uc($key)} = $value;
}
As you can see, I end up with a hash where the name of the option (key) is forced to upper (so it stands out in the code & is always upper), but the associated hash value is unchanged apart from leading/trailing spaces.
The hash is in a package cfg, which makes it effectively 'global' as I otherwise avoid global variables.
As you can see, it also allows the cfg file to have blank lines and comments, which this routine strips out first.
Er... Something I neglected to mention. I'm using PCRE in *C* not in Perl. Any snippets you give me would be most helpful if given in C... Perl is Greek to me. =/ Thanks for all the attention though, guys. =] I'll try those variations on the regex string and see if any of them work.
Edit: Also, the \s* at the beginning and ends is unnecessary, as I have it strip whitespace from the beginning and end of the string as it's read in.
Okay, so with a little bit of editing, the code posted in the first reply worked. Now I'm faced with the problem of pulling the values out (similar to how you can in javascript/PHP/perl[so I hear]). I don't think it works quite the same way as Perl. =/ I've checked around Google, and I've searched through the PCRE man pages, but I've come up with nothing. Anyone know what I need to do?
Alright, well I seem to have that working... Ish. The problem is, I don't know how to group it right (I guess) to get it to extract the right thing. I can extract the substring 0 (which is just the whole thing), but the list goes no further. I get error code -7 (no substring matching that number) when I put the index at 1 or over. How would I group them so that I can have it pull out the right substrings? I want to pull out the option name and the option value I.E.:
thisoption = this option value rules
I want to pull out the 'thisoption' and the 'this option value rules'.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.