parsing a version number with regular expressions?
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
parsing a version number with regular expressions?
Hi folks,
I barely know enough regex to use 'grep' effectively, so I need a bit of assistance here. I also learn by example, so please don't tell me to RTFM. It won't help!
I am trying to parse a version number from 'git describe' which will always fit one of these formats:
If the 'git describe' output omits the 'E-g<7-char hex string>' bit, E should just be substituted with '0'. The 'g<7-char hex string>' can be safely omitted.
I have been using a bash script to do the task, but it's slow and just doesn't feel right. I have an inkling that some sort of regular expression with grep, sed, perl, or something could accomplish the same task, but with much less horrible looking scripting.
A friend helped me figure this out (the most important part is the $pattern variable):
Code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use File::Spec::Functions qw(rel2abs);
my $in_git = 0;
my $scriptpath = rel2abs(dirname($0));
my $outfile = $ARGV[0];
my $releasever;
if (open RELEASE, "<", "$scriptpath/release_ver") {
$releasever = <RELEASE>;
close RELEASE;
}
print "Is this project under Git? ";
if (-d "$scriptpath/../.git" ) {
print "Yes\n";
print "Is Git installed? ";
if ( ! system("which git &> /dev/null") ) {
print "Yes\n";
$in_git = 1;
} else {
print "No\n";
$in_git = 0;
}
} else {
print "No\n";
$in_git = 0;
}
my $verstring = "";
if ($in_git == 0) {
$verstring = $releasever;
} else {
$verstring = `git describe --tags --long 2> /dev/null || git describe --tags`;
}
chomp($verstring);
my $pattern = "([0-9]).([0-9]).([0-9])(?:(?:-([a-zA-Z]+[0-9]+))?(?:-([0-9]+)-g[a-fA-F0-9]+)?)?";
if ($verstring =~ $pattern) {
} else {
die "Version string '$verstring' is malformed...\n";
}
my $major = $1;
my $minor = $2;
my $revis = $3;
my $build = $5;
my $pre = $4;
if ( !$build ) {
$build = "0";
}
if ( $pre ) {
# We have a prerelease version.
$pre = "-$pre";
} else {
$pre = "";
}
unlink("$outfile.tmp");
my $prefix = "CC_LIB";
my $tag = "cc";
open OUT, ">", "$outfile.tmp" or die $!;
print OUT <<__eof__;
#ifndef __included_${tag}_build_number_h
#define __included_${tag}_build_number_h
#define ${prefix}_VERSION_MAJOR ${major}
#define ${prefix}_VERSION_MINOR ${minor}
#define ${prefix}_VERSION_REVISION ${revis}
#define ${prefix}_VERSION_BUILD ${build}
#define ${prefix}_VERSION \"${major}.${minor}.${revis}${pre}\"
#define ${prefix}_VERSION_STRING "${verstring}"
#define ${prefix}_RESOURCE_VERSION ${major},${minor},${revis},${build}
#define ${prefix}_RESOURCE_VERSION_STRING \"${major}, ${minor}, ${revis}, ${build}\"
#endif
__eof__
close OUT or die $!;
use Digest::MD5;
my $ctx = Digest::MD5->new;
my $md5old = ""; my $md5new = "";
if (-e $outfile) {
open OUT, "$outfile" or die $!;
$ctx->addfile(*OUT);
$md5old = $ctx->hexdigest;
close OUT
}
open OUT, "$outfile.tmp" or die $!;
$ctx->addfile(*OUT);
$md5new = $ctx->hexdigest;
close OUT;
use File::Copy;
if ($md5old ne $md5new) {
if (-e $outfile) {
unlink($outfile) or die $!;
}
move "$outfile.tmp", $outfile or die $!;
print "$outfile updated.\n";
} else {
unlink ("$outfile.tmp");
print "$outfile is already up to date.\n";
}
Last edited by neunon; 07-02-2009 at 03:52 PM.
Reason: adding a note
The Perl solution works for me, but I'd be interested to see sed or awk to the task. (Perl was a couple orders of magnitude faster than Bash in this case, so sed/awk would be interesting)
If my line counting is correct, the sed is 17 lines vs. 124 for the perl (7.3:1). Furthermore, in spite of the limitations in sed, my regex is more precise:
It allows for 2+ digit decimal #'s in "A.B.C".
It ensures that the delimiter for "A.B.C" really is a '.' -- I believe the unescaped '.', even when quoted, remains a wild card.
"(a|b|rc)" properly embodies the "{a,b,rc}" of the stated problem.
I was stuck w/ '*' in g[0-9A-Fa-f]*, perl would allow'{7}'.
If you keep the perl, I suggest you look at:
Code:
my $pattern = "([0-9]+)\.([0-9]+)\.([0-9]+)(?:(?:-(a|b|rc))?(?:-([0-9]+)-g[a-fA-F0-9]{7})?)?";
which I believe embodies my improvements to the regex. Check it carefully -- I don't really write perl. Also, I believe that both regexes could drop the 'A-F' from the 'g[a-fA-F0-9]' -- your orig. statement implies that git would not use upper case in hex #'s.
BTW, I was pleasantly surprised that I could do it all in sed & not have to use awk.
BTW#2, I would be interested in seeing your original bash script.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.