parsing a version number with regular expressions?

neunon · 07-01-2009, 10:09 PM

Hi folks,

I barely know enough regex to use 'grep' effectively, so I need a bit of assistance here. I also learn by example, so please don't tell me to RTFM. It won't help!

I am trying to parse a version number from 'git describe' which will always fit one of these formats:

Code:

A.B.C-{a,b,rc}D-E-g<7-char hex string>
A.B.C-{a,b,rc}D
A.B.C-E-g<7-char hex string>
A.B.C

My goal is to take this:

Code:

1.2.3-rc4-5-g08fd3ae

and generates a C header like this:

Code:

#ifndef __included_build_number_h
#define __included_build_number_h

#define PROJECT_VERSION_MAJOR 1
#define PROJECT_VERSION_MINOR 2
#define PROJECT_VERSION_REVISION 3
#define PROJECT_VERSION_BUILD 5
#define PROJECT_VERSION_SHORT "1.2.3-rc4"
#define PROJECT_VERSION_LONG "1.2.3-rc4-5-g08fd3ae"

#define PROJECT_RESOURCE_VERSION 1,2,3,5
#define PROJECT_RESOURCE_VERSION_STRING "1, 2, 3, 5"

#endif

If the 'git describe' output omits the 'E-g<7-char hex string>' bit, E should just be substituted with '0'. The 'g<7-char hex string>' can be safely omitted.

I have been using a bash script to do the task, but it's slow and just doesn't feel right. I have an inkling that some sort of regular expression with grep, sed, perl, or something could accomplish the same task, but with much less horrible looking scripting.

Any ideas? Huge thanks in advance!

LiamFromLeeds · 07-02-2009, 02:35 AM

Will need some tweaking but a starter for 10...

Code:

#!/usr/bin/perl

my $vlong=$ARGV[0];
my ($buildinfo,$tmp,$test,$tmphexstr,undef)=split(/-/,$ARGV[0]);
my ($vmajor,$vminor,$vrev)=split(/\./,$buildinfo);
my $ebit;
my $vshort;
if ($tmp =~ /^a|^b|^rc/) {
        $ebit=$test;
        $hexstr=$tmphexstr;
        $vshort="$buildinfo-$tmp";
} else {
        $ebit=$tmp;
        $hexstr=$test;
        $vshort="$buildinfo";
}

unless ($ebit) {$ebit=0}


print "\n#ifndef __included_build_number_h";
print "\n#define __included_build_number_h\n";

print "\n#define PROJECT_VERSION_MAJOR $vmajor";
print "\n#definedefine PROJECT_VERSION_MINOR $vminor";
print "\n# PROJECT_VERSION_REVISION $vrev";

print "\n#define PROJECT_VERSION_BUILD $ebit";

print qq(\n#define PROJECT_VERSION_SHORT "$vshort");
print qq(\n#define PROJECT_VERSION_LONG "$vlong");

print qq(\n#define PROJECT_RESOURCE_VERSION $vmajor,$vminor,$vrev,$ebit");
print qq(\n#define PROJECT_RESOURCE_VERSION_STRING "$vmajor, $vminor, $vrev, $ebit");

print "\n#endif";

neunon · 07-02-2009, 03:51 PM

A friend helped me figure this out (the most important part is the $pattern variable):

Code:

#!/usr/bin/perl

use strict;
use warnings;

use File::Basename;
use File::Spec::Functions qw(rel2abs);

my $in_git = 0;
my $scriptpath = rel2abs(dirname($0));
my $outfile = $ARGV[0];

my $releasever;

if (open RELEASE, "<", "$scriptpath/release_ver") {
	$releasever = <RELEASE>;
	close RELEASE;
}

print "Is this project under Git? ";
if (-d "$scriptpath/../.git" ) {
	print "Yes\n";
	print "Is Git installed? ";
	if ( ! system("which git &> /dev/null") ) {
		print "Yes\n";
		$in_git = 1;
	} else {
		print "No\n";
		$in_git = 0;
	}
} else {
	print "No\n";
	$in_git = 0;
}

my $verstring = "";

if ($in_git == 0) {
	$verstring = $releasever;
} else {
	$verstring = `git describe --tags --long 2> /dev/null || git describe --tags`;
}

chomp($verstring);

my $pattern = "([0-9]).([0-9]).([0-9])(?:(?:-([a-zA-Z]+[0-9]+))?(?:-([0-9]+)-g[a-fA-F0-9]+)?)?";

if ($verstring =~ $pattern) {
} else {
	die "Version string '$verstring' is malformed...\n";
}

my $major = $1;
my $minor = $2;
my $revis = $3;
my $build = $5;
my $pre   = $4;

if ( !$build ) {
	$build = "0";
}

if ( $pre ) {
	# We have a prerelease version.
	$pre = "-$pre";
} else {
	$pre = "";
}

unlink("$outfile.tmp");

my $prefix = "CC_LIB";
my $tag    = "cc";

open OUT, ">", "$outfile.tmp" or die $!;
print OUT <<__eof__;
#ifndef __included_${tag}_build_number_h
#define __included_${tag}_build_number_h

#define ${prefix}_VERSION_MAJOR ${major}
#define ${prefix}_VERSION_MINOR ${minor}
#define ${prefix}_VERSION_REVISION ${revis}
#define ${prefix}_VERSION_BUILD ${build}
#define ${prefix}_VERSION \"${major}.${minor}.${revis}${pre}\"
#define ${prefix}_VERSION_STRING "${verstring}"

#define ${prefix}_RESOURCE_VERSION ${major},${minor},${revis},${build}
#define ${prefix}_RESOURCE_VERSION_STRING \"${major}, ${minor}, ${revis}, ${build}\"

#endif

__eof__
close OUT or die $!;

use Digest::MD5;

my $ctx = Digest::MD5->new;

my $md5old = ""; my $md5new = "";

if (-e $outfile) {
	open OUT, "$outfile" or die $!;
	$ctx->addfile(*OUT);
	$md5old = $ctx->hexdigest;
	close OUT
}

open OUT, "$outfile.tmp" or die $!;
$ctx->addfile(*OUT);
$md5new = $ctx->hexdigest;
close OUT;

use File::Copy;

if ($md5old ne $md5new) {
	if (-e $outfile) {
		unlink($outfile) or die $!;
	}
	move "$outfile.tmp", $outfile or die $!;
	print "$outfile updated.\n";
} else {
	unlink ("$outfile.tmp");
	print "$outfile is already up to date.\n";
}

archtoad6 · 07-04-2009, 04:04 PM

Are you happy w/ the Perl solution?

Would you like to explore sed or awk solutions?

Do you understand regexes any better?

neunon · 07-04-2009, 11:11 PM

Quote:

Originally Posted by archtoad6

Are you happy w/ the Perl solution?

Would you like to explore sed or awk solutions?

Do you understand regexes any better?

The Perl solution works for me, but I'd be interested to see sed or awk to the task. (Perl was a couple orders of magnitude faster than Bash in this case, so sed/awk would be interesting)

archtoad6 · 11-14-2009, 08:01 AM

Well, the sed is done. I've been thinking about it since we each last posted, but this week I finally found time to concentrate on it.

Code:

PV='PROJECT'
echo '1.2.3-rc4-5-g08fd3ae' |\
sed -r 's,([0-9]+)\.([0-9]+)\.([0-9]+)(-(a|b|rc)[0-9]+)?(-([0-9]+)-g[0-9A-Fa-f]*)?,\
#ifndef __included_build_number_h\
#define __included_build_number_h\
\
#define '${PV}'_VERSION_MAJOR \1\
#define '${PV}'_VERSION_MINOR \2\
#define '${PV}'_VERSION_REVISION \3\
#define '${PV}'_VERSION_BUILD \7\
#define '${PV}'_VERSION_SHORT "\1\.\2\.\3\4"\
#define '${PV}'_VERSION_LONG "\0"\
\
#define '${PV}'_RESOURCE_VERSION \1\,\2\,\3\,\7\
#define '${PV}'_RESOURCE_VERSION_STRING "\1\, \2\, \3\, \7"\
\
#endif,'

It's tested -- what you see produces:

Code:

#ifndef __included_build_number_h
#define __included_build_number_h

#define PROJECT_VERSION_MAJOR 1
#define PROJECT_VERSION_MINOR 2
#define PROJECT_VERSION_REVISION 3
#define PROJECT_VERSION_BUILD 5
#define PROJECT_VERSION_SHORT "1.2.3-rc4"
#define PROJECT_VERSION_LONG "1.2.3-rc4-5-g08fd3ae"

#define PROJECT_RESOURCE_VERSION 1,2,3,5
#define PROJECT_RESOURCE_VERSION_STRING "1, 2, 3, 5"

#endif

If my line counting is correct, the sed is 17 lines vs. 124 for the perl (7.3:1). Furthermore, in spite of the limitations in sed, my regex is more precise:

It allows for 2+ digit decimal #'s in "A.B.C".
It ensures that the delimiter for "A.B.C" really is a '.' -- I believe the unescaped '.', even when quoted, remains a wild card.
"(a|b|rc)" properly embodies the "{a,b,rc}" of the stated problem.
I was stuck w/ '*' in g[0-9A-Fa-f]*, perl would allow'{7}'.

If you keep the perl, I suggest you look at:

Code:

my $pattern = "([0-9]+)\.([0-9]+)\.([0-9]+)(?:(?:-(a|b|rc))?(?:-([0-9]+)-g[a-fA-F0-9]{7})?)?";

which I believe embodies my improvements to the regex. Check it carefully -- I don't really write perl. Also, I believe that both regexes could drop the 'A-F' from the 'g[a-fA-F0-9]' -- your orig. statement implies that git would not use upper case in hex #'s.

BTW, I was pleasantly surprised that I could do it all in sed & not have to use awk.

BTW#2, I would be interested in seeing your original bash script.