LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 07-01-2009, 10:09 PM   #1
neunon
LQ Newbie
 
Registered: Jul 2009
Posts: 3

Rep: Reputation: 0
parsing a version number with regular expressions?


Hi folks,

I barely know enough regex to use 'grep' effectively, so I need a bit of assistance here. I also learn by example, so please don't tell me to RTFM. It won't help!

I am trying to parse a version number from 'git describe' which will always fit one of these formats:

Code:
A.B.C-{a,b,rc}D-E-g<7-char hex string>
A.B.C-{a,b,rc}D
A.B.C-E-g<7-char hex string>
A.B.C
My goal is to take this:
Code:
1.2.3-rc4-5-g08fd3ae
and generates a C header like this:

Code:
#ifndef __included_build_number_h
#define __included_build_number_h

#define PROJECT_VERSION_MAJOR 1
#define PROJECT_VERSION_MINOR 2
#define PROJECT_VERSION_REVISION 3
#define PROJECT_VERSION_BUILD 5
#define PROJECT_VERSION_SHORT "1.2.3-rc4"
#define PROJECT_VERSION_LONG "1.2.3-rc4-5-g08fd3ae"

#define PROJECT_RESOURCE_VERSION 1,2,3,5
#define PROJECT_RESOURCE_VERSION_STRING "1, 2, 3, 5"

#endif
If the 'git describe' output omits the 'E-g<7-char hex string>' bit, E should just be substituted with '0'. The 'g<7-char hex string>' can be safely omitted.

I have been using a bash script to do the task, but it's slow and just doesn't feel right. I have an inkling that some sort of regular expression with grep, sed, perl, or something could accomplish the same task, but with much less horrible looking scripting.

Any ideas? Huge thanks in advance!
 
Old 07-02-2009, 02:35 AM   #2
LiamFromLeeds
Member
 
Registered: Dec 2006
Distribution: Arch
Posts: 44

Rep: Reputation: 15
Will need some tweaking but a starter for 10...
Code:
#!/usr/bin/perl

my $vlong=$ARGV[0];
my ($buildinfo,$tmp,$test,$tmphexstr,undef)=split(/-/,$ARGV[0]);
my ($vmajor,$vminor,$vrev)=split(/\./,$buildinfo);
my $ebit;
my $vshort;
if ($tmp =~ /^a|^b|^rc/) {
        $ebit=$test;
        $hexstr=$tmphexstr;
        $vshort="$buildinfo-$tmp";
} else {
        $ebit=$tmp;
        $hexstr=$test;
        $vshort="$buildinfo";
}

unless ($ebit) {$ebit=0}


print "\n#ifndef __included_build_number_h";
print "\n#define __included_build_number_h\n";

print "\n#define PROJECT_VERSION_MAJOR $vmajor";
print "\n#definedefine PROJECT_VERSION_MINOR $vminor";
print "\n# PROJECT_VERSION_REVISION $vrev";

print "\n#define PROJECT_VERSION_BUILD $ebit";

print qq(\n#define PROJECT_VERSION_SHORT "$vshort");
print qq(\n#define PROJECT_VERSION_LONG "$vlong");

print qq(\n#define PROJECT_RESOURCE_VERSION $vmajor,$vminor,$vrev,$ebit");
print qq(\n#define PROJECT_RESOURCE_VERSION_STRING "$vmajor, $vminor, $vrev, $ebit");

print "\n#endif";
 
Old 07-02-2009, 03:51 PM   #3
neunon
LQ Newbie
 
Registered: Jul 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Thumbs up

A friend helped me figure this out (the most important part is the $pattern variable):
Code:
#!/usr/bin/perl

use strict;
use warnings;

use File::Basename;
use File::Spec::Functions qw(rel2abs);

my $in_git = 0;
my $scriptpath = rel2abs(dirname($0));
my $outfile = $ARGV[0];

my $releasever;

if (open RELEASE, "<", "$scriptpath/release_ver") {
	$releasever = <RELEASE>;
	close RELEASE;
}

print "Is this project under Git? ";
if (-d "$scriptpath/../.git" ) {
	print "Yes\n";
	print "Is Git installed? ";
	if ( ! system("which git &> /dev/null") ) {
		print "Yes\n";
		$in_git = 1;
	} else {
		print "No\n";
		$in_git = 0;
	}
} else {
	print "No\n";
	$in_git = 0;
}

my $verstring = "";

if ($in_git == 0) {
	$verstring = $releasever;
} else {
	$verstring = `git describe --tags --long 2> /dev/null || git describe --tags`;
}

chomp($verstring);

my $pattern = "([0-9]).([0-9]).([0-9])(?:(?:-([a-zA-Z]+[0-9]+))?(?:-([0-9]+)-g[a-fA-F0-9]+)?)?";

if ($verstring =~ $pattern) {
} else {
	die "Version string '$verstring' is malformed...\n";
}

my $major = $1;
my $minor = $2;
my $revis = $3;
my $build = $5;
my $pre   = $4;

if ( !$build ) {
	$build = "0";
}

if ( $pre ) {
	# We have a prerelease version.
	$pre = "-$pre";
} else {
	$pre = "";
}

unlink("$outfile.tmp");

my $prefix = "CC_LIB";
my $tag    = "cc";

open OUT, ">", "$outfile.tmp" or die $!;
print OUT <<__eof__;
#ifndef __included_${tag}_build_number_h
#define __included_${tag}_build_number_h

#define ${prefix}_VERSION_MAJOR ${major}
#define ${prefix}_VERSION_MINOR ${minor}
#define ${prefix}_VERSION_REVISION ${revis}
#define ${prefix}_VERSION_BUILD ${build}
#define ${prefix}_VERSION \"${major}.${minor}.${revis}${pre}\"
#define ${prefix}_VERSION_STRING "${verstring}"

#define ${prefix}_RESOURCE_VERSION ${major},${minor},${revis},${build}
#define ${prefix}_RESOURCE_VERSION_STRING \"${major}, ${minor}, ${revis}, ${build}\"

#endif

__eof__
close OUT or die $!;

use Digest::MD5;

my $ctx = Digest::MD5->new;

my $md5old = ""; my $md5new = "";

if (-e $outfile) {
	open OUT, "$outfile" or die $!;
	$ctx->addfile(*OUT);
	$md5old = $ctx->hexdigest;
	close OUT
}

open OUT, "$outfile.tmp" or die $!;
$ctx->addfile(*OUT);
$md5new = $ctx->hexdigest;
close OUT;

use File::Copy;

if ($md5old ne $md5new) {
	if (-e $outfile) {
		unlink($outfile) or die $!;
	}
	move "$outfile.tmp", $outfile or die $!;
	print "$outfile updated.\n";
} else {
	unlink ("$outfile.tmp");
	print "$outfile is already up to date.\n";
}

Last edited by neunon; 07-02-2009 at 03:52 PM. Reason: adding a note
 
Old 07-04-2009, 04:04 PM   #4
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Are you happy w/ the Perl solution?

Would you like to explore sed or awk solutions?

Do you understand regexes any better?
 
Old 07-04-2009, 11:11 PM   #5
neunon
LQ Newbie
 
Registered: Jul 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by archtoad6 View Post
Are you happy w/ the Perl solution?

Would you like to explore sed or awk solutions?

Do you understand regexes any better?
The Perl solution works for me, but I'd be interested to see sed or awk to the task. (Perl was a couple orders of magnitude faster than Bash in this case, so sed/awk would be interesting)
 
Old 11-14-2009, 08:01 AM   #6
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Well, the sed is done. I've been thinking about it since we each last posted, but this week I finally found time to concentrate on it.
Code:
PV='PROJECT'
echo '1.2.3-rc4-5-g08fd3ae' |\
sed -r 's,([0-9]+)\.([0-9]+)\.([0-9]+)(-(a|b|rc)[0-9]+)?(-([0-9]+)-g[0-9A-Fa-f]*)?,\
#ifndef __included_build_number_h\
#define __included_build_number_h\
\
#define '${PV}'_VERSION_MAJOR \1\
#define '${PV}'_VERSION_MINOR \2\
#define '${PV}'_VERSION_REVISION \3\
#define '${PV}'_VERSION_BUILD \7\
#define '${PV}'_VERSION_SHORT "\1\.\2\.\3\4"\
#define '${PV}'_VERSION_LONG "\0"\
\
#define '${PV}'_RESOURCE_VERSION \1\,\2\,\3\,\7\
#define '${PV}'_RESOURCE_VERSION_STRING "\1\, \2\, \3\, \7"\
\
#endif,'
It's tested -- what you see produces:
Code:
#ifndef __included_build_number_h
#define __included_build_number_h

#define PROJECT_VERSION_MAJOR 1
#define PROJECT_VERSION_MINOR 2
#define PROJECT_VERSION_REVISION 3
#define PROJECT_VERSION_BUILD 5
#define PROJECT_VERSION_SHORT "1.2.3-rc4"
#define PROJECT_VERSION_LONG "1.2.3-rc4-5-g08fd3ae"

#define PROJECT_RESOURCE_VERSION 1,2,3,5
#define PROJECT_RESOURCE_VERSION_STRING "1, 2, 3, 5"

#endif
If my line counting is correct, the sed is 17 lines vs. 124 for the perl (7.3:1). Furthermore, in spite of the limitations in sed, my regex is more precise:
  • It allows for 2+ digit decimal #'s in "A.B.C".
  • It ensures that the delimiter for "A.B.C" really is a '.' -- I believe the unescaped '.', even when quoted, remains a wild card.
  • "(a|b|rc)" properly embodies the "{a,b,rc}" of the stated problem.
  • I was stuck w/ '*' in g[0-9A-Fa-f]*, perl would allow'{7}'.

If you keep the perl, I suggest you look at:
Code:
my $pattern = "([0-9]+)\.([0-9]+)\.([0-9]+)(?:(?:-(a|b|rc))?(?:-([0-9]+)-g[a-fA-F0-9]{7})?)?";
which I believe embodies my improvements to the regex. Check it carefully -- I don't really write perl. Also, I believe that both regexes could drop the 'A-F' from the 'g[a-fA-F0-9]' -- your orig. statement implies that git would not use upper case in hex #'s.


BTW, I was pleasantly surprised that I could do it all in sed & not have to use awk.

BTW#2, I would be interested in seeing your original bash script.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
\{a,b\} regular expressions sycamorex Linux - General 10 10-18-2008 06:38 PM
Regular Expressions ziggy25 Linux - Newbie 7 11-05-2007 06:57 AM
Number incrementing using regular expressions fudam Linux - General 3 11-28-2006 12:58 AM
help with REGULAR EXPRESSIONS ner Linux - General 23 10-31-2003 11:09 PM
regular expressions? alaios Linux - General 2 06-11-2003 03:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration