Forming a lex (or flex) regexp
For all you compiler hackers out there: this is a question about writing a specification for flex, the scanner generator. I am using flex 2.5.35.
My problem is to write a specification that will break up a NMEA-0183 AIS string into recognizable components. In case you don't know, AIS strings look like this: !AIVDM,1,1,,B,15M5c<0000G?j?HK@;F005U<04KH,0*4E !AIVDM,1,1,,B,15M>16?P00G?j9nKAFcV1ww:20Su,0*29 !AIVDM,1,1,,B,15N@wP0P00o?ruLK?UMMbOw>04KH,0*31 !AIVDM,1,1,,B,15Mj2u001vo?tV8K?<ub>8;@0D1<,0*17 !AIVDM,2,1,3,B,55P5TL01VIaAL@7WKO@mBplU@<PDhh000000001S;AJ::4A80?4i@E53,0*3E !AIVDM,2,2,3,B,1@0000000000000,2*55 That is, roughly: !AIVDM COMMA NUMBER COMMA NUMBER COMMA OPTIONAL_NUMBER COMMA [A | B] COMMA JUNK_TO_BE_DISCUSSED COMMA ZERO SPLAT HEXDIGIT HEXDIGIT CR LF The problem is in the JUNK part. This is ASCII-ized binary crud similar to Base64 encoded data. It can contain basically anything except delimiters like (,!$*) etc. The problem I am having is that my recognizers for decimal numbers are hitting sequences in the beginning of that junk, or at the end sometimes, so the junk sequence "15M>16?P00G?j9nKAFcV1ww:20Su" might hit on a number 15, followed by junk. So the basic problem is how do you construct a specification that will filter a really "promiscuous" field out of more "restricted" data? What is causing this (it seems) is the fact that the JUNK field can contain a lot of, well, junk that is easily mistaken for almost anything else. Any ideas what to do about this? I can post a lex file and data input if anyone cares. Thanks Eric |
Hard to say what you need to satisfy the bigger picture, but here's something to start with
Code:
%% Since your definition seems to indicate that commas are used exclusively as delimiters, perhaps an easier approach would be to break out the good old strtok() function. --- rod. |
All times are GMT -5. The time now is 10:57 AM. |