Forming a lex (or flex) regexp
For all you compiler hackers out there: this is a question about writing a specification for flex, the scanner generator. I am using flex 2.5.35.
My problem is to write a specification that will break up a NMEA-0183 AIS string into recognizable components. In case you don't know, AIS strings look like this:
That is, roughly:
!AIVDM COMMA NUMBER COMMA NUMBER COMMA OPTIONAL_NUMBER COMMA [A | B] COMMA JUNK_TO_BE_DISCUSSED COMMA ZERO SPLAT HEXDIGIT HEXDIGIT CR LF
The problem is in the JUNK part. This is ASCII-ized binary crud similar to Base64 encoded data. It can contain basically anything except delimiters like (,!$*) etc.
The problem I am having is that my recognizers for decimal numbers are hitting sequences in the beginning of that junk, or at the end sometimes, so the junk sequence "15M>16?P00G?j9nKAFcV1ww:20Su" might hit on a number 15, followed by junk.
So the basic problem is how do you construct a specification that will filter a really "promiscuous" field out of more "restricted" data? What is causing this (it seems) is the fact that the JUNK field can contain a lot of, well, junk that is easily mistaken for almost anything else.
Any ideas what to do about this? I can post a lex file and data input if anyone cares.