LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-11-2014, 03:01 PM   #16
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,883
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930

I didn't want to get too expansive before, but it sounded like you're parsing a protocol and now it sounds exactly like that.

My question about syntax was for that exact reason.

I use state machines to de-frame, validate checksums (if present/available) and classify packets and then have a secondary function parse the frame.

For instance if you look at some of the NMEA 0183 GPS frames, there are several, they are ASCII, comma delimited data.

Each packet is delimited by \r\n or <0x0d><0x0a> to end a frame, next one starts always with $ and runs until * with a two-character checksum following.

The state machine looks at every character to validate the rules of the protocol and then calls a secondary parser function which is more specific to the packet type. Or if all packets are the same format and it's not too complicated I may choose to parse right out of the state machine.

My points there:
  1. The data is continuous forever
  2. I encounter it at a random point when I start up, because it's serial data streaming from a sensor device; so I end up discarding data until I see an End of Frame ( \r\n ) and then search for a Start of Frame ( $ )
  3. I maintain my start pointer and move ahead until I see the ASTERISK, in this case updating my checksum and then validate my calculated checksum against the one given
  4. From that point I have a known good frame, or packet I then can look at the characters following the $ and determine what "type" of packet it is and call a secondary function with a pointer to the start of that packet type
  5. If there are protocol violations at any point, then the packet is invalid, I ignore it and try to re-sync on the next packet
This may not be exactly what you're doing; however you are reading lines out of a file, those lines are either comments, valid data, or incorrectly formatted data. It may be useful to validate whether or not the data is in the correct form before you proceed.

From there you can validate the syntax of individual packets and convert data if required. A technique I use to convert ASCII strings of unknown length into valued variables is that I use two character pointers, start them matching at the beginning of my intended character, move one until it encounters the next delimiter and then perform atoi().
Code:
    // $GPGSV,3,1,11,10,63,137,17,07,61,098,15,05,59,290,20,08,54,157,30*70
    // Say I've validated that this is $GPGSV, therefore I know in my parse function
    // I'm being passed the pointer to $ or G depending how I've coded it
    // Say it was the $, I then set up p1 and p2 accordingly.
void parse_gpgsv(char *frame, gps_data_t *return_structure)
{
    char *p1, *p2;

    // validate pointers - omitted

    p1 = frame;
    p1 += 7;  // move to the location where the first number is supposed to be
    p2 = p1;

    // move p2 up to the next delimiter, in these packets they are COMMAS or ASTERISKS
    while((*p2 != ',') || (*p2 != '*')) {
        p2++; // this terminates the string containing the satellite PRN number
        // One can also check each character in the string to make sure it's a digit, or hex-digit, or whatever you require
    }
    *p2 = 0;

    // At this point, p1 is pointing to the start of an ASCII number string and it is now NULL terminated, I can use atoi()
    return_structure->sats.sat_01.prn = atoi(p1);

    // I could opt to restore the value at p2 back to ',', but why bother?
    p2++;
    p1 = p2;

    ...

    // Continue using this logic while you go through the packet
}
In this case the lengths of each field are actually fixed; however there are cases where I decode variable length fields, hence the pointer method shown.

The bottom line is that there has to be a set of syntactical rules; otherwise the parser becomes too complicated.

Therefore if your data file has a set of random rules, then it will be that much more difficult to parse.

The reason why a lot of individual validations are left out is because the protocol has a frame format and checksum, therefore the state machine has already validated whether or not the packet is intact and valid. One can argue that checksums won't catch all errors; therefore other validations someone could add are things like verifying each character is a digit in that while loop and also passing the overall packet length to the secondary function so that the secondary function can check that pointer movement has not proceeded beyond the limits of the packet.

Another case may be expandable data. We've had cases where we generate CSV files for data sets where one set is higher frequency than the other, say "A" is measured once per second, "B" is measured every two seconds, and "C" is measured every three seconds, we generate a CSV like the following:
Code:
A,B,C
A,,
A,B,
A,,C
A,B,
A,,
A,B,C
As a result when imported to Excel the columns will match up properly. But to then parse this data file, one has to understand that "A" is always supposed to be there, and "B" or "C" may not be there, but should be there at regular intervals. Taking the data via software, not knowing where you're starting, you then have to decode what packet type it is.
 
Old 03-12-2014, 10:51 AM   #17
PeterUK
Member
 
Registered: May 2009
Posts: 281

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by NevemTeve View Post
Code:
if ((number1 > 0) && (number2 == 0) && (number21 == 0)) {number_to = number1;}
Do you remember writing this line?
:-)

Thanks for the following code:
Quote:
Note: You should do something like this:

Code:
int numval (const char *p, int *to)
{
    int err;
    int n= 0;
    const char *q= p;

    while (isdigit ((unsigned char)*q)) {
        n= n*10 + (*q-'0');
        ++q;
    }
    if (q==p || *q!='\0') err= EOF;
    else                  err= 0;
    *to= n;
    return err;
}
I've broken this code on parts and print and check what it's doing and it very similar (ofcourse more cleaner)
Did you make it by looking at what I was doing and come up with something better, could you please explain me how did you think to get to it?

Also I still dont know why are you using a "const char"? I have tested as passing a "char *p" and it also works

I also tested using "while (isdigit (*q))" and it also seen to be working, Why do you cast to "unsigned char"?
 
Old 03-12-2014, 12:27 PM   #18
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,871
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
1. 'const char' is a promise to the caller that the function won't change its input.

2. casting to 'unsigned char' might be resonable if you think of characters that are above 127 -- or below 0, if the character-type is signed. (Note: in EBCDIC even the digits and letters are above 127.)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
returning boolean from function ? cmosentine Programming 3 02-16-2012 10:27 AM
time() function not returning jiml8 Programming 5 04-09-2008 01:03 PM
returning different types of variables from a function knobby67 Programming 3 02-08-2008 08:53 PM
Perl - returning array from a function rose_bud4201 Programming 6 07-13-2007 01:02 AM
returning an array from a function.. javascript sonesay Programming 1 06-07-2004 05:28 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration