LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Blogs > rainbowsally
User Name
Password

Notices

Rate this Entry

Playing with parsing

Posted 01-02-2013 at 07:46 PM by rainbowsally
Updated 08-06-2014 at 08:36 AM by rainbowsally (typo correction)

Playing with parsing

Today's features:
  • Creating an extensible mid-level parser that makes sense in C.
  • Looking at BNF-like syntax and C syntax similarities.
  • Tossing flex as far as we can throw it. While its still burning.

The file set here isn't 'finalized' in any way but it's been stable for a few days. These examples are therefor not about THIS set of functions but about parsing in general, although this file set will very likely be part of the mc2 parsing system as a replacement for the dog that is currently implemented. :-)

[Re: "dog". Wow. Name clashes. This version may have the same problem but who knows at this point. Whatever the final version is, it can't use namespaces because then it can't be used in C.]

-----------------------------
Note: These are compiled with gcc, but the files are named *.cpp. This is for the additional error checking done in C++, though the files will likely be renamed *.c after being fully tested with the more strict types and syntax checking.
-----------------------------

What's parsing?

It's taking input in one format and converting it to another. The input is usually text, but it could be anything that can be divided up into 'parcels' (parsed) and translated into another form.

Adding actions to a parser is fairly easily done and after several bouts with flex and bison I'm more convinced than ever that both of those applications need to be destroyed. Flex is the worst. It has clobbered my main file by overwriting it with its own output one too many times, and though bison is a bit less of a mess, the inherited hidden functions, the spaghetti code inherent in the system, and the difficulty debugging and even "readability" of code makes it more like mahine language written for an unknonwn vitual machine than actual C code.

As a result both of those application have their own languages and neither of them make sense on any level but the abstract.

So then what would we require of a somewhat comprehensible parser that can recognize patterns in text?

1. It needs to tell us if the pattern matches.
2. It needs to be able to do something if it does.
3. It needs to know when an error occurs and spit out a syntax error message or some other error message that makes sense in the context of the parsing job.

It also needs some other features that deal with issues that come up often such as to UNDO changes made when a long parse branch fails to match.

Some of these additonal features like UNDO may seem complicated or inconvenient but they're not.

------------------------------
These tests will be far more interesting if you trace them in a good debugger like kdbg (v. >= 5.0) or some other insight-like gdb front end.

Also, it would help if you know a bit of BNF-like syntax (and C or C++), though it shouldn't be too hard to figure out what we're talking about if you don't.
-------------------------------

All of these tests will use a variation of the mc2 Parser that will possibly replace the mc2 version.

They have no mc2 dependencies at this point though the makefile will be far easier to generate if you have mc2 and let's face it, after all these years 'make' still rocks -- once you get past the nit-picky syntax. [So we here at the Comptuer Mad Science Dept. don't have a problem with oddball syntaxes, per se; just the ones that cause more problems than they solve. -rs]

Here's a self extractor for the lib functions that we will 'include' directly rather than linking with our test files. It will go into a new subdirectory named src/inc.

It will create:
parser2.cpp parser2.h parser2-numbers.cpp parser2-numbers.h

It replaces about a thousand lines of code with a couple hundred lines of lzma compressed data which requiers the now standard lzma/xz compression utilities and the base64 utility which is part of linux coreutils.

This can extracted from the commandline as
Code:
sh parser-inc-files.sfxz
or made executable and run as any other shell program is.

file: parser-inc-files.sfxz
purpose: utility (executable)
Code:
#!/bin/sh
# base64 self extractor
# file: inc

#####################################################
## created with new.sfxz, with two modifications
mkdir -p src    # make sure the upper subdir exists
cd src          # go there to make the src/inc folder
#####################################################

filename=inc

# don't allow user to click on the file
if ! tty >/dev/null; then
  xmessage -center "
  $(basename $0)
  must be run in a terminal 
  "
  exit
fi

# create and/or clear out tmp dir
mkdir -p $HOME/tmp/sfxz
rm -rf $HOME/tmp/sfxz/*
base64 -d << _BEOF >$HOME/tmp/sfxz/$filename.xz
/Td6WFoAAATm1rRGAgAhARYAAAB0L+Wj4J//IkVdABcLyaeZgp21tguQY3KNJqRQnbVqQxuWivXt
0VxeQ7ZgFO9mmOmoJgQ+yM4y6fiYDTb68E0gEUQ/td2mIvXuOStQwTg+b6LcsJP0+ZofSG6a8eux
/Yqru5qC/8ATIegtIqh+O4ksWHc0C9/S55RGrchNdd73Yj78+SGQ+EMz1X537YO9IIYhVMpTyHgm
O1GBP54voVywHubX0SCvu8sPw6eUKsmyrtdxRBNdK97tzpXGa7OVLy3mZNIw2CDPcgradz0/Ipbm
T0BY2N5tC3hDZ1d3090MwsWTpewmP3Shlw4/kIniEPACa8FklJRwnnPlenzwtTru6aQON7mBM7fz
HdzksBavu3stuHRx1cJbRXBz0h9IDwho4k/6bGU8N0QQFAYg1aFdbGDLwA4aCRvPA58g/RrlqvkV
TXm15mTBLYSDUaZmGV33gUJMpi6ng6VM10Az4ycS3Q5wp4yFKG8oAdkTaKL2Bqy/8bOz+C/D/yCU
F+3bV5/76CO1sUfsMqNEdkpUYGoSCo1COKjFwXzQBSVOA1o0BvhEbFnfcFOdMjsLr2Ms/3PXAPQf
HYvSvYkeLMwTW/NUC0Ysq/o8E3MWkKSwd3oVxzP6T6V8GaorsCj+Ae3kuplKM3kHMgW3ZLiqPuWm
RuCFSPwtoUKnkOXUWeUwW+GwZllwfrMKDSG0KaWJdppUWPqImSBy9RNlnMvkaotwZK7bY//CoVhJ
wQ4jFWkr9tPMRNQIfrX22hIPUTEgGjTYMtGM8DJFylGhCLmHQbkV0EZmmQFDL0y5d+nS2yvn1Il/
lnZxgpam8XaAndH2+5s1U2rNACm8C5W8biyvmcEZNSJreylCHMX+M0FytFJQKPlyHz9Be5PlFiKn
SzOFzc3BpFfFtR8eBSlPF7/D/6nVWFHvgEcB4kHhVGPD7dpPFWwwNkGuWFB5Ux3FMkbB3lVhhB1A
ERHaWR/77DuiepXlxwaU6DC8PTaSmOqt+e0+MrECDJaZ6Ce26A/rCC6jbIIulL58wux2j4E1oVQs
Vq/77AB1/OfKxVKMP9juhTczXQCQ2FW9D6Ycd2ScG+TashgKFBNLU82LiSeABS6F1r2STHSQQpj5
OUoeRfzukdnzCZ7sFUwThIWwQnJhkg2bR05PkAkjzH4MD8JXHDFf/FNHetkpjyHP6GtzA65f/Fli
Jidba+CcsNNi7me8A4aYRXTEDNsfZyhGr/p7J/t/KgWENvzxcsx7+dznfUWoKMV0NBvAeWrhLD9M
/K1xzviEp2h7Qc+UnkCnd1Dd96WS/iwDboOc0ZPVWJ3sGjNRI52Ue9jUN6ZvtXXAp5Mi55NxulKE
mVttKFjqJv8Vs9/+QFHb22zJPQ4MwDmLiUhOuJHPMeJljwhPG4JP9QxAqkY/L6b+6bTk4gbagRkA
v6UQmTh2OA/ZeTJL5laJoUnvynMp2t4oYPygz3cD7HNOghzUB5AHWQdObx5CSNz7wJj0QVE6DtWn
wZ93e3Pq41+xD5XeKocLseX7RW7WCq4havKe4j8Elq5GKjM1gjCTo4XnolztrYnjNoFgZrJkO3jH
3Hx+5IbbCZjPhR3oWaW34v0lrpf3QtgAE5D4xm+BM4iY62k5YOA9jqmoMEFdZ/KQ2RyoGm/GW+hM
sMgj5aMDUUnsUsETaaDPVWG5RbpIWFOJ1/xuyYvepOrKqi/EZherlQ072Mgt6dzCVuwtAMvvMAkz
HNI0KN8nN+TxB9B06h0s2f1AvX52Nlewk/6T5KUvsu03aamSW3uYr/PaOQBNbo5q+/yLBPO9W3sG
dt5u1qUJXDoxstzh75p8LU+oXOFbmTSwRwYJnNOKB7bi9vgnDXSZOajn+yo6ZOZfT2K8qxbVMjIL
CBQqOV22b4R+5aFGYAfXfmYYalHzv3Tcbu85bEwXJK6IqpOFOoSOHvm2nYCZwa1qjpV4YUk7La12
dy0GHEgdaekJvC0jkZVNtvmHuc9L6HdqGMEhHAX9KqH3F/4SX9VLyUUj1GsDBpxs65uLSxn9L+YI
fif4IxCTY0LOasJmI02cuQ7G2i/VPlGnT45PKIWI2/cjWPpQYe/HjHJpnUGEKm4rDcpEEeKeJaLk
9YVEN1OaQWXqSpL1Mi/MFXi6rq9oIe6XKxnkGroREdFg/qj257CcF1PFMyPThORVX/YfA4xqFAkf
b/MZALh0sHOIxZVrq0f9WDRpjlbYRdKBgkvkyAWnnlUjVRsHhi3O0IBPrcMt1kNraO+GsnkJjQaP
aSIEPiVyAH0RrPKOHSCdqB5L8V8pG9v1jv2yrmJxUTkJaSpWkgDyZpl3nmU+WIGj9xW513hM7gG7
AAjtgX/g3X+ccTcBxjL00WmRgS+BwOBI1Ij64say3d/NpQC30HqHuiOHiaDqWVSyX05lJRry/TSs
/IsWEOX/hDZjX5ceEvMIPRWN1ETGyOHUOhv5aWMwtbLZ/8J6LHik9vaKeoety1oS8b9FBWnVjJg2
ZCMw8A242U0G014ukueqYaVhYscn4S/HyrzmUc2hLFNc1ejzrGHbdmOj3iuj2wvBp/Z2l+GLnwF/
LxnvdbuDA9gHq1p5ln2c5syWTI6+C3bdA0oc1ZCd3k7wQSekPhnU6F7X8/o43H9sgI0ZJg75UOvX
CympbLs9zVM3WYqM6QfGxxrTGROFLGNha4CQqmASdM5iZPPTgHybQgVnJqPiuSpZ/D7wAWwa4jUb
25tjFpG/Y5Bd39owGG/0EpKosm2ux8NjLRj1YE5LxdIlWJj0m15LCuXil+nDCZnSeZusF/Rp9VaE
cSB9NEcm4iGWU6OW4J7wNw6JEbYGseYK55va7wKg6ey+yHl9Ffromqybvas3mI0Ydt75t2M7Hwev
APABBW9ywVe0eqHSEHOxdgNePM/uVT3up9wkX8c/tEaxnM/sP5L0kEiHkLnhZiictb5XWBIwP4lP
OeXNGk8uVrafdLnqYT6bsX7IVzS6RcvZFY6UVZ8vL3X5X/T1QwH5OFXzXWqxYtR5tooab4iwpwVp
+0rdDc++V4+OaH0bOa2NzmPi5fFszVQResmeQqD1ElKSB8D6qKgI9h5xrYLdzAa0ZHMOF6qsfCjC
sNdLdS64157npEDjeE1BTDJ3sXr9f5v+y+cU09zfpw6GPFh1/HEWabCi+/zN6XCTKs11E497ezmT
g7SLtBAHeXCeSBtWuUZ5XJxzsxF+vMa7Z3VIdMM4kn4c9mFS0XZB5r1GvwbHYm6y++b3sqQWYHxp
4XvLFSNlr1SUNBkz9BmZi94dQw6uMEvFdyssXBGuXrDBx/v10vWBJVICkT+Aec97QhsJcAc4256k
IIHe/EcclVY2U/sJ1DU3m4mI35CrJafKLDwHEPR1UvBUpDl8sFQ+qroGsdWubJjhzUrkT3ekEUWj
qSU2eSTa9CiUC2PX+pJ7zJfVmInFKeQg8N3sVyYPbzTks6Y8lOhnXvTeU0cDLfpDubmTdUaIzEsE
FS8cq5ZIzIXiDEZu1t9mxAyfqolHZ+Ml/PS++IbnDs8Zw0AzXB01QdwGdRr3CrXtpE8IW9hf+K8F
hIccj4bxd8eX5Ih2x8da1jLz8PLMjMX6KsQ9aA8mCHJivDOJt+kTHzHjW+rZhsM4hO4GoY14SeOB
RAUvCiZRYYi0mCt1X9LQuSlZee+112hkUtUoiGl7efY/di1E0cmviBYRZMb64hgeGe4JNsddHWcL
8rPAHnXIb1TDxZf4BEswJjUm1vW/5jghvWjm+d+AhTy0ZYnvj2gaNFmexWns0imilDdC5VEa8ZfY
soIU6+rnILZv7/jOxwwl6mBhNSHIkNwshDnAMSUGH5jlxyNDbUQ0xyjbnjnvag68/XlOKVzIolUY
3bYIMVA4OZdUZqL8Jr9DH1EpaeS7uApqph8krJJOCGCYZbl0GcSHbAr6hcbyEjQpuYQtNQIXUGJn
TVh1DyBuucl4pP8L4AZM55Y0Fg1fAfuphEVkiWgBAnlxS5HawXTjIrSUUZu3s4KxgXM7XHz8QQ+c
PUOSA+scJ2jfL2nByUBEm0DHCnuCByJ/LPISY8g6whNvrZoB1AHacTMoeg7iIl8BoFHityfLOUUA
hRLMjLBollxRRI7iUE+Iqw+gM9BX5uuzboZFdA5NeMo5u2LKt60nk07Z86yp2N2OppMTBQDqWrzo
ZnCmV09kulhyyi4+KAwr15U/Db8pZjN6tuB31n5ALHbmXSN2nqP5bsYhGuMFylkIUdxcbZ2tYHl+
VJ1US+jcoAuULTSWZ1YFm8DuUqqIjV8DINQaZJxSfosL3Sz2ziQnzVpLEtEdrFDM8K3OMZ9h3Zu3
2nuDVBnyycRUaL/9B+Ehl6pC3sYdMeaMEROrAEgPdZX1ElFxR1kdbmxHmc4wqFizP3gdDm91wPun
ULdx3Xt22bhyKz2KlE0Z+of+PvGq9SD0lcODIllW6DmUIKl73t1j5Wuh7sUCNDv8y/Cijgoowc9Y
cFOV0mV5IDHZJ4y2sJRLUiNp9yyGRx+KqFZ6MaC9EoqYq+FqH7rhUACv5bnkePr+K5iYIvb84s5f
oXYfEpS6DbAP3aYYOdAjQwtdnlOtON9/20cT+EZeryGdBGlxZ09QuXKB8LpwF7f1BQAe04J3obRu
8g47UJ3MO4cISv8OOsabHncN5ckR+HLkWIX6SkuIc1F4Jsga5TstP0xSUEF44L8Q7z+KL+ISgcJY
GM4o5HPjWl55Pw00ENh3V77T4QQl8P1Q1SKmMo/KusZ3L0I/LWcqCNPJ9uJI8k2S3gFHiKHIwvQl
BfMk+R+OO43XQlGNBjidhhNHo9f3q5l439UOZE/k8Dy43rKVQP4TbFu92yqHhf4vbfyR+ofg4e/N
aHtcdt/7rbvh7/2+wGri9XbcuqaOgNjmYfMkfConE8r+yhiTSCTRkW+ulDUBeHeb/se0ofJnmh2Y
j3AcVn4rtYp/DD2QNtvMKixq+MthuFXIDKDYVsxoOnS/EUhzuGlI5Nf0uRVFGLeyE4JDD1nHpkqv
7mzUc1U4K74o6FoEDq83MNxiQr9y5q7UfmkUhj8WFSvhazlKhdegzFuVzz+ezXLo6cSesehJkxIP
wkaBMyeVQREdZdAr0j2lRMpq3Pc3jOuYjiAedHnuSltVEwBEhSDIJqNvjXm/RTfv/Qqrn1Dec8Bl
ujzTZxk7vXr5VIWpFraLksnHg6zEhxOYAC/zJJ+xhd811sLmUYAsAHe8/wbYmbHHJWTTzkZKj+Nk
CWY+RwEeD/SWFCvYcV1bWYE0iZfFfOO5OkPJx9qdItU4MgwCzN3bthhN/fLT75iwaWhR3GvDC3Lk
5sR3wL3sc5whhNnWaPsbsxHuAELfnByV0dbLChCj/ud2ycQX7KDQAiWMqlfW2WP3qFBkbJOVo4Ev
wGoFWD+Uof76XUqUos7MNQDtzmVcfsgrL3f94ge5jbszJ7SStGbPvLqHm6ba05b4nINgFQyqudBS
V6Bf9DIXj+rERhd+hgkdNJH0UJrWRHswP5bd2J4vtKe7STY8G0wHTcNaxaWofkQKUUdPO3Vf/3J4
mWvkN5Gk1hAHouZe83pL+2qQpDwe4Fp2VE87ld9bkGnik57Iaao+K1ObF1q6LchY2obW5robY6yh
Hk1WfBDrd6tHnIRr4I4HQC58QSr25eXDcS0POoNfYnsF0OmjLPKEcIKp1JmIcsaqA/9pJrcDVz0X
p+VCmLg/mK8xr/ekKLaL6KLohEcUPasd0K9BQ+4PyObD+ppcVFd4Ju0Kc96vptL13Ac98EsPLZos
ERqlNqeY604r5/aGZ0NBuymssXY4Ju5K5zznECLZWIv8NlkdLW577rQNcdUpEtaIPOlzbLPo6JV3
cDa7jD9s000iKUThfVMYkszF67GY692XdezLrj/2zhsfOTlG+1G79pdAx0bGgi4PFg+cecSoK2Et
OpNYw94uWaS5xGKAOW3gIPdwzFmvkDUipO54LUXXRJq1Oapys98jFqKYClH+lMDEW4yBAwYxhgYu
ldg/tMC6egUMqVIpUrexpHbkyy9xuqR6KV86Hyk8Ili3YU3B0qes81WVMKOOd7/mlhhREWcIkkpX
pcT0wkDB7jZGaXLyaDnTYYprUCp5xkNxYC2yYDPDZ/dK0j2QFl2HXplITMG4/QwWpickSgXOjP/c
EltQMNxCaa+qp5EYThEiCeYUlznighIRlfcRhixHsTx9cyw5gCcu59IkTauCRSlyFulTf/Q51Fjq
9SCseHo2PppY2PEQQWTKjkGLSHb2pRWCrQLcXkH7X4JOo+L2xfSXaAh708ThY+HKX8BMQ2T1+saA
ggBSEU1EL6oYa8NuOmEBc2kI4vWBQbzkKUcbVncG9wXU3BL3uZzB5FnhFkH3Ejc2Ry2my8BDBdRT
vYcEC9q99pltIoOyiFZfblPULOG82YDGl9r8adAMP10bINtxVvkEFExfglwpkEzlN4ayWfqyqeEq
kvNQN/Y606VWu/kuyZIVuVb+KbTgHh+NTGtV91N1hvrbedJt/GWF0JMjKqga021xq5C1l6AludK+
thLgAMwCBJlnCxC9dnS0fck2Y4vfeWVyBUvX4579aM8gE5vHYE9C8iSSWKJg8afJw2GZFrOo95cH
0kZP7/8H/nWPQ9aLp+0eX77o/4wuXxX2vgdZ+HeHxDK8u01EO0x8gibbG/A5apXW6iQ/f6sK8b3T
WJRVwwR0n95fyEPOE6/abtMqj/hZp6PHF/Ecq6mF8plShIHOsmrtEzdOqZL3Oii26DxtWsa0Wi4n
z+fVrY811GAooUUR/X1FzBLr6XHuTzXdOwbfaoTtbMatstALFuciTsoDYniM80ZZvnYergNNGFMJ
0P5dyNhDbFHqum124vn3nB6fnE6nb7ddUpB7gTzOtEfLYJ7tfoxqyajQDHbVFOv28viK2d2lcw7w
RfHxS7r/HXhv+guENHibjiqBlRjBeLWDObhuKgl6Cg7Dt13zcY4yIgyTM5ZtvwV/KjDq4kudtA0E
Z3c9tnAoOOPnUrj4L7wYMRxuri3zDyr+0BYsnIZ90jEr2yEyVMv4IHEn6p1GBdeD8uL0y1eaztfi
aITK23s3l9whgPWRzE/djbO3gs1cgS0iMj09R7qmwKNcWvuHwe35Wvu8ZcQe8DlQj743CJVcL1ed
jF7nVJh6k5wshoCvr9/bLLYGA2x0Dv0YJIeY6LIb/ZlMjYpkmGZeBWEx5dBvGUOuo/HZ4OxaoOHC
SwFn3VotYLrTJ+sMNbr+oR2eHQB3p1bl7x1EJt1BpSY69IOrEmrrNJfM+UPdr1lCSAXUwuAw+tFb
e42GDYDXD9SdK0ztaYJkOfkfp2Ha8gxA0nu3DBIdaWoeVFtnGhfb3uxfO6P9AYAuKNWS7haewEYP
7C4t6PY8Mu/XS1CdY9OYvbfoyHcRrOuooRX6b/B49mBOywMx4ySOoef/xvtARNBmBo2WYrUU0DtE
GszlzaVN9+EekGYhrJT7974RWxRYTltlL0QR4Vygz7doONYjR8JytFseITuNwoPGbP2ZZRLDFDZv
Gu/U3Ic0brbWnvlcHBWo4x9H8/QVYVjN5J/a397m4+veeQJW0Uux3OxipULe18k8kF/+44BF0+EO
g3O2gzUW1oFQXBk54XDwDEEqdVKtDfTGZuIkTQUI8bx7bUBcwIAGEhvjcKE7dQfshkhuuyBXfC+b
1k9qkXL1axgXOfMTY98TFvfNMRwvqwsZO4hN8M1hSi7GdzZrM2JUvd+NEs+NDayLSQ2EB8BzuDQz
8Bv0ihFBJhcDfzXQ35VVbZoo2qv11nfYptOLqihoWu0L/T49U5F5zMIGi9fgolzV1MNW1/nH5LiZ
cKfq2vr3kxgLdAbg4/w3p1XDAmgIp5HjZGLtUW8kyvaGjGnNAOaly5uUx0qMrJFO/DeoGSLcwFnr
rxmybQVUNt8AAYmrI18eMTqhU0xkrEU+PL8G6VrzXIbEgYJu4Zr30QFvR6KrLN6nJ+bpxWzET1iq
2Mt3bCCDcNff5Ojw7Sv84TEzOPLt2K7+2OnAzqtMLQK3v+yUx/sYxVysy/QJCr9MI3cNaNUwW4EA
Sb46bXaeOMD4qzblwkWXxtn0fa/pBNqrRhOpND8WI6R3OrkizvaoBP8rDRsmuMIwKIdoq3TxnrEP
KO7lzmpdNsk3OX7DhI0OuwVDKrGefn+jDcDZkxMI/aKfCr+P7upUHsbWUi+/2vLNvEmKAHK4MCpR
HnV10H7pw6pwE97LrlYmnLwKaXniUa00pDt2uje7I6mhX/L8IgvsEI6KlOxlHccmhrSQwXWESshc
Snu7HaGJTOzxjGMuSgua+y5yBrvznqYJv5LXnNEYjWnJaCCURdC+vba6cO2X+ZWUstlhh8bW0q6P
ZmtcWe2Pmc/xuAPyTp2knlFyo/IhnWnYnW/ZOT6IXZkAS8vsWts68lehvBTz1ll7/+Ku6u6PDsto
HBYs3fy2dVUYmavQuLp/MSh7bduXdjUZfNC8e+BzQMKcYQB81ldanBm7jQ6h8M6j4TNLXaUOf/m0
5aWB3S5m2slj39NZsVpmAA8yfd5revOuZnVQV8tB6ITgsR9BOlCAG7wnr5FXSUUAbv/XL5QNP2Fv
67ZOqYOd0kYtG6sGKeUlpi7x8yWjj72GyLhMNjaJToX9Qm7CDvlzimYA0Mk3IXGfVVYDVkKyuoWG
PUrOHaA2c0E9Ghx7ABF3PQ1jPt18hxEdsUFuQzldNPQcmn0x6dJIrYkDTWqNDIxekYpiQZ5cc8Fu
jlEPwHp1WQLxKro4moOvgaGSl9pCw6Gy4MNDouqOVaRMxbrdT44W67Lydtv5NtQmk8eCIvTN6IAR
UCoogatGvQXs9oAGm/mLCPxunyjfHPlK3qfzSUR/Bgt5iuQCCnWFfxAtxZfRa3f//07hxccLF+11
Zz2ExF6YPQ9xOFmxPJd3hPuBFbpM99MMWnWnhAFnzvqkkae06LbXpIqz6GG8VLjQGxu//bp4M/wy
Tnxyt7DS29apf75rnHhpsfL0ePllgSo2xhJ2NBCaedB8XnqvMgUlLjEE5jtkUFdBUZLWMthjiArA
H2Jb0z8+GsycrRk9WyrCx5CVnYZ9Qy+CIJ70bIJ3yW/5wHArouG2vPrAjINtIGuwQDtbXPnu1Vao
hHGng4Tnt+psNHJb5Z9hJpz4t1vM1VW7OemT1rB+5EIHr94VS20SH3GNYjskIbuXoinGYM7VW7ON
HgNF09oK6xTqS6Ocn69xlX/xBnpV4sA0q7cOQ1a6oEYxr6jh8ZkrzOxxjfXyOd8Fq5FZ8QmRBaPS
LPFLgcidiNi3r0Mmy2btFyR2pL0em4JeKDtt8FPtLBt/d4bH7iHxuNL3o5xVtAVSX8hxfkU9kTem
ZJZKgxawQmaHjgbSEQUIlc3KGjdNKhKEGTwhAWPhBJ35779Xxq+Kxgh1S4V91J3kfL9p3/+jqGCu
sKgfdeIQ84edrL4pqS1+fdnsZldg8UtI4eX9gRaCwWvmBE3ds0nbaYBkOQGE8mRi7x7XrmneftmC
riPSQJcdK89B5QPRDg5TiAzwH8CsLj3HIcytoEHH+QBa8DylXuO3ZL25qeh1BSx9wJGMXZWnE13D
DUkGCqKO5je1ONwnHt1w1lyN9DhRRSx8SfGGxjXfoMO8SMKBLXeWobbhcYzVGje3aQHXrG3mCbTN
uikbYzTF4EqO+3UTyaAwnb63zrQQYbwSBEXHFxgY+xcXmWTimUj5Yf7W4NYdkbm7ZAfvTHhL+ikI
yk9PtM/NWogxHtFpGF3T9UKwdk77rfirbecCqsa9rzM2hhAw8BNs//T8p7QbVRs0I4jSBJWQIV0P
+BMeEU93sFwrw4bnyRlavEcACfRv9ltE+qsLFCSLmjcS40k77qBoMPp9hnNDSwp0N5B9KigAwbUr
kQN0GGteYWeHZPctUcJ0E2ngzFjIEF0IsEy89w+1ZODkcLPXiW42zV/MaMPC+I0/tmmWOaV9eYVi
9xXdwFqzLdqVyLA6n8hNFHaaieh0xYUuNmpffn+N8+35DicoV0uvhPq3a0n8XMv7FIwFFygFNnjy
Cid8hDsTga7IkaSQ8vYHmgOKuEcIA5lifeRaSUhPCpEBs6WTFIFAj3rngYvqVn9IZEj9snmGFvLO
NZ7AbihvYSWWy4mikZwguE7lLBUxzi6kaNs64NHbyav+zwOS2/nSbCUmm9ovqR4pdF88zIRoG5Fn
pAHuTIfiK1eGgCL1JoIiTPZd0ki6X4aNpGaUiilVY8eqCa9dD8XcQjwARApg4nft6wox5JAD6V+G
yt7A6HlcmPgYymKb3PYIFzktr+zhyxNlK0qccdtoYaMDl0ydKNZKVmwcBWodgF6D9X3AloEK67jv
lr86dCrMthNxFQKerEMSG3rkGSBrZ7unJybE3bTJ06R9US2LH3wb7CW40jnVLNJ2jdX7kuCXFswI
H4dI8Kd7TqR87h34jfdfUmtJ+UmfuTBHaNmoXApaQHgNG28/v0YcEdtIVeTuzo6k8gT7goB6qVsY
q35WKSEfGRoZWmUaCc4Z5rVfmsnCqGumPJ46Z4JK7gqvgSN4KtXoiyIYOFqzuj426mQIgmuHs8bl
hOSEuYwAYTIY/3DaBCp2cj4pzfhRrPqWhiEIXDZkbpsCBvAa9SRaE3Oja+aTcdNIqQ41AWJzxRnQ
BVZiz7jcUl5M+6f+4QBfYEFtNXeHNDssakR73S44E9V8TG43m7fPxre3FkCoqzE6U78IKuIwtQua
U15T7TIDx694UuJbHg15atOcaOKLc2xrDD0ut3eEVTQxyGo6pyVEozasYYB3bWJyp59FASQgLMfY
pJ4jfY8P5gWBfpEOF5DqK4tYqrZDAOmAdGfUp+55yf04TDPiPO2r/nIslKXHNBWPnAElQirR1Zss
AV+3H/yMEOQU1ApTycwMk3T1HpkSQFzSIsGmbEuncg4qiY7qCjUysfuFcqXkZbw5nnzLXyocraEH
rQPQz8VRMAGE8LqUOT7wtvy+3mqowPFsBsrv/gEiQysyIF7d0anG1DxjD1kLDzt8DNva1kHsvHLa
cKAHV5bwiJAqHoVVPtTwYh6DPhX7i6f+W0qsChOL/uu7Ar/xQddDgPWf5pRk2CSpsVkewJk8CefU
wVgKpiHWXXiJliA/MlNof2283lqvKHdxt5mUDFOOScBc+tTsKb1+7vZgSs/pDrVMxUQPUKmu73ca
V0yueZfa5gxD6/6xOhjgnTycUNnGhMzjTDO672q23QlPhIgUoytsmPj/2CHhGR+qChLOYBnI9KzQ
PsWtZadSf+FHRcc/1nSiuayeguANipU73yiauPP0934fZ//HEHK3XVly0AYhEiTxmOlHmnIdq6jK
L2QZOt/LrrJubwQMo7E7K8rcek9ed8QYwGVSz7bFh7cMubB0x9mCd5mVKj0odRdV+ubkfV1lteIH
9DJuaD46IvCqZWElNYnSKYaTEA9LVsji1MME4Pvoq2Asl65PPQaVB2eomcr6AQuOSpVdLVVZAVB9
Yruc70KQpeHi7YY5QQW+hg3kP46c6B6lT5cIrogqDnvcWry8Ee8G1fxfFMJHK337Utz+Dz0KD59t
LdBnSltxS8Pj556jjcn9gsCe/edVOxEH+R4IlcDTN6Nh91N7lQ3A13MgpADJ4/yvzKwDXt9r3XvV
U9KLr8bwrr4Jvcf9MJnVCpXY+9ZC3u04XrARVIZTx8wH11tJKgBU4NfpBsgxIaob3AIivABJrVVA
S49UE+jRvzh4w4+RiFNuEnXHHOmAOt2YigAAAAAAU0l64+YP7Q8AAeFEgMACAIqg/v2xxGf7AgAA
AAAEWVo=
_BEOF

(cd $HOME/tmp/sfxz && tar -xaf inc.xz)
rm -f $HOME/tmp/sfxz/inc.xz
 
cat << _BEOF > $HOME/tmp/sfxz/post-extract
#!/bin/sh

is_yes() # returns OK if first char is Y or y
{
  local key=`echo $1 | cut -b1 | sed 's/y/Y/; /Y/!d'`
  [ "$key" == 'Y' ] && return 0 # true
  return 1 # false
}

if [ -e inc ]; then
  printf "
  Overwrite existing file(s)? [N/y]:"
  read key
  if ! is_yes $key ;then
  echo "Aborting.."
    exit 0
  fi
fi

mv $HOME/tmp/sfxz/inc .
_BEOF

sh $HOME/tmp/sfxz/post-extract
rm -rf $HOME/tmp/sfxz/*
rmdir $HOME/tmp/sfxz 2>/dev/null || true
rmdir $HOME/tmp 2>/dev/null || true
[The code for new.sfxz which was used to generate this self extractor is available here at the blog. http://www.linuxquestions.org/questi...c-tools-35216/ at the bottom of the page.]

And here's the makefile definitions (mc2.def) which will update the makefile to add new files as we go. (Type MULTI. See the help in mc2 if you don't know how/why this works.)

file: mc2.def
purpose: Makefile definitions for automatically generating makefiles
Code:
# mc2.def template created with Makefile Creator 'mc2'

## sandbox path and other new variables
PREFIX = $(HOME)/usr32
BUILDDIR := $(PWD)

# output name override
OUTNAME = MULTI

SRCDIR = src
OBJDIR = o
BINDIR = .

CC = gcc # straight c, -std=c99
#CC = g++

# compile function overrides
COMPILE = $(CC) -m32 -c -o # COMPILE <output_file> ...
 CFLAGS = -Wall -g3 # debug
# CFLAGS = -Wall -O2 # optimized
INCLUDE = -I $(SRCDIR) -I$(PREFIX)/include -I /usr/include 

# link function overrides
LINK = $(CC) -m32 -o # LINK <output_file> ...
LDFLAGS = 
LIB = -L/usr/lib -L$(PREFIX)/lib

# additional targets
#mc2-update:
#  @mc2 -update

semiclean:
  @rm -f $(OBJ)
  @rm -f *~ */*~ */*/*~ */*/*/*~

strip:
  @strip $(MAIN)
  @make semiclean

clean:
  @rm -f $(MAIN)
  @rm -f $(OBJ)
  @rm -f *.kdevelop.pcs *.kdevses
  @rm -f *~ */*~ */*/*~ */*/*/*~ tmp.mak
 
# Note: If you install into PREFIX, make sure to include and link 
# against your working copy so you don't accidentally get the 
# installed copy's libs and headers instead of the ones you are 
# working on.

force: # used to force execution
[mc2 is in the libLQ-qt d/load here at the blog.
http://www.linuxquestions.org/questi...upport-34783/]

To initialize your first Makefile, type "mc2 -init". After this "make update" will add/remove files and set the names for the output files.

Before we start, let's take a quick look at the headers. At the top of the parser2.h file we see some 'defines' that we can pretty much ignore and a couple of typedefs that are really pretty experimental at this point, and the main parser object that is the target of our debugger 'watch point' because it shows the state (only two states) of the machine and the positions of the input and output pointers, cleverly named "ip" and "op", which will be what we call them when talking about them.

Once you're watch point is set to look at 'obj', all the fields at the top will be visible and this will make a WHOLE LOT of sense!

The next struct we see is the PPTRS_OBJ. This too will be visible in a debugger but it's handled by SAVE_PTRS() and RESTORE_PTRS() and can safely be ignored unless you have nested them incorrectly, which shouldn't happen too often once you understand what the macro does. (This is based on the concept at the savanah non-gnu bnf project code, which has probably disappeared by now, but was a hugely valuable experience in parsing and grammar generation.)

Below those we see some utilities to init a new parser object (always named 'obj' in the debugger view) and a few oddball features to deal with errors and placeing user-supplied text at the output or for getting text from the output back to a user supplied string buffer, and the rest is almost intuitive.

There are three types of match tests. One is the 'is*' type. That just reports whether or not a pattern at the input matches something predefined or that you supply.

To simplify customizing the parser with new is* functions, the builtin 'is()' takes a string as a parameter. It's barely anything more than a memcmp() so don't expect anything brialliant to be going on behind your back. Simplicity is what we're after here and the only complexity is due to necessity of being able to get our hands on the innards 'simply'.

Besides 'is' there are a couple other things that are often done when a match is encountered.

1. is* just reports whether a pattern matches and sets and returns the internal state.
2. skip* also sets and returns 'true' if the patthern matches but can be used to replace (or translate) input patterns to something else, even binary bytes.
3. copy* just copies input that matches directly to the output.

There are a few other tricks that will come up often enough, but for starters, the new vocabulary (besides fairly standard C stuff) consists of SAVE_PTRS(), RESTORE_PTRS() 'put()', and the is*, skip* and copy*, which we hope will be intuitive enough that they can be easily remembered, but if not you can create your own quite easily as you will soon see.

Also, before we get started, as a general rule, the simplest parse block to base the others on will be the 'copy*' type. There are exceptions, especially for simple pattern matching, but in general you may want to copy the whole thing, then undo the parts you don't want o save. Here's what I'm talking about (in parse2.cpp).

See
Code:
is_cname(),
skip_cname(), and
copy_cname()
Since there is more than one part involved in this type of parse (the first char having a different 'match rule' than the rest, the easiest (and not much slower way to handle the common task is to copy first, and then undoing what you don't want to save.

As it turns out 'is_cname()' can be optimized down to a single simple test, for 'is' but for 'skip_cname()' it's easiest to copy and then delete the output and just return the flag.

[Note: The obj.op pointer is absolute, so it can change when the buffer resizes. The *_cname function uses the 'safe' method of getting and setting obj.op, (op2offset() and offset2op()) though it's generally safe to us obj.op for anything that can't change op by more than about 4K. See the memory reallocation test and routines and see where they're called. The source code IS the documentation for this parser.]

BTW, the cname functions only apply to normal "C" types of names. Underscores and alphanumeric are legit in the body of the name but it can't start with a digit.

Also, if you are writing a disassembler, an 8-bit flex/bison type of parser might make sense. Might. But for taking disassembly and decompiling it to C, I'd rather see what's going on, eh?

So: copy*, skip*, and is* are the main low level types we'll be playing with.

More (and perhaps clearer) examples of these three functions are in the parser2-numbers source code which contains all the elements needed to find, copy or translate (minus the method which includes sscanf() as one possibility) signed integers and floating point numbers with or without exponents.

Cont'd in the next blog post: "Parser Test 1"
Posted in Uncategorized
Views 479 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 11:05 AM.

Main Menu
Advertisement

My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration