Please bear with me.
1) Quote:
2) For the following: Quote:
Code:
$$.value = generate_while($2.value, $3.value); |
Quote:
Quote:
Quote:
Quote:
A PC doesn't have the same limitations. One thing that will be necessary though, is to have a solid definition of what the pseudo assembly language is. Even to the point of having a "simulator" to evaluate the results of your translator (it would help tremendously in debugging). I do apologize for not being fully consistent. The $$ variable is actually a struct entry *. I made an error using $$.value. The production rule for a while should be: Code:
...: WHILE expr statements WEND { Code:
struct entry *generate_while( |
Thanks Jpollard you are a real life saver!
|
Would YYSTYPE be defined at the top of the rules.y bison file as:
Code:
%{ |
Would there in fact be a need for two or more structure types one for
Code:
struct entry { |
Quote:
Characters, array, integers are all elements of the underlying language. They get an entry_type value just like variables, and statements do. One way to handle an array is to treat it just like an operator (add/sub/mul ...), only the entry type for indexing would be an INDEX_t. I'm assuming, for the time being, the grammar allows for multiple dimensions. As a bit of expansion, an array (declared as A[M,N], where M,N is the size of the corresponding dimension - actual allocation calls for a block of memory of size M*N elements) would have an index operations done for a use of A[x,y] such that A[x,y] is equivalent to A[x*N+y] (which happens to be why the original BASIC only had a single dimension for an array). This works for arrays of strings just as well as for integers (using an array of pointers to the string value), and does assume the symbol table entry for A indicates the data type). Please note - there is a bit of vagueness here. It is entirely possible to just have a tree {INDEX_t, NULL, ptr to var x,NULL, ptr to expression list} and during the interpretation/code generation, use the parent entry {VARIABLE_t, ptr to symbol table entry for variable A, ptr to index tree}, and extract the M,N values from the symbol table as needed. If the "ptr to index tree" is null, then there are no subscripts involved. Note: "ptr to var x/y" could be expression subtrees. There are several ways to represent the indexing in the parse tree. In the above, I'm allowing for "expression list", which can be used for either indexing (for example "A[x,y]") or for function parameter lists (for example "function_name(x,y)"). HOW the list is used is up to the parent entry structure. An array reference A[x,y] would be broken down into a tree like {VARIABLE_t, ptr to symbol table entry for A, NULL,{INDEX_t, NULL, NULL, ptr to list}}, where ptr to list would be a tree presenting the expresion list like {EXPR_t, ptr to expression subtree, NULL, ptr to next EXPR_t entry} would allow for evaluation during code generation. This directly corresponds to a statement list (same type of BNF rules). It also happens to allow for as many dimensions as you want, just as it allows for as many function parameters as you want. In the case of a function, the tree would look almost the same - the difference is at the top of the tree. Instead of a VARIABLE_t at the beginning, you have FUNCTION_t, and instead of INDEX_t, you have PARAM_t (parameter entry type). If the function call has no parameters then the pointer would be NULL. You could even have no syntactic difference between an array and a function (provided the function/array definition appears in the program source code BEFORE its use). This would make program constructs like "A(x,y)" be either an array or a function call, as determined by the symbol table entry for A. If A is an array, then it is indexing, if A is a function, then it is a parameter list. Personally (my opinion) it is easier to read code that does make a syntax difference. Visually the code "A(x,y)" would be a function call, and "A[x,y]" is an array without needing to hunt for the declaration of A to find out which it is. |
Would YYSTYPE be defined at the top of the rules.y bison file as:
Code:
%{ |
I believe it needs to also be included in the scanner so that the scanner can generate some entries for it. I tend to put it in an include file that can then be included in both (this is mostly so the scanner can add an identifier to a symbol table, and return an entry structure that connects the parse tree with the symbol table entr).
note - the structure as it stands is not quite complete (it has enough for things within the parse tree, but things like "ptr to variable" usually means it has to be pointing into a symbol table (which we haven't discussed yet). Now it is still a pointer, but as a variable there is a different structure that needs to be defined - one that takes account of initialization, number of and size of dimensions (if any), data type (at a minimum, integer and string). And then there are debugging information (what source code line it was defined on for instance, has a value ever been assigned...) as well as how function parameters are handled (wouldn't want a function name used as a variable...) |
Fot the time being I am eliminating functions just using gosub with parameters passed as global variables. Do you have an idea as to how the structure entry should look? What about:
Code:
struct entry { |
I don't think vst or vtype would ever get used - the grammar itself specifies whether something is a terminal or not. The entry_type field would have the same (or equivalent) values. As for the parse tree, most terminals (and non-terminals) are discarded. The only terminal symbols that would remain are to variables in the symbol table, or a constant value (whether char, integer, string or whatever), and these would be identified in the entry type.
Identifiers are terminal symbols... but they are already identified in the entry type (which could be an int rather than an unsigned char if there are more than 256 different values. I thought there would only be 25 to 30 different entry types) A symbol table entry might be Code:
struct symbol { And the field "symbol" can point to any string at all. Which is why it can also hold constant string values (even if that string is a sequence of numerical digits)... With that consideration the only way a variable would exist is if it is in a variable table. Now this would allow a entry structure like: Code:
struct entry { I don't think you want to completely drop functions - I suspect you are going to want some built-in functions (like substr, index, ...). They aren't that bad - the most complex thing is that the defined parameter list of a function declaration becomes a "symbol table", and symbols within the function body are first looked up in that table, and if not defined, are then looked for in the global table. Mapping the parameter values to the code becomes a relatively simple task depending on the target interpreter. Most languages even have a table for variables defined and used only within the body. The complexity is deallocating it after the parsing is complete. Some just drop it with the expectation that the translator will exit - which releases all the memory anyway. |
1) Where are the constants stored? - I will be using a signed 32 bit int, an unsigned char for a character and a pointer to a string (16 bit).
2) I have never written a bison/flex program. Would you be interested in looking over my code? I could pay a reasonable amount for your services. If you would be interested please contact me at schmitt_alvin[at]yahoo[dot]com . Thank you for all your help. Alvin... |
What had crossed my mind about constants (both numeric and string) was that these could be stored just like other symbols - in different tables. That way if the code used the same constant it would be identified as a duplicate, and only stored once.
Storage would be just like a variable name (a string) - so each "symbol" table would accumulate those constants - symbol names in a table for variables, string constants in a string table, and such. Besides the elimination of duplicates (more significant for 32 bit integer and string constants), it could also permit more optimum storage in the target machine memory. Integers (of various sizes) can have their own alignment requirements, where character strings don't. The various sized integers can be stored in the same table, but when they go to be created for the target machine code, sorted according to the storage size - all characters together or put inline as duplicating a byte (even if extended to 16 bits) is smaller than using addresses - so all the characters would be together, and the integers together. But it is just a thought. I've started throwing some grammar around again myself, and may make a translator as an example. The target machine interpreter is also something I've done before. My old one was implemented as a protocol interpreter in a driver for a PDP-11 for a serial line block storage device (a DEC TU58 tape cartridge, which worked like the old Ftape drive - right down to the same block limits, and cheaper/more reliable that a floppy at the time). As for #2, you have an interesting problem and it has gotten me off my duff to poke around at small languages again. I wouldn't mind looking your code over, if you think it worth it, lets just see afterwards about any payment you think reasonable. Otherwise it's free. |
Please contact me at schmitt_alvin[at]yahoo[dot]com if you like. Alvin...
|
After the parser has processed the language and created the tree I guess the tree is sent to the mcu for processing as a pseudo program (interpreted?). How would the tree be sent to the micro? By walking the tree? Should the STRUC ENTRY tree be in a predefined array as sending that array would be easier than walking the tree? Since the tree is defined on a PC with infinite storage (!?) a very large array (larger that the total 53kb of ram on the mcu) could be defined and the STRUCT ENTRY allocator program would know what storage was used (allocated linearly from 0 index on) it would be simple to send the array as one contiguous byte stream to the micro. What do you think?
|
The tree is "sort of" sent to the mcu. It isn't in a suitable form as it uses local addresses, and a rather large tree structure for the program.
What should be done (still on the development PC) is that the tree gets translated into a format directly related to the target machine. There are several ways to do that, and it depends on the target as to what form it takes: 1) machine code - the parse tree can be converted to the base assembler code for other development tools (such as an assembler) which make the final creation of the machine code. This usually has the best runtime, but sometimes can be a booger to debug. 2) instructions for a virtual machine that actually runs on the target machine... This approach is like the Java virtual machine - you have to have a virtual machine program running on the target, and provide the application binary to it. 3) an inbetween mix of the two. This one makes heavy use of runtime support (if the target has such a limited stack, then function/subroutine calls are emulated by the runtime, as are function/subroutine returns using a software implemented stack). In between the emulated capability (such as some arithmetic operations, string operations, input/output) then the code generated is combined with the runtime (either during translation of the assembler code, or having some kind of linker to do the work). #3 is likely the best choice as it allows a minimal virtual machine, plus the advantage of mostly direct execution. There are two forms of this: a) a threaded code virtual machine - makes code generation relatively easy as you get to avoid problems of register availability limits solved by the virtual machine, and the code generation can be of several different types. b) a "run time library" that defines things that the hardware doesn't provide (a general stack for instance - runtime routines can emulate having a fully functioning stack of nearly any depth, subroutine calls/returns using that general stack, extra arithmetic capability (multiplication/division? even floating point is possible that way). Of these two options (3a and 3b), 3a is the easiest to implement in an emulator for debugging purposes, 3b has the best speed performance. I don't know how familiar you are with threaded code interpreters though. They can have nearly the same speed as direct execution, with an overhead of one to about 4 instructions (it depends on the specific CPU target... A PDP-11 with a 64k address range could do it with a single instruction (it had an indirect jump with an auto-increment register addressing mode). Others need two to five instructions to do it. |
All times are GMT -5. The time now is 07:22 AM. |