Java: A Conceptual Question
I am attempting to write an interpreter (not a compiler) in Java. I am going to invent my own very basic language and the Java program will interpret a file written in that basic language. My "language" is going to be called SPL (Small Programming Language) and it is basically going to be useless (simple mathematical operations, simple variable assignment/deceleration). What I am trying to get out of this exercise is a good understanding of how one might parse code using Java in such a way that it can be executed. Here is what I am thinking right now...
My dad gave me the idea of using a recursive algorithm that searches through an expression for the operator of least precedence and then runs the algorithm again on either side of the operator until it gets to the base case. The base case would be an expression with no operator (either a number or a variable). In the case of a variable, a simple method would return its value. EX: x * 5 + y *2 algorithm (x * 5 + y * 2) ---> algorithm (x *5) + algorithm (y * 2) ---> algorithm (x) * algorithm (5) + algorithm (y) * algorithm (2) You can see at the end that the algorithm has it its base case and the expression will be calculated. My idea is to parse expressions into a tree. EX: x * 5 + y * 2 Code:
So I thought I would just bounce these ideas off the community and see if anyone had ideas of their own. Please, keep discussion on a conceptual level! This is not some homework assignment, just an interesting problem :-) EDIT: The tree notation is messed up, I hope it is still clear :-( EDIT EDIT: Thanks GrapefruiTgirl for the tip on how to fix the notation! |
Code:
|
Yes, that is much better! Before it would all cluster to the left side of the post.
|
Well, compiling and parsing algorithms have been well-discussed, and I'm sure that there are Java implementations. Look at tools (not in Java) such as yacc, bison, and lex.
The general approach that is most often used is one that is called shift/reduce. The parser considers one symbol at a time. It can either shift ("push") the symbol onto a stack of symbols that it maintains, or it can reduce ... which means grabbing a set of symbols from the stack and replacing it with just one. Along the way, it calls routines in the back-end of your compiler. A grammar tells the parser what the language looks like, and it serves as a sort of "road map" that guides the parser in its decision of what to do next: shift, reduce, or go into error-recovery. The advantage of this approach, vs. recursive-descent, is that the complexity is pushed to the grammar, which is simply a data-structure. Your code remains simple. Whether the language is simple or the language is complex, the compiler itself remains simple. --- Incidentally, both compilers and interpreters usually work this way. Even an interpreter will "semi-compile" the source-code into something that can be efficiently executed. The representation might be a "p-code" of some sort, or it might be a tree. Enjoy! |
Thanks for your reply! Let me make sure I understand the shift/reduce idea that you bring up. So the interpreter has a stack of operators that it has previously encountered (let say the stack is simply two + operators). If the interpretor encounters an operator of higher precedence (let's say a multiplication symbol), it will shift the stack and insert then multiplication symbol? And then if it encounters an operator of equal precedence (let's say another addition operator) it will reduce all the + operators into one step on the stack?
Also a question about grammar. Using the methods I suggest, both the compiler and the grammar could possibly be complex. But using the shift/reduce method, only grammar might be complex? |
Quote:
|
Quote:
I am not really sure what a LALR parser is, but I have an interesting look at abstract syntax trees using this. |
Quote:
|
All times are GMT -5. The time now is 02:20 PM. |