LinuxQuestions.org - Java: A Conceptual Question

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Java: A Conceptual Question (https://www.linuxquestions.org/questions/programming-9/java-a-conceptual-question-589033/)

Java: A Conceptual Question

I am attempting to write an interpreter (not a compiler) in Java. I am going to invent my own very basic language and the Java program will interpret a file written in that basic language. My "language" is going to be called SPL (Small Programming Language) and it is basically going to be useless (simple mathematical operations, simple variable assignment/deceleration). What I am trying to get out of this exercise is a good understanding of how one might parse code using Java in such a way that it can be executed. Here is what I am thinking right now...

My dad gave me the idea of using a recursive algorithm that searches through an expression for the operator of least precedence and then runs the algorithm again on either side of the operator until it gets to the base case. The base case would be an expression with no operator (either a number or a variable). In the case of a variable, a simple method would return its value.
EX: x * 5 + y *2
algorithm (x * 5 + y * 2) ---> algorithm (x *5) + algorithm (y * 2) ---> algorithm (x) * algorithm (5) + algorithm (y) * algorithm (2)

You can see at the end that the algorithm has it its base case and the expression will be calculated.

My idea is to parse expressions into a tree.
EX: x * 5 + y * 2

Code:

This is similar to my dads idea, and probably less elegant, but I prefer it because it is my own :-D.

So I thought I would just bounce these ideas off the community and see if anyone had ideas of their own. Please, keep discussion on a conceptual level! This is not some homework assignment, just an interesting problem :-)

EDIT: The tree notation is messed up, I hope it is still clear :-(
EDIT EDIT: Thanks GrapefruiTgirl for the tip on how to fix the notation!

Code:



    +

  / \

  *  *

 /\  /\

x  5 y  2

Tree notation messed up how? Is this better?

Yes, that is much better! Before it would all cluster to the left side of the post.

Well, compiling and parsing algorithms have been well-discussed, and I'm sure that there are Java implementations. Look at tools (not in Java) such as yacc, bison, and lex.

The general approach that is most often used is one that is called shift/reduce. The parser considers one symbol at a time. It can either shift ("push") the symbol onto a stack of symbols that it maintains, or it can reduce ... which means grabbing a set of symbols from the stack and replacing it with just one. Along the way, it calls routines in the back-end of your compiler.

A grammar tells the parser what the language looks like, and it serves as a sort of "road map" that guides the parser in its decision of what to do next: shift, reduce, or go into error-recovery.

The advantage of this approach, vs. recursive-descent, is that the complexity is pushed to the grammar, which is simply a data-structure. Your code remains simple. Whether the language is simple or the language is complex, the compiler itself remains simple.

---

Incidentally, both compilers and interpreters usually work this way. Even an interpreter will "semi-compile" the source-code into something that can be efficiently executed. The representation might be a "p-code" of some sort, or it might be a tree.

Enjoy!

Thanks for your reply! Let me make sure I understand the shift/reduce idea that you bring up. So the interpreter has a stack of operators that it has previously encountered (let say the stack is simply two + operators). If the interpretor encounters an operator of higher precedence (let's say a multiplication symbol), it will shift the stack and insert then multiplication symbol? And then if it encounters an operator of equal precedence (let's say another addition operator) it will reduce all the + operators into one step on the stack?

Also a question about grammar. Using the methods I suggest, both the compiler and the grammar could possibly be complex. But using the shift/reduce method, only grammar might be complex?

Quote:

Originally Posted by timmit (Post 2911839)

This is a very large subject. If you have no idea what a LALR parser or abstract syntax tree is, I suggest you buy a book on the subject. Arguably, one of the best on the subject: http://www.amazon.com/gp/product/0321486811/. It is a very heavy book (in terms of comprehension, not weight), so be prepared to spend a bit of time w/ it. As for the other post, talking about lex/yacc, there are equivalents in Java. I've used JFlex and JavaCup before, so I'll suggest them. Good luck!

Quote:

Originally Posted by 95se (Post 2912206)

Thanks for the link! That looks like quite a book, unfortunately it has a price to go along with it :-(. Most of the things in that book are probably outside the scope of what I am trying to do right now though, I just want to interpret a very simple programming language I will design. Seems like really interesting reading though. I will have to see if my University Library has a copy.

I am not really sure what a LALR parser is, but I have an interesting look at abstract syntax trees using this.

Quote:

Originally Posted by timmit (Post 2912281)

Well, you're interested, and that is, by far, the most important part! I take it you are a bit younger then? I know this sounds nerdy, but I used to ask for computer books for birthday/xmas presents when I was a kid, lol. Take a look at some tutorials on lex & yacc. It should help you out (at least to understand the fundamentals and create something useful). Read somet tutorials, if you have any problems, write back on here, I'm sure some more people will help you out as well!