For those of you just joining us, in my "Java In Depth" column over the past couple months, I've been discussing how one might go about building an interpreter in Java. In the first interpreter column, we covered some of the desirable attributes of an interpreter; in the second column we discussed both parsing and the layout of a class package for implementing the interpreter. In this column we'll look at running the interpreter, and the support classes necessary to accomplish that. Finally, I'll wrap up the series here with a discussion of how an interpreter can be hooked into other Java classes, thus enhancing their abilities.
Reviewing the relevant bits
Let me start by sketching out what we've covered so far and point out those parts of the design that will become more important as we discuss execution mode. For a more detailed description of these classes, refer to my previous columns or to the source code links that are in the
below.
There are three foundation classes in the implementation of the interpreter, Program
, Statement
, and Expression
. The following shows how the three are related:
Program
ThisProgram
class glues together the parsing and execution components of the parser. This class defines two principal methods,load
andrun
. Theload
method reads statements from an input stream and parses them into a collection of statements, therun
method iterates over the collection and executes each of the statements. TheProgram
class also provides a collection of variables for the program to use as well as a stack for storing data.
Statement
TheStatement
class contains a single parsed statement. This class is actually subclassed into a specific type of statement (PRINT, GOTO, IF, and so on) but all statements contain the methodexecute
which is called to execute the statement in the context of aProgram
class instance.
Expression
TheExpression
class contains the parse tree of an expression. During execution, thevalue
method is used to evaluate the expression and return its value. LikeStatement
, theExpression
class is primarily designed to be subclassed by specific types of expressions.
All of these classes work together to form the basis of an interpreter. The Program
class simultaneously encapsulates the parsing operation and execution operation, whereas the Statement
and Expression
classes encapsulate the actual computational concepts of the language we've implemented. For these three articles on building interpreters, the example language has been BASIC.
Facilities for computation
There are two executable classes in the interpreter,
Statement
and
Expression
. First let's take a look at
Expression
.
The instances of Expression
are created by the method expression
in the class ParseExpression
. The ParseExpression
class implements the expression parser in this interpreter. This class is a peer of the ParseStatement
class, which uses the statement
method to parse BASIC statements. Instances of Expression
have an internal type
that identifies what operator the instance represents, and two methods, value
and stringValue
, that return the computed value of the expression. Additionally, when an Expression instance is created, it is nominally given two parameters representing the left and right side of the expression's operation. Shown in source form, the first part of the Expression is as follows:
class Expression { Expression arg1, arg2; int oper; final static int OP_ADD = 1; // Addition '+' final static int OP_SUB = 2; // Subtraction '-' final static int OP_MUL = 3; // Multiplication '*' final static int OP_DIV = 4; // Division '/' [... etc for all of the operators ...] final static int OP_BNOT = 19; // Boolean negation '.NOT.' final static int OP_NEG = 20; // Unary minus
As you can see in the code above, there are the instance variables, an operator type named oper, and two halves of the operation in arg1 and arg2, and then some constants to define the various types. Also, there are two constructors that are used by the expression parser; these create a new expression from two expressions. Their source is shown below:
protected Expression(int op, Expression a, Expression b) throws BASICSyntaxError { arg1 = a; arg2 = b; oper = op; /* * If the operator is a boolean, both arguments must be boolean. */ if (op > OP_GE) { if ( (! (arg1 instanceof BooleanExpression)) || (! (arg2 instanceof BooleanExpression)) ) throw new BASICSyntaxError(typeError); } else { if ((arg1 instanceof BooleanExpression) || (arg2 instanceof BooleanExpression)) throw new BASICSyntaxError(typeError); } } protected Expression(int op, Expression a) throws BASICSyntaxError { arg2 = a; oper = op; if ((oper == OP_BNOT) && (! (arg2 instanceof BooleanExpression))) throw new BASICSyntaxError(typeError); }
The first constructor builds an arbitrary expression object, and the second one builds a "unary" expression object -- such as unary minus. One thing to note is that if there is only one argument, arg2 is used to store its value.
The method that is used the most frequently in the Expression
class is value
, which is defined as follows:
double value(Program pgm) throws BASICRuntimeError { switch (oper) { case OP_ADD : return arg1.value(pgm) + arg2.value(pgm); case OP_SUB : return arg1.value(pgm) - arg2.value(pgm); ... etc for all of the other operator types. ...
You can see that each expression object represents a tuple consisting of an operator and one or two arguments. The fun part of designing the expression execution engine in this way is that when you construct this set of expression tuples based on the Expression
object, you can compute the value of the expression by simply invoking the value
method. The value
method recursively invokes the value
method of the two arguments that compose this expression, applies the operation to them, and returns the result. This design was used so that expressions would be easy to understand.
To keep the class structure clean, all computational units -- from constants to trigonometric functions -- are subclasses of Expression
. This idea, stolen shamelessly from Lisp, completely encapsulates the notion of "causing" an evaluation to occur, from the actual implementation of "how" that evaluation occurs. To demonstrate how this principle is applied, we need only look at some of the specialized subclasses of Expression
.
Constants in my version of BASIC, which I named COCOA are represented by the class ConstantExpression
, which subclasses Expression
and simply stores the numeric value in a member value. The source code to ConstantExpression
is shown conceptually below. I say "conceptually" because I did choose to bundle what would have been StringConstantExpression
and NumericConstantExpression
into a single class. So the real class includes a constructor for creating a constant with a string argument and for returning its value as a string. The following code shows how the ConstantExpression
class handles numeric constants.
class ConstantExpression extends Expression { private double v; ConstantExpression(double a) { super(); v = a; } double value(Program pgm) throws BASICRuntimeError { return v; } }
The code shown above replaces the more complicated constructors of Expression
with a simple store of an instance variable; the value
method is simply replaced with a return of the stored value.
It is true that you could code the Expression
class to accept in its constructors a constant that would save you a class. However, one advantage of designing Expression
the way I have is that the code in Expression
remains maximally generic. I find this style of coding helps me eliminate the complexity of special cases and thus when I am "done" with the Expression
code, I can move on to other aspects of expressions without revisiting the base class again and again. The advantage becomes clearer when we delve into another subclass of Expression
named FunctionExpression
.
In the class FunctionExpression
, there were two design requirements I felt should be met to keep the interpreter flexible. The first was to implement the standard BASIC functions; the other was to encapsulate the parsing of the functions arguments into the same class that implemented these functions. The second requirement, parsing, was motivated by a desire to make this BASIC extensible by creating additional function libraries that could be passed to the parser as subclasses of FunctionExpression
. Further, those passed-in classes could be used by the parser to increase the number of functions available to the user's program.
The FunctionExpression
class is only moderately more complicated than a ConstantExpression
and is shown in condensed form below:
1 class FunctionExpression extends Expression { 2 [ ... parsing stuff removed ...] 3 double value(Program p) throws BASICRuntimeError { 4 try { 5 switch (oper) { 6 case RND : 7 if (r == null) 8 r = p.getRandom(); 9 return (r.nextDouble() * arg2.value(p)); 10 case INT : 11 return Math.floor(arg2.value(p)); 12 case SIN : 13 return Math.sin(arg2.value(p)); 14 [ ... and so on for all of the functions implemented ... ] 15 default : 16 throw new BASICRuntimeError("Unknown or non-numeric function."); 17 } 18 } catch (Exception e) { 19 if (e instanceof BASICRuntimeError) 20 throw (BASICRuntimeError) e; 21 else 22 throw new BASICRuntimeError("Arithmetic Exception."); 23 } 24 }
The above source shows how the value
method is implemented. The oper variable is reused to hold the function identity, and the Expression objects referenced by arg1 and arg2 are used as arguments to the functions themselves. Finally, there is a large switch statement that dispatches the request. One interesting aspect is that the value
method does catch the potential arithmetic exceptions and converts them into instances of BASICRuntimeError
. The parsing code in FunctionExpression
is shown below, again condensed to save space. (Remember, all of the source code is available using links in the Resources section.)
1 static FunctionExpression parse(int ty, LexicalTokenizer lt) throws BASICSyntaxError { 2 FunctionExpression result; 3 Expression a; 4 Expression b; 5 Expression se; 6 Token t; 7 8 t = lt.nextToken(); 9 if (! t.isSymbol('(')) { 10 if (ty == RND) { 11 lt.unGetToken(); 12 return new FunctionExpression(ty, new ConstantExpression(1)); 13 } else if (ty == FRE) { 14 lt.unGetToken(); 15 return new FunctionExpression(ty, new ConstantExpression(0)); 16 } 17 throw new BASICSyntaxError("Missing argument for function."); 18 } 19 switch (ty) { 20 case RND: 21 case INT: 22 case SIN: 23 case COS: 24 case TAN: 25 case ATN: 26 case SQR: 27 case ABS: 28 case CHR: 29 case VAL: 30 case STR: 31 case SPC: 32 case TAB: 33 case LOG: 34 a = ParseExpression.expression(lt); 35 if (a instanceof BooleanExpression) { 36 throw new BASICSyntaxError(functions[ty].toUpperCase()+" function cannot accept boolean expression."); 37 } 38 if ((ty == VAL) && (! a.isString())) 39 throw new BASICSyntaxError(functions[ty].toUpperCase()+" requires a string valued argument."); 40 result = new FunctionExpression(ty, a); 41 break; [ ... other cases omitted for brevity ... ] 42 default: 43 throw new BASICSyntaxError("Unknown function on input."); 44 45 } 46 t = lt.nextToken(); 47 if (! t.isSymbol(')')) { 48 throw new BASICSyntaxError("Missing closing parenthesis for function."); 49 } 50 return result; 51 }
Note that this code takes advantage of the fact that the expression parser in ParseStatement
has already figured out that it is looking at an expression and has passed the identity of the expression in as parameter ty. This parser then need only locate the opening parenthesis and the closing parenthesis that contain the argument(s). But look carefully: In lines #9 through #18, the parser allows some functions to have no arguments (in this case RND and FRE). This demonstrates the flexibility afforded by having the function subparser built into this class, rather than forcing all functions to conform to some predefined template. Given a function type in the parameter ty, the switch statement selects a branch that can parse the arguments required for that function, be they strings, numbers, other expressions, and so on.