Writing an Interpreter Thoughts on Assignment 6 CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, March 27, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks ggchappell@alaska.edu 2017 Glenn G. Chappell
Review Forth: Details Stacks A Forth implementation adhering to the ANSI standard is actually required to have four stacks. Data stack Holds integer values. These are also used as pointers and booleans. This is the stack we have been dealing with. Floating-point stack Holds floating-point values. Return stack Holds return addresses for words that are called. Locals stack Holds local variables. Why are the return stack & locals stack separate? I do not know. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 2
Review Forth: Details Floating Point [1/2] Forth includes support for floating-point computations. Floatingpoint values (essentially the same as C/C++ values of type double) are stored on a separate stack: the floating-point stack. Floating-point literals must contain e or E. These push a value on the floating-point stack. -4e 1.2E 1.2e17 See float.fs. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 3
Review Forth: Details Floating Point [2/2] Words that handle floating-point are often named the same as the corresponding integer-handling words, with an f prepended. f. \ Like. f.s \ Like.s This means that the stack-effect notation refers to the floating-point stack. fdup ( F: x -- x x ) \ Like dup Also fdrop fswap... f+ ( F: x y -- x+y ) \ Like + Also f- f* f/ Here are some other floating-point-handling words. f** ( F: x y -- pow[x,y] ) \ x raised to the y power fsqrt ( F: x -- sqrt[x] ) fexp ( F: x -- exp[x] ) \ Also flog fsin fcos... 1/f ( F: x -- 1/x ) 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 4
Review Forth: Details Other Features Some Forth features that we do not have time to cover: Exceptions Forth has a notion of exception that can be used for error handling. Defining new flow-of-control words Some of the words we have covered are special: if else endif begin while repeat?do loop recurse. These affect the flow of control in ways that we do now know how to duplicate. But Forth does allow us to write such words ourselves. Defining new defining words Some other words are special in another way: variable constant : ;. These allow new words to be defined. Forth allows us to write this kind of word as well. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 5
Review Forth: Details Specifying Semantics [1/4] Recall: Syntax = structure (of code) Semantics = meaning (of code) Grammatical Notes Semantics is an uncountable noun (like butter ). It is mostly used in the singular (so Semantics is, not Semantics are... ). Semantic is the corresponding adjective. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 6
Review Forth: Details Specifying Semantics [2/4] How is semantics used? A programmer needs to know the semantics of a PL in order to write correct code. A design of a compiler needs to be based on the semantics of the source PL, so that correct object code can be generated. Similarly, the design of an interpreter needs to be based on the semantics of the source PL, so that correct actions can be performed. Semantics is useful in optimization: altering code so as to improve performance, while keeping semantics the same. Semantics is used in verification: checking that code performs the actions it is supposed to. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 7
Review Forth: Details Specifying Semantics [3/4] Semantics is generally divided into two kinds: static and dynamic. Static semantics includes the aspects of semantics that can be checked before a program executes. This includes: Typing, in statically typed PLs. Dependencies (what relies on what). Other things like whether all cases in a switch are distinct. Dynamic semantics refers to the semantics of a running program: what statements do, and what expressions compute. In a dynamically typed PL, this also includes typing. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 8
Review Forth: Details Specifying Semantics [4/4] We have looked at methods for formally specifying syntax in particular, phrase-structure grammars. Formal semantics refers to methods for formally specifying semantics. These generally involve mathematical notations. We looked briefly at four formal-semantics methods. We are not covering notation. Attribute grammars. Specify static semantics via attributes added to AST nodes. Operational semantics. Specify dynamic semantics of a PL in terms of the semantics of some other PL or abstract machine (usually the latter). Axiomatic semantics. Specify dynamic semantics in terms logical statements about program state. Denotational semantics. Specify dynamic semantics by representing state & values with mathematical objects, commands & computations by functions. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 9
Writing an Interpreter Introduction [1/3] Recall: a compiler takes code in one PL (the source PL) and translates it into code in another PL (the target PL). Source PL Compiler Target PL An interpreter takes code in its source PL and executes it. Source PL Interpreter 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 10
Writing an Interpreter Introduction [2/3] Compilation and interpretation are not mutually exclusive. Many modern interpreters begin by compiling to an intermediate representation (IR) perhaps a byte code which is then interpreted directly. Lua Standard Lua Interpreter Lua Compiler Lua Byte Code Lua Byte Code Interpreter 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 11
Writing an Interpreter Introduction [3/3] Regardless of whether there is a compilation step, virtually all interpreters will use some kind of IR. This might be: An abstract syntax tree (AST). A PL-specific byte code. E.g., Lua byte code, Python byte code. A general-purpose byte code. E.g., LLVM, Java Virtual Machine (JVM) byte code. Some other programming language. JavaScript is used in this way very often. It is possible that more than one IR is used. The source code is translated to the first IR, then the first IR is translated to the second, etc. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 12
Writing an Interpreter Processing an AST [1/2] The AST produced by the parser will need to be processed, either by interpreting it directly, or generating another IR from it. How is this done? An AST is a rooted tree. Code that deals with a rooted tree usually proceeds as follows. Handle the root node. Make a function call (often a recursive call) on each subtree of the root. (a + 2) * -b + * - a 2 b 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 13
Writing an Interpreter Processing an AST [2/2] Suppose we wish to write a function that evaluates an AST representing a numeric expression, like the pictured tree. Our function will take an AST and return the numeric * value of the expression. It could work something like this: If the root node represents a numeric literal: Convert the literal to a number and return it. Else if the root node represents a numeric variable: Get the variable s current value and return it. Else if the root node represents a binary operator: Get the value of the left subtree (recursive call). Get the value of the right subtree (recursive call). Apply the appropriate operation and return the result. Else if the root node represents a unary operator: Get the value of the subtree (recursive call). Apply the appropriate operation and return the result. + a 2 b - 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 14
Writing an Interpreter Representing State While an interpreter is executing a program, there will need to be some representation of program state: values of variables, the call stack, etc. In a PL with static typing and scope, the compiler/linker can determine the types and scopes of all variables and the types of all unnamed values. These can be laid out in memory (for local values, in a stack frame). Thus, at runtime, a reference to a value will simply be a reference to a particular memory location. In a dynamic PL, it is common to place variables in an associative structure with the variable name as key. Usually a hash table is used, with a separate hash table for each scope. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 15
Writing an Interpreter Runtime System There will need to be a runtime system (often simply runtime): additional code that programs will need to use at runtime. This might include: Program initialization and shutdown. I/O. Memory management. Interfaces to operating system functionality (e.g., files, threads, interprocess communication). Implementations of PL commands that perform complex operations (e.g., advanced floating-point computations, operations involving multiple data items like sorting or matrix operations). 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 16
Thoughts on Assignment 6 Introduction You have written a lexer and parser for the Kanchil programming language. For Assignment 6 you will complete the trilogy by writing an interpreter that takes an AST and executes it. As with the previous two parts, this will be written in Lua: a module interpit, which exports a single function: interp. A complete specification of the semantics of Kanchil and requirements on your implementation will be given in the Assignment 6 description. These slides contain some relevant ideas & examples. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 17
Thoughts on Assignment 6 The Goal Once again, here is a sample Kanchil program. # Subroutine &fibo # Given %k, set %fibk to F(%k), # where F(n) = nth Fibonacci no. sub &fibo set %a: 0 # Consecutive Fibos set %b: 1 set %i: 0 # Loop counter while %i < %k set %c: %a+%b # Advance set %a: %b set %b: %c set %i: %i+1 # ++counter end set %fibk: %a # Result end # Get number of Fibos to output print "How many Fibos to print: " input %n cr # Print requested number of Fibos set %j: 0 # Loop counter while %j < %n set %k: %j call &fibo print "F(" print %j print ") = " print %fibk cr set %j: %j + 1 end 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 18
Thoughts on Assignment 6 Function interp [1/2] interpit.interp takes four parameters: ast The AST to execute, in the format returned by parseit.parse. state The current state: values of simple variables, arrays, and subroutines. This is passed so that Kanchil code can be entered interactively, line by line, and handled as a series of separate programs, each getting its state from the earlier code. incall outcall Functions to call to do string input (read line) & output. These are passed so that Kanchil code can interact with files and other programs. In particular, this allows me to test your work. interpit.interp will return the new state. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 19
Thoughts on Assignment 6 Function interp [2/2] You will need to write a number of helper functions. I suggest that, at the very least, you plan to write: A function that takes the AST for a statement list and executes it, updating the state appropriately. A function that takes the AST for a numeric expression, evaluates it, and returns its value. Both of these will be recursive. The function that executes a statement list will be called to execute a program, or a subroutine, or the body of an if-statement or while-statement. Note that the function that evaluates an expression does not need to be concerned with precedence and associativity; these are already encoded in the AST. The evaluation function may need to read the state, but it will not change it; Kanchil expressions have no side effects. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 20
Thoughts on Assignment 6 State State will be stored as a Lua table with three members: v, a, s, holding simple variables, arrays, and subroutines, respectively. For example: The value of simple variable %abc will be in state.v["%abc"]. The value of array item %abc[2] will be in state.a["%abc"][2]. The AST for subroutine &abc will be in state.s["&abc"]. All identifiers are global and have dynamic scope. Once a variable/subroutine is given a value, it has that value everywhere in the code. Thus, only one state table is needed. Kanchil has no fatal runtime errors. Thus, undefined variables are treated as if they have a default value. The default value for simple variables and array items is 0. The default AST for a subroutine is { STMT_LIST }. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 21
Thoughts on Assignment 6 Utilities I provide a runtime system for Kanchil, along with the following utility functions, which should not be modified. numtostr Convert a number to a string. Used in numeric output. strtonum Convert a string to a number. Used in numeric input. numtoint Convert a number to an integer value. Used after every numeric computation. booltoint Convert a Lua boolean to an integer. In addition, the passed incall & outcall should be used to do string I/O. And all of Lua is available to be used. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 22
Thoughts on Assignment 6 Numeric & Boolean Values Kanchil has no separate boolean type. When a Kanchil number is treated as a boolean, it is true if it is nonzero ( ~= 0) and false otherwise. For the majority of Kanchil operators, the computation performed is that done by the corresponding Lua operator, followed by a call to numtoint or booltoint, as appropriate. Two small exceptions: The Kanchil!= operator corresponds to the Lua ~= operator. Unlike Kanchil, Lua has no unary + operator. The Kanchil unary + operator simply returns its operand unchanged. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 23
Thoughts on Assignment 6 General Principles Be DRY! If a function is already written, then it can be used. You may assume the AST is formatted correctly. Write all functions local to interpit.interp. Don t pass around state, incall, outcall. Do pass the AST. Pre-declaring local functions: local f function f( ) -- NO "local" L-Values As the argument of input, or the LHS of set, an L-value is something whose value is changed. As part of an argument of print, the RHS of set, or an array index, an L-value is something that is evaluated as part of an expression. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 24
Thoughts on Assignment 6 How I Did It [1/2] I wrote six new functions, all local to interpit.interp: interp_stmt_list interp_stmt process_lvalue get_lvalue set_lvalue eval_expr Handling L-Values When an L-value is encountered, I call process_lvalue, which returns a description of the L-value (its identifier, whether it is an array reference, and, if so, the index). If I need the value of the L-value, then I pass this description to get_lvalue, which returns the numeric value. If I need to set the L-value, then I pass the description and the new value to set_lvalue. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 25
Thoughts on Assignment 6 How I Did It [2/2] Writing eval_expr This function takes an AST and returns the value of the expression. It is called for the RHS of a set statement, a non-string argument of print, and an array index. It calls itself recursively. Written in the form of a number of cases: ast[1] is NUMLIT_VAL ast[1] is BOOLLIT_VAL ast[1] is VARID_VAL ast[1] is ARRAY_REF ast[1] is a table, and: ast[1][1] is UN_OP ast[1][1] is BIN_OP 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 26
Thoughts on Assignment 6 Write It TO DO Begin writing module interpit. Implementations were written in class for: Cr statements. Print statements whose argument is a string literal. Sub statements (subroutine definitions). Call statements (subroutine calls). Done. See interpit.lua. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 27