Writing an Interpreter Thoughts on Assignment 6

Similar documents
Thoughts on Assignment 4 Haskell: Flow of Control

Scheme: Data. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, April 3, Glenn G.

Introduction to Syntax Analysis Recursive-Descent Parsing

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

Scheme: Expressions & Procedures

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Haskell: Lists. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, Glenn G.

Semantics of programming languages

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Types and Static Type Checking (Introducing Micro-Haskell)

Semantic actions for expressions

Semantic actions for declarations and expressions

LECTURE 17. Expressions and Assignment

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Semantic actions for declarations and expressions

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Semantic actions for declarations and expressions. Monday, September 28, 15

Types and Static Type Checking (Introducing Micro-Haskell)

Scheme: Strings Scheme: I/O

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Semantics of programming languages

CSE 12 Abstract Syntax Trees

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Stating the obvious, people and computers do not speak the same language.

6.001 Notes: Section 4.1


A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

PL Categories: Functional PLs Introduction to Haskell Haskell: Functions

Intermediate Code Generation

Operational Semantics. One-Slide Summary. Lecture Outline

CS 6353 Compiler Construction Project Assignments

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CS4215 Programming Language Implementation

COS 320. Compiling Techniques

Semantics of programming languages

Semantic Analysis. Lecture 9. February 7, 2018

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Programming Languages Third Edition. Chapter 7 Basic Semantics

Modules, Structs, Hashes, and Operational Semantics

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

ECE251 Midterm practice questions, Fall 2010

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

Language Reference Manual simplicity

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

CS558 Programming Languages

St. MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

1 Lexical Considerations

7. Introduction to Denotational Semantics. Oscar Nierstrasz

CMSC 330: Organization of Programming Languages. Operational Semantics

The Structure of a Syntax-Directed Compiler

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Midterm 2 Solutions Many acceptable answers; one was the following: (defparameter g1

CS 360 Programming Languages Interpreters

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Binary Search Trees Treesort

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Programming Languages, Summary CSC419; Odelia Schwartz

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

LECTURE 18. Control Flow

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

The Structure of a Syntax-Directed Compiler

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CS4215 Programming Language Implementation. Martin Henz

CS 3360 Design and Implementation of Programming Languages. Exam 1

Principles of Programming Languages COMP251: Syntax and Grammars

And Parallelism. Parallelism in Prolog. OR Parallelism

CS 351 Design of Large Programs Programming Abstractions

These notes are intended exclusively for the personal usage of the students of CS352 at Cal Poly Pomona. Any other usage is prohibited without

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

CGS 3066: Spring 2015 JavaScript Reference

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Chapter 3 (part 3) Describing Syntax and Semantics

The role of semantic analysis in a compiler

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

Semantic Analysis. How to Ensure Type-Safety. What Are Types? Static vs. Dynamic Typing. Type Checking. Last time: CS412/CS413

Question Points Score

Semantic Analysis and Type Checking

Operators and Expressions

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

COMP 410 Lecture 1. Kyle Dewey

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

CS 415 Midterm Exam Spring SOLUTION

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

CS 6353 Compiler Construction Project Assignments

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

In Our Last Exciting Episode

CIS 341 Midterm March 2, 2017 SOLUTIONS

SaM. 1. Introduction. 2. Stack Machine

Introduction to Scheme

Transcription:

Writing an Interpreter Thoughts on Assignment 6 CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, March 27, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks ggchappell@alaska.edu 2017 Glenn G. Chappell

Review Forth: Details Stacks A Forth implementation adhering to the ANSI standard is actually required to have four stacks. Data stack Holds integer values. These are also used as pointers and booleans. This is the stack we have been dealing with. Floating-point stack Holds floating-point values. Return stack Holds return addresses for words that are called. Locals stack Holds local variables. Why are the return stack & locals stack separate? I do not know. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 2

Review Forth: Details Floating Point [1/2] Forth includes support for floating-point computations. Floatingpoint values (essentially the same as C/C++ values of type double) are stored on a separate stack: the floating-point stack. Floating-point literals must contain e or E. These push a value on the floating-point stack. -4e 1.2E 1.2e17 See float.fs. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 3

Review Forth: Details Floating Point [2/2] Words that handle floating-point are often named the same as the corresponding integer-handling words, with an f prepended. f. \ Like. f.s \ Like.s This means that the stack-effect notation refers to the floating-point stack. fdup ( F: x -- x x ) \ Like dup Also fdrop fswap... f+ ( F: x y -- x+y ) \ Like + Also f- f* f/ Here are some other floating-point-handling words. f** ( F: x y -- pow[x,y] ) \ x raised to the y power fsqrt ( F: x -- sqrt[x] ) fexp ( F: x -- exp[x] ) \ Also flog fsin fcos... 1/f ( F: x -- 1/x ) 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 4

Review Forth: Details Other Features Some Forth features that we do not have time to cover: Exceptions Forth has a notion of exception that can be used for error handling. Defining new flow-of-control words Some of the words we have covered are special: if else endif begin while repeat?do loop recurse. These affect the flow of control in ways that we do now know how to duplicate. But Forth does allow us to write such words ourselves. Defining new defining words Some other words are special in another way: variable constant : ;. These allow new words to be defined. Forth allows us to write this kind of word as well. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 5

Review Forth: Details Specifying Semantics [1/4] Recall: Syntax = structure (of code) Semantics = meaning (of code) Grammatical Notes Semantics is an uncountable noun (like butter ). It is mostly used in the singular (so Semantics is, not Semantics are... ). Semantic is the corresponding adjective. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 6

Review Forth: Details Specifying Semantics [2/4] How is semantics used? A programmer needs to know the semantics of a PL in order to write correct code. A design of a compiler needs to be based on the semantics of the source PL, so that correct object code can be generated. Similarly, the design of an interpreter needs to be based on the semantics of the source PL, so that correct actions can be performed. Semantics is useful in optimization: altering code so as to improve performance, while keeping semantics the same. Semantics is used in verification: checking that code performs the actions it is supposed to. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 7

Review Forth: Details Specifying Semantics [3/4] Semantics is generally divided into two kinds: static and dynamic. Static semantics includes the aspects of semantics that can be checked before a program executes. This includes: Typing, in statically typed PLs. Dependencies (what relies on what). Other things like whether all cases in a switch are distinct. Dynamic semantics refers to the semantics of a running program: what statements do, and what expressions compute. In a dynamically typed PL, this also includes typing. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 8

Review Forth: Details Specifying Semantics [4/4] We have looked at methods for formally specifying syntax in particular, phrase-structure grammars. Formal semantics refers to methods for formally specifying semantics. These generally involve mathematical notations. We looked briefly at four formal-semantics methods. We are not covering notation. Attribute grammars. Specify static semantics via attributes added to AST nodes. Operational semantics. Specify dynamic semantics of a PL in terms of the semantics of some other PL or abstract machine (usually the latter). Axiomatic semantics. Specify dynamic semantics in terms logical statements about program state. Denotational semantics. Specify dynamic semantics by representing state & values with mathematical objects, commands & computations by functions. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 9

Writing an Interpreter Introduction [1/3] Recall: a compiler takes code in one PL (the source PL) and translates it into code in another PL (the target PL). Source PL Compiler Target PL An interpreter takes code in its source PL and executes it. Source PL Interpreter 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 10

Writing an Interpreter Introduction [2/3] Compilation and interpretation are not mutually exclusive. Many modern interpreters begin by compiling to an intermediate representation (IR) perhaps a byte code which is then interpreted directly. Lua Standard Lua Interpreter Lua Compiler Lua Byte Code Lua Byte Code Interpreter 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 11

Writing an Interpreter Introduction [3/3] Regardless of whether there is a compilation step, virtually all interpreters will use some kind of IR. This might be: An abstract syntax tree (AST). A PL-specific byte code. E.g., Lua byte code, Python byte code. A general-purpose byte code. E.g., LLVM, Java Virtual Machine (JVM) byte code. Some other programming language. JavaScript is used in this way very often. It is possible that more than one IR is used. The source code is translated to the first IR, then the first IR is translated to the second, etc. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 12

Writing an Interpreter Processing an AST [1/2] The AST produced by the parser will need to be processed, either by interpreting it directly, or generating another IR from it. How is this done? An AST is a rooted tree. Code that deals with a rooted tree usually proceeds as follows. Handle the root node. Make a function call (often a recursive call) on each subtree of the root. (a + 2) * -b + * - a 2 b 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 13

Writing an Interpreter Processing an AST [2/2] Suppose we wish to write a function that evaluates an AST representing a numeric expression, like the pictured tree. Our function will take an AST and return the numeric * value of the expression. It could work something like this: If the root node represents a numeric literal: Convert the literal to a number and return it. Else if the root node represents a numeric variable: Get the variable s current value and return it. Else if the root node represents a binary operator: Get the value of the left subtree (recursive call). Get the value of the right subtree (recursive call). Apply the appropriate operation and return the result. Else if the root node represents a unary operator: Get the value of the subtree (recursive call). Apply the appropriate operation and return the result. + a 2 b - 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 14

Writing an Interpreter Representing State While an interpreter is executing a program, there will need to be some representation of program state: values of variables, the call stack, etc. In a PL with static typing and scope, the compiler/linker can determine the types and scopes of all variables and the types of all unnamed values. These can be laid out in memory (for local values, in a stack frame). Thus, at runtime, a reference to a value will simply be a reference to a particular memory location. In a dynamic PL, it is common to place variables in an associative structure with the variable name as key. Usually a hash table is used, with a separate hash table for each scope. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 15

Writing an Interpreter Runtime System There will need to be a runtime system (often simply runtime): additional code that programs will need to use at runtime. This might include: Program initialization and shutdown. I/O. Memory management. Interfaces to operating system functionality (e.g., files, threads, interprocess communication). Implementations of PL commands that perform complex operations (e.g., advanced floating-point computations, operations involving multiple data items like sorting or matrix operations). 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 16

Thoughts on Assignment 6 Introduction You have written a lexer and parser for the Kanchil programming language. For Assignment 6 you will complete the trilogy by writing an interpreter that takes an AST and executes it. As with the previous two parts, this will be written in Lua: a module interpit, which exports a single function: interp. A complete specification of the semantics of Kanchil and requirements on your implementation will be given in the Assignment 6 description. These slides contain some relevant ideas & examples. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 17

Thoughts on Assignment 6 The Goal Once again, here is a sample Kanchil program. # Subroutine &fibo # Given %k, set %fibk to F(%k), # where F(n) = nth Fibonacci no. sub &fibo set %a: 0 # Consecutive Fibos set %b: 1 set %i: 0 # Loop counter while %i < %k set %c: %a+%b # Advance set %a: %b set %b: %c set %i: %i+1 # ++counter end set %fibk: %a # Result end # Get number of Fibos to output print "How many Fibos to print: " input %n cr # Print requested number of Fibos set %j: 0 # Loop counter while %j < %n set %k: %j call &fibo print "F(" print %j print ") = " print %fibk cr set %j: %j + 1 end 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 18

Thoughts on Assignment 6 Function interp [1/2] interpit.interp takes four parameters: ast The AST to execute, in the format returned by parseit.parse. state The current state: values of simple variables, arrays, and subroutines. This is passed so that Kanchil code can be entered interactively, line by line, and handled as a series of separate programs, each getting its state from the earlier code. incall outcall Functions to call to do string input (read line) & output. These are passed so that Kanchil code can interact with files and other programs. In particular, this allows me to test your work. interpit.interp will return the new state. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 19

Thoughts on Assignment 6 Function interp [2/2] You will need to write a number of helper functions. I suggest that, at the very least, you plan to write: A function that takes the AST for a statement list and executes it, updating the state appropriately. A function that takes the AST for a numeric expression, evaluates it, and returns its value. Both of these will be recursive. The function that executes a statement list will be called to execute a program, or a subroutine, or the body of an if-statement or while-statement. Note that the function that evaluates an expression does not need to be concerned with precedence and associativity; these are already encoded in the AST. The evaluation function may need to read the state, but it will not change it; Kanchil expressions have no side effects. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 20

Thoughts on Assignment 6 State State will be stored as a Lua table with three members: v, a, s, holding simple variables, arrays, and subroutines, respectively. For example: The value of simple variable %abc will be in state.v["%abc"]. The value of array item %abc[2] will be in state.a["%abc"][2]. The AST for subroutine &abc will be in state.s["&abc"]. All identifiers are global and have dynamic scope. Once a variable/subroutine is given a value, it has that value everywhere in the code. Thus, only one state table is needed. Kanchil has no fatal runtime errors. Thus, undefined variables are treated as if they have a default value. The default value for simple variables and array items is 0. The default AST for a subroutine is { STMT_LIST }. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 21

Thoughts on Assignment 6 Utilities I provide a runtime system for Kanchil, along with the following utility functions, which should not be modified. numtostr Convert a number to a string. Used in numeric output. strtonum Convert a string to a number. Used in numeric input. numtoint Convert a number to an integer value. Used after every numeric computation. booltoint Convert a Lua boolean to an integer. In addition, the passed incall & outcall should be used to do string I/O. And all of Lua is available to be used. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 22

Thoughts on Assignment 6 Numeric & Boolean Values Kanchil has no separate boolean type. When a Kanchil number is treated as a boolean, it is true if it is nonzero ( ~= 0) and false otherwise. For the majority of Kanchil operators, the computation performed is that done by the corresponding Lua operator, followed by a call to numtoint or booltoint, as appropriate. Two small exceptions: The Kanchil!= operator corresponds to the Lua ~= operator. Unlike Kanchil, Lua has no unary + operator. The Kanchil unary + operator simply returns its operand unchanged. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 23

Thoughts on Assignment 6 General Principles Be DRY! If a function is already written, then it can be used. You may assume the AST is formatted correctly. Write all functions local to interpit.interp. Don t pass around state, incall, outcall. Do pass the AST. Pre-declaring local functions: local f function f( ) -- NO "local" L-Values As the argument of input, or the LHS of set, an L-value is something whose value is changed. As part of an argument of print, the RHS of set, or an array index, an L-value is something that is evaluated as part of an expression. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 24

Thoughts on Assignment 6 How I Did It [1/2] I wrote six new functions, all local to interpit.interp: interp_stmt_list interp_stmt process_lvalue get_lvalue set_lvalue eval_expr Handling L-Values When an L-value is encountered, I call process_lvalue, which returns a description of the L-value (its identifier, whether it is an array reference, and, if so, the index). If I need the value of the L-value, then I pass this description to get_lvalue, which returns the numeric value. If I need to set the L-value, then I pass the description and the new value to set_lvalue. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 25

Thoughts on Assignment 6 How I Did It [2/2] Writing eval_expr This function takes an AST and returns the value of the expression. It is called for the RHS of a set statement, a non-string argument of print, and an array index. It calls itself recursively. Written in the form of a number of cases: ast[1] is NUMLIT_VAL ast[1] is BOOLLIT_VAL ast[1] is VARID_VAL ast[1] is ARRAY_REF ast[1] is a table, and: ast[1][1] is UN_OP ast[1][1] is BIN_OP 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 26

Thoughts on Assignment 6 Write It TO DO Begin writing module interpit. Implementations were written in class for: Cr statements. Print statements whose argument is a string literal. Sub statements (subroutine definitions). Call statements (subroutine calls). Done. See interpit.lua. 27 Mar 2017 CS F331 / CSCE A331 Spring 2017 27