Ways to implement a language

Similar documents
CS 360 Programming Languages Interpreters

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures

Typical workflow. CSE341: Programming Languages. Lecture 17 Implementing Languages Including Closures. Reality more complicated

Functional Programming. Pure Functional Programming

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

Why do we need an interpreter? SICP Interpretation part 1. Role of each part of the interpreter. 1. Arithmetic calculator.

Func%onal Programming in Scheme and Lisp

Func%onal Programming in Scheme and Lisp

EDAN65: Compilers, Lecture 13 Run;me systems for object- oriented languages. Görel Hedin Revised:

Programming Languages. Streams Wrapup, Memoization, Type Systems, and Some Monty Python

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

6.037 Lecture 4. Interpretation. What is an interpreter? Why do we need an interpreter? Stages of an interpreter. Role of each part of the interpreter

Compiler Optimization Intermediate Representation

Compila(on /15a Lecture 6. Seman(c Analysis Noam Rinetzky

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

Project 2: Scheme Interpreter

CS251 Programming Languages Spring 2016, Lyn Turbak Department of Computer Science Wellesley College

(Func&onal (Programming (in (Scheme)))) Jianguo Lu

TAIL RECURSION, SCOPE, AND PROJECT 4 11

6.001 Notes: Section 15.1

Racket Values. cons Glues Two Values into a Pair. The Pros of cons: Programming with Pairs and Lists. Box-and-pointer diagrams for cons trees

Streams, Delayed Evaluation and a Normal Order Interpreter. CS 550 Programming Languages Jeremy Johnson

Related Course Objec6ves

The Environment Model. Nate Foster Spring 2018

Scheme in Scheme: The Metacircular Evaluator Eval and Apply

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Announcements. The current topic: Scheme. Review: BST functions. Review: Representing trees in Scheme. Reminder: Lab 2 is due on Monday at 10:30 am.

The Environment Model

Lexical vs. Dynamic Scope

Advanced Topics in MNIT. Lecture 1 (27 Aug 2015) CADSL

Racket Values. cons Glues Two Values into a Pair. The Pros of cons: Pairs and Lists in Racket. Box-and-pointer diagrams for cons trees

A Func'onal Introduc'on. COS 326 David Walker Princeton University

6.037 Lecture 7B. Scheme Variants Normal Order Lazy Evaluation Streams

FP Foundations, Scheme

Announcement. Overview. LISP: A Quick Overview. Outline of Writing and Running Lisp.

Functional Programming. Pure Functional Languages

LISP: LISt Processing

Fall Semester, The Metacircular Evaluator. Today we shift perspective from that of a user of computer langugaes to that of a designer of

Functional Languages. Hwansoo Han

Late Binding; OOP as a Racket Pattern

Principles of Programming Languages

CS 61C: Great Ideas in Computer Architecture Func%ons and Numbers

Documentation for LISP in BASIC

Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs. Lexical addressing

CS 11 Ocaml track: lecture 7

Func+on applica+ons (calls, invoca+ons)

CSc 520 Principles of Programming Languages

(scheme-1) has lambda but NOT define

Overview. CS301 Session 3. Problem set 1. Thinking recursively

Syntactic Sugar: Using the Metacircular Evaluator to Implement the Language You Want

CS 61A Interpreters, Tail Calls, Macros, Streams, Iterators. Spring 2019 Guerrilla Section 5: April 20, Interpreters.

A Brief Introduction to Scheme (II)

CS61A Discussion Notes: Week 11: The Metacircular Evaluator By Greg Krimer, with slight modifications by Phoebus Chen (using notes from Todd Segal)

High-Level Synthesis Creating Custom Circuits from High-Level Code

Functional Programming. Pure Functional Languages

CS61A Midterm 2 Review (v1.1)

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming

What is a model METAPROGRAMMING. ! Descrip5on/abstrac5on of real world things. ! Abstract representa5on that can be manipulated by a program 11/22/13

CS 61C: Great Ideas in Computer Architecture Strings and Func.ons. Anything can be represented as a number, i.e., data or instruc\ons

Functions & First Class Function Values

SOFTWARE ARCHITECTURE 6. LISP

Programming Languages. Thunks, Laziness, Streams, Memoization. Adapted from Dan Grossman s PL class, U. of Washington

CS61A Summer 2010 George Wang, Jonathan Kotker, Seshadri Mahalingam, Eric Tzeng, Steven Tang

An Explicit-Continuation Metacircular Evaluator

Parsing Scheme (+ (* 2 3) 1) * 1

Fall 2017 Discussion 7: October 25, 2017 Solutions. 1 Introduction. 2 Primitives

Introduction to Scheme

Below are example solutions for each of the questions. These are not the only possible answers, but they are the most common ones.

Functional Programming. Big Picture. Design of Programming Languages

Lists in Lisp and Scheme

Evaluating Scheme Expressions

Functional Programming

CS 314 Principles of Programming Languages

CSSE 304 Assignment #13 (interpreter milestone #1) Updated for Fall, 2018

CSE Compilers. Reminders/ Announcements. Lecture 15: Seman9c Analysis, Part III Michael Ringenburg Winter 2013

Functional Programming

4.2 Variations on a Scheme -- Lazy Evaluation

A LISP Interpreter in ML

A Small Interpreted Language

CSE341 Autumn 2017, Final Examination December 12, 2017

SCHEME AND CALCULATOR 5b

MetacircularScheme! (a variant of lisp)

Code Genera*on for Control Flow Constructs

CIS 194: Homework 3. Due Wednesday, February 11, Interpreters. Meet SImPL

CS 415 Midterm Exam Fall 2003

Spring 2018 Discussion 7: March 21, Introduction. 2 Primitives

Principles of Programming Languages

Principles of Programming Languages

Research opportuni/es with me

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

CSCI337 Organisation of Programming Languages LISP

Objec+ves. Review. Basics of Java Syntax Java fundamentals. What are quali+es of good sooware? What is Java? How do you compile a Java program?

NAMES, SCOPES AND BINDING A REVIEW OF THE CONCEPTS

Scheme. Functional Programming. Lambda Calculus. CSC 4101: Programming Languages 1. Textbook, Sections , 13.7

CSE 341, Spring 2011, Final Examination 9 June Please do not turn the page until everyone is ready.

;;; Determines if e is a primitive by looking it up in the primitive environment. ;;; Define indentation and output routines for the output for

Lecture08: Scope and Lexical Address

Intro. Scheme Basics. scm> 5 5. scm>

Transcription:

Interpreters

Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams), mul+threading, An educated computer scien+st should also know some things about implemen*ng PLs Implemen+ng something requires fully understanding its seman+cs Things like closures and objects are not magic Many programming tasks are like implemen+ng PLs Example: "connect- the- dots programming language" from 141

Ways to implement a language Two fundamental ways to implement a programming language X Write an interpreter in another language Y BeSer names: evaluator, executor Immediately executes the input program as it's read Write a compiler in another language Y to a third language Z BeSer name: translator Take a program in X and produce an equivalent program in Z.

First programming language?

First programming language?

Interpreters vs compilers Interpreters Takes one "statement" of code at a +me and executes it in the language of the interpreter. Like having a human interpreter with you in a foreign country. Compilers Translate code in language X into code in language Z and save it for later. (Typically to a file on disk.) Like having a person translate a document into a foreign language for you.

Reality is more complicated Evalua+on (interpreter) and transla+on (compiler) are your op+ons But in modern prac+ce we can have mul+ple layers of both A example with Java: Java was designed to be pla\orm independent. Any program wrisen in Java should be able to run on any computer. Achieved with the "Java Virtual Machine" An idealized computer for which people have wrisen interpreters that run on "real" computers.

Example: Java Java programs are compiled to an "intermediate representa+on" called bytecode. Think of bytecode as an instruc+on set for the JVM. Bytecode is then interpreted by a (so_ware) interpreter in machine- code. Complica+on: Bytecode interpreter can compile frequently- used func+ons to machine code if it desires. CPU itself is an interpreter for machine code.

Sermon Interpreter versus compiler versus combina+ons is about a par+cular language implementa)on, not the language defini)on So clearly there is no such thing as a compiled language or an interpreted language Programs cannot see how the implementa+on works Unfortunately, you hear these phrases all the +me C is faster because it s compiled and LISP is interpreted Nonsense: I can write a C interpreter or a LISP compiler, regardless of what most implementa+ons happen to do Please politely correct your bosses, friends, and other professors

Okay, they do have one point In a tradi+onal implementa+on via compiler, you do not need the language implementa+on (the compiler) to run the program Only to compile it So you can just ship the binary But Racket, Scheme, LISP, Javascript, Ruby, have eval At run- +me create some data (in Racket a list, in Javascript a string) and treat it as a program Then run that program Since we don t know ahead of +me what data will be created and therefore what program it will represent, we need a language implementa+on at run- +me to support eval Could be interpreter, compiler, combina+on

Eval/Apply Digression Built into Racket, tradi+onally part of all LISP- ish interpreters Quote Also built- in Happens behind the scenes when you use the single quote operator: '

Further digression: quo+ng Quo+ng (quote ) or '( ) is a special form that makes everything underneath symbols and lists, not variables and calls But then calling eval on it looks up symbols as code So quote and eval are inverses (list 'begin (list 'print "hi") (list '+ 4 2)) (quote (begin (print "hi") (+ 4 2)))

Back to implemen+ng a language "((lambda (x) (+ x x)) 7)" Possible Errors / warnings Parsing Call Function Negate x + Var Var x x Sta+c checking (what checked depends on PL) Constant 4 Possible Errors / warnings Rest of implementa+on

Skipping those steps If language to be interpreted (X) is very close to the interpreter language (Y), then take advantage of this! Skip parsing? Maybe Y already has this. These abstract syntax trees (ASTs) are already ideal structures for passing to an interpreter We can also, for simplicity, skip sta+c checking Assume subexpressions have correct types Do not worry about (add #f hi ) For dynamic errors in the embedded language, interpreter can give an error message (e.g., divide by zero)

Write Racket in Racket

Heart of the interpreter Mini- Eval: Evaluates an expression to a value (will call apply to handle func+ons) Mini- Apply: Takes a func+on and argument values and evaluate its body (calls eval)

(define (mini-eval expr env) is this a expression? if so, then call our special handler for that type of expression. ) What kind of expressions will we have? numbers variables (symbols) math functions +, -, *, etc others as we need them

How do we evaluate a (literal) number? Just return it! Psuedocode for first line of mini- eval: If this expression is a number, then return it.

How do we handle (add 3 4)? Need two func+ons: One to detect that an expression is an addi+on. One to evaluate the expression.

(add 3 4) Is this an expression an addi+on expression? (equal? 'add (car expr)) Evaluate an addi+on expression: (+ (cadr expr) (caddr expr))

You try Add subtrac+on (e.g., sub) Add mul+plica+on (mul) Add division (div) Add exponen+a+on (exp) It's your programming language, so you may name these commands whatever you want.

(add 3 (add 4 5)) Why doesn't this work?

(add 3 (add 4 5)) How should our language evaluate this sort of expression? We could forbid this kind of expression. Insist things to be added always be numbers. Or, we could allow the things to be added to be expressions themselves. Need a recursive call to mini- eval inside eval- add.

You try Fix your math commands so that they will recursively evaluate their arguments.

Adding Variables

Implemen+ng variables Represent a frame as a hashtable. Racket's hashtables: (define ht (make-hash)) (hash-set! ht key value) (hash-has-key? ht key) (hash-ref ht key)

Implemen+ng variables Represent an environment as a list of frames. global x 2 y 3 f. x 7 y 1 hash table x - > 7 y - > 1 hash table x - > 2 y - > 3

Implemen+ng variables Two things we can do with a variable in our programming language: Define a variable Get the value of a variable

Gepng the value of a variable New type of expression: a symbol. Whenever mini- eval sees a symbol, it should look up the value of the variable corresponding to that symbol.

Gepng the value of a variable (define (lookup-variable-value var env) ; Pseudocode: ; If our current frame has the variable bound, ; then get its value and return it. ; Otherwise, if our current frame has a frame ; pointer, then follow it and try the lookup ; there. ; Otherwise, throw an error.

Gepng the value of a variable (define (lookup-variable-value var env) (cond ((hash-has-key? (car env) var) (hash-ref (car env) var)) ((not (null? env)) (lookup-variable-value var (cdr env))) ((null? env) (error "unbound variable" var))))

Defining a variable Mini- eval needs to handle expressions that look like (define variable expr1) expr1 can contain sub- expressions Add two func+ons to the evaluator: defini+on?: tests if an expr fits the form of a defini+on. eval- defini+on: extract the variable, recursively evaluate expr1, and add a binding to the current frame.

Implemen+ng condi+onals We will have one condi+onal in our mini- language: ifzero Syntax: (ifzero expr1 expr2 expr3) Seman+cs: Evaluate expr1, test if it's equal to zero. If yes, evaluate and return expr2. If no, evaluate and return expr3.

Implemen+ng condi+onals Add func+ons ifzero? and eval- ifzero.

Designing our interpreter around mini-eval. (define (mini-eval expr env) Determines what type of expression expr is Dispatch the evalua+on of the expression to the appropriate func+on number? - > evaluate in place symbol? - > lookup-variable-value add?/subtract?/multiply? - > appropriate math func definition? - > eval-define ifzero? - > eval-ifzero

Today Two more pieces to add: Closures (lambda? / eval- lambda) Func+on calls (call? / eval- call)

Implemen+ng closures In Mini- Racket, all (user- defined) func+ons and closures will have a single argument. Syntax: (lambda var expr) Seman+cs: Creates a new closure (anonymous func+on) of the single argument var, whose body is expr.

(lambda var expr) Need a new data structure to represent a closure. Why can't we just represent them as the list (lambda var closure) above? Hint: What is missing? Think of environment diagrams.

(lambda var expr) We choose to represent closures using a list of four components: The symbol 'closure The argument variable (var) The body (expr) The environment in which this closure was defined.

Evaluate at top level: (lambda x (add x 1)) global Arg: x Code: (add x 1) Our evaluator should return '(closure x (add x 1) (#hash( )))

Write lambda? and eval- lambda lambda? is easy. eval- lambda should: Extract the variable name and the body, but don t evaluate the body (not un+l we call the func+on) Return a list of the symbol 'closure, the variable, the body, and the current environment.

(define (eval-lambda expr env) (list 'closure (cadr expr) ; the variable (caddr expr) ; the body env))

Func+on calls First we need the other half of the eval/apply paradigm.

Remember from environment diagrams: To evaluate a func+on call, make a new frame with the func+on's arguments bound to their values, then run the body of the func+on using the new environment for variable lookups.

Apply (define (mini-apply closure argval) Pseudocode: Make a new frame mapping the closure's argument (i.e., the variable name) to argval. Make a new environment consisting of the new frame pointing to the closure's environment. Evaluate the closure's body in the new environment.

Apply (define (mini-apply closure argval) (let ((new-frame (make-hash))) (hash-set! new-frame <arg name> argval) (let ((new-env ))) <construct new environment>)) <eval body of closure in new-env>

Apply (define (mini-apply closure argval) (let ((new-frame (make-hash))) (hash-set! new-frame (cadr closure) argval) (let ((new-env (cons new-frame (cadddr closure)))) (mini-eval (caddr closure) new-env))))

Func+on calls Syntax: (call expr1 expr2) Seman+cs: Evaluate expr1 (must evaluate to a closure) Evaluate expr2 to a value (the argument value) Apply closure to value (and return result)

You try it Write call? (easy) Write eval- call (a lisle harder) Evaluate expr1 (must evaluate to a closure) Evaluate expr2 to a value (the argument value) Apply closure to value (and return result) When done, you now have a Turing- complete language!

; expr looks like ; (call expr1 expr2) (define (eval-call expr env) (mini-apply <eval the function> <eval the argument>)

(define (eval-call expr env) (mini-apply (mini-eval (cadr expr) env) (mini-eval (caddr expr) env)))

Magic in higher- order func+ons The magic : How is the right environment around for lexical scope when func+ons may return other func+ons, store them in data structures, etc.? Lack of magic: The interpreter uses a closure data structure to keep the environment it will need to use later

Is this expensive? Time to build a closure is +ny: make a list with four items. Space to store closures might be large if environment is large.

Interpreter steps Parser Takes code and produces an intermediate representa+on (IR), e.g., abstract syntax tree. Sta+c checking Typically includes syntac+cal analysis and type checking. Interpreter directly runs code in the IR.

Compiler steps Parser Sta+c checking Code op+mizer Take AST and alter it to make the code execute faster. Code generator Produce code in output language (and save it, as opposed to running it).

Code op+miza+on // Test if n is prime boolean isprime(int n) { for (int x = 2; x < sqrt(n); x++) { if (n % x == 0) return false; } return true; }

Code op+miza+on // Test if n is prime boolean isprime(int n) { double temp = sqrt(n); for (int x = 2; x < temp; x++) { if (n % x == 0) return false; } return true; }

Common code op+miza+ons Replacing constant expressions with their evalua+ons. Ex: Game that displays an 8 by 8 grid. Each cell will be 50 pixels by 50 pixels on the screen. int CELL_WIDTH = 50; int BOARD_WIDTH = 8 * CELL_WIDTH;

Common code op+miza+ons Replacing constant expressions with their evalua+ons. Ex: Game that displays an 8 by 8 grid. Each cell will be 50 pixels by 50 pixels on the screen. int CELL_WIDTH = 50; int BOARD_WIDTH = 400; References to these variables would probably replaced with constants as well.

Common code op+miza+ons Reordering code to improve cache performance. for (int x = 0; x < HUGE_NUMBER; x++) { huge_array[x] = f(x) } another_huge_array[x] = g(x)

Common code op+miza+ons Reordering code to improve cache performance. for (int x = 0; x < HUGE_NUMBER; x++) { huge_array[x] = f(x) } for (int x = 0; x < HUGE_NUMBER; x++) { another_huge_array[x] = g(x) }

Common code op+miza+ons Loops: unrolling, combining/distribu+on, change nes+ng Finding common subexpressions and replacing with a reference to a temporary variable. (a + b)/4 + (a + b)/3 Recursion: replace with itera+on if possible. That's what tail- recursion op+miza+on does!

Why don't interpreters do these op+miza+ons? Usually, there's not enough +me. We need the code to run NOW! Some+mes, can op+mize a lisle (e.g., tail- recursion).

Code genera+on Last phase of compila+on. Choose what opera+ons to use in the output language and what order to put them in (instruc*on selec*on, instruc*on scheduling). If output in a low- level language: Pick what variables are stored in which registers (register alloca+on). Include debugging code? (store "true" func+on/ variable names and line numbers?)

Java Uses both interpreta+on and compila+on! Step 1: Compile Java source to bytecode. Bytecode is "machine code" for a made- up computer, the Java Virtual Machine (JVM). Step 2: An interpreter interprets the bytecode. Historically, the bytecode interpreter made Java code execute very slowly (1990s).

Just- in- +me compila+on Bytecode interpreters historically would translate each bytecode command into machine code and immediately execute it. A just- in- +me compiler has two op+miza+ons: Caches bytecode - > machine code transla+ons so it can re- use them later. Dynamically compiles sec+ons of bytecode into machine code "when it thinks it should."

JIT: a classic trade- off Startup is slightly slower Need +me to do some ini+al dynamic compila+on. Once the program starts, it runs faster than a regular interpreter. Because some sec+ons are now compiled.