Public-Service Announcement

Similar documents
61A Lecture 26. Monday, October 31

INTERPRETATION AND SCHEME LISTS 12

INTERPRETATION AND SCHEME LISTS 12

Lecture #24: Programming Languages and Programs

Syntax and Grammars 1 / 21

Chapter 9: Dealing with Errors

A Simple Syntax-Directed Translator

Parsing CSCI-400. Principles of Programming Languages.

Syntax-Directed Translation. Lecture 14

CS61A Notes Week 13: Interpreters

A simple syntax-directed

Topic 3: Syntax Analysis I

EXCEPTIONS, CALCULATOR 10

SCHEME AND CALCULATOR 5b

Variables, expressions and statements

INTERPRETERS AND TAIL CALLS 9

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

INTERPRETERS 8. 1 Calculator COMPUTER SCIENCE 61A. November 3, 2016

Context Free Grammars and Recursive Descent Parsing

Introduction to Parsing. Lecture 8

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Syntax Analysis. Chapter 4

LECTURE 3. Compiler Phases

ASTS, GRAMMARS, PARSING, TREE TRAVERSALS. Lecture 14 CS2110 Fall 2018

Syntax Errors; Static Semantics

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Lecture 10 Parsing 10.1

Introduction to Lexical Analysis

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

Semantic Analysis. Lecture 9. February 7, 2018

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

ASTs, Objective CAML, and Ocamlyacc

SCHEME INTERPRETER GUIDE 4

Homework & Announcements

Project 1: Scheme Pretty-Printer

Compiler Construction D7011E

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

A programming language requires two major definitions A simple one pass compiler

syntax tree - * * * * * *

CSE 413 Final Exam. June 7, 2011

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Getting Started. Office Hours. CSE 231, Rich Enbody. After class By appointment send an . Michigan State University CSE 231, Fall 2013

CS 842 Ben Cassell University of Waterloo

Building Compilers with Phoenix

SCHEME The Scheme Interpreter. 2 Primitives COMPUTER SCIENCE 61A. October 29th, 2012

CSE 413 Final Exam Spring 2011 Sample Solution. Strings of alternating 0 s and 1 s that begin and end with the same character, either 0 or 1.

CSE P 501 Exam 8/5/04

Chapter 4: Syntax Analyzer

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

CSE 3302 Programming Languages Lecture 2: Syntax

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical Analysis. Lecture 2-4

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Intro. Scheme Basics. scm> 5 5. scm>

Monty Python and the Holy Grail (1975) BBM 101. Introduction to Programming I. Lecture #03 Introduction to Python and Programming, Control Flow

Interpreters and Tail Calls Fall 2017 Discussion 8: November 1, 2017 Solutions. 1 Calculator. calc> (+ 2 2) 4

Introduction to Parsing. Lecture 5

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Lexical Analysis. Lecture 3-4

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Compilers CS S-01 Compiler Basics & Lexical Analysis

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

Sprite an animation manipulation language Language Reference Manual

This book is licensed under a Creative Commons Attribution 3.0 License

Lexical and Syntax Analysis

Full file at

Compiler Theory. (Semantic Analysis and Run-Time Environments)

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Lecture 2: SML Basics

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Lexical Analysis. Lecture 3. January 10, 2018

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

CS S-01 Compiler Basics & Lexical Analysis 1

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

6.037 Lecture 4. Interpretation. What is an interpreter? Why do we need an interpreter? Stages of an interpreter. Role of each part of the interpreter

CS1 Lecture 3 Jan. 22, 2018

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Basic Concepts. Computer Science. Programming history Algorithms Pseudo code. Computer - Science Andrew Case 2

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

The Three Rules. Program. What is a Computer Program? 5/30/2018. Interpreted. Your First Program QuickStart 1. Chapter 1

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Transcription:

Public-Service Announcement Hi CS 61A students! Did you know that the university has a farm? The UC Gill Tract Community Farm is located just two miles from campus and is a community space, open to all. Come volunteer during open hours for free organic produce, or drop by during our farm stand hours on Sunday to purchase some produce without volunteering. Here are UCGTCF open hours: S 12-5 (farmstand 3-5) M 11-2 Tu 3-6 W 11-2 Th 3-6 We ll see you there soon! Please visit gilltractfarm.org for more information. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 1

Lecture #23: Programming Languages and Programs A programming language, is a notation for describing computations or processes. These range from low-level notations, such as machine language or simple hardware description languages, where the subject matter is typically finite bit sequences and primitive operations on them that correspond directly to machine instructions or gates,......to high-level notations,such aspython, inwhichthesubject matter can be objects and operations of arbitrary complexity. The universe of implementations of these languages is layered: Python can be implemented in C, which in turn can be implemented in assembly language, which in turn is implemented in machine language, which in turn is implemented with gates, which in turn are implemented with transistors. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 2

Metalinguistic Abstraction We ve created abstractions of actions functions and of things classes. Metalinguistic abstraction refers to the creation of languages abstracting description. Programming languages are one example. Programming languages are effective: they can be implemented. These implementations interpret utterances in that language, performing the described computation or controlling the described process. The interpreter may be hardware (interpreting machine-language programs) or software (a program called an interpreter), or (increasingly common) both. To be implemented, though, the grammar and meaning of utterances in the programming language must be defined precisely. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 3

Source: John Denero. A Sample Language: Calculator Prefix notation expression language for basic arithmetic Python-like syntax, with more flexible built-in functions. calc> add(1, 2, 3, 4) 10 calc> mul() 1 calc> sub(100, mul(7, add(8, div(-12, -3)))) 16.0 calc> -(100, *(7, +(8, /(-12, -3)))) 16.0 Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 4

Expression types: Syntax and Semantics of Calculator A call expression is an operator name followed by a comma-separated list of operand expressions, in parentheses A primitive expression is a number Operators: The add (or + operator returns the sum of its arguments The sub (-) operator returns either the additive inverse of a single argument, or the sum of subsequent arguments subtracted from the first. The mul (*) operator returns the product of its arguments. The div (/) operator returns the real-valued quotient of a dividend and divisor. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 5

Expression Trees (again) Our calculator program represents expressions as trees (see Lecture #10). It consists of a parser, which produces expression trees from input text, and an evaluator, which performs the computations represented by the trees. You can use the term interpreter to refer to both, or to just the evaluator. To create an expression tree: class Exp(object): """A call expression in Calculator.""" def init (self, operator, operands): self.operator = operator self.operands = operands Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 6

Expression Trees By Hand As usual, we have defined (in 23.py) the methods repr and str to produce reasonable representations of expression trees: >>> Exp( add, [1, 2]) Exp( add, [1, 2]) >>> str(exp( add, [1, 2])) add(1, 2) >>> Exp( add, [1, Exp( mul, [2, 3, 4])]) Exp( add, [1, Exp( mul, [2, 3, 4])]) >>> str(exp( add, [1, Exp( mul, [2, 3, 4])])) add(1, mul(2, 3, 4)) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 7

Evaluation Evaluation discovers the form of an expression and then executes a corresponding evaluation rule. Primitive expressions (literals) evaluate to themselves Call expressions are evaluated recursively, following the tree structure: Evaluate each operand expression, collecting values as a list of arguments. Apply the named operator to the argument list. def calc eval(exp): """Evaluate a Calculator expression.""" if type(exp) in (int, float): return exp elif type(exp) == Exp: arguments = list(map(calc eval, exp.operands)) return calc apply(exp.operator, arguments) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 8

Applying Operators Calculator has a fixed set of operators that we can enumerate def calc apply(operator, args): """Apply the named operator to a list of args. if operator in ( add, + ): return sum(args) if operator in ( sub, - ): if len(args) == 0: raise TypeError(operator + requires at least 1 argument ) if len(args) == 1: return -args[0] return sum(args[:1] + [-arg for arg in args[1:]]) etc. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 9

Read-Eval-Print Loop The user interface to many programming languages is an interactive loop that Reads an expression from the user Parses the input to build an expression tree Evaluates the expression tree Prints the resulting value of the expression def read eval print loop(): """Run a read-eval-print loop for calculator.""" while True: try: expression tree = calc parse(input( calc> )) print(calc eval(expression tree)) except: print error message and recover Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 10

Recap: The strategy. Calculator Example: Parsing Parsing: Convert text into expression trees. Evaluation: Recursively traverse the expression trees calculating a result add(2, 2) = Exp( add, (2, 2)) = 4 Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 11

Parsing: Lexical and Syntactic Analysis To parse a text is to analyze it into its constituents and to describe their relationship or structure. Thus, we can parse an English sentence into nouns, verbs, adjectives, etc., and determine what plays the role of subject, what is plays the role of object of the action, and what clauses or words modify what. When processing programming languages, we typically divide task into two stages: Lexical analysis (aka tokenization): Divide input string into meaningful tokens, such as integer literals, identifiers, punctuation marks. Syntactic analysis: Convert token sequence into trees that reflect their meaning. def calc parse(line): # From lect23.py """Parse a line of calculator input. Return expression tree.""" tokens = tokenize(line) expression tree = analyze(tokens) return expression tree Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 12

Tokens Purpose of tokenize is to perform a transformation like this: >>> tokenize( add(2, mul(4, 6)) ) [ add, (, 2,,, mul, (, 4,,, 6, ), ) ] In principle, we could dispense with this step and go from text to trees directly, but We choose these particular chunks because they correspond to how we think about and describe the text, and thus make analysis simpler: We say the word add, not the character a followed by the character b... We don t mention spaces at all. In production compilers, the lexical analyzer typically returns more information, but the simple tokens will do for this problem. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 13

Quick-and-Dirty Tokenizing For our simple purposes, we can use a few simple Python routines to do the job. For example, if all our tokens were separated by whitespace, we could use the.split() method on strings to break up the input, after first using the.strip() method to remove any leading or trailing whitespace: >>> " add ( 2, 2 ) ".strip().split() [ add, (, 2,,, 2, ) ] [Gee. HowdidIfindoutabouttheseusefulmethods? Whatprompted me to go looking?] So now, we just need to get a string with everything separated. Since integer literals and words (like add or + ) are not supposed to be next to each other in the syntax, it would suffice to surround any punctuation characters with spaces. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 14

Quick-and-Dirty Tokenizing: The Code Option 1: use the.replace method on strings: def tokenize(line): """Convert a string into a list of tokens.""" spaced = line.replace( (, ( ).replace( ), ) ) \.replace(,,, ) return spaced.strip().split() Option 2: same as Option 1, but use a loop to make it more easily extensible: spaced = line for c in "(),": spaced = spaced.replace(c, + c + ) Option 3: Import the package re, and use pattern replacement: spaced = re.sub(r ([(),]), r \1, line) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 15

Syntactic Analysis: Find the Recursion Consider the definition of a calculator expression: A numeral, or Anoperator,followedbya (, followedbyasequenceofcalculator expressions separated by commas, followed by a right parenthesis. The recursion in the definition suggests the recursive structure of our analyzer. This particular syntax has two useful properties: By looking at the first token of a calculator expression, we can tell which of the two branches above to take, and By looking at the token immediately after each operand, we can tell when we ve come to the end of an operand list. That is, we can predict on the basis of the next (as-yet unprocessed) token, what we ll find next. Allows us to build a predictive recursive-descent parser that uses one token of lookahead. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 16

Analysis from the Top Plan: organize our program into two mutually recursive functions: one for expressions, and one for operand lists. Each of these will input a list of tokens and consume (remove) the tokens comprising the expression or list it finds, returning tree(s). def analyze(tokens): """Return the translation of a prefix of tokens that forms a calculator expression into a tree, removing the tokens used.""" token = analyze token(tokens.pop(0)) if type(token) in (int, float): return token else: return Exp(token, analyze operands(tokens)) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 17

Handling Operands def analyze operands(tokens): """Assuming that tokens starts with a comma-separated list of expressions surrounded by (...), return their translations into a list of trees, removing all the tokens thus used.""" Notes: operands = [] while tokens.pop(0)!= ) : operands.append(analyze(tokens)) return operands Every trip through the while loop splits off an operand. The while condition has the side effect of removing the next token. This token is ( on the first trip,, after each operand but the last, and ) after the last operand. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 18

Detail: Token Coercion The analyze token function converts numerals (text) into Python numbers. In actual compilers, this is often done by the lexical analyzer, but the boundary between lexer and parser is moveable. def analyze token(token): """Return the numeric value of token if it can be analyzed as a number, and otherwise token itself.""" try: return int(token) # Why try this first? except ValueError: try: return float(token) except ValueError: return token Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 19

Limitations of Predictive Parsers Not all languages lend themselves to predictive parsing. Consider the English sentence: Subject of the sentence {}}{ The horse raced past the barn fell. This is an example of a garden-path sentence: You expect (might reasonably predict) that the subject is The horse, and ends just before raced. But raced heremeans thatwasraced, whichyou can ttelluntil you get to the last word. One can use backtracking in this case (like the maze program). Requires a different program structure. Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 20

Dealing with Errors Code so far has assumed correct input. In real life, one must be less trusting. known operators = { add, sub, mul, div, +, -, *, / } def analyze(tokens): if not tokens: raise SyntaxError( unexpected end of line ) token = analyze token(tokens.pop(0)) if type(token) in (int, float): return token if token in known operators: return Exp(token, analyze operands(tokens)) else: raise SyntaxError( unexpected + token) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 21

Dealing with Errors with a Little More Style Error-checking code clutters the program, so we might opt for something a bit clearer. known operators = { add, sub, mul, div, +, -, *, / } def analyze(tokens): token = next token(tokens, known operators) if type(token) in (int, float): return token else: return Exp(token, analyze operands(tokens)) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 22

Catching Errors in next token def next token(tokens, allowed): if len(tokens) == 0: token, name = None, *ENDLINE* else: token = name = analyze token(tokens.pop()) if token in allowed or \ (type(token) in [int, float] and int in allowed): return token else: raise SyntaxError( unexpected token: + name) Last modified: Mon Mar 20 15:24:06 2017 CS61A: Lecture #23 23