CS 842 Ben Cassell University of Waterloo

Similar documents
Functional programming techniques

Squibs and Discussions Memoization in Top-Down Parsing

INF4820. Common Lisp: Closures and Macros

Lisp Basic Example Test Questions

UMBC CMSC 331 Final Exam

Principles of Programming Languages 2017W, Functional Programming

Functional programming with Common Lisp

Homework & Announcements

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Racket: Macros. Advanced Functional Programming. Jean-Noël Monette. November 2013

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

COP4020 Spring 2011 Midterm Exam

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Principles of Programming Languages Topic: Functional Programming Professor L. Thorne McCarty Spring 2003

Fall 2018 Discussion 8: October 24, 2018 Solutions. 1 Introduction. 2 Primitives

Fall 2017 Discussion 7: October 25, 2017 Solutions. 1 Introduction. 2 Primitives

Functional Programming. Pure Functional Programming

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

User-defined Functions. Conditional Expressions in Scheme

Scheme Tutorial. Introduction. The Structure of Scheme Programs. Syntax

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Functional Programming. Pure Functional Languages

Symbolic Programming. Dr. Zoran Duric () Symbolic Programming 1/ 89 August 28, / 89

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Streams, Delayed Evaluation and a Normal Order Interpreter. CS 550 Programming Languages Jeremy Johnson

Spring 2018 Discussion 7: March 21, Introduction. 2 Primitives

Syntax Analysis, III Comp 412

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

The Earley Parser

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CSCI 3155: Principles of Programming Languages Exam preparation #1 2007

CSE 3302 Programming Languages Lecture 2: Syntax

Lisp. Versions of LISP

Modern Programming Languages. Lecture LISP Programming Language An Introduction

Syntax Analysis, III Comp 412

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Common LISP Tutorial 1 (Basic)

Midterm 2 Solutions Many acceptable answers; one was the following: (defparameter g1

Functional Programming. Big Picture. Design of Programming Languages

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

Parsing II Top-down parsing. Comp 412

Topic 3: Syntax Analysis I

Functional Programming. Pure Functional Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

PL Revision overview

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Streams and Evalutation Strategies

Functional Languages. CSE 307 Principles of Programming Languages Stony Brook University

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Introduction to Scheme

UMBC CMSC 331 Final Exam Section 0101 December 17, 2002

UMBC CMSC 331 Final Exam

Introduction to Artificial Intelligence CISC-481/681 Program 1: Simple Lisp Programs

Programming Languages. Thunks, Laziness, Streams, Memoization. Adapted from Dan Grossman s PL class, U. of Washington

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CSE3322 Programming Languages and Implementation

CSE431 Translation of Computer Languages

SCHEME The Scheme Interpreter. 2 Primitives COMPUTER SCIENCE 61A. October 29th, 2012

Introduction to Lexical Analysis

The role of semantic analysis in a compiler

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

CS 415 Midterm Exam Spring 2002

CS1 Recitation. Week 2

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

CSCI337 Organisation of Programming Languages LISP

CS 275 Name Final Exam Solutions December 16, 2016

Common LISP-Introduction

Question Points Score

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Normal Order (Lazy) Evaluation SICP. Applicative Order vs. Normal (Lazy) Order. Applicative vs. Normal? How can we implement lazy evaluation?

Common Lisp. Blake McBride

CS 480. Lisp J. Kosecka George Mason University. Lisp Slides

Top down vs. bottom up parsing

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. More Common Lisp

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Data Types The ML Type System

Object-oriented Compiler Construction

A Small Interpreted Language

Syntactic Analysis. Top-Down Parsing

Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs. Lexical addressing

CMSC 330: Organization of Programming Languages

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Chapter 4: LR Parsing

Defining Languages GMU

CIS4/681 { Articial Intelligence 2 > (insert-sort '( )) ( ) 2 More Complicated Recursion So far everything we have dened requires

LR Parsing Techniques

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Notes on Higher Order Programming in Scheme. by Alexander Stepanov

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Public-Service Announcement

FUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 4. LISP: PROMĚNNÉ, DALŠÍ VLASTNOSTI FUNKCÍ, BLOKY, MAPOVACÍ FUNKCIONÁLY, ITERAČNÍ CYKLY,

Scheme in Scheme: The Metacircular Evaluator Eval and Apply

Streams. CS21b: Structure and Interpretation of Computer Programs Spring Term, 2004

6.037 Lecture 4. Interpretation. What is an interpreter? Why do we need an interpreter? Stages of an interpreter. Role of each part of the interpreter

Transcription:

CS 842 Ben Cassell University of Waterloo

Recursive Descent Re-Cap Top-down parser. Works down parse tree using the formal grammar. Built from mutually recursive procedures. Typically these procedures represent the production rules of the grammar. Lend themselves very well to representation in both functional languages and simple, nonfunctional equivalents.

Sample Recursive Descent Parser Consider the language defined as follows: A a + B B A B a Generates words such as: a + a a + a + a a + a + a + a Based on samples available at http://www.cs.engr.uky.edu/~lewis/essays/compilers/rec-des.html

parsea([parse, input]) if (isfirst(input, a ) and isfirst(rest(input), + )) then return parseb([parse, rest(rest(input))]) else return [false, ] parseb([parse, input]) if (isfirst(input, ( )) then let [parse2, input2] = parsea([parse, rest(input)]) if (parse2 and isfirst(input2, ) )) then return [parse, rest(input2)] else return [false, ] else if (isfirst(input, a )) then return [parse, rest(input)] else return [false, ]

Types of Recursive Descent Parsers At a high level there are two primary types of recursive descent parsers: Predictive parsers. Backtracking parsers.

Predictive Parsing Parse the class of LL(k) grammars. Use the k token lookahead to determine which production rule to choose. Cannot handle some ambiguous grammars (depending on lookahead) nor left recursion Which is ok because LL(k) does not include any of these grammars! Should be noted that you don t necessarily create an LL(k) grammar by removing left recursion. Predictive parsing on LL(k) grammars runs in linear time.

Breaking Predictive Parsing Left recursion: A Aa A ϵ Ambiguous grammar: A A + A A a

Backtracking Parsers Although k-lookahead parsers are very fast, they are also limited. Backtracking parsers attempts production rules in turn, rewinding given an error and trying other alternatives. Backtracking parsers can handle a much larger variety of languages, but might not terminate on non-ll(k) languages. Even when they do terminate, backtracking parsers can require exponential time to run if implemented naively.

A Potentially Slow Example Consider the ambiguous grammar: A aaa A ϵ This will return an exponential number of valid parses for any series of a s!

Memoization Consider the following implementation of the Fibonacci numbers in Haskell: fib Int Integer fib 0 = 0 fib 1 = 1 fib n = fib n 2 + fib n 1 As calls to this function get larger, they will become significantly slower.

Now, consider this implementation instead: fib Int Integer fib = map fibrec 0..!! where fibrec 0 = 0 fibrec 1 = 1 fibrec n = fib n 2 + fibrec n 1 Similar techniques can be manually applied to backtracking parsers to significantly improve performance. Why run the same production rule multiple times if you already know what it will produce? Based on samples available at http://www.haskell.org/haskellwiki/memoization

Extending Memoization Wouldn t it be great if the technique for memoization could be extended out into a general form that could simply be pluggedin to our other functions? Maybe we could even use similar techniques to solve some of the problems with recursive descent parsers if applied correctly!

Papers Techniques for Automatic Memoization with Applications to Context-Free Parsing Peter Norvig (UC Berkley), 1991 Memoization in Top-Down Parsing Mark Johnson (Brown), 1995

Techniques for Automatic Memoization Claims that an algorithm similar to Earley s algorithm can be generated with a backtracking parser using memoization. Earley s algorithm: Top-down dynamic programming algorithm. Set of states to examine. Starts with only the top rule, and as input is processed, new rules are added to the set of states by prediction, scanning and completion. Has O n 3 time complexity where n is the length of the string, and O n 2 time complexity when the grammar is unambiguous.

Memoizing Functions in General Consider the following code from the paper: (defun memo (fn) (let ((table (make-hash-table))) # (lambda (x) (multiple-value-bind (val found) (gethash x table) (if found val (setf (gethash x table) (funcall fn x)))))))

Problems with the Implementation The function fn being memoized is required to both take and return one value. This is probably too restrictive to be useful. Also, more importantly, what if fn makes any recursive calls? Recursive calls will go to the original version of fn, and not the memoized version. This mostly defeats the point of using the current version, especially for functional parsing.

One Possible Solution Globally rebind what fn points to: (defun memoize (fn-name) (setf (symbol-function fn-name) (memo (symbol-function fn-name)))) This is highly useful, but still has a limitation: Memoized functions can only have one argument. It should accept arbitrary arguments, and we should be able to index on arbitrary combinations of them.

Updated Version (defun memo (fn &key (key # first) (test # eql) name) (let ((table (make-hash-table :test test))) (setf (get name memo) table) # (lambda (&rest args) let ((k funcall key args))) (multiple-value-bind (val found) (gethash k table) (if found val (setf (gethash k table) (apply fn args))))))))

Updated Version (Continued) (defun memoize (fn-name &key (key # first) (test # eql)) (setf (symbol-function fn-name) (memo (symbol-function fn-name) :name fn-name :key key :test test)))

Notes About the New Version The hash table is stored on the property list for the function name, meaning it can be inspected, cleared or otherwise modified. Useful when the working set changes. The default key function is first. This is fine for a single argument. In lisp, identity can be used to hash on all the arguments. The test by default uses eql. Can be changed to equal or other tests as desired. equal, for instance, requires more computational overhead but will prevent duplicated lists.

Using Memoize to Parse: A Simple Top- Down Parser (defun parse (tokens start-symbol) (if (eq (first tokens) start-symbol) (list (make-parse :tree (first tokens) :rem (rest tokens))) (mapcan # (lambda (rule) (extend-parse (lhs rule) nil tokens (rhs rule))) (rules-for start-symbol) (defun extend-parse (lhs rhs rem needed) (if (null needed) (list (make-parse :tree (cons lhs rhs) :rem rem)) (mapcan # (lambda (p) (extend-parse lhs (append rhs (list (parse-tree p))) (parse-rem p) (rest needed))) (parse rem (first needed)))))

Adding Memoization (memoize rules-for) (memoize parse :test # equal :key # identity) (defun parser (tokens start-symbol) (clear-memoize parse) (mapcar # parse-tree (remove-if-not # null (parse tokens start-symbol) :key # parse-rem))) Parse returned all valid parses of all prefixes of the input. Parser looks for completeness.

Limitations The algorithm is equivalent to Earley s (not proved), but with O n 4 complexity. The asymptotic complexity is worse because of the use of equal over remaining tokens. O n 3 is achieved by adding a different type of hash table (compromising between eql and equal), and a memoize that allows a user-specified hash getter and putter functions. Explicitly cannot handle left-recursion (the authors mention this directly). Hash tables that are not very carefully implemented can result in very poor performance.

Silver Linings The parser is exceedingly simple (15 lines of code or less). The technique applies beyond parsing. The authors show automatic memoization applied across various languages: Scheme. Pascal.

Memoization in Top-Down Parsing Goal: Discover why left-recursion fails for memoized parsers and present a memoization technique that can handle it. Takeaway: Instead of returning a set of positions as a single value for the right string positions of a category, return them incrementally.

Symbols Used Throughout Uses symbols similar to The Functional Treatment of Parsing (Leermakers, 1993): S: Sentence (S NP VP) N: Noun ( student, professor ) V: Verb ( likes, knows ) Det: Determinant ( every, no ) PN: Person Name ( Kim, Sandy ) NP: Noun phrase, (NP PN Det N) VP: Verb phrase, (VP V NP V S) This rule in Figure 1 of the paper has a typo. (seq (V S)) should read (seq V S) to properly represent VP V S.

Formalizing Grammars Johnson creates a recursive descent parser quite similar to Norvig s, and defines higher order functions to simplify the process. reduce: Recursively applies a function across a list of arguments. For example, (reduce f x (1 2 3)) would reduce to (f (f (f x 1) 2) 3). union: Construct a unique list from two lists. terminal: Map a substring to a terminal if it matches, otherwise empty. seq: Recognize a concatenation of substrings recognized by two functions.

Formalizing Grammars (Continued) alt: Recognize the union of substrings recognized by two functions. epsilon: Recognizes the empty string. opt: Recognizes optional elements. k*: Recognize the Kleene star of an element. recognize: Return true if the string passed in can be parsed from the start symbol.

Language Problem Consider the following examples: (define S (seq NP VP)) (define VP (alt (seq V NP) (seq V S))) In Scheme, the way these are written incurs a mutual recursion issue (the binding will fail). The fix for this is fairly straight-forward: (define-syntax vacuous (syntax-rules () ((vacuous fn) (lambda args (apply fn args))))) (define S (vacuous (seq NP VP)))

More Involved Problems The provided rules act as a top-down parser. Results are returned as a list of suffixes. We get our usual left-recursive problems. There is a non-trivial amount of re-computation that occurs by default. Memoization can prevent the re-computation. The presented memo function is a Scheme version of the Norvig technique. Now we can write: (define S (memo (vacuous (seq NP VP)))) As in the Norvig paper however, this still doesn t allow the parsing of left-recursive grammars.

What s the Fundamental Problem? To memoize a result, it first needs to be fully computed, which is impossible. Instead, memoize calls as they are made and lazily evaluate the results as needed. To do this, Johnson uses a technique called Continuation-Passing Style with memoization. Provide a function to return a result to. This effectively reverses the direction of computation from bottom-up to top-down.

Sample CPS Function Traditional definition: (define (square x) (* x x)) CPS definition: (define (square cont x) (cont (* x x))) (square display 10)

How do the Parsing Functions Change? Rules (A) are now represented by functions g A c l that reduce in a way such that (c r) only reduces if A can derive the string from positions l to r. In other words, (c r) is evaluated zero or more times with r bound to right string positions. This implies that instead of returning a set of string positions, we re simply calling the continuation for each result position instead. The terminal will could now be written as: (define (future-aux continuation pos) (if (and (pair? pos) (eq? (car pos) will)) (continuation (cdr pos))) The rule VP V NP V S could be written: (define (VP continuation pos) (begin (V (lambda (pos1) (NP continuation pos1)) pos) (V (lambda (pos1) (S continuation pos1)) pos)))

Simplifying and Recognizing In the previous example, the lambda expression tells the function V to pass a parsed V s right string position into NP and S. Johnson redefines alt, seq and terminal to simplify this process. The recognize function now simply passes a continuation that marks whether or not a word is successfully parsed. The example uses a set! but this is not necessary. Problem: This still fails to terminate, even in the memoized version!

The Secret Sauce: Updating Memo Memo table entries can t record reduced results. In CPS the results are what are passed forwards to the continuation. New memo function stores a set of argument values that maps to a list of caller continuations and a list of result values. Result values are propagated and updated as new values are returned. Values that are not subsumed by previous ones are added to the list of associated entries. When a function is called it looks up whether or not values already exist to pass to the continuation based on the provided arguments.

Left Recursion is Fixed! Unmemoized functions are never called more than once on the same arguments. They can fall back on the lazily-evaluated continuation list stored in the hash table. Even a left-recursive grammar can look up the continuation and result list and pass computation forwards as the CPS-style calls enumerate through the parse of the input string. Progress is always made because the left-recursive call does not need to drill down again to produce a concrete result, proving that the lazy shall inherit the Earth!

The End