exp ::= (match exp clause :::) j clause :::) j * clause :::) j (match-let ([pat exp] :::) body) j (match-let* ([pat exp] :::) body) j (match-letrec ([

Similar documents
Pattern Matching for Scheme

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

Scheme Quick Reference

Scheme Quick Reference

Introduction to Scheme

User-defined Functions. Conditional Expressions in Scheme

MIT Scheme Reference Manual

The Typed Racket Guide

Notes on Higher Order Programming in Scheme. by Alexander Stepanov

Typed Racket: Racket with Static Types

A Brief Introduction to Scheme (II)

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

Functional Programming Languages (FPL)

Essentials of Programming Languages Language

Essentials of Programming Languages Language

Functional Programming. Pure Functional Programming

Scheme Tutorial. Introduction. The Structure of Scheme Programs. Syntax

INTRODUCTION TO SCHEME

Evaluating Scheme Expressions

Revised 5 Report on the Algorithmic Language Scheme

An Introduction to Scheme

Documentation for LISP in BASIC

Title. Authors. Status. 1. Abstract. 2. Rationale. 3. Specification. R6RS Syntax-Case Macros. Kent Dybvig

Kent Dybvig, Will Clinger, Matthew Flatt, Mike Sperber, and Anton van Straaten

The Typed Racket Reference

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Functional Programming. Pure Functional Languages

Typed Scheme: Scheme with Static Types

11/6/17. Functional programming. FP Foundations, Scheme (2) LISP Data Types. LISP Data Types. LISP Data Types. Scheme. LISP: John McCarthy 1958 MIT

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures

Programming Languages

Functional Programming. Pure Functional Languages

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

6.037 Lecture 4. Interpretation. What is an interpreter? Why do we need an interpreter? Stages of an interpreter. Role of each part of the interpreter

SMURF Language Reference Manual Serial MUsic Represented as Functions

Functional Languages. Hwansoo Han

Syntax: Meta-Programming Helpers

Overview. CS301 Session 3. Problem set 1. Thinking recursively

Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs. Lexical addressing

COP4020 Programming Assignment 1 - Spring 2011

Programming Languages

CS 360 Programming Languages Interpreters

Principles of Programming Languages Topic: Functional Programming Professor L. Thorne McCarty Spring 2003

LECTURE 16. Functional Programming

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018

Scheme in Scheme: The Metacircular Evaluator Eval and Apply

Programming Languages: Application and Interpretation

Below are example solutions for each of the questions. These are not the only possible answers, but they are the most common ones.

The Typed Racket Guide

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Modern Programming Languages. Lecture LISP Programming Language An Introduction

Building a system for symbolic differentiation

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

Symbolic Programming. Dr. Zoran Duric () Symbolic Programming 1/ 89 August 28, / 89

Comp 311: Sample Midterm Examination

6.001 Notes: Section 8.1

Parser Tools: lex and yacc-style Parsing

How to Design Programs Languages

6.034 Artificial Intelligence February 9, 2007 Recitation # 1. (b) Draw the tree structure corresponding to the following list.

Concepts of programming languages

CS 342 Lecture 8 Data Abstraction By: Hridesh Rajan

Fundamentals of Artificial Intelligence COMP221: Functional Programming in Scheme (and LISP)

Building a system for symbolic differentiation

Spring 2018 Discussion 7: March 21, Introduction. 2 Primitives

JRM s Syntax-rules Primer for the Merely Eccentric

Metaprogramming assignment 3

CS 342 Lecture 7 Syntax Abstraction By: Hridesh Rajan

Parser Tools: lex and yacc-style Parsing

CS 314 Principles of Programming Languages

Scheme. Functional Programming. Lambda Calculus. CSC 4101: Programming Languages 1. Textbook, Sections , 13.7

Organization of Programming Languages CS3200/5200N. Lecture 11

Redex: Practical Semantics Engineering

Z Notation. June 21, 2018

Fall 2017 Discussion 7: October 25, 2017 Solutions. 1 Introduction. 2 Primitives

Deferred operations. Continuations Structure and Interpretation of Computer Programs. Tail recursion in action.

CSE 341 Section 7. Ethan Shea Autumn Adapted from slides by Nicholas Shahan, Dan Grossman, Tam Dang, and Eric Mullen

Using Symbols in Expressions (1) evaluate sub-expressions... Number. ( ) machine code to add

Functional Programming

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

The SPL Programming Language Reference Manual

Programming Languages Third Edition

Principles of Programming Languages

Project 5 - The Meta-Circular Evaluator

CSSE 304 Assignment #13 (interpreter milestone #1) Updated for Fall, 2018

The PCAT Programming Language Reference Manual


A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

CSE 413 Midterm, May 6, 2011 Sample Solution Page 1 of 8

Scheme: Strings Scheme: I/O

Functional Languages. CSE 307 Principles of Programming Languages Stony Brook University

Functional Programming. Big Picture. Design of Programming Languages

Principles of Programming Languages

Project 2: Scheme Interpreter

C311 Lab #3 Representation Independence: Representation Independent Interpreters

FP Foundations, Scheme

Principles of Programming Languages COMP251: Functional Programming in Scheme (and LISP)

CS 314 Principles of Programming Languages

Python I. Some material adapted from Upenn cmpe391 slides and other sources

2 Revised 5 Scheme INTRODUCTION Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and r

CSE 341 Section 7. Eric Mullen Spring Adapted from slides by Nicholas Shahan, Dan Grossman, and Tam Dang

Transcription:

Pattern Matching for Scheme Andrew K. Wright and Bruce F. Duba Department of Computer Science Rice University Houston, TX 77251-1892 Version 1.12, May 24, 1995 Please direct questions or bug reports regarding this software to wright@research.nj.nec.com. The most recent version of this software can be obtained by anonymous FTP from site ftp.nj.nec.com in le pub/wright/match.tar.gz. 1 Pattern Matching for Scheme Pattern matching allows complicated control decisions based on data structure to be expressed in a concise manner. Pattern matching is found in several modern languages, notably Standard ML, Haskell and Miranda. This document describes several pattern matching macros for Scheme, and an associated mechanism for dening new forms of structured data. The basic form of pattern matching expression is: (match exp [pat body] :::) where exp is an expression, pat is a pattern, and body is one or more expressions (like the body of a lambda-expression). 1 The match form matches its rst subexpression against a sequence of patterns, and branches to the body corresponding to the rst pattern successfully matched. For example, the following code denes the usual map function: (dene map (lambda (f l) (match l [() ()] [(x. y) (cons (f x) (map f y))]))) The rst pattern () matches the empty list. The second pattern (x :y)matches a pair, binding x to the rst component of the pair and y to the second component of the pair. 1.1 Pattern Matching Expressions The complete syntax of the pattern matching expressions follows: 1 The notation \hthing i :::" indicates that hthing i is repeated zero or more times. The notation \hthing 1 j thing 2 i" means an occurrence of either thing 1 or thing 2. Brackets \[]" are extended Scheme syntax, equivalent toparentheses \()". 1

exp ::= (match exp clause :::) j clause :::) j * clause :::) j (match-let ([pat exp] :::) body) j (match-let* ([pat exp] :::) body) j (match-letrec ([pat exp] :::) body) j (match-let var ([pat exp] :::) body) j (match-dene pat exp) clause ::= [pat body] j [pat (=> identier) body] Figure 1 gives the full syntax for patterns. The next subsection describes the various patterns. The match-lambda and match-lambda* forms are convenient combinations of match and lambda, and can be explained as follows: [pat body] :::) = (lambda (x) (match x [pat body] :::)) * [pat body] :::) = (lambda x (match x [pat body] :::)) where x is a unique variable. The match-lambda form is convenient when dening a single argument function that immediately destructures its argument. The match-lambda* form constructs a function that accepts any number of arguments the patterns of match-lambda* should be lists. The match-let, match-let*, match-letrec, and match-dene forms generalize Scheme's let, let*, letrec, and dene expressions to allow patterns in the binding position rather than just variables. For example, the following expression: (match-let ([(x y z) (list 123)]) body) binds x to 1, y to 2, andz to 3 in body. These forms are convenient for destructuring the result of a function that returns multiple values as a list or vector. As usual for letrec and dene, pattern variables bound by match-letrec and match-dene should not be used in computing the bound value. The match, match-lambda, and match-lambda* forms allow the optional syntax (=> identier) between the pattern and the body of a clause. When the pattern match for such a clause succeeds, the identier is bound to a failure procedure of zero arguments within the body. If this procedure is invoked, it jumps back to the pattern matching expression, and resumes the matching process as if the pattern had failed to match. The body must not mutate the object being matched, otherwise unpredictable behavior may result. 1.2 Patterns Figure 1 gives the full syntax for patterns. Explanations of these patterns follow. identier (excluding the reserved names?, $, =,, and, or, not, set!, get!, :::, and ::k for nonnegative integers k): matches anything, and binds avariable of this name to the matching value in the body. : matches anything, without binding any variables. (), #t, #f, string, number, character, 's-expression: These constant patterns match themselves, ie., the corresponding value must be equal? to the pattern. (pat 1 :::pat n ): matches a proper list of n elements that match pat 1 through pat n. 2

Pattern : Matches : pat ::= identier anything, and binds identier as a variable j anything j () itself (the empty list) j #t itself j #f itself j string an equal? string j number an equal? number j character an equal? character j 's-expression an equal? s-expression j 'symbol an equal? symbol (special case of s-expression) j (pat 1 :::pat n ) a proper list of n elements j (pat 1 :::pat n : pat n+1 ) a list of n or more elements j (pat 1 :::pat n pat n+1 ::k) a proper list of n + k or more elements a j #(pat 1 :::pat n ) avector of n elements j #(pat 1 :::pat n pat n+1 ::k) avector of n + k or more elements j #&pat abox j ($ struct pat1 :::patn) a structure j (= eld pat) a eld of a structure j (and pat 1 :::pat n ) if all of pat 1 through pat n match j (or pat 1 :::pat n ) if any ofpat 1 through pat n match j (not pat 1 :::pat n ) if none of pat 1 through pat n match j (? predicate pat 1 :::pat n ) if predicate true and pat 1 through pat n all match j (set! identier ) anything, and binds identier as a setter j (get! identier ) anything, and binds identier as a getter j `qp a quasipattern Quasipattern: Matches : qp ::= () itself (the empty list) j #t itself j #f itself j string an equal? string j number an equal? number j character an equal? character j identier an equal? symbol j (qp 1 :::qp n ) a proper list of n elements j (qp 1 :::qp n : qp n+1 ) a list of n or more elements j (qp 1 :::qp n qp n+1 ::k) a proper list of n + k or more elements j #(qp 1 :::qp n ) avector of n elements j #(qp 1 :::qp n qp n+1 ::k) avector of n + k or more elements j #&qp abox j,pat a pattern j,@pat a pattern, spliced Figure 1: Pattern Syntax a The notation ::k denotes a keyword consisting of three consecutive dots (ie., \:::"), or two dots and an non-negative integer (eg., \::1", \::2"), or three consecutive underscores (ie., \ "), or two underscores and a non-negative integer. The keywords \::k" and\ k" are equivalent. The keywords \:::", \ ", \::0", and \ 0" are equivalent. 3

(pat 1 :::pat n : pat n+1 ): matches a (possibly improper) list of at least n elements that ends in something matching pat n+1. (pat 1 :::pat n pat n+1 :::): matches a proper list of n or more elements, where each element of the tail matches pat n+1. Each pattern variable in pat n+1 is bound to a list of the matching values. For example, the expression: (match '(let ([x 1][y 2]) z) [('let ((binding values) :::) exp) body]) binds binding to the list '(x y), values tothelist'(12), and exp to 'z in the body of the matchexpression. For the special case where pat n+1 is a pattern variable, the list bound to that variable may share with the matched value. (pat 1 :::pat n pat n+1 ): This pattern means the same thing as the previous pattern. (pat 1 :::pat n pat n+1 ::k): This pattern is similar to the previous pattern, but the tail mustbeat least k elements long. The pattern keywords ::0 and::: are equivalent. (pat 1 :::pat n pat n+1 k): This pattern means the same thing as the previous pattern. #(pat 1 :::pat n ): matches a vector of length n, whose elements match pat 1 through pat n. #(pat 1 :::pat n pat n+1 :::): matches a vector of length n or more, where each element beyond n matches pat n+1. #(pat 1 :::pat n pat n+1 ::k): n matches pat n+1. matches a vector of length n + k or more, where each element beyond #&pat: matches a box containing something matching pat. ($ struct pat 1 :::pat n ): matches a structure declared with dene-structure or dene-conststructure. See Section 2. (= eld pat): is intended for selecting a eld from a structure. \eld" may beany expression it is applied to the value being matched, and the result of this application is matched against pat. (and pat 1 :::pat n ): matches if all of the subpatterns match. At least one subpattern must be present. This pattern is often used as (and x pat) to bind x to to the entire value that matches pat (cf. \as-patterns" in ML or Haskell). (or pat 1 :::pat n ): matches if any of the subpatterns match. At least one subpattern must be present. All subpatterns must bind the same set of pattern variables. (not pat 1 :::pat n ): matches if none of the subpatterns match. At least one subpattern must be present. The subpatterns may not bind any pattern variables. 4

(? predicate pat 1 :::pat n ): In this pattern, predicate must be an expression evaluating to a single argument function. This pattern matches if predicate applied to the corresponding value is true, and the subpatterns pat 1 :::pat n all match. The predicate should not have side eects, as the code generated by the pattern matcher may invoke predicates repeatedly in any order. The predicate expression is bound in the same scope as the match expression, ie., free variables in predicate are not bound by pattern variables. (set! identier): matches anything, and binds identier to a procedure of one argument that mutates the corresponding eld of the matching value. This pattern must be nested within a pair, vector, box, or structure pattern. For example, the expression: (dene x (list 1 (list 2 3))) (match x [( ( (set! setit))) (setit 4)]) mutates the cadadr of x to 4, so that x is '(1 (2 4)). matches anything, and binds identier to a procedure of zero arguments that (get! identier): accesses the corresponding eld of the matching value. This pattern is the complement toset!. As with set!, this pattern must be nested within a pair, vector, box, or structure pattern. Quasipatterns: Quasiquote introduces a quasipattern, in which identiers are considered to be symbolic constants. Like Scheme's quasiquote for data, unquote (,) and unquote-splicing (,@) escape back to normal patterns. 1.3 Match Failure If no clause matches the value, the default action is to invoke the procedure match:error with the value that did not match. The default denition of match:error calls error with an appropriate message: > (match 1 [2 2]) Error: no clause matched 1. For most situations, this behavior is adequate, but it can be changed either by redening match:error, or by altering the value of the variable match:error-control. Valid values for match:error-control are: match:error-control: error action: 'error (default) call (match:error unmatched-value) 'match call (match:error unmatched-value '(match expression :::)) 'fail call match:error or die in car, cdr,... 'unspecied return unspecied value Setting match:error-control to 'match causes the entire match expression to be quoted and passed as a second argument tomatch:error. The default denition of match:error then prints the match expression before calling error this can help identify which expression failed to match. This option causes the macros to generate somewhat larger code, since each match expression includes a quoted representation of itself. Setting match:error-control to 'fail permits the macros to generate faster and more compact code than 'error or 'match. The generated code omits pair? tests when the consequence is to fail in car or cdr rather than call match:error. Finally, ifmatch:error-control is set to 'unspecied, non-matching expressions will either fail in car or cdr, or return an unspecied value. This results in still more compact code, but is unsafe. 5

2 Data Denition The ability to dene new forms of data proves quite useful in conjunction with pattern matching. This macro package includes a slightly altered 2 version of Chez Scheme's dene-structure macro for dening new forms of data [1], and a similar dene-const-structure macro for dening immutable data. The following expression denes a new kind of data named struct: (dene-structure (struct arg 1 :::arg n )) A struct is a composite data structure with n elds named arg 1 through arg n. The dene-structure macro declares the following procedures for constructing and manipulating data of type struct: Procedure Name: make-struct struct? struct-arg 1, :::, struct-arg n set-struct-arg 1!, :::, set-struct-arg n! struct-1, :::, struct-n set-struct-1!, :::, set-struct-n! Function: constructor requiring n arguments predicate named selectors named mutators numeric selectors numeric mutators The eld name (underscore) is special: no named selectors or mutators are dened for such a eld. Such unnamed elds can only be accessed through the numeric selectors or mutators, or through pattern matching. A second form of denition: (dene-structure (struct arg 1 :::arg n )([init 1 exp1 ] ::: [init m expm])) declares m additional elds init 1 through init m with initial values exp 1 through exp m. The expressions exp 1 through exp m are evaluated in order each time make-struct is invoked. Finally, the macro dene-const-structure: (dene-const-structure (struct arg 1 :::arg n )) (dene-const-structure (struct arg 1 :::arg n )([init 1 exp1 ] ::: [init m expm])) is similar to dene-structure, butallows immutable elds. If a eld name arg i is simply a variable, no (named or numeric) mutator is declared for that eld. If a eld name has the form (! x) where x is a variable, then that eld is mutable. Hence (dene-structure (Foo a b)) abbreviates (dene-const-structure (Foo (! a) (! b))). By default, structures are implemented as vectors whose rst component is the name of the structure as a symbol. Thus a Foo structure of one eld will match both the patterns ($ Foo x) and #('Foo x). Setting the variable match:structure-control to 'disjoint causes subsequent denestructure denitions to create structures that are disjoint from all other data, including vectors. In this case, Foo structures will no longer match the pattern #('Foo x). 3 2 This macro generates additional numeric selector and mutator names for use by the pattern matcher, recognizes as an unnamed eld, and optionally allows structures to be disjoint from vectors. Chez Scheme does not provide dene-const-structure. 3 Disjoint structures are implemented as vectors whose rst component is a unique symbol (an uninterned symbol for Chez Scheme). The procedure vector? is modied to return false for such vectors (hence the 'disjoint option cannot be used with Chez Scheme's optimize-level set higher than 1). For completeness the other vector operations (vector-ref, vector-set!, etc.) should also be modied to reject structures, but we don't bother. 6

3 Code Generation Pattern matching macros are compiled into if-expressions that decompose the value being matched with standard Scheme procedures, and test the components with standard predicates. Rebinding or lexically shadowing the names of any of these procedures will change the semantics of the match macros. The names that should not be rebound or shadowed are: null? pair? number? string? symbol? boolean? char? procedure? vector? box? list? equal? car cdr cadr cdddr... vector-length vector-ref unbox reverse length call/cc Additionally, the code generated to match a structure pattern like ($Foo pat 1 :::pat n ) refers to the names Foo?, Foo-1 through Foo-n, and set-foo-1! through set-foo-n!. These names also should not be shadowed. 4 Examples This section illustrates the convenience of pattern matching with some examples. The following function recognizes s-expressions that represent the standard Y operator: (dene Y? [('lambda (f1 ) ('lambda (y1 ) ((('lambda (x1 )(f2 ('lambda (z1 )((x2 x3 ) z2 )))) ('lambda (a1 )(f3 ('lambda (b1 )((a2 a3 ) b2 ))))) y2 ))) (and (symbol? f1 )(symbol? y1 )(symbol? x1 )(symbol? z1 )(symbol? a1 )(symbol? b1 ) (eq? f1 f2 )(eq? f1 f3 )(eq? y1 y2 ) (eq? x1 x2 )(eq? x1 x3 )(eq? z1 z2 ) (eq? a1 a2 )(eq? a1 a3 )(eq? b1 b2 ))] [ #f])) Writing an equivalent piece of code in raw Scheme is tedious. The following code denes abstract syntax for a subset of Scheme, a parser into this abstract syntax, and an unparser. (dene-structure (Lam args body)) (dene-structure (Var s)) (dene-structure (Const n)) (dene-structure (App fun args)) 7

(dene parse [(and s (? symbol?) (not 'lambda)) (make-var s)] [(? number? n) (make-const n)] [('lambda (and args ((? symbol?) :::)(not (? repeats?))) body) (make-lam args (parse body))] [(f args :::) (make-app (parse f ) (map parse args))] [x (error x "invalid expression")])) (dene repeats? (lambda (l) (and (not (null? l)) (or (memq (car l) (cdr l)) (repeats? (cdr l)))))) (dene unparse [($ Var s) s] [($ Const n) n] [($ Lam args body) `(lambda,args,(unparse body))] [($ App f args) `(,(unparse f ),@(map unparse args))])) With pattern matching, it is easy to ensure that the parser rejects all incorrectly formed inputs with an error message. With match-dene, it is easy to dene several procedures that share a hidden variable. The following code denes three procedures, inc, value, and reset, that manipulate a hidden counter variable: (match-dene (inc value reset) (let ([val 0]) (list (lambda () (set! val (+ 1 val))) (lambda () val) (lambda () (set! val 0))))) Although this example is not recursive, the bodies could recursively refer to each other. The following code is taken from the macro package itself. The procedure match:validate-pattern checks the syntax of match patterns, and converts quasipatterns into ordinary patterns. 8

(dene match:validate-pattern (lambda (pattern) (letrec ([simple? (lambda (x) (or (string? x) (boolean? x) (char? x) (number? x) (null? x)))] [ordinary [(? simple? p) p] [' ' ] [(? match:pattern-var? p) p] [('quasiquote p) (quasi p)] [(and p ('quote )) p] [('? pred ps :::)`(?,pred,@(map ordinary ps))] [('and ps..1) `(and,@(map ordinary ps))] [('or ps..1) `(or,@(map ordinary ps))] [('not ps..1) `(not,@(map ordinary ps))] [('$ (? match:pattern-var? r) ps :::) `($,r,@(map ordinary ps))] [(and p ('set! (? match:pattern-var?))) p] [(and p ('get! (? match:pattern-var?))) p] [(p (? match:dot-dot-k? ddk)) `(,(ordinary p),ddk)] [(x. y) (cons (ordinary x) (ordinary y))] [(? vector? p) (apply vector (map ordinary (vector->list p)))] [(? box? p) (box (ordinary (unbox p)))] [p (match:syntax-err pattern "syntax error in pattern")])] [quasi [(? simple? p) p] [(? symbol? p) `(quote,p)] [('unquote p) (ordinary p)] [(('unquote-splicing p).()) (ordinary p)] [(('unquote-splicing p). y) (append (ordlist p) (quasi y))] [(p (? match:dot-dot-k? ddk)) `(,(quasi p),ddk)] [(x. y) (cons (quasi x) (quasi y))] [(? vector? p) (apply vector (map quasi (vector->list p)))] [(? box? p) (box (quasi (unbox p)))] [p (match:syntax-err pattern "syntax error in pattern")])] [ordlist [() ()] [(x. y) (cons (ordinary x) (ordlist y))] [p (match:syntax-err pattern "invalid use of unquote-splicing in pattern")])]) (ordinary pattern)))) 5 Known Bugs A structure pattern like ($ foo a b c) is not checked to ensure that there are enough elds present for a foo object. This should be xed in the future. 9

Acknowledgments Several members of the Rice programming languages community exercised the implementation and suggested enhancements. We thank Matthias Felleisen, Cormac Flanagan, Amit Patel, and Amr Sabry for their contributions. References [1] Dybvig, R. K. The Scheme Programming Language. Prentice-Hall, Englewood Clis, New Jersey, 1987. 10