Data Abstraction. An Abstraction for Inductive Data Types. Philip W. L. Fong.

Data Abstraction An Abstraction for Inductive Data Types Philip W. L. Fong pwlfong@cs.uregina.ca Department of Computer Science University of Regina Regina, Saskatchewan, Canada

Introduction This lecture covers [EOPL2] Sect. 2.2. We skip Sect. 2.1 for now. An Abstraction for Inductive Data Types p.1/43

Variant Records An Abstraction for Inductive Data Types p.2/43

Variant Records Aggregate data types Array element selection by indexing Record element (i.e., field) selection by field name Union Discriminated union (i.e., variant records) variant discrimination via a tag field An Abstraction for Inductive Data Types p.3/43

Example: Variant Records struct BinTree { enum { T_LEAF, T_INTERIOR } tag; union { struct { int datum; } leaf; struct { const char *symbol; const BinTree *left; const BinTree *right; } interior; } node; } An Abstraction for Inductive Data Types p.4/43

Variant Records for Scheme [EOPL2] provides a facility, datatype, for supporting the use of variant records in Scheme. Note: datatype is not a standard feature in Scheme. An Abstraction for Inductive Data Types p.5/43

Example: Binary Trees Grammar:!bintree" ::=!number" ::= (!symbol"!bintree"!bintree") An Abstraction for Inductive Data Types p.6/43

Design Goals 1. constructors that allow us to build variants of binary trees 2. a predicate to test if a given value is a binary tree 3. some way of determining, given a binary tree, whether it is a leaf or interior node 4. some way of extracting the components of each variant An Abstraction for Inductive Data Types p.7/43

Using datatype A datatype for binary trees: (define-datatype bintree bintree? (leaf-node (datum number?)) (interior-node (key symbol?) (left bintree?) (right bintree?))) A predicate (bintree? x) is defined to test if x is a bintree. An Abstraction for Inductive Data Types p.10/43

Using datatype A datatype for binary trees: (define-datatype bintree bintree? (leaf-node (datum number?)) (interior-node (key symbol?) (left bintree?) (right bintree?))) A bintree can be either a leaf-node or an interior-node. An Abstraction for Inductive Data Types p.11/43

Using datatype A datatype for binary trees: (define-datatype bintree bintree? (leaf-node (datum number?)) (interior-node (key symbol?) (left bintree?) (right bintree?))) A leaf-node has only one field, called datum, which stores a number. The type of a field is specified by naming a predicate, such as number? in the case of datum. An Abstraction for Inductive Data Types p.12/43

Using datatype A datatype for binary trees: (define-datatype bintree bintree? (leaf-node (datum number?)) (interior-node (key symbol?) (left bintree?) (right bintree?))) A constructor procedure (leaf-node n) is defined so that one can construct a leaf-node out of a number n. An Abstraction for Inductive Data Types p.13/43

Using datatype A datatype for binary trees: (define-datatype bintree bintree? (leaf-node (datum number?)) (interior-node (key symbol?) (left bintree?) (right bintree?))) An interior-node consists of 3 fields: key, left, and right. Their types are respectively symbol, bintree, and bintree. An Abstraction for Inductive Data Types p.14/43

Using datatype A datatype for binary trees: (define-datatype bintree bintree? (leaf-node (datum number?)) (interior-node (key symbol?) (left bintree?) (right bintree?))) A constructor (interior-node S B1 B2 ) is defined so that one can create an interior-node out of a symbol S, and two bintrees B1 and B2. An Abstraction for Inductive Data Types p.15/43

Example > (define BT (interior-node a a (leaf-node 3) b 3 (interior-node b 5 0 (leaf-node 5) (leaf-node 0)))) > (bintree? BT) #t An Abstraction for Inductive Data Types p.16/43

Variant Discrimination & Component Extraction: 1st Attempt Need a way of determining if a given bintree is a leaf-node. (leaf-node? tree) Need a way of extracting the datum field of a given leaf-node. (leaf-node-datum leaf ) Similar facilities for interior-node... (interior-node? tree) (interior-node-key node) (interior-node-left node) (interior-node-right node) An Abstraction for Inductive Data Types p.18/43

Pattern Matching The designer of the datatype facility did not go that path. That approach would have led to clumsy code. Instead, they have adopted a variant discrimination and component extraction mechanism known as pattern matching, which was first introduced in the functional programming language ML. Pattern matching for datatypes is provided via the cases syntactic form. An Abstraction for Inductive Data Types p.19/43

Example: leaf-sum (leaf-sum tree) Argument(s): tree: a bintree Return: sum of all data at the leaf nodes of tree An Abstraction for Inductive Data Types p.20/43

Example: leaf-sum (define leaf-sum (lambda (tree) (cases bintree tree (leaf-node (N) N) (interior-node (S L R) (+ (leaf-sum L) (leaf-sum R)))))) Write a (cases...) construct to discriminate variants. Identify the type of variant record (i.e., bintree) to be discriminated. An Abstraction for Inductive Data Types p.21/43

Example: leaf-sum (define leaf-sum (lambda (tree) (cases bintree tree (leaf-node (N) N) (interior-node (S L R) (+ (leaf-sum L) (leaf-sum R)))))) Apply the cases construct to an instance of a datatype (i.e., tree). An Abstraction for Inductive Data Types p.22/43

Example: leaf-sum (define leaf-sum (lambda (tree) (cases bintree tree (leaf-node (N) N) (interior-node (S L R) (+ (leaf-sum L) (leaf-sum R)))))) Identify all the variants (i.e., leaf-node & interior-node). For each variant, identify also a list of local variables to be bound to the components of that variant (i.e., (N) in the case of leaf-node, and (S L R) in the case of interior-node). An Abstraction for Inductive Data Types p.23/43

Example: leaf-sum (define leaf-sum (lambda (tree) (cases bintree tree (leaf-node (N) N) (interior-node (S L R) (+ (leaf-sum L) (leaf-sum R)))))) If tree is a leaf-node, then a list of local variables, (N), will be created, with bindings equal the corresponding components of the leaf-node. With these local bindings effective, the expression N will be evaluated, and its value is returned as the value of the cases construct. An Abstraction for Inductive Data Types p.24/43

Example: leaf-sum (define leaf-sum (lambda (tree) (cases bintree tree (leaf-node (N) N) (interior-node (S L R) (+ (leaf-sum L) (leaf-sum R)))))) If tree is an interior-node, then a list of local variables, (S L R), will be created, with bindings equal the corresponding components of the interior-node. With these local bindings effective, the expression (+ (leaf-sum L) (leaf-sum R)) will be evaluated. An Abstraction for Inductive Data Types p.25/43

Example: flip-bintree (flip-bintree tree) Argument(s): tree: a bintree Return: Return a bintree obtained by swapping the left and right subtrees of every interior node in tree. Example: a a 3 b 5 0 b 0 3 5 An Abstraction for Inductive Data Types p.26/43

Example: flip-bintree (define flip-bintree (lambda (tree) (cases bintree tree (leaf-node (N) (leaf-node N)) (interior-node (S L R) (interior-node S (flip-bintree R) (flip-bintree L)))))) An Abstraction for Inductive Data Types p.27/43

Abstract Syntax An Abstraction for Inductive Data Types p.28/43

Concrete Syntax Grammar for Lambda Calculus:!expression" ::=!identifier" ::= (lambda (!identifier")!expression") ::= (!expression"!expression") The brackets, the list representation, and key words such as lambda are introduced to make the language easy to write by humans. Inconvenient to work with for language processors (e.g., interpreters). An Abstraction for Inductive Data Types p.29/43

Example: Revisiting occurs-free? A variable x occurs free in a lambda calculus expression E iff 1. E is a variable reference such that E is the same as x; or 2. E is of the form (lambda (y ) E1 ), where y is different from x and x occurs free in E1 ; or 3. E is of the form (E1 E2 ) such that x occurs free in either E1 or E2. An Abstraction for Inductive Data Types p.30/43

Example: Revisiting occurs-free? An implementation for concrete syntax: (define occurs-free? (lambda (x E) (cond ((symbol? E) (eqv? x E)) ((eqv? (car E) lambda) (let ((y (caadr E)) (E1 (caddr E))) (and (not (eqv? (y x))) (occurs-free? x E1)))) (else (let ((E1 (car E)) (E2 (cadr E))) (or (occurs-free? x E1) (occurs-free? x E2))))))) An Abstraction for Inductive Data Types p.31/43

Abstract Syntax Rather than working with clumsy concrete syntax, a language processor (e.g., an interpreter) usually works with a more convenient internal representation called abstract syntax trees. The idea is that most of the concrete syntax information is discarded, leaving only the essential information that reflects the abstract syntactic structure of a program. An Abstraction for Inductive Data Types p.32/43

Abstract Syntax Trees Concrete syntax: (lambda (x) (f (f x))) Abstract syntax tree: lambda-exp id body x app-exp rtor rand var-exp app-exp id rtor f var-exp id rand var-exp id f f An Abstraction for Inductive Data Types p.33/43

Parsing A parser transforms a source program (in concrete syntax) to an abstract syntax tree (AST): program AST Parser Processor text An Abstraction for Inductive Data Types p.34/43

Defining Abstract Syntax Trees The art and science of building a parser belong to a course on compiler construction (CS410). A shallow coverage of this topic is given in Chapter 3 of [EOPL2]. In this lecture, we are concerned with with following question: Given a concrete syntax, how does one define a data structure for representing the corresponding abstract syntax trees? An Abstraction for Inductive Data Types p.35/43

Example: Lambda Calculus!expression" ::=!identifier" ::= (lambda (!identifier")!expression") ::= (!expression"!expression") An Abstraction for Inductive Data Types p.36/43

Example: Lambda Calculus!expression" ::=!identifier" var-exp (id) ::= (lambda (!identifier")!expression") lambda-exp (id body) ::= (!expression"!expression") app-exp (rator rand) One datatype for each nonterminal being defined. One variant for each production. One field for each occurrence of a nonterminal in a production. An Abstraction for Inductive Data Types p.37/43

Example: Lambda Calculus (define-datatype expression expression? (var-exp (id symbol?)) (lambda-exp (id symbol?) (body expression?)) (app-exp (rator expression?) (rand expression?))) An Abstraction for Inductive Data Types p.38/43

Example: Revisiting occurs-free? An implementation based on processing abstract syntax trees: (define occurs-free? (lambda (x E) (cases expression E (var-exp (y) (eqv? y x)) (lambda-exp (y E1) (and (not (eqv? y x)) (occurs-free? x E1))) (app-exp (E1 E2) (or (occurs-free? x E1) (occurs-free? x E2)))))) An Abstraction for Inductive Data Types p.40/43

Working with Abstract Syntax The language processors developed in this course maniputate abstract syntax trees rather than concrete syntax. Throughout this course, whenever a language grammar (i.e., concrete syntax) is given, we always define a corresponding set of datatypes to represent its abstract syntax trees. An Abstraction for Inductive Data Types p.42/43

Lecture Summary This lecture covers [EOPL2] Sect. 2.2. Variant records via datatype Pattern matching via cases Concrete syntax vs abstract syntax Parsing conversion of concrete syntax to abstract syntax trees Defining abstract syntax trees using datatype Manipulating abstract syntax trees using cases An Abstraction for Inductive Data Types p.43/43