ASTs, Objective CAML, and Ocamlyacc

Similar documents
Abstract Syntax Trees

Implementing Actions

CS 11 Ocaml track: lecture 6

CMPT 379 Compilers. Parse trees

Final Report on Cards the Language. By Jeffrey C Wong jcw2157

Topic 3: Syntax Analysis I

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

2.2 Syntax Definition

Table-Driven Top-Down Parsers

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Principles of Programming Languages COMP251: Syntax and Grammars

Decaf Language Reference

A Simple Syntax-Directed Translator

TML Language Reference Manual

Abstract Syntax. Mooly Sagiv. html://

PLT 4115 LRM: JaTesté

Defining syntax using CFGs

Last Time. What do we want? When do we want it? An AST. Now!

MP 3 A Lexer for MiniJava

B The SLLGEN Parsing System

Syntax and Grammars 1 / 21

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CS 314 Principles of Programming Languages

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Parsing. CS321 Programming Languages Ozyegin University

CSCI 2041: Advanced Language Processing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Programming Languages and Compilers (CS 421)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Implementing a VSOP Compiler. March 12, 2018

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

Fundamentals of Compilation

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

MP 3 A Lexer for MiniJava

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

An ANTLR Grammar for Esterel

Introduc)on. Tristan Naumann. Sahar Hasan. Steve Hehl. Brian Lu

Alternatives for semantic processing

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

ML 4 A Lexer for OCaml s Type System

Interpreters. Prof. Clarkson Fall Today s music: Step by Step by New Kids on the Block

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Examination in Compilers, EDAN65

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Project 1: Scheme Pretty-Printer

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

Lecture 14 Sections Mon, Mar 2, 2009

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSE 340 Fall 2014 Project 4

Minimalistic Basic Compiler (MBC)

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Class Information ANNOUCEMENTS

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

7. Introduction to Denotational Semantics. Oscar Nierstrasz

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Control Structures. Lecture 4 COP 3014 Fall September 18, 2017

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Assignment 1. Due 08/28/17

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Syntax Intro and Overview. Syntax

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Topic 5: Syntax Analysis III

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Syntax-Directed Translation. Lecture 14

CS 403: Scanning and Parsing

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Chapter 4. Abstract Syntax

Principles of Programming Languages COMP251: Syntax and Grammars

Parsing II Top-down parsing. Comp 412

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Syntax Analysis. Chapter 4

CSE302: Compiler Design

Parsing and Pattern Recognition

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Left to right design 1

Lecture 12: Parser-Generating Tools

SML-SYNTAX-LANGUAGE INTERPRETER IN JAVA. Jiahao Yuan Supervisor: Dr. Vijay Gehlot

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Exercise 1: Balanced Parentheses

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Lexical and Syntax Analysis

CS152: Programming Languages. Lecture 5 Pseudo-Denotational Semantics. Dan Grossman Spring 2011

Transcription:

ASTs, Objective CAML, and Ocamlyacc Stephen A. Edwards Columbia University Fall 2012

Parsing and Syntax Trees Parsing decides if the program is part of the language. Not that useful: we want more than a yes/no answer. Like most, parsers generated by ocamlyacc can include actions: pieces of code that run when a rule is matched. Top-down parsers: actions executed during parsing rules. Bottom-up parsers: actions executed when rule is reduced. (ocamlyacc)

Actions Simple languages can be interpreted just with parser actions. %token PLUS MINUS TIMES DIVIDE %token EOF %token <int> LIT %left PLUS MINUS %left TIMES DIVIDE %start top %type <int> top %% { open Calc_parse } rule token = parse [ \n ] { token lexbuf } + { PLUS } - { MINUS } * { TIMES } / { DIVIDE } [ 0-9 ]+ as s { LIT(int_of_string s) } eof { EOF } top : expr EOF { $1 } expr : expr PLUS expr { $1 + $3 } expr MINUS expr { $1 - $3 } expr TIMES expr { $1 * $3 } expr DIVIDE expr { $1 / $3 } LIT { $1 } let _ =< let lexbuf = Lexing.from_channel stdin in print_int (Calc_parse.top Calc_scan.token lexbuf); print_newline ()

Actions Even in a parser for an interpreter, actions usually build a data structure that represents the program. Separates parsing from translation. Makes modification easier by minimizing interactions. Allows parts of the program to be analyzed in different orders. Bottom-up parsers can only build bottom-up data structures. Children known first, parents later. Context of an object only established later.

What To Build? Typically, an Abstract Syntax Tree that represents the program. Represents the syntax of the program almost exactly, but easier for later passes to deal with. Punctuation, whitespace, other irrelevant details omitted.

Abstract vs. Concrete Trees Like scanning and parsing, objective is to discard irrelevant details. E.g., comma-separated lists are nice syntactically, but later stages probably just want lists. AST structure almost a direct translation of the grammar.

Abstract vs. Concrete Trees expr : mexpr PLUS mexpr {... }; mexpr : atom TIMES atom {... }; atom : INT {... }; 3 + 5 * 4 expr mexpr PLUS mexpr PLUS atom atom TIMES atom INT(3) TIMES INT(3) INT(5) INT(4) INT(5) INT(4) Concrete Parse Tree Abstract Syntax Tree

Part I Designing ASTs

Designing an AST Structure Sequences of things Removing unnecessary punctuation Additional grouping How to factor

One Way to Handle Comma-Separated Lists { } type args = Arg of arg Args of args * arg type arg =... args : LPAREN arglist RPAREN { $2 } arglist : arglist COMMA arg { Args($1, $3) } arg { Arg($1) } int gcd(int a, int b, int c) Args Args int c Arg int b int a Drawbacks: Many unnecessary nodes Branching suggests recursion Harder for later routines to get the data they want

A Better Way to Handle Comma-Separated Lists Better to choose a simpler structure for the tree: use lists type args = Args of arg list type arg =... args : LPAREN arglist RPAREN { Args(List.rev $2) }; arglist : arglist COMMA arg { $3::$1 } arg { [$1] } int gcd(int a, int b, int c) Args int a int b int b

Removing Unnecessary Punctuation Punctuation makes the syntax readable, unambiguous. Information represented by structure of the AST Things typically omitted from an AST Parentheses Grouping and precedence/associativity overrides Separators (commas, semicolons) Mark divisions between phrases. Probably want a list of items in the AST. Extra keywords while-do, if-then-else. Just want a While constructor with two children.

How to factor Two possible ways to represent binary operators: type expr = Plus of expr * expr Minus of expr * expr Times of expr * expr... type binop = Plus Minus Times... type expr = Binop of expr * binop * expr... Each has advantages and disadvantages Main question is how nice the later code looks. Is each operator a special case, or can you handle them all at once?

Part II Walking ASTs

It s easy in O Caml type operator = Add Sub Mul Div type expr = Binop of expr * operator * expr Lit of int let rec eval = function Lit(x) -> x Binop(e1, op, e2) -> let v1 = eval e1 and v2 = eval e2 in match op with Add -> v1 + v2 Sub -> v1 - v2 Mul -> v1 * v2 Div -> v1 / v2

Comments on ASTs Two ways to handle optional clauses: type stmt = If of expr * stmt * stmt option... let rec eval = function If(e, s1, None) ->... If(e, s1, Some(s2)) ->... (* or *) type stmt = If of expr * stmt IfElse of expr * stmt * stmt... let rec eval = function If(e, s) ->... IfElse(e, s1, s2) ->... let rec eval = function If(e, s1, s2) ->... match s2 with None ->... Some(s) ->......