Building Compilers with Phoenix

Similar documents
EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Building Compilers with Phoenix

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Semantic Analysis. Lecture 9. February 7, 2018

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Top down vs. bottom up parsing

Context-free grammars (CFG s)

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Syntax-Directed Translation. Lecture 14

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Lecture 8: Deterministic Bottom-Up Parsing

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Compiler construction in4303 lecture 3

How do LL(1) Parsers Build Syntax Trees?

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Wednesday, August 31, Parsers

Lecture 7: Deterministic Bottom-Up Parsing

3. Parsing. Oscar Nierstrasz

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Compiler construction lecture 3

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Monday, September 13, Parsers

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Table-Driven Top-Down Parsers

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

LECTURE 3. Compiler Phases


CSE 401 Midterm Exam Sample Solution 2/11/15

Grammars and Parsing. Paul Klint. Grammars and Parsing

Chapter 3: Lexing and Parsing

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

COMP3131/9102: Programming Languages and Compilers

CS606- compiler instruction Solved MCQS From Midterm Papers

Syntax-Directed Translation

CS5363 Final Review. cs5363 1

LL(1) Conflict Resolution in a Recursive Descent Compiler Generator 1

Downloaded from Page 1. LR Parsing

CS 4120 Introduction to Compilers

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Better Extensibility through Modular Syntax. Robert Grimm New York University

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Syntax Errors; Static Semantics

CSE 401 Midterm Exam Sample Solution 11/4/11

Using an LALR(1) Parser Generator

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Semantic actions for declarations and expressions

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Bottom-Up Parsing. Lecture 11-12

Semantic actions for declarations and expressions. Monday, September 28, 15

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1

CS 314 Principles of Programming Languages

Abstract Syntax Trees & Top-Down Parsing

VIVA QUESTIONS WITH ANSWERS

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Syntax and Grammars 1 / 21

Syntax. 2.1 Terminology

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Grammars and Parsing, second week

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

shift-reduce parsing

A programming language requires two major definitions A simple one pass compiler

Syntax. In Text: Chapter 3

Lecture 16: Static Semantics Overview 1

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Compiler Construction

1. Consider the following program in a PCAT-like language.

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Time : 1 Hour Max Marks : 30

Syntax Analysis MIF08. Laure Gonnord

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Writing a Simple DSL Compiler with Delphi. Primož Gabrijelčič / primoz.gabrijelcic.org

Transcription:

Building Compilers with Phoenix Parser Generators: ANTLR

History of ANTLR ANother Tool for Language Recognition Terence Parr's dissertation: Obtaining Practical Variants of LL(k) and LR(k) for k > 1 PCCTS: Purdue Compiler Construction Set Master's Thesis: YUCC (1988), LL(1)" written in C++ a.k.a. ANTLR v1 "linear approximate LL(k)" ANTLR v2: Rewrite in Java (1997) can generate parsers in many other languages ANTLR v3: Rewrite (2007) new parser algorithm: LL(*) auto-backtracking for conflicting/ambiguous productions... 2

LR Parsing Left-to-right, preferring Right-most derivations LR(k): Parser needs k tokens look-ahead Language L is said to be LR(k) if LR(k) parser exists Many programming languages are LR(1) typically bottom-up parsing: parser is a state machine, with an additional stack of nested nonterminals and terminals as input is encountered, pieces are combined into non-terminals typically table-driven: based on look-ahead and state: decide to shift/reduce/accept Many variants LALR (look-ahead LR): yacc SLR (simple LR) GLR (generalized LR) 3

LL parsing Left-to-right, preferring Left-most derivations LL(k): needs k tokens lookahead Programming languages are often not LL(k) for any k typically implemented as top-down parsers parser algorithm: construct parser table up-front for any non-terminal N and combination of k look-ahead tokens T1..Tk, select production P maintain stack S of non-terminals Given look-ahead tokens and top of stack, select production for non-terminal N, push N, then recurse for terminal T, consume T when production is complete, pop N 4

LL(k) Sometimes, two or more tokens look-ahead necessary decl type ID ';' type ID '=' value ';' type 'int' 'long' 'float' 'double' Problem: for alphabet A, (#A) N different lookahead sequences are possible algorithm considered impractical problem with left-recursion remains Linear approximated LL(k): Instead of checking all (#A) N combinations, parser forms equivalence classes as sequence of token sets: {'int','long','float','double'} {ID} {';'} cannot parse all LL(k) grammars experience: if "real-language" grammar is LL(k), l-a LL(k) can parse it 5

LL(*) Sometimes, cannot parse with LL(k) for any fixed k method type ID '(' args ')' ';' type ID '(' args ')' '{' statements '}' args args (',' arg)* args type ID Solution: express prefixes of alternative as regular language generate DFA to determine alternative Limitation: prefix language must not use recursive productions maximum recursion depth configurable (-Xm), default 4 6

Conflict resolution in ANTLR Syntactic predicates: prefix alternative with "try" production alternatives tested in textual order, first matching alternative used e.g. Java for vs. foreach forcontrol : (forvarcontrol) => forvarcontrol forinit? ';' expression? ';' forupdate?; forvarcontrol: 'final'? annotation? type Identifier ':' expression; Semantic predicates: prefix alternative with boolean predicate (in parser's target language) Backtracking 7

Semantic Predicates Example: classify identifiers in C declaration : decl_specs declarator ';' ; decl_specs : (storage_class type_qual type_spec)+ ; type_spec : 'void' 'int'... type_id; type_id : ID; declarator : ID '*' declarator ; Conflict: In "static const Foo", is Foo type_id or declarator? Solution: type_id: {istypename(input.lt(1).gettext())}? ID ; Other application: optional language features statement : for_stmt while_stmt expr_stmt... {have_aop()}? proceed_stmt 8

Backtracking New in ANTLR v3 option backtrack: per grammar or per non-terminal implicit syntactic predicates on all alternatives optimization: omit predictive parsing if LL(*) already disambiguates the grammar memoization: avoid re-parsing the same pieces over and over again due to backtracking option memoize 9

ANTLR Parsers grammar files (.g): grammar name, global options, productions EBNF: alternatives ( ), repeating (?, *, +) various annotations to describe semantic actions return type declaration tree construction (!, ^) arbitrary semantic actions ({...}) common options: language (Java, C, C++, C#, Objective-C, Python, Ruby) output (AST, template) k (ANTLR v2) 10

ANTLR Lexers Lexical rules: rule name starts with capital letter additional constructs for character ranges ('a'..'z') common option: greedy (default: true) token attributes: text, type, line, pos,... channel: divide tokens into different channels whitespace: set channel to Token.HIDDEN_CHANNEL 11

Semantic Actions Return values and rule arguments r [int a, String b] returns [int c, String d] :... { $c = $a; $d = $b } label parts of a production with symbolic names expr : left=expr '+' right=term; Arbitrary semantic actions Creation of abstract syntax tree (output=ast) default: flat list of all tokens mark terminals with ^ to indicate they should be parent node of the production mark terminals with! to omit node from tree classes used for nodes can be set with ASTLabelType option 12

Generating with ANTLR generating without ANTLR: just process AST tree rewriting: write "tree parser" to convert one tree into another templating: StringTemplate library 13