Building Compilers with Phoenix

Similar documents
A programming language requires two major definitions A simple one pass compiler

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

A simple syntax-directed

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

A Simple Syntax-Directed Translator

Lexical and Syntax Analysis. Top-Down Parsing

Syntax. In Text: Chapter 3

Lexical and Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

3. Context-free grammars & parsing

CSE 3302 Programming Languages Lecture 2: Syntax

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Syntax Analysis Part I

3. Parsing. Oscar Nierstrasz

Introduction to Lexing and Parsing

Lecture 4: Syntax Specification

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CSE302: Compiler Design

CSCI312 Principles of Programming Languages!

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

VIVA QUESTIONS WITH ANSWERS

4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis

Defining syntax using CFGs

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

4. Lexical and Syntax Analysis

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Abstract Syntax Trees & Top-Down Parsing

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

LANGUAGE PROCESSORS. Introduction to Language processor:

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Context-free grammars

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Compiler Construction: Parsing

CS 314 Principles of Programming Languages

Chapter 4: Syntax Analyzer


Syntax Analysis Check syntax and construct abstract syntax tree

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

How do LL(1) Parsers Build Syntax Trees?

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Dr. D.M. Akbar Hussain

Lexical Scanning COMP360

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Theory and Compiling COMP360

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Table-Driven Top-Down Parsers

Chapter 3. Describing Syntax and Semantics ISBN

ICOM 4036 Spring 2004

CSCI312 Principles of Programming Languages!

Compiler Design Overview. Compiler Design 1

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Building Compilers with Phoenix

Part 3. Syntax analysis. Syntax analysis 96

CMSC 330: Organization of Programming Languages

Week 2: Syntax Specification, Grammars

2.2 Syntax Definition

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Context-Free Grammar (CFG)

22c:111 Programming Language Concepts. Fall Syntax III

Programming Language Definition. Regular Expressions

CA Compiler Construction

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Syntax. 2.1 Terminology

Principles of Programming Languages COMP251: Syntax and Grammars

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Introduction to Syntax Analysis. The Second Phase of Front-End

EECS 6083 Intro to Parsing Context Free Grammars

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

CSCE 314 Programming Languages

Types of parsing. CMSC 430 Lecture 4, Page 1

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Unit 13. Compiler Design

Transcription:

Building Compilers with Phoenix Syntax-Directed Translation

Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate Representation Syntax Analyzer Code Generator syntax tree target machine code Semantic Analyzer Machine-dependent Optimizer syntax tree target machine code Intermediate Code Generation 2

Syntax Definition (E)BNF: (Extended) Backus Naur Form context-free grammars terminal symbols: provided by scanner/lexical analysis tokens / lexems nonterminal symbols syntactic variables productions head / left side: nonterminal arrow body / right side: sequence of terminals and/or non-terminals, possibly ε BNF: notational convenience: (or) EBNF: additional operators: [optional], {zero-or-more}, (group) alternatively: * (Kleene star), +,? 3

BNF Example list list '+' digit list list '-' digit list digit digit '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' 4

Derivations start symbol derivation step: replace nonterminal with right-hand side word (of a language): sequence of terminals derivable from start symbol language: set of all words 5

Parse Trees tree of terminals and non-terminals start symbol at the root terminals in the leaves children: right hand side of a production list list digit list digit digit 9-5 + 2 6

Ambiguity Multiple parse trees for a single word of the language string string '+' string string '-' string '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' 7

Associativity favor one parse tree over another in case of ambiguity left-associative vs. right-associative in BNF: left-recursive vs. right-recursive productions init letter '=' init letter letter 'a' 'b' 'c'... 'z' 8

Operator Precedence Resolution of ambiguity for different (binary) operators different nesting of nonterminals factor digit '(' expr ')' term term '*' factor term '/' factor factor expr expr '+' term expr '-' term term 9

Syntax-directed Translation Syntax analysis: form parse tree in memory Semantic analysis: attach attributes to nodes in the tree attributed grammar Synthesis: output attribute values Alternatively: Don't represent syntax tree in memory, but represent tree hierarchy only in call stack / value stack 10

Example: Postfix Notation expr.t = 95-2+ expr.t = 95- term.t = 2 expr.t = 9 - term.t = 5 + term.t = 9 9 5 2 11

Tree Traversals depth-first vs. breadth-first recursive traversal: depth first top-down vs. bottom-up both useful in parsing preorder vs. post-order cases of depth-first potentially: consider node both before and after visiting children 12

Top-Down Parsing Top-down processing of imaginary parse tree in-memory parse tree might get created through post-order creation of tree nodes one function created per nonterminal lookahead token: not-yet-consumed input token e.g. global variable, member of parser object possibly multiple lookahead tokens select alternative of production according to lookahead for terminals, consume lookahead for non-terminals, descend into appropriate function (recursive-descend parsing) proceed reading additional tokens for right-hand side of selected production ambiguous grammars: may need to backtrack ambiguity vs. conflict 13

Predictive Parsing avoid backtracking, by always knowing what alternative to chose requires constraints on the grammar FIRST-set: set of all possible first tokens of an alternative if alternative starts with terminal t: FIRST(a) = { t } if alternative starts with non-terminal e: FIRST(a) = FIRST(e) ε-productions: if ε can be derived from e, then also include FIRST(e2) in FIRST(a), where e2 follows e FIRST sets of all alternatives need to be disjoint reformulate grammar if first sets overlap 14

Left Recursion Left-recursive production: recursive-descend parser will overflow stack Reformulate left recursion: A Aα Aβ γ A γr R αr βr ε 15

Left Factorization Alternatives overlap in first sets Extract common prefix into separate nonterminal A αx αy β H α A H (X Y) β H α A H T β T (X Y) 16

Abstract Syntax Trees unambiguous representations of program abstract-away unnecessary punctuation use nodes specialized for language - 9 5 + 2 17

Lexical Analysis Might apply BNF and syntax-directed processing to lexical analysis as well however: Lexis often simpler than syntax (using regular languages, not arbitrary context-free ones) analysis possible using finite automata Lexis often ambiguous eg. "staticpublicintfoo" ambiguities broken in a local fashion, e.g. prefer longest match Lexer needs to drop white space tokens (including comments) Lexer needs to group lexems into token classes (e.g. identifier), with original lexem as value Lexer needs to consider "reserved" words (keywords) 18

Recognizing Keywords Often keywords use identifier syntax solution: recognize identifiers, then check whether it is a keyword binary search, hashing (perfect hash functions) 19