The Phases of a Compiler. Course Overview. In Chapter 4. Syntax Analysis. Syntax Analysis. Multi Pass Compiler. PART I: overview material

Similar documents
Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Languages and Compilers (SProg og Oversættere)

CPS 506 Comparative Programming Languages. Syntax Specification

Course Overview. Levels of Programming Languages. Compilers and other translators. Tombstone Diagrams. Syntax Specification

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Contextual Analysis (2) Limitations of CFGs (3)

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

A simple syntax-directed

Describing Syntax and Semantics

Lexical and Syntax Analysis. Top-Down Parsing

ECE251 Midterm practice questions, Fall 2010

Lecture 4: Syntax Specification

Syntax. In Text: Chapter 3

3. Context-free grammars & parsing

Lexical and Syntax Analysis

Languages and Compilers (SProg og Oversættere) Lecture 3 recap Bent Thomsen Department of Computer Science Aalborg University

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSE 3302 Programming Languages Lecture 2: Syntax

4. Lexical and Syntax Analysis

Chapter 3. Describing Syntax and Semantics ISBN

MiniTriangle. G52CMP: Lecture 6 Defining Programming Languages II. This Lecture. A MiniTriangle Program

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Chapter 4. Lexical and Syntax Analysis

Dr. D.M. Akbar Hussain

Context-Free Grammar (CFG)

Chapter 3. Describing Syntax and Semantics

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

ICOM 4036 Spring 2004

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

LL parsing Nullable, FIRST, and FOLLOW

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Introduction to Lexing and Parsing

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Parsing II Top-down parsing. Comp 412

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Compiler Design Concepts. Syntax Analysis

CSCI312 Principles of Programming Languages!

Wednesday, September 9, 15. Parsers

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Introduction to Parsing

A programming language requires two major definitions A simple one pass compiler

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

CMSC 330: Organization of Programming Languages

Introduction to Syntax Analysis. The Second Phase of Front-End

COP4020 Spring 2011 Midterm Exam

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Introduction to Syntax Analysis

3. Parsing. Oscar Nierstrasz

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Context-free grammars (CFG s)

CSCI312 Principles of Programming Languages

COP 3402 Systems Software Syntax Analysis (Parser)

Context-Free Languages and Parse Trees

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Part 5 Program Analysis Principles and Techniques

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Syntax. A. Bellaachia Page: 1

Lexical and Syntax Analysis

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Habanero Extreme Scale Software Research Project

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CS 536 Midterm Exam Spring 2013

Languages and Compilers

Chapter 2 :: Programming Language Syntax

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

PL Revision overview

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Syntax-Directed Translation. Lecture 14

Version 4, 12 October 2004 Project handed out on 12 October. Complete Java implementation due on 2 November.

CS 406/534 Compiler Construction Parsing Part I

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CMSC 330: Organization of Programming Languages

Building Compilers with Phoenix

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Principles of Programming Languages COMP251: Syntax and Grammars

Lecture 8: Context Free Grammars

Transcription:

Course Overview The Phases of a Compiler PART I: overview material Introduction 2 Language processors (tombstone diagrams, bootstrappg) 3 Architecture of a compiler PART II: side a compiler 4 Sntax analsis 5 Contextual analsis 6 Runtime organization 7 Code generation PART III: conclusion 8 Interpretation 9 Review Source Program This chapter Sntax Analsis Abstract Sntax Tree Contextual Analsis Decorated Abstract Sntax Tree Code Generation Object Code 2 In Chapter 4 Sntax Analsis Sntax Analsis Scanng: recognize words or tokens the put Parsg: recognize structure of program Different parsg strategies How to construct a recursive descent parser AST Construction Use of theoretical Tools : Regular Expressions and Fite State Maches Grammars Extended BNF notation First sets and Follow sets The job of sntax analsis is to read the source program (text file) and determe its structure. Subphases Scanng Parsg Construct an ternal representation of the source text that shows the structure (usuall an AST) Note: A sgle-pass compiler usuall does not explicitl construct an AST. 3 4 Multi Pass Compiler Sntax Analsis A multi pass compiler makes several passes over the program. The output of a precedg phase is stored a data structure and used b subsequent phases. Dataflow chart Source Program (Stream of Characters) Dependenc diagram of a tpical Multi Pass Compiler: Compiler Driver This chapter Sntactic Analzer Contextual Analzer Code Generator put output put output put output Source Text AST Decorated AST Object Code Scanner Stream of Tokens Parser Abstract Sntax Tree 5 6

() Scan: Divide Input to Tokens (2) Parse: Determe structure of program An example Mi Triangle source program: : Integer!new ear := +... scanner becomes := Tokens are words the put, for example kewords, operators, identifiers, literals, etc. colon : op. + Integer tlit eot... Parser analzes the structure of the token stream with respect to the grammar of the language. sgle-declaration Ident Ident Ident Ident Op. Int.Lit Declaration Tpe Denoter V-Name col. : Program Int bec. := Expression primar-exp V-Name op + primar-exp tlit eot 7 8 (3) AST Construction Grammars Program LetCommand AssignCommand VarDecl BarExpr SimpleTpe SimpleVar VNameExp Int.Expr SimpleVar Ident Ident Ident Ident Op Int.Lit Integer + RECAP: The Sntax of a Language can be specified b means of a CFG (Context Free Grammar). CFG can be expressed BNF (Bachus-Naur Form) Mi Triangle grammar BNF Program ::= Command ::= Command ; beg Command end... 9 0 Grammars (contued) Regular Expressions For our convenience, we will use EBNF or Extended BNF rather than simple BNF. RE are a notation for expressg a set of strgs of termal smbols. EBNF = BNF + regular expressions Mi Triangle EBNF * means 0 or more occurrences of Program ::= Command ::= ( ;)* beg Command end... Different kds of RE: ε The empt strg t Generates onl the strg t X Y Generates an strg x such that x is generated b x and is generated b Y X Y Generates an strg which generated either b X or b Y X* The concatenation of zero or more strgs generated b X (X) Used for groupg 2 2

RE: Examples Regular Expressions What sets of strgs do each of the followg RE generate?.. ε 2. 2. M(r s). 3. 3. (foo bar)* 4. 4. (foo bar)(foo bar)* 5. 5. (0 2 3 4 5 6 7 8 9)* 6. 6. 0 (.. 9)(0.. 9)* The languages that can be defed b RE and CFG have been extensivel studied b theoretical computer scientists. These are some important conclusions / termolog RE is a weaker formalism than CFG: An language expressible b a RE can be expressed b CFG but not the other wa around! The languages expressible as RE are called regular languages Generall: a language that exhibits self embeddg cannot be expressed b RE. Programmg languages exhibit self embeddg. (Examples: an expression can conta another expression, and a command can conta another command). 3 4 Extended BNF Extended BNF: an Example Extended BNF combes BNF with RE A production EBNF looks like LHS ::= RHS where LHS is a non termal smbol and RHS is an extended regular expression An extended RE is just like a regular expression except it is composed of termals and non termals of the grammar. Simpl put, EBNF adds to BNF these notations (...) for the purpose of groupg and * for denotg 0 or more repetitions of a simple expression language Expression ::= PrimarExp (Operator PrimarExp)* PrimarExpression ::= Literal Identifier ( Expression ) Identifier ::= Letter (Letter Digit)* Literal ::= Digit Digit* Letter ::= a b c... z Digit ::= 0 2 3 4... 9 5 6 A little bit of useful theor Grammar Transformations We will now look at a few useful bits of theor. These will be necessar later when we implement parsers. Grammar transformations A grammar can be transformed a number of was without changg its meang (i.e. its language, or the set of strgs that it generates) The defition and computation of starter sets (first sets), follow sets, and nullable smbols Left factorization X Y X Z X ( Y Z ) X Y= ε Z if Expression then if Expression then else if Expression then ( ε else ) 7 8 3

Grammar Transformations (contued) Grammar Transformations (contued) Elimation of Left Recursion N ::= X N Y N ::= X Y* Substitution of non-termal smbols N ::= X M ::= α N β N ::= X M ::= α X β Identifier ::= Letter Identifier Letter Identifier Digit Identifier ::= Letter Identifier (Letter Digit) Identifier ::= Letter (Letter Digit)* ::= for controlvar := Expression direction Expression do direction ::= to downto ::= for controlvar := Expression (to downto) Expression do 9 20 Starter Sets (a.k.a. First Sets) Derivations Informal Defition: The starter set of a RE X is the set of termal smbols that can occur as the start of an strg generated b X Example : starters[ ( + - ε) (0 9) + ] = {+, -, 0,,, 9} Replacg a non-termal E ::= ::= T E + T T ::= ::= i i (( E )) Formal Defition: starters[ε] ={ } starters[t] ={t} (where t is an termal smbol) starters[x Y] = starters[x] (if X doesn t generate ε) starters[x Y] = starters[x] starters[y] (ifx generates ε) starters[x Y] = starters[x] starters[y] starters[x*] = starters[x] S => => E => => E + T => => T + T => => i i + T => => i i + ii This is is a left-most derivation (it replaces the left -most non-termal at each step. Can ou fd the correspondg right-most derivation? Can ou fd a derivation that is is neither left -most nor right-most? 2 22 Sentential forms Ambiguous grammars A sequence of grammar smbols that can be derived from the start smbol A grammar is ambiguous if some sentence has more than one distct parse tree. S => => E => => E + T => => T + T => => i i + T => => i i + ii Equivalentl, a grammar is ambiguous if some sentence has more than one left-most derivation, or more than one right-most derivation. A sentence is a sentential form that contas onl termal smbols, that is, a strg that can be generated usg the grammar. Does i i + i i + i i i demonstrate the the an an ambiguit? E => => E + E => => i i + E => => i i + ii E => => E + E => => i i + E => => i i + E + E => => i i + i i + E => => i i + i i + i i E => => E + E => => E + E + E => => i i + E + E => => i i + i i + E => => i i + i i + i i 23 24 4

Augmented grammars Nullable, First sets (starter sets), and Follow sets We augment grammars to ensure that we can recognize and handle the end of the put strg A non-termal is nullable if it derives the empt strg First(N) or starters(n) is the set of all termals that can beg a sentence derived from N S S ::= ::= S $ Follow(N) is the set of termals that can follow N some sentential form Here $ denotes the end-of-file token Next we will see algorithms to compute each of these. 25 26 5