Syntax Analysis Part I

Similar documents
Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Syntax Analysis. Chapter 4

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CS 314 Principles of Programming Languages

CSE302: Compiler Design

Syn S t yn a t x a Ana x lysi y s si 1

Syntax Analysis Check syntax and construct abstract syntax tree

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University


Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS 4120 Introduction to Compilers

Lexical and Syntax Analysis

3. Parsing. Oscar Nierstrasz

Lexical and Syntax Analysis. Top-Down Parsing

COP 3402 Systems Software Syntax Analysis (Parser)

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CS 406/534 Compiler Construction Parsing Part I

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

2068 (I) Attempt all questions.

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Introduction to Syntax Analysis

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Context-free grammars

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Introduction to Parsing Ambiguity and Syntax Errors

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Introduction to Parsing Ambiguity and Syntax Errors

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

LR Parsing - The Items

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Introduction to Syntax Analysis. The Second Phase of Front-End

Compiler Construction: Parsing

Building Compilers with Phoenix

Introduction to Lexing and Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Question Bank. 10CS63:Compiler Design

4. Lexical and Syntax Analysis

Context-Free Languages and Parse Trees

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

A programming language requires two major definitions A simple one pass compiler

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

4. Lexical and Syntax Analysis

Monday, September 13, Parsers

Chapter 4. Lexical and Syntax Analysis

LL(1) predictive parsing

Syntactic Analysis. Top-Down Parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Concepts Introduced in Chapter 4

UNIT-III BOTTOM-UP PARSING

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

How do LL(1) Parsers Build Syntax Trees?

Parsing III. (Top-down parsing: recursive descent & LL(1) )

VIVA QUESTIONS WITH ANSWERS

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Introduction to Parsing. Lecture 8

Compilers Course Lecture 4: Context Free Grammars

LR Parsing Techniques

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSCI312 Principles of Programming Languages!

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Wednesday, August 31, Parsers

Top down vs. bottom up parsing

Unit 13. Compiler Design

Introduction to Parsing. Comp 412

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Chapter 4: LR Parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Compiler Design Concepts. Syntax Analysis

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Optimizing Finite Automata

Properties of Regular Expressions and Finite Automata

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW

Defining syntax using CFGs

Languages and Compilers

Principles of Programming Languages

Parsing II Top-down parsing. Comp 412

Chapter 4: Syntax Analyzer

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Bottom-Up Parsing. Lecture 11-12

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Table-Driven Top-Down Parsers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Context-Free Grammars

Transcription:

Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University

Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token, tokenval Get next token Parser and rest of front-end Intermediate representation Lexical error Syntax error Semantic error Symbol Table

The Parser A parser implements a context-free grammar Check syntax (= string recognizer) Report syntax errors accurately Invoke semantic actions For static semantics checking, e.g. type checking of expressions, functions, etc. For syntax-directed translation of the source code to an intermediate representation

Syntax-Directed Translation One of the major roles of the parser is to produce an intermediate representation (IR) of the source program using syntax-directed translation methods Possible IR output: Abstract syntax trees (ASTs) Three-address code (3AC) Register transfer list notation (RTN)

Error Handling A good compiler should assist in identifying and locating errors Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect

Viable-Prefix Property Prefix The viable-prefix property of LL/LR parsers allows early detection of syntax errors Goal: detection of an error as soon as possible without further consuming unnecessary input How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language for (;) Error is detected here Prefix Error is detected here DO 10 I = 1;0

Error Recovery Strategies Panic mode Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error Error productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction

Context-Free Grammar: How It Works Write a grammar representing the structure of a thesis A thesis consists of a thesis title followed by one or more chapters A chapter consists of a chapter title followed by one or more sections A section consists of a section title followed by one or more line of text

How It Works T t-title Cs Cs C C Cs C c-title Ss Ss S S Ss S s-title Ls Ls line line Ls

T How It Works t-title c-title C Cs Ss Cs C S c-title Ss s-title Ls S line s-title Ls line

Context-Free Grammar A context-free grammar (CFG) is a 4-tuple G = (N, T, P, S) where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form A α where A N and α (N T)* S N is a designated start symbol

Example G = ({E, T, F}, {+, -, *, /, (, ), id}, P, E) Productions in P : E E + T E - T T T T * F T / F F F ( E ) id

Notational Conventions Terminals a, b, c, T specific terminals: 0, 1, id, + Nonterminals A, B, C, N specific nonterminals: expr, term, stmt

Notational Conventions Grammar symbols X, Y, Z (N T) Strings of terminals u, v, w, x, y, z T* Strings of grammar symbols α, β, γ (N T)*

Derivations Given a CFG we can determine the set of all strings (sequences of tokens) generated by the grammar using derivation We begin with the start symbol In each step, we replace one nonterminal in the current sentential form with one of the righthand sides of a production for that nonterminal

Derivations Mathematically, the one-step derivation is a binary relation defined by α A β α γ β where A γ is a production in the grammar

Derivations In addition, we define is leftmost lm if α does not contain a nonterminal is rightmost rm if β does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) The language generated by G is defined by L(G) = {w T* S + w}

Example Grammar G = ({E}, {+, *, (, ), -, id}, P, E) with productions P = E E + E E E * E E ( E ) E - E E id Example derivations: E - E - id E rm E + E rm E + id rm id + id E * E E * id + id E + id * id + id

Exercise Which of the strings are in the language of the given CFG? abcba acca aba abcbcba S axa X ε by Y ε cxc

Parse Trees The root of the tree is labeled by the start symbol Each leaf of the tree is labeled by a terminal (=token) or ε Each interior node is labeled by a nonterminal If A X 1 X 2 X n is a production, then node A has immediate children X 1, X 2,, X n where X i is a (non)terminal or ε

Example E - E - (E) - (E + E) - (id + E) - (id + id) E - E ( E ) E id + E id

Ambiguity An ambiguous grammar produces more than one leftmost derivation (or more than one parse tree) for the same sentence Consider the string id + id * id and the productions E E + E, E E * E, E id E E + E id + E id + E * E id + id * E id + id * id E E * E E + E * E id + E * E id + id * E id + id * id

Ambiguity Different parse trees for the same sentence correspond to different interpretations, in this case for the precedence of the arithmetic operators E E E + E E * E id E * E E + E id id id id id

Exercise Which of the following CFGs are ambiguous? S SS a b E E + E id S Sa Sb ε E E E + E E - E id ( E )

Chomsky Hierarchy: Language Classification A grammar G is said to be Regular if it is right linear where each production is of the form A w B or A w or left linear where each production is of the form A B w or A w Context free if each production is of the form A α where A N and α (N T)* Context sensitive if each production is of the form α A β α γ β where A N, α,γ,β (N T)*, γ > 0 Unrestricted

Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L(T) = { L(G) G is of type T } That is: the set of all languages generated by grammars G of type T Examples: Every finite language is regular! (construct a FSA for strings in L(G)) L 1 = { a n b n n 1 } is context free L 2 = { a n b n c n n 1 } is context sensitive

Parsing Parsing is the process of Determining if a string of tokens can be generated by a grammar Producing the relevant parse tree forest Top-down parsing constructs a parse tree from root to leaves Bottom-up parsing constructs a parse tree from leaves to root

Parsing Universal parsing algorithms work for any CFG Recursive descent uses backtracking and takes exponential time Tabular methods take O(n 3 ) time to parse a string of n tokens Cocke-Younger-Kasami Earley

Parsing CFGs for programming languages are restricted (unambiguous, etc.) and can be parsed in linear time Two main family of algorithms LL parsing uses top-down strategy LR parsing uses bottom-up strategy

Push-Down Automata A push-down automaton (PDA) implements a context-free grammar Reads the input left to right from a buffer Uses an auxiliary storage called stack, allowing push and pop operations

Push-Down Automata A configuration of the PDA completely describes the state of the computation, and is a pair (σ, β) where σ is the stack β is the buffer A transition, or action, simulates a move from one configuration to the next A computation of the PDA is a sequence of configurations obtained by applying actions

Top Down PDA 1. Predict: for each production A X 1 X 2 X n, if A is at the top of the stack, replace it with X n X n-1 X 2 X 1 2. Match: if terminal symbol a is the first symbol of the buffer and the top-most symbol of the stack, remove both symbols 3. The initial configuration is (S, a 1 a 2 a n ) 4. The final configuration is (ε, ε)

Example Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id Stack E T + E T + T T + F * T T + F * F T + F * id T + F * T + F T + id T + T F id ε Buffer id*id+id id*id+id id*id+id id*id+id id*id+id id*id+id *id+id id+id id+id +id id id id ε Action predict 1 predict 2 predict 3 predict 4 predict 6 match match predict 6 match match predict 4 predict 6 match accept

Bottom Up PDA 1. Reduce: for each production A X 1 X 2 X n, if X 1 X 2 X n is at the top of the stack, replace it with A 2. Shift: remove first terminal symbol a from the buffer and push it into the stack 3. The initial configuration is (ε, a 1 a 2 a n ) 4. The final configurationis (S, ε)

Example Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id Stack ε id F T T * T * id T * F T E E + E + id E + F E + T E Buffer id*id+id *id+id *id+id *id+id id+id +id +id +id +id id ε ε ε ε Action shift reduce 6 reduce 4 shift shift reduce 6 reduce 3 reduce 2 shift shift reduce 6 reduce 4 reduce 1 accept

LL and LR Parsing The top down and bottom up PDAs are nondeterministic: several actions might be possible at a given configuration Most of LL and LR parsing can be understood as based on the previous PDAs, with an additional oracle that provides the correct action