ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Similar documents
Syntax Analysis. Chapter 4

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Chapter 4: Syntax Analyzer

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

A Simple Syntax-Directed Translator

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

CSE302: Compiler Design

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CS 314 Principles of Programming Languages

Syntax Analysis Check syntax and construct abstract syntax tree

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Part 5 Program Analysis Principles and Techniques

Ambiguity and Errors Syntax-Directed Translation

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

2.2 Syntax Definition

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Introduction to Parsing. Lecture 5

COP 3402 Systems Software Syntax Analysis (Parser)

A simple syntax-directed

Syntax Analysis Part I

Context-free grammars (CFG s)

EECS 6083 Intro to Parsing Context Free Grammars

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Introduction to Parsing. Lecture 5

Context-Free Grammars

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Syntax. In Text: Chapter 3

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Introduction to Parsing. Lecture 8

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Principles of Programming Languages COMP251: Syntax and Grammars

Introduction to Parsing

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CMPT 755 Compilers. Anoop Sarkar.

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Dr. D.M. Akbar Hussain

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

VIVA QUESTIONS WITH ANSWERS

Parsing II Top-down parsing. Comp 412

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Compiler Design Concepts. Syntax Analysis

Defining syntax using CFGs

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

UNIT-4 (COMPILER DESIGN)


Recursive Descent Parsers

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Project 2 Interpreter for Snail. 2 The Snail Programming Language

This book is licensed under a Creative Commons Attribution 3.0 License

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Syntax. A. Bellaachia Page: 1

CPS 506 Comparative Programming Languages. Syntax Specification

Compilers Course Lecture 4: Context Free Grammars

CS153: Compilers Lecture 4: Recursive Parsing

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Error Handling Syntax-Directed Translation Recursive Descent Parsing

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE 3302 Programming Languages Lecture 2: Syntax

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Errors; Static Semantics

CMSC 330: Organization of Programming Languages. Context Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Related Course Objec6ves

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Context-Free Grammars

Chapter 3. Describing Syntax and Semantics

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Transcription:

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών Lecture 5a Syntax Analysis lias Athanasopoulos eliasathan@cs.ucy.ac.cy

Syntax Analysis Συντακτική Ανάλυση Context-free Grammars (CFGs) Derivations Parse trees Top-down Parsing Ambiguities

Syntax Analysis Syntax analysis (parsing) is the process of determining if a string of tokens can be generated by a grammar I gave him the book sentence subject: I verb:gave object: him article: the indirect object noun phrase noun: book

Lexical-Syntax Analysis Source code (character stream) { if (b == 0) a = b; while (a!= 1) { printf( %I,I--); } } Token stream { if ( b == 0 ) a = b ; Lexical Analysis Syntax Analysis Syntax tree block if_stmt while_stmt expr expr variable == constant variable!= b 0 expr a variable = variable a b constant 1 block...

The Role of the Parser source program lexical analyzer token get next token parser parse tree rest of front end symbol table

Syntax Analysis Operation Input A stream of tokens taken from lexical analysis Output Syntax tree which determines the token relations and the syntax correctness (are all parentheses balanced?) Semantic analysis takes care of types int x = true; int y; z = f(y);

Syntax rror Handling Lexical Misspelling an identifier, keyword, or operator Syntactic Arithmetic expression with unbalanced parenthesis Semantic Operator applied to an incompatible operand Logical Infinitely recursive call

rror Handler Requirements It should report the presence of errors clearly and accurately It should recover from each error quickly enough to be able to detect subsequent errors It should not significantly slow down the processing of correct programs

What happens when an error is detected? Many strategies, none clearly dominates Not adequate for the parser to quit upon detecting the first error Subsequent parsing may reveal additional errors Usually, the compiler attempts error recovery Reasonable hope that the rest of the program can be parsed rror recovery should be realized correctly Otherwise many errors can be generated

xample While recovering from an error a compiler may skip the declaration of a variable zap At a later point when zap is used the compiler should not generate a syntactic error, but just the missing declaration Since, there should be no entry at the symbol table Conservative strategy Once an error is detected, filter out close errors (consume enough tokens to exit the error area)

rror-recovery Strategies Panic mode Once an error is detected, consume tokens until a synchronizing token is detected Synchronizing tokens are usually delimiters (end, ;), which have a clear meaning Simple and cannot enter an infinite loop Phrase level Attempt to correct the error by taking action Insert a missing semicolon, replace a comma with a semicolon, etc. Can create infinite loops if actions are not applied correctly Hard to cope with cases where the error has occurred before the point of detection

rror-recovery Strategies rror productions Common errors can be augmented to the grammar of the language The parser can then detect errors, since these errors are part of the language Global correction Attempt to correct an error with the least possible actions Given an incorrect input string x and grammar G, find a valid y, which can be derived from x with the least amount of changes The closest correct program may not be the one the programmer had in mind

Γραμματικές Χωρίς Συμφραζόμενα CONTXT-FR GRAMMARS

Regular xpressions Limitations Regular expressions can be transformed easily to NFA (and then to DFA) Discovering and classifying tokens using regular expressions is easy and efficient Regular expressions cannot be used for syntax analysis

Regular xpressions Limitations Match all balanced parentheses: () (()) ()()() (())()((()())) You need an NFA with an infinite number of states For 5 nested parentheses you need the following NFA S ( ( ( ( ( ) ) ) ) )

Context-free Grammar (CFG) Γραμματική Χωρίς Συμφραζόμενα 1. A set of tokens, known as terminal symbols. Terminals are the basic symbols from which strings are formed. The word token is a synonym for terminal when we are talking about programming languages (e.g., tokens like if, then, and else are all terminals) 2. A set of nonterminals. Nonterminals are syntactic variables that denote sets of strings. The nonterminals define sets of strings that help define the language generated by the grammar. They also impose a hierarchical structure on the language defined by the grammar.

Context-free Grammar (CFG) Γραμματική Χωρίς Συμφραζόμενα 3. A set of productions (κανόνες παραγωγής) where each production consists of a nonterminal, called the left side of the production, an arrow, and a sequence of tokens and/or nonterminals, called the right side of the production. The productions of the grammar specify the manner in which the terminals and nonterminals can be combined to form strings. ach production consists of a nonterminal, followed by an arrow (sometimes the symbol ::== is used in place of the arrow), followed by a string of nonterminals and terminals. 4. A designation of one of the nonterminals as the start symbol In a grammar, one nonterminal is distinguished as the start symbol, and the set of strings it denotes is the language defined by the grammar.

xample 1 xpressions of digits separated by plus and minus signs 9-5+2, 3-1, 7 list è list + digit (2.2) list è list digit (2.3) list è digit (2.4) digit è 0 1 2 3 4 5 6 7 8 9 (2.5) The three first productions can be grouped: list è list + digit list digit digit Terminals/Tokens: + - 0 1 2 3 4 5 6 7 8 9 Nonterminals: list, digit Sart symbol: list

xample 1 The ten productions for the nonterminal digit allow it to stand for any of the tokens 0, 1,..., 9 From 2.4 a single digit by itself is a list 2.2 and 2.3 express the fact that if we take any list and follow it by a plus or minus sign and then another digit we have a new list 9-5+2 9 is a list by production 2.4, since 9 is a digit 9-5 is a list by production 2.3, since 9 is a list and 5 is a digit 9-5+2 is a list by production 2.2, since 9-5 is a list and 2 is a digit

xample 2 Begin nd block in Pascal begin... (* Pascal code *) end block opt_stmts stmt_list è begin opt_stmts end è stmt_list ε è stmt_list ; stmt stmt (stmt is not expanded at this point)

xample 3 Simple arithmetic expressions expr è expr op expr expr è (expr) expr è -expr expr è id op è + op è - op è * op è / op è ^ qual with: è A () - id A è + - * / ^

Derivation Παραγωγή è A () - id The production è - signifies that an expression preceded by a minus sign is also an expression We can thus generate more complex expressions from simpler expressions by just replacing with -

Derivation Παραγωγή xamples è () => - ( derives ) * => ()* or *() => - => (-) => -(id) => Derives in one step => * Derives in zero ore more steps => + Derives in one or more steps

Leftmost - Rightmost è A () - id A è + - * / ^ (G1) The string -(id + id) is a sentence of grammar G1 Leftmost derivation =>-=>-()=>-(+)=>-(id+)=>-(id+id) lm lm lm lm lm Rightmost derivation =>-=>-()=>-(+)=>-(+id)=>-(id+id) rm rm rm rm rm

Grammars and Languages Given a grammar G with a start symbol S, A string of only terminals, w, is in L(G) + iff S => w The string w is called a sentence of G L(G) is the language generated by G and includes all w (strings composed by terminals of G) A language that can be generated by a grammar is a context-free grammar If two grammars generate the same language, then they are equivalent

Parse Trees A parse tree may be viewed as a graphical representation for a derivation that filters out the choice regarding replacement order. =>-=>-()=>-(+)=>-(id+)=>-(id+id) lm lm lm lm lm - ( ) + id id

Constructing the Parse Tree - - => => ( ) - => ( ) - => => ( ) - ( ) + + + id id id

Ambiguity Αμφισημία A grammar that produces more than one parse tree for some sentence is said to be ambiguous For certain types of parsers, it is desirable that the grammar be made unambiguous For some applications we shall also consider methods whereby we can use certain ambiguous grammars, together with disambiguating rules that throw away undesirable parse trees