CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Similar documents
COP 3402 Systems Software Syntax Analysis (Parser)

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 3302 Programming Languages Lecture 2: Syntax

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Introduction to Lexing and Parsing

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

EECS 6083 Intro to Parsing Context Free Grammars

Building Compilers with Phoenix

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Lecture 4: Syntax Specification

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CSCE 314 Programming Languages

Compiler Design Overview. Compiler Design 1

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Introduction to Parsing. Lecture 8

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Homework & Announcements

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Introduction to Parsing. Lecture 5

COSE312: Compilers. Lecture 1 Overview of Compilers

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Syntax and Grammars 1 / 21

ICOM 4036 Spring 2004

Dr. D.M. Akbar Hussain

Introduction to Compiler Design

Syntax. In Text: Chapter 3

CS153: Compilers Lecture 4: Recursive Parsing

Downloaded from Page 1. LR Parsing

Compiling Regular Expressions COMP360

Principles of Programming Languages COMP251: Syntax and Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Syntax-Directed Translation. Lecture 14

CSE302: Compiler Design

Introduction to Parsing. Lecture 5

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CPS 506 Comparative Programming Languages. Syntax Specification

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lexical Scanning COMP360

Lexical Analysis. Introduction

Week 2: Syntax Specification, Grammars

A simple syntax-directed

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CSCI312 Principles of Programming Languages!

Parsing II Top-down parsing. Comp 412

Topic 3: Syntax Analysis I

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Context-Free Grammars

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

CMPT 755 Compilers. Anoop Sarkar.

Introduction to Compiler

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Introduction to Parsing

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Lecture 10 Parsing 10.1

CS 314 Principles of Programming Languages. Lecture 3

CMSC 330: Organization of Programming Languages

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Languages and Compilers

Question Bank. 10CS63:Compiler Design

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

CMSC 330: Organization of Programming Languages

Programming Language Syntax and Analysis

Syntax Intro and Overview. Syntax

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Lecture 8: Context Free Grammars

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

A programming language requires two major definitions A simple one pass compiler

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Theory and Compiling COMP360

3. Context-free grammars & parsing

Describing Syntax and Semantics

22c:111 Programming Language Concepts. Fall Syntax III

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

A Simple Syntax-Directed Translator

Part 5 Program Analysis Principles and Techniques

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Compiling Techniques

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

CST-402(T): Language Processors

Chapter 3. Describing Syntax and Semantics ISBN

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Principles of Programming Languages COMP251: Syntax and Grammars

Transcription:

CSE450 Translation of Programming Languages Lecture 4: Syntax Analysis

http://xkcd.com/859

Structure of a Today! Compiler Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Front End Intermediate Code Code Optimizer Target Code Generator Back End Target Language

Project 2: Syntax Analysis Your project group will be assigned before Thursday s class. You will be extending your lexical analysis program from Project 1. Choose one group member s lexer as a starting point (or write a new one!) You may (and in some cases should!) change the tokens you use. The output from Project 2 is independent of Project 1. Do not worry about checking for errors beyond those listed in the project. Correctness is still jobs #1, 2, and 3.

Where is Syntax Analysis? if (idx == 0) idx = 750; Lexical Analysis or Scanner if ( idx == 0 ) idx = 750 ; Syntax Analysis or Parsing Abstract Syntax Tree or Parse Tree IsEq if Assign idx 0 idx 750

Parsing Analogy Syntax analysis for natural languages - Identify the function of each word - Recognize if a sentence is grammatically correct Example: I gave Jim the card.

Parsing Analogy Syntax analysis for natural languages - Identify the function of each word - Recognize if a sentence is grammatically correct subject sentence action verb phrase indirect object object noun phrase pronoun verb proper noun article noun I gave Jim the card.

Syntax Analysis Overview Goal: Does the input token stream satisfy the syntax of the program? What do we need to do this? An expressive way to describe the syntax A mechanism to determine if a token stream satisfies the syntax A structured output to be used by later components of the compiler. For lexical analysis Regular expressions describe patterns for tokens Finite automata (generated by Flex) convert the input character stream to tokens A token stream is made availble to later components (specifically, the parser)

Just Use Regular essions? Regular expressions are easy to implement and can expressively describe tokens. Should we also use them to describe the syntax of a programming language? NO! - They do not have the power to express any non-trivial syntax Example - Nested constructs (blocks, expressions, statements) - Detect balanced braces: {{} {} {{} { }}} - We need unbounded counting! - FSAs cannot count except in a strictly modulo fashion { { { { { } } } } }...

Context Free Grammars Consist of 4 components (Backus-Naur Form or BNF): Terminal Symbols = token or ε Non-terminal Symbols = syntactic variables Symbol S = special non-terminal Production Rules of the form LHS RHS LHS = A single non-terminal RHS = A string of terminals and non-terminals Specify how non-terminals may be expanded S a S a S T T b S b T ε The language generated by a grammar is the set of strings of terminals derived from the start symbol by repeatedly apply the productions. L(G) = language generated by grammar 'G'

Context Free Grammar Example Grammar for a balanced-parentheses language: S ( S ) S S ε 1 non-terminal: S 2 terminals: "(", ")" symbol: S 2 production rules If the grammar accepts a string, there is a derivation of that string using the production rules How do we produce the string "(())" S = ( S ) ε = ( ( S ) S ) ε = ( ( ε ) ε ) ε = ( ( ) )

More on Context Free Grammars Shorthand - vertical bar ' ' to combine multiple productions S a S a T T b T b ε CFGs are powerful enough to express the syntax of most programming languages Derivation = successive application of productions starting from S Acceptance? = Determine if there exists a derivation for an input token stream

A Parser Context Free Grammar, G Token Stream, s (from lexer) Parser Error Messages Yes, if s in L(G) No, Otherwise If yes, also output abstract syntax tree Syntax analyzers (parsers) = Context free grammar acceptors that also output the corresponding derivation when the token stream is accepted.

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' 1) 2) Op 3) Int 4) Open Close (2-1) + 1 (2-1) + 1 Question: Could we have produced this string from the above grammar?

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' 1) 2) Op 3) Int 4) Open Close (2-1) + 1 Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' 1) 2) Op 3) Int 4) Open Close (2-1) + 1 Next Step: Tokenize

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' 1) 2) Op 3) Int 4) Open Close (2-1) + 1 Open Int Op Int Close Op Int Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' 1) 2) Op 3) Int 4) Open Close (2-1) + 1 Open Int Op Int Close Op Int Next Step: Production 3

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op 1) 2) Op 3) Int 4) Open Close Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op 1) 2) Op 3) Int 4) Open Close Next Step: Production 2

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op 1) 2) Op 3) Int 4) Open Close Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op 1) 2) Op 3) Int 4) Open Close Next Step: Production 4

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op Op 1) 2) Op 3) Int 4) Open Close Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op Op 1) 2) Op 3) Int 4) Open Close Next Step: Production 2

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op Op 1) 2) Op 3) Int 4) Open Close Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op Op 1) 2) Op 3) Int 4) Open Close Next Step: Production 1

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op Op 1) 2) Op 3) Int 4) Open Close Next Step:??

Reverse Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' (2-1) + 1 Open Int Op Int Close Op Int Open Op Close Op Open Close Op Op 1) 2) Op 3) Int 4) Open Close Next Step: Done

True Derivation Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 1) 2) Op 3) Int 4) Open Close Can we convert this to a parse tree?

Parse Tree Internal Nodes: Non-terminals Leaves: Terminals Edges: From: non-terminal of LHS of production To: nodes from RHS of production Captures derivation of the string

Parse Tree Construction Op = '+' '-' '*' '/' Int = [0-9]+ Open = '(' Close = ')' 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op Open Close

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op Open Close

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op Open Close Op

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op Open Close Op

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Op Open Close Int Op Int Int

Parse Tree Construction 1) 2) Op 3) Int 4) Open Close Op Open Close Op Open Op Close Op Open Int Op Int Close Op Int (2-1) + 1 Done! Op Open Close Int Op Int Int

Simplifying the Tree Op Open Close Int Op Int Int

Simplifying the Tree Op Open Close Int Op Int Int

Simplifying the Tree Op Open Close Int Op Int Int ( 2-1 ) + 1

Simplifying the Tree Op Open Close Int Op Int Int ( 2-1 ) + 1

Simplifying the Tree Op Int Op Int Int 2-1 + 1

Simplifying the Tree Op Int Op Int Int 2-1 + 1

Simplifying the Tree Op Int Op Int 2-1 + 1

Simplifying the Tree Op Int 2 Op Int - 1 + 1

Simplifying the Tree 2-1 + 1

Simplifying the Tree + 1 2-1

Simplifying the Tree + 1 2-1

Simplifying the Tree + 1-2 1

Simplifying the Tree + 1-2 1

Simplifying the Tree + 1-2 1

Simplifying the Tree + 1-2 1

Simplifying the Tree + 1-2 1

Simplifying the Tree ROOT + 1-2 1

Original Input: ( 2-1) + 1 ROOT + Op - 1 Open Close Int 2 1 Op Int Int Parse Tree Abstract Syntax Tree