Lecture 4: Syntax Specification

Similar documents
Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Languages and Compilers

CPS 506 Comparative Programming Languages. Syntax Specification

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

COP 3402 Systems Software Syntax Analysis (Parser)

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 2 :: Programming Language Syntax

Non-deterministic Finite Automata (NFA)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Building Compilers with Phoenix

Lexical Analysis. Introduction

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CSE 3302 Programming Languages Lecture 2: Syntax

Part 5 Program Analysis Principles and Techniques

Defining syntax using CFGs

Dr. D.M. Akbar Hussain

Describing Syntax and Semantics

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

This book is licensed under a Creative Commons Attribution 3.0 License

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Dr. D.M. Akbar Hussain

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Optimizing Finite Automata

Lecture 10 Parsing 10.1

Chapter 3. Describing Syntax and Semantics

Formal Languages. Formal Languages

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Introduction to Lexing and Parsing

CSE 401 Midterm Exam 11/5/10

A simple syntax-directed

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

Lecture 3: Lexical Analysis

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Syntax. In Text: Chapter 3

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Programming Language Definition. Regular Expressions

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Specifying Syntax COMP360

Chapter 3. Describing Syntax and Semantics ISBN

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

3. Context-free grammars & parsing

Homework & Announcements

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CS 314 Principles of Programming Languages. Lecture 3

CS415 Compilers. Lexical Analysis

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

List of Figures. About the Authors. Acknowledgments

CS 314 Principles of Programming Languages

Context-Free Languages and Parse Trees

Week 2: Syntax Specification, Grammars

Chapter 3. Describing Syntax and Semantics ISBN

CSCI312 Principles of Programming Languages!

CSE 105 THEORY OF COMPUTATION

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Lexical Analysis. Prof. James L. Frankel Harvard University

Syntax. A. Bellaachia Page: 1

Context-Free Grammar (CFG)

CSCI312 Principles of Programming Languages!

EECS 6083 Intro to Parsing Context Free Grammars

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Chapter 3. Describing Syntax and Semantics

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS 314 Principles of Programming Languages

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Introduction to Parsing. Lecture 5

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Syntax. 2.1 Terminology

Programming Lecture 3

Identifying Hierarchical Structure in Sequences. Sean Whalen

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CMSC 330: Organization of Programming Languages

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

A programming language requires two major definitions A simple one pass compiler

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

CSE 401/M501 Compilers

Transcription:

The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1

Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic elements (as words) are put together to form constituents (as phrases or clauses) The syntax of a programming language Describes its form» i.e. Organization of tokens (elements) Formal notation» Context Free Grammars (CFGs( CFGs) 3 Review: Formal definition of tokens A set of tokens is a set of strings over an alphabet {read, write, +, -,, *, /, :=, 1, 2,, 10,, 3.45e-3, 3, } A set of tokens is a regular set that can be defined by comprehension using a regular expression For every regular set, there is a deterministic finite automaton (DFA) that can recognize it i.e. determine whether a string belongs to the set or not Scanners extract tokens from source code in the same way DFAs determine membership 4 2

Review: Regular Expressions A regular expression (RE) is: A single character The empty string, ε The concatenation of two regular expressions» Notation: RE 1 RE 2 (i.e. RE 1 followed by RE 2 ) The union of two regular expressions» Notation: RE 1 RE 2 The closure of a regular expression» Notation: RE*» * is known as the Kleene star» * represents the concatenation of 0 or more strings 5 Review: Token Definition Example Numeric literals in Pascal Definition of the token unsigned_number digit 0 1 2 3 4 5 6 7 8 9 unsigned_integer digit digit* unsigned_number unsigned_integer ( (. unsigned_integer ) ε ) ( ( e ( + ε) unsigned_integer ) ε ) Recursion is not allowed! 6 3

Exercise digit 0 1 2 3 4 5 6 7 8 9 unsigned_integer digit digit* unsigned_number unsigned_integer ( (. unsigned_integer ) ε ) ( ( e ( + ε) unsigned_integer ) ε ) Regular expression for Decimal numbers number 7 Exercise digit 0 1 2 3 4 5 6 7 8 9 unsigned_integer digit digit* unsigned_number unsigned_integer ( (. unsigned_integer ) ε ) ( ( e ( + ε) unsigned_integer ) ε ) Regular expression for Decimal numbers number ( + ε) unsigned_integer ( (. unsigned_integer ) ε ) 8 4

Exercise digit 0 1 2 3 4 5 6 7 8 9 unsigned_integer digit digit* unsigned_number unsigned_integer ( (. unsigned_integer ) ε ) ( ( e ( + ε) unsigned_integer ) ε ) Regular expression for Identifiers identifier 9 Exercise digit 0 1 2 3 4 5 6 7 8 9 unsigned_integer digit digit* unsigned_number unsigned_integer ( (. unsigned_integer ) ε ) ( ( e ( + ε) unsigned_integer ) ε ) Regular expression for Identifiers identifier letter ( letter digit _ )* letter a b c z 10 5

Context Free Grammars CFGs Add recursion to regular expressions» Nested constructions Notation expression identifier number - expression ( expression ) expression operator expression operator + - * /» Terminal symbols» Non-terminal symbols» Production rule (i.e. substitution rule) terminal symbol terminal and non-terminal symbols 11 Backus-Naur Form Backus-Naur Form (BNF) Equivalent to CFGs in power CFG expression identifier number - expression ( expression ) expression operator expression operator + - * / BNF expression identifier number - expression ( expression ) expression operator expression operator + - * / 12 6

Extended Backus-Naur Form Extended Backus-Naur Form (EBNF) Adds some convenient symbols» Union» Kleene star *» Meta-level parentheses ( ) It has the same expressive power 13 Extended Backus-Naur Form Extended Backus-Naur Form (EBNF) It has the same expressive power BNF digit 0 digit 1 digit 9 unsigned_integer digit unsigned_integer digit unsigned_integer EBNF digit 0 1 2 3 4 5 6 7 8 9 unsigned_integer digit digit * 14 7

Derivations A derivation shows how to generate a syntactically valid string Given a CFG Example:» CFG expression identifier number - expression ( expression ) expression operator expression operator + - * /» Derivation of slope * x + intercept 15 Derivation Example Derivation of slope * x + intercept expression expression operator expression expression operator intercept expression + intercept expression operator expression + intercept expression operator x + intercept expression * x + intercept slope * x + intercept expression * slope * x + intercept» Identifiers were not derived for simplicity 16 8

Parse Trees A parse is graphical representation of a derivation Example 17 Ambiguous Grammars Alternative parse tree same expression same grammar This grammar is ambiguous 18 9

Designing unambiguous grammars Specify more grammatical structure In our example, left associativity and operator precedence» 10 4 3 means (10 4) 3» 3 + 4 * 5 means 3 + (4 * 5) 19 Example Parse tree for 3 + 4 * 5 Exercise: parse tree for - 10 / 5 * 8 4-5 20 10

Java Language Specification Available on-line http://java.sun.com/docs/books/jls/second_edition/html/j.tit le.doc.html Examples Comments: http://java.sun.com/docs/books/jls/second_edition/html/lex ical.doc.html#48125 Multiplicative Operators: http://java.sun.com/docs/books/jls/second_edition/html/ex pressions.doc.html#239829 Unary Operators: http://java.sun.com/docs/books/jls/second_edition/html/ex pressions.doc.html#4990 21 Reading Assignment Scott s Chapter 2 Section 2.1.2 Section 2.1.3 Java language specification Chapter 2 (Grammars) Glance at chapter 3 Glance at sections 15.17, 15.18 and 15.15 22 11