CSCE 314 Programming Languages

Similar documents
CSCI312 Principles of Programming Languages!

CSE 3302 Programming Languages Lecture 2: Syntax

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

This book is licensed under a Creative Commons Attribution 3.0 License

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages

CPS 506 Comparative Programming Languages. Syntax Specification

Languages and Compilers

EECS 6083 Intro to Parsing Context Free Grammars

Syntax. In Text: Chapter 3

Principles of Programming Languages COMP251: Syntax and Grammars

Part 5 Program Analysis Principles and Techniques

COP 3402 Systems Software Syntax Analysis (Parser)

CMSC 330: Organization of Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CSCI312 Principles of Programming Languages!

Syntax. A. Bellaachia Page: 1

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Introduction to Lexical Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Lexing and Parsing

Lexical Analysis. Chapter 2

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS 314 Principles of Programming Languages

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

3. Context-free grammars & parsing

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Introduction to Lexical Analysis

A Simple Syntax-Directed Translator

Chapter 3. Describing Syntax and Semantics ISBN

Lexical Scanning COMP360

High Level Languages. Java (Object Oriented) This Course. Jython in Java. Relation. ASP RDF (Horn Clause Deduction, Semantic Web) Dr.

Part 3. Syntax analysis. Syntax analysis 96

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Homework & Announcements

Introduction to Syntax Analysis. The Second Phase of Front-End

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Dr. D.M. Akbar Hussain

Optimizing Finite Automata

Context-free grammars (CFG s)

Syntax Analysis Part I

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Habanero Extreme Scale Software Research Project

Syntax Intro and Overview. Syntax

Building Compilers with Phoenix

Week 2: Syntax Specification, Grammars

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 3-4

Introduction to Syntax Analysis

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Describing Syntax and Semantics

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Formal Languages. Formal Languages

Stating the obvious, people and computers do not speak the same language.

Syntax and Grammars 1 / 21

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CMPT 755 Compilers. Anoop Sarkar.

Chapter 3. Describing Syntax and Semantics

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Chapter 3: Lexing and Parsing

Formal Languages and Compilers Lecture VI: Lexical Analysis

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CSE302: Compiler Design

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

22c:111 Programming Language Concepts. Fall Syntax III

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

CT32 COMPUTER NETWORKS DEC 2015

Introduction to Parsing. Lecture 5

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 8

CS 403: Scanning and Parsing

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Chapter 3. Describing Syntax and Semantics

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Transcription:

CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1

What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how expressions, commands, declarations etc. are put together to result in the final program. The semantics of a language is concerned with the meaning of a program: how the programs behave when executed on computers Syntax defines the set of valid programs, semantics how valid programs behave 2

Programming Language Definition Syntax: grammatical structure lexical how words are formed phrasal how sentences are formed from words Semantics: meaning of programs Informal: English documents such as reference manuals Formal: 1. Operational semantics: execution on an abstract machine, e.g., <x:=c,s>->[ s[x -> s(c)] ] 2. Denotational semantics: meaning defined as a mathematical function from input to output, definition compositional, e.g., [[x:=c]](s)->s[x -> [[c]] s ] 3. Axiomatic semantics: each construct is defined by pre- and post- conditions, e.g., {y x} z:=x; z:=z+1 {y<z} 3

Language Syntax Defines legal programs: programs that can be executed by machine Defined by grammar rules Define how to make sentences out of words For programming languages Sentences are called statements, expressions, terms, commands, and so on Words are called tokens Grammar rules describe both tokens and statements Often, grammars alone cannot capture exactly the set of valid programs. Grammars combined with additional rules are a common approach. 4

Language Syntax (Cont.) Statement is a sequence of tokens Token is a sequence of characters Lexical analyzer produces a sequence of tokens from a character sequence Parser produces a statement representation from the token sequence Statements are represented as parse trees (abstract syntax tree) characters Lexical Analyzer tokens Parser sentences 5

Backus-Naur Form (BNF) BNF is a common notation to define programming language grammars A BNF grammar G = (N, T, P, S) A set of non-terminal symbols N A set of terminal symbols T (tokens) A set of grammar rules P A start symbol S Grammar rule form (describe context-free grammars): <non-terminal> ::= <sequence of terminals and non-terminals> 6

Examples of BNF BNF rules for robot commands A robot arm accepts any command from the set {up, down, left, right} Rules: <move> ::= <command> <command> <move> <command> ::= up <command> ::= down <command> ::= left <command> ::= right Examples of accepted sequences up down left up up right 7

How to Read Grammar Rules From left to right Generates the following sequence Each terminal symbol is added to the sequence Each non-terminal is replaced by its definition For each, pick any of the alternatives Note that a grammar can be used to both generate a statement, and verify that a statement is legal The latter is the task of parsing find out if a sentence (program) is in a language, and how the grammar generates the sentence 8

Extended BNF Constructs and notation: <x> nonterminal x <x> ::= Body <x> <y> {<x>} {<x>}+ [<x>] <x> is defined by Body the sequence <x> followed by <y> the sequence of zero or more occurrences of <x> the sequence of one or more occurrences of <x> zero or one occurrence of <x> Example <expression> ::= <variable> <integer> <expression> ::= <expression> + <expression>... <statement> ::= if <expression> then <statement> { elseif <expression> then <statement> }+ [ else <statement> ] end... <statement> ::= <expression> return <expression>... 9

Example Grammar Rules (Part of C++ Grammar) A.5 Statements statement: labeled-statement expression-statement compound-statement selection-statement iteration-statement jump-statement declaration-statement try-block labeled-statement: identifier : statement case constant-expression : statement default : statement expression-statement: expression opt ; compound-statement: { statement-seq opt } statement-seq: statement statement-seq statement selection-statement: if ( condition ) statement if ( condition ) statement else statement switch ( condition ) statement condition: expression type-specifier-seq declarator = assignment-expression iteration-statement: while ( condition ) statement do statement while ( expression ) ; for ( for-init-statement ; condition opt ; expression opt ) statement for-init-statement: expression-statement simple-declaration jump-statement: break ; continue ; return expression opt ; goto identifier ; declaration-statement: block-declaration 10

Context Free Grammars A grammar G = (N, T, S, P) with the set of alphabet V is called context free if and only if all productions in P are of the form A -> B where A is a single nonterminal symbol and B is in V *. The reason this is called context free is that the production A -> B can be applied whenever the symbol A occurs in the string, no matter what else is in the string. Example: The grammar G = ( {S}, {a,b}, S, P ) where P = { S -> ab asb } is context free. The language generated by G is L(G) = { a n b n n >= 1}. 11

Concrete vs. Abstract Syntax Concrete syntax tree Result of using a PL grammar to parse a program is a parse tree Contains every symbol in the input program, and all non-terminals used in the program s derivation Abstract syntax tree (AST) Many symbols in input text are uninteresting (punctuation, such as commas in parameter lists, etc.) AST only contains meaningful information Other simplifications can also be made, e.g., getting rid of syntactic sugar, removing intermediate nonterminals, etc. 12

Ambiguity (1) A grammar is ambiguous if there exists a string which gives rise to more than one parse tree E.g., infix binary operators - ::= - Now parse 1 2-3 As (1-2)-3 As 1-(2-3) '-' '-' '-' '-' 3 1 1 2 2 3 13

Ambiguity (2) E.g., infix binary operators + and * ::= + * == Now parse 1 + 2 * 3 As (1+2)*3 As 1+(2*3) * + + * 3 1 1 2 2 3 14

Resolving Ambiguities 1. Between two calls to the same binary operator Associativity rules left-associative: a op b op c parsed as (a op b) op c right-associative: a op b op c parsed as a op (b op c) By disambiguating the grammar ::= - vs. ::= - 2. Between two calls to different binary operator Precedence rules if op1 has higher-precedence than op2 then a op1 b op2 c => (a op1 b) op2 c if op2 has higher-precedence than op1 then a op1 b op2 c => a op1 (b op2 c) 15

Resolving Ambiguities (Cont.) Rewriting the ambiguous grammar: ::= + * == Let us give * the highest precedence, + the next highest, and == the lowest ::= <sum> { == <sum> } <sum> ::= <term> <sum> + <term> <term> ::= <term> * 16

Dangling-Else Ambiguity Ambiguity in grammar is not a problem occurring only with binary operators For example, <S> ::= if <E> then <S> if <E> then <S> else <S> Now consider the string: if A then if B then X else Y 1. if A then ( if B then X else Y )? 2. if A then ( if B then X ) else Y? 17

Chomsky Hierarchy Four classes of grammars that define particular classes of languages Type 0 Phrase-structure Grammars 1. Regular grammars Type 1 2. Context free grammars Context-Sensitive 3. Context sensitive Type 2 grammars Context-Free 4. Phrase-structure Type 3 (unrestricted) grammars Regular Ordered from less expressive to more expressive (but faster to slower to parse) Regular grammars and CF grammars are of interest in theory of programming languages 18

Regular Grammar Productions are of the form A -> ab or A -> a where A, B are nonterminal symbols and a is a terminal symbol. Can contain S -> λ. Example regular grammar G = ({A, S}, {a, b, c}, S, P), where P consists of the following productions: S -> aa A -> ba ca a G generates the following words aa, aba, aca, abba, abca, acca, abbba, abbca, abcba, The language L(G) in regular expression: a(b+c)*a 19

Regular Languages The following three formalisms all express the same set of (regular) languages: 1. Regular grammars 2. Regular expressions 3. Finite state automata Not very expressive. For example, the language L = { a n b n n >= 1 } is not regular. Question: Can you relate this language L to parsing programming languages? Answer: balancing parentheses 20

Finite State Automata A finite state automaton M=(S, I, f, s 0, F) consists of: a finite set S of states a finite set of input alphabet I a transition function f: SxI -> S that assigns to a given current state and input the next state of the automaton an initial state s 0, and a subset F of S consisting of accepting (or final) states Example: 1. Regular grammar 3. FSA S -> aa A -> ba ca a 2. Regular expression a(b+c)*a S a b A c a F 21

Why a Separate Lexer? Regular languages are not sufficient for expressing the syntax of practical programming languages, so why use them? Simpler (and faster) implementation of the tedious (and potentially slow) character-bycharacter processing: DFA gives a direct implementation strategy Separation of concerns deal with low level issues (tabs, linebreaks, token positions) in isolation: grammars for parsers need not go below token level 22

Summary of the Productions 1. Phrase-structure (unrestricted) grammars A -> B where A is string in V * containing at least one nonterminal symbol, and B is a string in V *. 2. Context sensitive grammars lar -> lwr where A is a nonterminal symbol, and w a nonempty string in V *. Can contain S ->λ if S does not occur on RHS of any production. 3. Context free grammars A -> B where A is a nonterminal symbol. 4. Regular grammars A -> ab or A -> a where A, B are nonterminal symbols and a is a terminal symbol. Can contain S -> λ. 23