Syntax Intro and Overview. Syntax

Similar documents
Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Semantic Analysis. Role of Semantic Analysis

High Level Languages. Java (Object Oriented) This Course. Jython in Java. Relation. ASP RDF (Horn Clause Deduction, Semantic Web) Dr.

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

22c:111 Programming Language Concepts. Fall Syntax III

CPS 506 Comparative Programming Languages. Syntax Specification

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages!

Syntax. In Text: Chapter 3

Habanero Extreme Scale Software Research Project

Syntax and Grammars 1 / 21

3. Context-free grammars & parsing

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Principles of Programming Languages COMP251: Syntax and Grammars

EECS 6083 Intro to Parsing Context Free Grammars

Syntax. A. Bellaachia Page: 1

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Chapter 3. Describing Syntax and Semantics ISBN

Principles of Programming Languages COMP251: Syntax and Grammars

CSCE 314 Programming Languages

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Part 5 Program Analysis Principles and Techniques

Dr. D.M. Akbar Hussain

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 3302 Programming Languages Lecture 2: Syntax

Lexical and Syntax Analysis

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CS 230 Programming Languages

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

P L. rogramming anguages. Fall COS 301 Programming Languages. Syntax & Semantics. UMaine School of Computing and Information Science

Syntax & Semantics. COS 301 Programming Languages

Using an LALR(1) Parser Generator

Week 2: Syntax Specification, Grammars

CSE302: Compiler Design

Describing Syntax and Semantics

Related Course Objec6ves

1 Lexical Considerations

Introduction to Lexing and Parsing

Chapter 3. Describing Syntax and Semantics

This book is licensed under a Creative Commons Attribution 3.0 License

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Lexical and Syntax Analysis. Top-Down Parsing

COP 3402 Systems Software Syntax Analysis (Parser)

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

More Assigned Reading and Exercises on Syntax (for Exam 2)

Context-free grammars (CFG s)

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

A simple syntax-directed

Chapter 3. Describing Syntax and Semantics ISBN

4. LEXICAL AND SYNTAX ANALYSIS

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Lexical Considerations

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Programming Language Syntax and Analysis

Chapter 3. Describing Syntax and Semantics

ICOM 4036 Spring 2004

BNF, EBNF Regular Expressions. Programming Languages,

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Building Compilers with Phoenix

Programming Lecture 3

Programming Languages 2nd edition Tucker and Noonan"

Syntax-Directed Translation. Lecture 14

CSE 401 Midterm Exam Sample Solution 2/11/15

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Chapter 3. Describing Syntax and Semantics

ECE251 Midterm practice questions, Fall 2010

Lexical Considerations

Decaf Language Reference Manual

The SPL Programming Language Reference Manual

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Chapter 4. Lexical and Syntax Analysis

CS 314 Principles of Programming Languages

Lexical and Syntax Analysis

CSC312 Principles of Programming Languages : Type System. Copyright 2006 The McGraw-Hill Companies, Inc.

Syntax and Semantics

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Introduction to Parsing. Lecture 8

Informatica 3 Syntax and Semantics

Syntax. 2.1 Terminology

Transcription:

Syntax Intro and Overview CS331 Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will be ambiguity in a programming language We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Concrete Actual representation scheme down to every semicolon, i.e. every lexical token Abstract Description of a program s information without worrying about specific details such as where the parentheses or semicolons go 1

BNF Grammar BNF = Backus-Naur Form to specify a grammar Equivalent to a context free grammar Set of rewriting rules (a rule that can be applied multiple times) defined on a set of nonterminal symbols, a set of terminal symbols, and a start symbol Terminals, : Basic alphabet from which programs are constructed. E.g., letters, digits, or keywords such as int, main, {, } Nonterminals, N : Identify grammatical categories Start Symbol: One of the nonterminals which identifies the principal category. E.g., Sentence for english, Program for a programming language Rewriting Rules Rewriting Rules, ρ Written using the symbols and is a separator for alternative definitions, i.e. OR is used to define a rule, i.e. IS Format LHS RHS1 RHS2 RHS3 LHS is a single nonterminal RHS is any sequence of terminals and nonterminals 2

Sample Grammars Grammar for subset of English Sentence Noun Verb Noun Jack Jill Verb eats bites Grammar for a digit Digit 0 1 2 3 4 5 6 7 8 9 Grammar for signed integers SignedInteger Sign Integer Sign + - Integer Digit Digit Integer Grammar for subset of Java Assignment Variable = Expression Expression Variable Variable + Variable Variable Variable Variable X Y Derivation Process of parsing data using a grammar Apply rewrite rules to non-terminals on the RHS of an existing rule To match, the derivation must terminate and be composed of terminals only Example Digit 0 1 2 3 4 5 6 7 8 9 Integer Digit Digit Integer Is 352 an Integer? Integer Digit Integer 3 Integer 3 Digit Integer 3 5 Integer 3 5 Digit 3 5 2 Intermediate formats are called sentential forms This was called a Leftmost Derivation since we replaced the leftmost nonterminal symbol each time (could also do Rightmost) 3

Derivation and Parse Trees The derivation can be visualized as a parse tree Integer Digit Integer 3 Digit Integer 5 Digit 2 Parse Tree Sketch for Programs 4

BNF and Languages The language defined by a BNF grammar is the set of all strings that can be derived Language can be infinite, e.g. case of integers A language is ambiguous if it permits a string to be parsed into two separate parse trees Generally want to avoid ambiguous grammars Example: Expr Integer Expr + Expr Expr * Expr Expr - Expr Parse: 3*4+1 Expr * Expr Integer * Expr 3 * Expr 3 * Expr+Expr 3 * 4 + 1 Expr + Expr Expr + Integer Expr + 1 Expr * Expr +1 3 * 4 + 1 Ambiguity Example for AmbExp Integer AmbExp AmbExp 2-3-4 5

Ambiguous IF Statement Dangling ELSE: if (x<0) if (y<0) { y=y-1 } else { y=0 }; Does the else go with the first or second if? Dangling Else Ambiguity 6

How to fix ambiguity? Use explicit grammar without ambiguity E.g., add an ENDIF for every IF Java makes a separate category for if-else vs. if: IfThenStatement If (Expr) Statement IfThenElseStatement If (Expr) StatementNoShortIf else Statement StatementNoShortIf contains everything except IfThenStatement, so the else always goes with the IfThenElse statement not the IfThenStatement Use precedence on symbols Alternative to BNF The use of regular expressions is an alternate way to express a language 7

Regex to EBNF The book uses some deviations from standard regular expressions in Extended Backus Naur Format (defined in a few slides) { M } means zero or more occurrences of M ( M N) means one of M or N must be chosen [ M ] means M is optional Use { to mean the literal { not the regex { RegEx Examples Booleans true false Integers (0-9)+ Identifiers (a-za-z){a-za-z0-9} Comments (letters/space only) // {a-za-z }( \r \n \r\n ) Regular expressions seem pretty powerful Can you write one for the language a n b n? (i.e. n a s followed by n b s) 8

Extended BNF EBNF variation of BNF that simplifies specification of recursive rules using regular expressions in the RHS of the rule Example: BNF rule Expr Term Expr + Term Expr Term Term Factor Term * Factor Term / Factor EBNF equivalent Expr Term { [+ -] Term } Term Factor { [* / ] Factor } EBNF tends to be shorter and easier to read EBNF Consider: Expr Term{ (+ -) Term } Term Factor { (* / ) Factor } Factor Identifier Literal (Expr) Parse for X+2*Y 9

BNF and Lexical Analysis Lexicon of a programming language set of all nonterminals from which programs are written Nonterminals referred to as tokens Each token is described by its type (e.g. identifier, expression) and its value (the string it represents) Skipping whitespace or comments or punctuation Categories of Lexical Tokens Identifiers Literals Includes Integers, true, false, floats, chars Keywords bool char else false float if int main true while Operators = && ==!= < <= > >= + - * / %! [ ] Punctuation ;. { } ( ) Issues to consider: Ignoring comments, role of whitespace, distinguising the < operator from <=, distinguishing identifiers from keywords like if 10

A Simple Lexical Syntax for a Small Language, Clite Primary Identifier [ "["Expression"]" ] Literal "("Expression")" Type "("Expression")" Identifier Letter { Letter Digit } Letter a b z A B Z Digit 0 1 2 9 Literal Integer Boolean Float Char Integer Digit { Digit } Boolean true false Float Integer. Integer Char ASCIICHAR Major Stages in Compilation Lexical Analysis Translates source into a stream of Tokens, everything else discarded Syntactic Analysis Parses tokens, detects syntax errors, develops abstract representation Semantic Analysis Analyze the parse for semantic consistency, transform into a format the architecture can efficiently run on Code Generation Use results of abstract representation as a basis for generating executable machine code 11

Lexical Analysis & Compiling Process Difficulties: 1 to many mapping from HL source to machine code Translation must be correct Translation should be efficient Lexical Analysis of Clite Lexical Analysis transforms a program into tokens (type, value). The rest is tossed. Example Clite program: // Simple Program int main() { int x; x = 3; } Result of Lexical Analysis: 12

Lexical Analysis (2) Result of Lexical Analysis: 1 Type: Int Value: int 2 Type: Main Value: main 3 Type: LeftParen Value: ( 4 Type: RightParen Value: ) 5 Type: LeftBrace Value: { 6 Type: Int Value: int 7 Type: Identifier Value: x 8 Type: Semicolon Value: ; 9 Type: Identifier Value: x 10 Type: Assign Value: = 11 Type: IntLiteral Value: 3 12 Type: Semicolon Value: ; 13 Type: RightBrace Value: } 14 Type: Eof Value: <<EOF>> // Simple Program int main() { int x; x = 3; } Lexical Analysis of Clite in Java public class TokenTester { public static void main (String[] args) { Lexer lex = new Lexer (args[0]); Token t; int i = 1; } do { t = lex.next(); System.out.println(i+" Type: "+t.type() +"\tvalue: "+t.value()); i++; } while (t!= Token.eofTok); } The source code for how the Lexer and Token classes are arranged is the topic of chapter 3 13

Lexical to Concrete From the stream of tokens generated by our lexical analyzer we can now parse them using a concrete syntax Concrete EBNF Syntax for Clite Program int main ( ) { Declarations Statements } Declarations { Declaration } Declaration Type Identifier [ "["Integer"]" ] {, Identifier ["["Integer"]"] }; Type int bool float char Statements { Statement } Statement ; Block Assignment IfStatement WhileStatement Block { Statements } Assignment Identifier ["["Expression"]" ] = Expression ; IfStatement if "(" Expression ")" Statement [ else Statement ] WhileStatement while "("Expression")" Statement Concrete Syntax; Higher than lexical syntax! 14

Concrete EBNF Syntax for Clite Expression Conjunction { Conjunction } Conjunction Equality { && Equality } Equality Relation [ EquOp Relation ] EquOp ==!= Relation Addition [ RelOp Addition ] RelOp < <= > >= Addition Term { AddOp Term } AddOp + - Term Factor { MulOp Factor } MulOp * / % Factor [ UnaryOp ] Primary UnaryOp -! References lexical syntax Primary Identifier [ "["Expression"]" ] Literal "("Expression")" Type "(" Expression ")" Syntax Diagram Alternate way to specify a language Popularized with Pascal Not any more powerful than BNF, EBNF, or regular expressions 15

Linking Syntax and Semantics What we ve described so far has been concrete syntax Defines all parts of the language above the lexical level Assignments, loops, functions, definitions, etc. Uses BNF or variant to describe the language An abstract syntax links the concrete syntax to the semantic level Abstract Syntax Defines essential syntactic elements without describing how they are concretely constructed Consider the following Pascal and C loops Pascal C while i<n do begin while (i<n) { i:=i+1 i=i+1; end } Small differences in concrete syntax; identical abstract construct 16

Abstract Syntax Format Defined using rules of the form LHS = RHS LHS is the name of an abstract syntactic class RHS is a list of essential components that define the class Similar to defining a variable. Data type or abstract syntactic class, and name Components are separated by ; Recursion naturally occurs among the definitions as with BNF Abstract Syntax Example Loop Loop = Expression test ; Statement body The abstract class Loop has two components, a test which is a member of the abstract class Expression, and a body which is a member of an abstract class Statement Nice by-product: If parsing abstract syntax in Java, it makes sense to actually define a class for each abstract syntactic class, e.g. class Loop extends Statement { Expression test; Statement body; } 17

Abstract Syntax of Clite Program = Declarations decpart; Statements body; Declarations = Declaration* Declaration = VariableDecl ArrayDecl VariableDecl = Variable v; Type t ArrayDecl = Variable v; Type t; Integer size Type = int bool float char Statements = Statement* Statement = Skip Block Assignment Conditional Loop Skip = Block = Statements Conditional = Expression test; Statement thenbranch, elsebranch Loop = Expression test; Statement body Assignment = VariableRef target; Expression source Expression = VariableRef Value Binary Unary Abstract Syntax of Clite VariableRef = Variable ArrayRef Binary = Operator op; Expression term1, term2 Unary = UnaryOp op; Expression term Operator = BooleanOp RelationalOp ArithmeticOp BooleanOp = && RelationalOp = =!!= < <= > >= ArithmeticOp = + - * / UnaryOp =! - Variable = String id ArrayRef = String id; Expression index Value = IntValue BoolValue FloatValue CharValue IntValue = Integer intvalue FloatValue = Float floatvalue BoolValue = Boolean boolvalue CharValue = Character charvalue 18

Java AbstractSyntax for Clite class Loop extends Statement { Expression test; Statement body; } Class Assignment extends Statement { // Assignment = Variable target; Expression source Variable target; Expression source; } Much more see the file (when available) Abstract Syntax Tree Just as we can build a parse tree from a BNF grammar, we can build an abstract syntax tree from an abstract syntax Example for: x+2*y Expression = Variable Value Binary Binary = Operator op ; Expression term1, term2 Binary node Expr 19

Sample Clite Program Compute nth fib number Abstract Syntax for Loop of Clite Program 20

Concrete and Abstract Syntax Aren t the two redundant? A little bit The concrete syntax tells the programmer exactly what to write to have a valid program The abstract syntax allows valid programs in two different languages to share common abstract representations It is closer to semantics We need both! What s coming up? Semantic analysis Do the types match? What does this mean? char a= c ; int sum=0; sum = sum = a; Can associate machine code with the abstract parse Code generation Code optimization 21