Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Similar documents
Compilers & Translation Systems Engineering

Assignment 1. Due 08/28/17

CPS 506 Comparative Programming Languages. Syntax Specification

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Lecture 10 Parsing 10.1

A simple syntax-directed

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Semantic actions for declarations and expressions

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Semantic actions for declarations and expressions. Monday, September 28, 15

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Syntax. In Text: Chapter 3

Last Time. What do we want? When do we want it? An AST. Now!

A programming language requires two major definitions A simple one pass compiler

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Intro and Overview. Syntax

Defining syntax using CFGs

Part 5 Program Analysis Principles and Techniques

Parsing II Top-down parsing. Comp 412

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Chapter 3. Describing Syntax and Semantics ISBN

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

A Simple Syntax-Directed Translator

Introduction to Parsing

Lexical and Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Topic 3: Syntax Analysis I

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Semantic actions for expressions

Dr. D.M. Akbar Hussain

Context-free grammars (CFG s)

CSE 3302 Programming Languages Lecture 2: Syntax

Semantic actions for declarations and expressions

EECS 6083 Intro to Parsing Context Free Grammars

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Principles of Programming Languages COMP251: Syntax and Grammars

Left to right design 1

Programming Language Syntax and Analysis

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

LECTURE 7. Lex and Intro to Parsing

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CMSC 330: Organization of Programming Languages

Lexical Scanning COMP360

Wednesday, August 31, Parsers

Monday, September 13, Parsers

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Building Compilers with Phoenix

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Chapter 2 - Programming Language Syntax. September 20, 2017

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Working of the Compilers

Compiler Design Overview. Compiler Design 1

Using an LALR(1) Parser Generator

Syntax-Directed Translation. Lecture 14

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Lecture 4: Syntax Specification

CS 415 Midterm Exam Spring 2002

Context-Free Grammar (CFG)

Compilers. Compiler Construction Tutorial The Front-end

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Chapter 3. Describing Syntax and Semantics

CS Concepts of Compiler Design

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CS 230 Programming Languages

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Theory and Compiling COMP360

Abstract Syntax. Mooly Sagiv. html://

CSE 401 Midterm Exam 11/5/10

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CSE 401 Midterm Exam Sample Solution 2/11/15

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

COMPILERS BASIC COMPILER FUNCTIONS

The Structure of a Syntax-Directed Compiler

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Habanero Extreme Scale Software Research Project

Downloaded from Page 1. LR Parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

LANGUAGE PROCESSORS. Introduction to Language processor:

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSCE 314 Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CS 403: Scanning and Parsing

Transcription:

Last time Source code Scanner Tokens Parser What are compilers? Phases of a compiler Syntax tree Semantic Routines IR Optimizer IR Code Generation Executable

Extra: Front-end vs. Back-end Scanner + Parser + Semantic actions + (high level) optimizations called the front-end of a compiler IR-level optimizations and code generation (instruction selection, scheduling, register allocation) called the back-end of a compiler Can build multiple front-ends for a particular back-end e.g., gcc & g++, or many front-ends which generate CIL Can build multiple back-ends for a particular front-end e.g., gcc allows targeting different architectures Source code Scanner Tokens Parser Syntax tree Semantic Routines IR Optimizer IR Code Generation Executable front end back end

The MICRO compiler: a simple example

High level structure Single pass compiler: no intermediate representation Scanner tokenizes input stream, but is called by parser on demand As parser recognizes constructs, it invokes semantic routines Semantic routines build symbol table on-the-fly and directly generate code for a 3-address virtual machine

The Micro language Tokens are defined by regular expressions Tokens: BEGIN, END, READ, WRITE, ID, LITERAL, LPAREN, RPAREN, SEMICOLON, COMMA, ASSIGN_OP, PLUS_OP, MINUS_OP, SCANEOF Implicit identifier declaration (no need to predeclare variables): ID = [A-Z][A-Z0-9]* Literals (numbers): LITERAL = [0-9]+ Comments (not passed on as tokens): --(Not(\n))*\n Program: BEGIN {statements} END

The Micro language One data type all IDs are integers Statement: ID := EXPR Expressions are simple arithmetic expressions which can contain identifiers Note: no unary minus Input/output READ(ID, ID,...) WRITE(EXPR, EXPR,...)

Scanner Scanner algorithm provided in book (pages 28/29) What the scanner can identify corresponds to what the finite automaton for a regular expression can accept Identifies the next token in the input stream Read a token (process finite automaton until accept state found) Identify its type (determine which accept state the FA is in) Return type and value (e.g., type = LITERAL, value = 5)

Recognizing tokens Skip spaces If the first non-space character is: letter: read until non-alphanumeric. Check for reserved words ( begin, end ). Return reserved word or (ID and variable name) digit: read until non-digit. Return LITERAL and number ( ) ;, +: return single character : : next must be =. Return ASSIGN_OP - : if next is also - skip to end of line, otherwise return MINUS_OP unget the next character that had to be read to find end of IDs, reserved words, literals and minus ops.

Parsers and Grammars Language syntax is usually specified with context-free grammars (CFGs) Backus-Naur form (BNF) is the standard notation Written as a set of rewrite rules: Non-terminal ::= (set of terminals and non-terminals) Terminals are the set of tokens Each rule tells how to compose a non-terminal from other non-terminals and terminals

Micro grammar program ::= BEGIN statement_list END statement_list ::= statement { statement } statement ::= ID := expression; READ( id_list) WRITE( expr_list) id_list ::= ID {, ID} expr_list ::= expression {, expression} expression ::= primary {add_op primary} primary ::= ( expression ) ID LITERAL add_op ::= PLUSOP MINUSOP system_goal ::= program SCANEOF

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS program SCANEOF replace program

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN statement_list END replace statement_list

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN statement; {statement} END drop {statement }

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN statement; END replace statement

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN ID := expression; END replace expression

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN ID := primary add_op primary; END replace 1st primary

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN ID := ID add_op primary; END replace add_op

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN ID := ID + primary; END replace primary

Relating the CFG to a program CFGs can produce a program by applying a sequence of productions How to produce BEGIN id := id + id; END Rewrite by starting with the goal production and replacing non-terminals with the rule s RHS BEGIN ID := ID + ID; END

How do we go in reverse? How do we parse a program given a CFG? Start at goal term, rewrite productions from left to right If it is a terminal, make sure we match input token Otherwise, there is a syntax error If it is a non-terminal If there is a single choice for a production, pick it If there are multiple choices for a production, choose the production that matches the next token(s) in the stream e.g., when parsing statement, could use production for ID, READ or WRITE Note that this means we have to look ahead in the stream to match tokens!

Quiz: how much lookahead? program ::= BEGIN statement_list END statement_list ::= statement { statement } statement ::= ID := expression; READ( id_list) WRITE( expr_list) id_list ::= ID {, ID} expr_list ::= expression {, expression} expression ::= primary {add_op primary} primary ::= ( expression ) ID LITERAL add_op ::= PLUSOP MINUSOP system_goal ::= program SCANEOF

Recursive descent parsing Idea: parse using a set of mutually recursive functions One function per non-terminal Each function attempts to match any terminals in its production If a rule produces non-terminals, call the appropriate function statement() { token = peek_at_match(); switch(token) { case ID: match(id); //consume ID match(assign); //consume := expression(); //process non-terminal break; case READ: match(read); //consume READ match(lparen); //match ( id_list(); //process non-terminal match(rparen); //match ) break; case WRITE: match(write); match(lparen); //match ( expr_list(); //process non-terminal match(rparen); //match ) break; } match(semicolon); } statement ::= ID := expression; READ( id_list) WRITE( expr_list)

Recursive descent parsing (II) How do we parse id_list ::= ID {, ID} One idea: this is equivalent to id_list ::= ID ID, id_list This is equivalent to the following loop id_list() { match(id); //consume ID if (peek_at_match() == COMMA) { match(comma) id_list(); } } id_list() { match(id); //consume ID while (peek_at_match() == COMMA) { match(comma) match(id); } } Note: in both cases, if peek_at_match() isn t COMMA, we don t consume the next token!

General rules One function per non-terminal Non-terminals with multiple choices (like statement) use case or if statements to distinguish Conditional based on first set of the non-terminal, the terminals that can distinguish between productions An optional list (like id_list) uses a while loop Conditional for continuing with loop is if next terminal(s) are in first-set Full parser code in textbook on pages 36-38

Semantic processing Want to generate code for a 3-address machine: OP A, B, C performs A op B C Temporary variables may be created to convert more complex expressions into three-address code Naming scheme: Temp&1, Temp&2, etc. D = A + B * C MULT C, B, Temp&1 ADD A, Temp&1, Temp&2 STORE &Temp2, D

Semantic action routines To produce code, we call routines during parsing to generate three-address code. These action routines do one of two things: Collect information about passed symbols for use by other semantic action routines. This information is stored in semantic records. Generate code using information from semantic records. and the current parse procedure Note: for this to work correctly, we must parse expressions according to order of operations (i.e., must parse a * expression before a + expression)

Operator Precedence Operator precedence can be specified in the CFG CFG can determine the order in which expressions are parsed For example: expr factor primary ::= factor {+ factor} ::= primary {* primary} ::= (expr) ID LITERAL Because +-expressions are composed of *-expressions, we will finish dealing with the * production before we finish with the + production

Example Annotations are inserted into grammar, specifying when semantic routines should be called Consider A = B + 2; statement ::= ID = expr #assign expr ::= term + term #addop term ::= ID #id LITERAL #num num() and id() create semantic records containing ID names and number values addop() generates code for the expression, using information from the num() and id() records, and creates a temporary variable assign() generates code for the assignment using the temporary variable generated by addop()

More in book Complete set of routines for recursive descent parser (pages 36 38) Semantic records for Micro (page 41, figure 2.8) Complete annotated grammar for Micro (page 42, figure 2.9) Semantic routines for Micro (pages 43 45)

Next time Scanners How to specify the tokens for a language How to construct a scanner How to use a scanner generator

Backup slides

Annotated Micro Grammar (fig. 2.9) Program ::= #start BEGIN Statement-list END Statement-list ::= Statement {Statement} Statement ::= ID := Expression; #assign READ ( Id-list ) ; WRITE ( Expr-list ) ; Id-list ::= Ident #read_id {, Ident #read_id } Expr-list ::= Expression #write_expr {, Expression #write_expr } Expression ::= Primary { Add-op Primary #gen_infix} Primary ::= ( Expression ) Ident INTLITERAL #process_literal Ident ::= ID #process_id Add-op ::= PLUSOP #process_op MINUSOP #process_op System-goal ::= Program SCANEOF #finish 33

Annotated Micro Grammar Program ::= #start BEGIN Statement-list END Semantic routines in Chap. 2 print information about what the parser has recognized. At #start, nothing has been recognized, so this takes no action. End of parse is recognized by the final production: System-goal ::= Program SCANEOF #finish In a production compiler, the #start routine might set up program initialization code (i.e. initialization of heap storage and static storage, initialization of static values, etc.) 34

Annotated Micro Grammar Statement-list ::= Statement {Statement} No semantic actions are associated with this statement because the necessary semantic actions associated with statements are done when a statement is recognized. 35

Annotated Micro Grammar Statement ::= ID := Expression; #assign READ ( Id-list ) ; WRITE ( Expr-list ) ; Expr-list ::= Expression #write_expr {, Expression #write_expr } Expression ::= Primary { Add-op Primary #gen_infix} Primary ::= ( Expression ) Ident INTLITERAL #process_literal Different semantic actions used when the parser finds an expression. In Exprlist, it is handled with write_expr, whereas Primary we choose to do nothing but could express a different semantic action if there were a reason to do so. We know that different productions, or rules of the grammar, are reached in different ways, and can tailor semantic actions (and the grammar) appropriately. 36

Annotated Micro Grammar Statement ::= Ident := Expression; #assign READ ( Id-list ) ; WRITE ( Expr-list ) ; Id-list ::= Ident #read_id {, Ident #read_id } Ident ::= ID #process_id Note that in the grammar of Fig. 2.4, there is no Ident nonterminal. By adding a nonterminal Ident a placeholder is created to take semantic actions as the nonterminal is processed. The programs look syntactically the same, but the additional productions allow the semantics to be richer. Semantic actions create a semantic record for the ID and thereby create something for read_id to work with. 37