Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

Similar documents
Overview. Overview. Programming problems are easier to solve in high-level languages

Compilers Crash Course

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A simple syntax-directed

SOFTWARE ARCHITECTURE 5. COMPILER

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

A Simple Syntax-Directed Translator

Chapter 2 A Quick Tour

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Compiler Construction LECTURE # 1

CSCE 314 Programming Languages

Syntax and Grammars 1 / 21

Principles of Programming Languages COMP251: Syntax and Grammars

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSE302: Compiler Design

CSE 3302 Programming Languages Lecture 2: Syntax

Introduction to Compiler Design

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Lecture 1: Introduction

CPS 506 Comparative Programming Languages. Syntax Specification

Syntax Analysis Part I

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

EECS 6083 Intro to Parsing Context Free Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

Compilers. Lecture 2 Overview. (original slides by Sam

Introduction to Parsing

Introduction to Compiler

2.2 Syntax Definition

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

A programming language requires two major definitions A simple one pass compiler

Defining syntax using CFGs

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CS5363 Final Review. cs5363 1

CSE 582 Autumn 2002 Exam 11/26/02

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

VIVA QUESTIONS WITH ANSWERS

Properties of Regular Expressions and Finite Automata

Interpreters. Prof. Clarkson Fall Today s music: Step by Step by New Kids on the Block

COP 3402 Systems Software Syntax Analysis (Parser)

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

Part 5 Program Analysis Principles and Techniques

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Question Bank. 10CS63:Compiler Design

Syntax-Directed Translation. Lecture 14

CS558 Programming Languages. Winter 2013 Lecture 1

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Compiling Regular Expressions COMP360

Optimizing Finite Automata

Introduction to Lexical Analysis

Syntax. A. Bellaachia Page: 1

Parsing II Top-down parsing. Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS 314 Principles of Programming Languages

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Compiler Construction D7011E

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Chapter 3. Describing Syntax and Semantics

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

You may work with a partner on this quiz; both of you must submit your answers.

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

Lexical Analysis. Introduction

CSE 582 Autumn 2002 Exam Sample Solution

CMSC 330: Organization of Programming Languages

Overview of Compiler. A. Introduction

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

22c:111 Programming Language Concepts. Fall Syntax III

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Program Abstractions, Language Paradigms. CS152. Chris Pollett. Aug. 27, 2008.

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Crafting a Compiler with C (II) Compiler V. S. Interpreter

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Building Compilers with Phoenix

COMPILER DESIGN LECTURE NOTES

Compilers and computer architecture From strings to ASTs (2): context free grammars

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Summary: Direct Code Generation

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Principles of Programming Languages COMP251: Syntax and Grammars

SYSTEMS PROGRAMMING AND COMPUTER ARCHITECTURE Assignment 5: Assembly and C

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

CST-402(T): Language Processors

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Introduction to Parsing Ambiguity and Syntax Errors

CS 406/534 Compiler Construction Putting It All Together

Transcription:

Introduction Compiler Design CSE 504 1 Overview 2 Syntax-Directed Translation 3 Phases of Translation Last modifled: Mon Jan 25 2016 at 00:15:02 EST Version: 1.5 23:45:54 2013/01/28 Compiled at 12:59 on 2016/01/25 Compiler Design Introduction CSE 504 1 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages Compiler Design Introduction CSE 504 2 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain Compiler Design Introduction CSE 504 2 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQL, Tcl/Tk,... Compiler Design Introduction CSE 504 2 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQL, Tcl/Tk,... Solutions have to be executed by a machine Compiler Design Introduction CSE 504 2 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQL, Tcl/Tk,... Solutions have to be executed by a machine Instructions to a machine are specified in a language that reflects to the cycle-by-cycle working of a processor Compiler Design Introduction CSE 504 2 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQL, Tcl/Tk,... Solutions have to be executed by a machine Instructions to a machine are specified in a language that reflects to the cycle-by-cycle working of a processor Compilers are the bridges: Compiler Design Introduction CSE 504 2 / 33

What is a Compiler? Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQL, Tcl/Tk,... Solutions have to be executed by a machine Instructions to a machine are specified in a language that reflects to the cycle-by-cycle working of a processor Compilers are the bridges: Software that translates programs written in high-level languages to efficient executable code. Compiler Design Introduction CSE 504 2 / 33

An Example int gcd(int m, int n) { if (m == 0) return n; else if (m > n) return gcd(n, m); else return gcd(n%m, m); } _gcd: LFB2: pushq %rbp LCFI0: movq %rsp, %rbp LCFI1: movl %edi, %ecx movl %esi, %edi testl %ecx, %ecx jne L11 jmp L3.align 4,0x90 L13: L11: L6: L3: movl movl cmpl jg movl sarl idivl movl testl jne movl leave ret %edx, %ecx %edi, %edx %edi, %ecx L6 %edi, %eax $31, %edx %ecx %ecx, %edi %edx, %edx L13 %edi, %eax Compiler Design Introduction CSE 504 3 / 33

Requirements In order to translate statements in a language, one needs to understand both the structure of the language: the way sentences are constructed in the language, and the meaning of the language: what each sentence stands for. Terminology: Structure Syntax Meaning Semantics Compiler Design Introduction CSE 504 4 / 33

Translation Strategy Classic Software Engineering Problem Objective: Translate a program in a high level language into efficient executable code. Strategy: Divide translation process into a series of phases Each phase manages some particular aspect of translation. Interfaces between phases governed by specific intermediate forms. Compiler Design Introduction CSE 504 5 / 33

Translation Process Source Program Syntax Analysis Abstract Syntax Tree Semantic Processing Target Program Compiler Design Introduction CSE 504 6 / 33

Translation Steps Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Compiler Design Introduction CSE 504 7 / 33

Translation Steps Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Semantic Analysis Phase: Infers information about the program using the semantics of the language Compiler Design Introduction CSE 504 7 / 33

Translation Steps Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Semantic Analysis Phase: Infers information about the program using the semantics of the language Intermediate Code Generation Phase: Generates abstract code based on the syntactic structure of the program and the semantic information from Phase 2. Compiler Design Introduction CSE 504 7 / 33

Translation Steps Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Semantic Analysis Phase: Infers information about the program using the semantics of the language Intermediate Code Generation Phase: Generates abstract code based on the syntactic structure of the program and the semantic information from Phase 2. Optimization Phase: Refines the generated code using a series of optimizing transformations. Compiler Design Introduction CSE 504 7 / 33

Translation Steps Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Semantic Analysis Phase: Infers information about the program using the semantics of the language Intermediate Code Generation Phase: Generates abstract code based on the syntactic structure of the program and the semantic information from Phase 2. Optimization Phase: Refines the generated code using a series of optimizing transformations. Final Code Generation Phase: Translates the abstract intermediate code into specific machine instructions. Compiler Design Introduction CSE 504 7 / 33

Structure of a Compiler: an Analogy Syntax-Directed Translation: the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). Compiler Design Introduction CSE 504 8 / 33

Structure of a Compiler: an Analogy Syntax-Directed Translation: the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). Bawat tao y isinilang na may laya at magkakapantay ang taglay na dangal at karapatan. Compiler Design Introduction CSE 504 8 / 33

Structure of a Compiler: an Analogy Syntax-Directed Translation: the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). Bawat tao y isinilang na may laya at magkakapantay ang taglay na dangal at karapatan. Green wire connect after first not cut white also red wire. Compiler Design Introduction CSE 504 8 / 33

Structure of a Compiler: an Analogy Syntax-Directed Translation: the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). Bawat tao y isinilang na may laya at magkakapantay ang taglay na dangal at karapatan. Green wire connect after first not cut white also red wire. He sailed the coffee out of the leaf. Compiler Design Introduction CSE 504 8 / 33

Structure of a Compiler: an Analogy Syntax-Directed Translation: the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). Bawat tao y isinilang na may laya at magkakapantay ang taglay na dangal at karapatan. Green wire connect after first not cut white also red wire. He sailed the coffee out of the leaf. This sentence has four words. Compiler Design Introduction CSE 504 8 / 33

Syntax Defining and Recognizing Sentences in a Language Layered approach Alphabet: defines allowed symbols Lexical Structure: defines allowed words Syntactic Structure: defines allowed sentences We will later associate meaning with sentences (semantics) based on their syntactic structure. Compiler Design Introduction CSE 504 9 / 33

Formal Language Specification Solid theoretical results applied to a practical problem. Declarative vs. Operational Notations Declarative notation is used to define a language Defines precisely the set of allowed objects (words/sentences) Examples: Regular expressions, Grammars. Operational notation is used to recognize statements in a language Defines an algorithm for determining whether or not a given word/sentence is in the language Example: Automata Results from theory on converting between the two notations. Compiler Design Introduction CSE 504 10 / 33

Formal Languages A language is a set of strings over a set of symbols. The set of symbols of a language is called its alphabet (usually denoted by Σ. Compiler Design Introduction CSE 504 11 / 33

Formal Languages A language is a set of strings over a set of symbols. The set of symbols of a language is called its alphabet (usually denoted by Σ. Each string in the language is called a sentence. Compiler Design Introduction CSE 504 11 / 33

Formal Languages A language is a set of strings over a set of symbols. The set of symbols of a language is called its alphabet (usually denoted by Σ. Each string in the language is called a sentence. Parts of sentences are called phrases. Compiler Design Introduction CSE 504 11 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form where X β Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and β is a sequence of terminal and non-terminal symbols Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and β is a sequence of terminal and non-terminal symbols Example: Stmt while Expr do Stmt Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and β is a sequence of terminal and non-terminal symbols Example: Stmt while Expr do Stmt A unique non-terminal, called the start symbol, represents the set of all sentences in the language. Compiler Design Introduction CSE 504 12 / 33

Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and β is a sequence of terminal and non-terminal symbols Example: Stmt while Expr do Stmt A unique non-terminal, called the start symbol, represents the set of all sentences in the language. The language defined by a grammar G is denoted by L(G). Compiler Design Introduction CSE 504 12 / 33

Example Grammar List of digits separated by + and signs (Example 2.1 in book): L L + L L L - L L D D 0 1... 9 Derivation of 9-5+2 from L: L = L - L Compiler Design Introduction CSE 504 13 / 33

Example Grammar List of digits separated by + and signs (Example 2.1 in book): L L + L L L - L L D D 0 1... 9 Derivation of 9-5+2 from L: L = L - L = D - L Compiler Design Introduction CSE 504 13 / 33

Example Grammar List of digits separated by + and signs (Example 2.1 in book): L L + L L L - L L D D 0 1... 9 Derivation of 9-5+2 from L: L = L - L = D - L = 9 - L Compiler Design Introduction CSE 504 13 / 33

Example Grammar List of digits separated by + and signs (Example 2.1 in book): L L + L L L - L L D D 0 1... 9 Derivation of 9-5+2 from L: L = L - L = D - L = 9 - L = 9 - L + L Compiler Design Introduction CSE 504 13 / 33

Example Grammar List of digits separated by + and signs (Example 2.1 in book): L L + L L L - L L D D 0 1... 9 Derivation of 9-5+2 from L: L = L - L = D - L = 9 - L = 9 - L + L = 9 - D + L = 9-5 + L = 9-5 + D = 9-5 + 2 Compiler Design Introduction CSE 504 13 / 33

Parse Trees Pictorial representation of derivations L = L - L = D - L = 9 - L = 9 - L + L = 9 - D + L = 9-5 + L = 9-5 + D = 9-5 + 2 L - L D L D L + L D 9 5 2 Compiler Design Introduction CSE 504 14 / 33

Parse Trees Pictorial representation of derivations L = L - L = D - L = 9 - L = 9 - L + L = 9 - D + L = 9-5 + L = 9-5 + D = 9-5 + 2 L - L D L D L + L D 9 5 2 L = L - L = L - L + L = L - L + D = L - L + 2 = L - D + 2 = L - 5 + 2 = D - 5 + 2 = 9-5 + 2 Note: one parse tree may correspond to multiple derivations! Compiler Design Introduction CSE 504 14 / 33

Ambiguity A grammar is ambiguous if some sentence in the language has more than one parse tree. L L L - L L + L D L + L L - L D D D D D 9 5 2 9 5 2 Compiler Design Introduction CSE 504 15 / 33

Associativity and Precedence 9-5+2 (9-5)+2 9-5-2 (9-5)-2 9+5+2 (9+5)+2 + and usually have the same precedence and are left-associative. i.e. the second parse tree in the previous slide is the correct one The grammar can be changed to reflect the associativity and precedence: L L + D L L - D L D D 0 1... 9 Compiler Design Introduction CSE 504 16 / 33

Syntax-Directed Translation Schemes A notation that attaches program fragments (also called actions) to productions in a grammar. Compiler Design Introduction CSE 504 17 / 33

Syntax-Directed Translation Schemes A notation that attaches program fragments (also called actions) to productions in a grammar. The intuition is, whenever a production is used in recognizing a sentence, the corresponding action will be taken. Compiler Design Introduction CSE 504 17 / 33

Syntax-Directed Translation Schemes A notation that attaches program fragments (also called actions) to productions in a grammar. The intuition is, whenever a production is used in recognizing a sentence, the corresponding action will be taken. Example: L L + D {add} L L - D {sub} L D D 0 {push 0} D 1 {push 1}. Compiler Design Introduction CSE 504 17 / 33

Syntax-Directed Translation Actions can be seen as additional leaves introduced into a parse tree. Reading the actions left-to-right in the tree gives the translation. Compiler Design Introduction CSE 504 18 / 33

Syntax-Directed Translation Actions can be seen as additional leaves introduced into a parse tree. Example: Reading the actions left-to-right in the tree gives the translation. L L + D {add} L L - D {sub} L D D 0 {push 0} D 1 {push 1}. Compiler Design Introduction CSE 504 18 / 33

Syntax-Directed Translation Actions can be seen as additional leaves introduced into a parse tree. Example: Reading the actions left-to-right in the tree gives the translation. L L L + D {add} L L - D {sub} L D D 0 {push 0} D 1 {push 1}. L L - D 9 D + D 5 2 {push 9} {push 5} {sub} {push 2} {add} Compiler Design Introduction CSE 504 18 / 33

Grammars for Language Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. Compiler Design Introduction CSE 504 19 / 33

Grammars for Language Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. The grammar notation may be tedious for some aspects of a language. Compiler Design Introduction CSE 504 19 / 33

Grammars for Language Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. The grammar notation may be tedious for some aspects of a language. For instance, an integer is defined by a grammar of the following form: I P + P - P P D P P D D 0 1... 9 Compiler Design Introduction CSE 504 19 / 33

Grammars for Language Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. The grammar notation may be tedious for some aspects of a language. For instance, an integer is defined by a grammar of the following form: I P + P - P P D P P D D 0 1... 9 For simpler fragments, the notation of regular expressions may be used. Compiler Design Introduction CSE 504 19 / 33

Grammars for Language Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. The grammar notation may be tedious for some aspects of a language. For instance, an integer is defined by a grammar of the following form: I P + P - P P D P P D D 0 1... 9 For simpler fragments, the notation of regular expressions may be used. I = (+ -)?[0 9]+ Compiler Design Introduction CSE 504 19 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Parsing: Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Parsing: A parser converts a stream of tokens into a tree. Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Parsing: A parser converts a stream of tokens into a tree. Parsing uncovers the structure of a sentence in the language. Compiler Design Introduction CSE 504 20 / 33

Syntax Analysis in Practice Usually divided into Lexical Analysis followed by Parsing. Lexical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Parsing: A parser converts a stream of tokens into a tree. Parsing uncovers the structure of a sentence in the language. Parsers are specified by grammars (actually, by translation schemes which are sets of productions associated with actions). Compiler Design Introduction CSE 504 20 / 33

Translation Process Source Program Syntax Analysis Abstract Syntax Tree Semantic Processing Target Program Compiler Design Introduction CSE 504 21 / 33

Syntax Analysis Source Program Lexical Analysis Token Stream Parsing Abstract Syntax Tree Compiler Design Introduction CSE 504 22 / 33

Lexical Analysis First step of syntax analysis Objective: Convert the stream of characters representing input program into a sequence of tokens. Tokens are the words of the programming language. Examples: The sequence of characters static int is recognized as two tokens, representing the two words static and int. The sequence of characters *x++ is recognized as three tokens, representing *, x and ++. Compiler Design Introduction CSE 504 23 / 33

Parsing Second step of syntax analysis Objective: Uncover the structure of a sentence in the program from a stream of tokens. For instance, the phrase x = +y, which is recognized as four tokens, representing x, = and + and y, has the structure =(x, +(y)), i.e., an assignment expression, that operates on x and the expression +(y). Output: A tree called abstract syntax tree that reflects the structure of the input sentence. Compiler Design Introduction CSE 504 24 / 33

Abstract Syntax Tree (AST) Represents the syntactic structure of the program, hiding a few details that are irrelevent to later phases of compilation. For instance, consider a statement of the form: if (m == 0) S1 else S2 where S1 and S2 stand for some block of statements. A possible AST for this statement is: == If then else m 0 AST for S1 AST for S2 Compiler Design Introduction CSE 504 25 / 33

Semantic Processing Abstract Syntax Tree Type Checking Intermediate Code Generation Code Optimization Final Code Generation Target Program Semantic Analysis Code Generation Compiler Design Introduction CSE 504 26 / 33

Type Checking A instance of Semantic Analysis Objective: Decorate the AST with semantic information that is necessary in later phases of translation. For instance, the AST == If then else m 0 AST for S1 AST for S2 is transformed into If then else == : boolean m: integer 0 : integer AST for S1 AST for S2 Compiler Design Introduction CSE 504 27 / 33

Intermediate Code Generation Objective: Translate each sub-tree of the decorated AST into intermediate code. Intermediate code hides many machine-level details, but has instruction-level mapping to many assembly languages. Main motivation for using an intermediate code is portability. Compiler Design Introduction CSE 504 28 / 33

Intermediate Code Generation, an Example == : boolean m: integer 0 : integer If then else AST for S1 AST for S2 = loadint m loadimmed 0 intequal jmpnz.l1 jmp.l2.l1:... code for S1 jmp.l3.l2:... code for S2 jmp.l3.l3: Compiler Design Introduction CSE 504 29 / 33

Code Optimization Objective: Improve the time and space efficiency of the generated code. Usual strategy is to perform a series of transformations to the intermediate code, with each step representing some efficiency improvement. Peephole optimizations: generate new instructions by combining/expanding on a small number of consecutive instructions. Global optimizations: reorder, remove or add instructions to change the structure of generated code. Compiler Design Introduction CSE 504 30 / 33

Code Optimization, an Example loadint m loadimmed 0 intequal jmpz.l1 jmp.l2.l1:... code for S1 jmp.l3.l2:... code for S2 jmp.l3.l3: = loadint m jmpnz.l2.l1:... code for S1 jmp.l3.l2:... code for S2.L3: Compiler Design Introduction CSE 504 31 / 33

Final Code Generation Objective: Map instructions in the intermediate code to specific machine instructions. Supports standard object file formats. Generates sufficient information to enable symbolic debugging. Compiler Design Introduction CSE 504 32 / 33

Final Code Generation, an Example loadint m jmpnz.l2.l1:... code for S1 jmp.l3.l2:... code for S2.L3: = movl 8(%ebp), %esi testl %esi, %esi jne.l2.l1:... code for S1 jmp.l3.l2:... code for S2.L3: Compiler Design Introduction CSE 504 33 / 33