Overview. Overview. Programming problems are easier to solve in high-level languages

Similar documents
Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

CS558 Programming Languages. Winter 2013 Lecture 1

SOFTWARE ARCHITECTURE 5. COMPILER

Compiler Construction LECTURE # 1

A simple syntax-directed

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilers Crash Course

Compiler Construction D7011E

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Overview of Compiler. A. Introduction

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CS558 Programming Languages

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

2.2 Syntax Definition

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

CSE 582 Autumn 2002 Exam Sample Solution

A Simple Syntax-Directed Translator

Syntax-Directed Translation. Lecture 14

Machine-Level Programming II: Control Flow

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Second Part of the Course

Compilers. Lecture 2 Overview. (original slides by Sam

CSE 3302 Programming Languages Lecture 2: Syntax

Syntax and Grammars 1 / 21

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings.

CS241 Computer Organization Spring Addresses & Pointers

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

The Hardware/Software Interface CSE351 Spring 2013

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

CSE 582 Autumn 2002 Exam 11/26/02

Chapter 3 Machine-Level Programming II Control Flow

Machine-Level Programming II: Control and Arithmetic

CSCE 314 Programming Languages

ASSEMBLY II: CONTROL FLOW. Jo, Heeseung

Machine Representa/on of Programs: Control Flow cont d. Previous lecture. Do- While loop. While- Do loop CS Instructors: Sanjeev Se(a

CPS 506 Comparative Programming Languages. Syntax Specification

Compiling Regular Expressions COMP360

Introduction to Compiler Design

administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?

THEORY OF COMPILATION

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Principles of Programming Languages COMP251: Syntax and Grammars

Question Bank. 10CS63:Compiler Design

Syntax Analysis Part I

Condition Codes The course that gives CMU its Zip! Machine-Level Programming II Control Flow Sept. 13, 2001 Topics

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Introduction to Compiler

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Condition Codes. Lecture 4B Machine-Level Programming II: Control Flow. Setting Condition Codes (cont.) Setting Condition Codes (cont.

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

A programming language requires two major definitions A simple one pass compiler

Syntax. A. Bellaachia Page: 1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CISC 360. Machine-Level Programming II: Control Flow Sep 17, class06

Defining syntax using CFGs

CSE302: Compiler Design

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Code Generation. Lecture 30

COP 3402 Systems Software Syntax Analysis (Parser)

Y86 Processor State. Instruction Example. Encoding Registers. Lecture 7A. Computer Architecture I Instruction Set Architecture Assembly Language View

Lexical Analysis. Introduction

Chapter 3. Describing Syntax and Semantics

Assembly I: Basic Operations. Jo, Heeseung

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Giving credit where credit is due

Code Optimization. What is code optimization?

Optimizing Finite Automata

Introduction to Lexical Analysis

CISC 360. Machine-Level Programming II: Control Flow Sep 23, 2008

Page 1. Condition Codes CISC 360. Machine-Level Programming II: Control Flow Sep 17, Setting Condition Codes (cont.)

CS5363 Final Review. cs5363 1

Page # CISC 360. Machine-Level Programming II: Control Flow Sep 23, Condition Codes. Setting Condition Codes (cont.)

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

CSC 2400: Computer Systems. Towards the Hardware: Machine-Level Representation of Programs

CS 406/534 Compiler Construction Putting It All Together

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CAS CS Computer Systems Spring 2015 Solutions to Problem Set #2 (Intel Instructions) Due: Friday, March 20, 1:00 pm

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

CF Carry Flag SF Sign Flag ZF Zero Flag OF Overflow Flag. ! CF set if carry out from most significant bit. "Used to detect unsigned overflow

CS164: Midterm I. Fall 2003

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSC 8400: Computer Systems. Machine-Level Representation of Programs

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Handout #2: Machine-Level Programs

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

CS 314 Principles of Programming Languages

Properties of Regular Expressions and Finite Automata

Process Layout, Function Calls, and the Heap

Process Layout and Function Calls

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

AS08-C++ and Assembly Calling and Returning. CS220 Logic Design AS08-C++ and Assembly. AS08-C++ and Assembly Calling Conventions

COMPILER DESIGN LECTURE NOTES

Program Abstractions, Language Paradigms. CS152. Chris Pollett. Aug. 27, 2008.

Transcription:

Overview Course Objectives To learn the process of translating a modern high-level language to executable code. earn the fundamental techniques from lectures, text book and exercises from the book. Apply these techniques in practice to construct a fully working compiler for a non-trivial subset of Java called ecaf. In the end, you should be able to compile small Java-like programs with your compiler, and see it actually work! Compilers Introduction CSE 304/504 1 / 33 What is a Compiler? Overview Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQ, Tcl/Tk,... Solutions have to be executed by a machine Instructions to a machine are specified in a language that reflects to the cycle-by-cycle working of a processor Compilers are the bridges: Software that translates programs written in high-level languages to efficient executable code. Compilers Introduction CSE 304/504 2 / 33

Overview An Example int gcd(int m, int n) { if (m == 0) return n; else if (m > n) return gcd(n, m); else return gcd(n%m, m); } gcd:.2:... pushl %ebp movl %esp,%ebp cmpl $0,8(%ebp) jne.2 movl 12(%ebp),%eax jmp.1.align 16 jmp.3.align 16 movl 8(%ebp),%eax cmpl %eax,12(%ebp) jge.4 movl 8(%ebp),%eax pushl %eax Compilers Introduction CSE 304/504 3 / 33 Example (contd.) Overview gcd:.11:.8: pushl %ebp movl %esp,%ebp pushl %esi pushl %ebx movl 8(%ebp),%esi movl 12(%ebp),%ebx testl %esi,%esi jne.8 movl %ebx,%eax jmp.13.align 16 cmpl %ebx,%esi jg.14 movl %ebx,%eax.14:.13: cltd idivl %esi movl %edx,%ebx movl %esi,%ecx movl %ebx,%esi movl %ecx,%ebx jmp.11.align 16 leal -8(%ebp),%esp popl %ebx popl %esi movl %ebp,%esp popl %ebp ret Compilers Introduction CSE 304/504 4 / 33

Overview Requirements In order to translate statements in a language, one needs to understand both the structure of the language: the way sentences are constructed in the language, and the meaning of the language: what each sentence stands for. Terminology: Structure Syntax Meaning Semantics Compilers Introduction CSE 304/504 5 / 33 Translation Strategy Overview Classic Software Engineering Problem Objective: Translate a program in a high level language into efficient executable code. Strategy: ivide translation process into a series of phases Each phase manages some particular aspect of translation. Interfaces between phases governed by specific intermediate forms. Compilers Introduction CSE 304/504 6 / 33

Translation Process Syntax-irected Translation Source Program Syntax Analysis Abstract Syntax Tree Semantic Processing Target Program Compilers Introduction CSE 304/504 7 / 33 Translation Steps Syntax-irected Translation Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Semantic Analysis Phase: Infers information about the program using the semantics of the language Intermediate Code Generation Phase: Generates abstract code based on the syntactic structure of the program and the semantic information from Phase 2. Optimization Phase: Refines the generated code using a series of optimizing transformations. Final Code Generation Phase: Translates the abstract intermediate code into specific machine instructions. Compilers Introduction CSE 304/504 8 / 33

Syntax-irected Translation Structure of a Compiler: an Analogy Syntax-irected Translation: the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). ie C-ROM Informatikunterricht liefert alle wichtigen Programme und Bücher, die für den Unterricht oder zur Bearbeitung von Hausaufgaben notwendig sind. Green wire connect after first cut white also red wire. He sailed the coffee out of the leaf. This sentence has four words. Compilers Introduction CSE 304/504 9 / 33 Syntax Syntax-irected Translation efining and Recognizing Sentences in a anguage ayered approach Alphabet: defines allowed symbols exical Structure: defines allowed words Syntactic Structure: defines allowed sentences We will later associate meaning with sentences (semantics) based on their syntactic structure. Compilers Introduction CSE 304/504 10 / 33

Syntax-irected Translation Formal anguage Specification Solid theoretical results applied to a practical problem. eclarative vs. Operational Notations eclarative notation is used to define a language efines precisely the set of allowed objects (words/sentences) Examples: Regular expressions, Grammars. Operational notation is used to recognize statements in a language efines an algorithm for determining whether or not a given word/sentence is in the language Example: Automata Results from theory on converting between the two notations. Compilers Introduction CSE 304/504 11 / 33 Formal anguages Syntax-irected Translation A language is a set of strings over a set of symbols. The set of symbols of a language is called its alphabet (usually denoted by Σ. Each string in the language is called a sentence. Parts of sentences are called phrases. Compilers Introduction CSE 304/504 12 / 33

Syntax-irected Translation Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and β is a sequence of terminal and non-terminal symbols Example: Stmt while Expr do Stmt A unique non-terminal, called the start symbol, represents the set of all sentences in the language. The language defined by a grammar G is denoted by (G). Compilers Introduction CSE 304/504 13 / 33 Syntax-irected Translation Example Grammar ist of digits separated by + and signs (Example 2.1 in book): erivation of 9-5+2 from : + - 0 1... 9 = - = - = 9 - = 9 - + = 9 - + = 9-5 + = 9-5 + = 9-5 + 2 Compilers Introduction CSE 304/504 14 / 33

Syntax-irected Translation Parse Trees Pictorial representation of derivations = - = - = 9 - = 9 - + = 9 - + = 9-5 + = 9-5 + = 9-5 + 2 - + 9 5 2 = - = - + = - + = - + 2 = - + 2 = - 5 + 2 = - 5 + 2 = 9-5 + 2 Note: one parse tree may correspond to multiple derivations! Compilers Introduction CSE 304/504 15 / 33 Ambiguity Syntax-irected Translation A grammar is ambiguous if some sentence in the language has more than one parse tree. - + + - 9 5 2 9 5 2 Compilers Introduction CSE 304/504 16 / 33

Syntax-irected Translation Associativity and Precedence 9-5+2 (9-5)+2 9-5-2 (9-5)-2 9+5+2 (9+5)+2 + and usually have the same precedence and are left-associative. i.e. the second parse tree in the previous slide is the correct one The grammar can be changed to reflect the associativity and precedence: + - 0 1... 9 Compilers Introduction CSE 304/504 17 / 33 Syntax-irected Translation Syntax-irected Translation Schemes A notation that attaches program fragments (also called actions) to productions in a grammar. The intuition is, whenever a production is used in recognizing a sentence, the corresponding action will be taken. Example: + {add} - {sub} 0 {push 0} 1 {push 1}. Compilers Introduction CSE 304/504 18 / 33

Syntax-irected Translation Syntax-irected Translation Actions can be seen as additional leaves introduced into a parse tree. Example: Reading the actions left-to-right in the tree gives the translation. + {add} - {sub} 0 {push 0} 1 {push 1}. - 9 + 5 2 {push 9} {push 5} {sub} {push 2} {add} Compilers Introduction CSE 304/504 19 / 33 Syntax-irected Translation Grammars for anguage Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. The grammar notation may be tedious for some aspects of a language. For instance, an integer is defined by a grammar of the following form: I P + P - P P P P 0 1... 9 For simpler fragments, the notation of regular expressions may be used. I = (+ -)?[0 9]+ Compilers Introduction CSE 304/504 20 / 33

Syntax-irected Translation Syntax Analysis in Practice Usually divided into exical Analysis followed by Parsing. exical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Parsing: A parser converts a stream of tokens into a tree. Parsing uncovers the structure of a sentence in the language. Parsers are specified by grammars (actually, by translation schemes which are sets of productions associated with actions). Compilers Introduction CSE 304/504 21 / 33 Syntax Analysis Phases of Translation Source Program exical Analysis Token Stream Parsing Abstract Syntax Tree Compilers Introduction CSE 304/504 22 / 33

Phases of Translation exical Analysis First step of syntax analysis Objective: Convert the stream of characters representing input program into a sequence of tokens. Tokens are the words of the programming language. Examples: The sequence of characters static int is recognized as two tokens, representing the two words static and int. The sequence of characters *x++ is recognized as three tokens, representing *, x and ++. Compilers Introduction CSE 304/504 23 / 33 Parsing Phases of Translation Second step of syntax analysis Objective: Uncover the structure of a sentence in the program from a stream of tokens. For instance, the phrase x = +y, which is recognized as four tokens, representing x, = and + and y, has the structure =(x, +(y)), i.e., an assignment expression, that operates on x and the expression +(y). Output: A tree called abstract syntax tree that reflects the structure of the input sentence. Compilers Introduction CSE 304/504 24 / 33

Phases of Translation Abstract Syntax Tree (AST) Represents the syntactic structure of the program, hiding a few details that are irrelevent to later phases of compilation. For instance, consider a statement of the form: if (m == 0) S1 else S2 where S1 and S2 stand for some block of statements. A possible AST for this statement is: == If then else m 0 AST for S1 AST for S2 Compilers Introduction CSE 304/504 25 / 33 Semantic Processing Phases of Translation Abstract Syntax Tree Type Checking Intermediate Code Generation Code Optimization Final Code Generation Target Program Semantic Analysis Code Generation Compilers Introduction CSE 304/504 26 / 33

Phases of Translation Type Checking A instance of Semantic Analysis Objective: ecorate the AST with semantic information that is necessary in later phases of translation. For instance, the AST == If then else m 0 AST for S1 AST for S2 is transformed into If then else == : boolean 0 m: integer : integer AST for S1 AST for S2 Compilers Introduction CSE 304/504 27 / 33 Phases of Translation Intermediate Code Generation Objective: Translate each sub-tree of the decorated AST into intermediate code. Intermediate code hides many machine-level details, but has instruction-level mapping to many assembly languages. Main motivation for using an intermediate code is portability. Compilers Introduction CSE 304/504 28 / 33

Phases of Translation Intermediate Code Generation, an Example If then else == : boolean 0 m: integer : integer AST for S1 AST for S2 = loadint m loadimmed 0 intequal jmpnz.1 jmp.2.1:... code for S1 jmp.3.2:... code for S2 jmp.3.3: Compilers Introduction CSE 304/504 29 / 33 Code Optimization Phases of Translation Objective: Improve the time and space efficiency of the generated code. Usual strategy is to perform a series of transformations to the intermediate code, with each step representing some efficiency improvement. Peephole optimizations: generate new instructions by combining/expanding on a small number of consecutive instructions. Global optimizations: reorder, remove or add instructions to change the structure of generated code. Compilers Introduction CSE 304/504 30 / 33

Phases of Translation Code Optimization, an Example loadint m loadimmed 0 intequal jmpz.1 jmp.2.1:... code for S1 jmp.3.2:... code for S2 jmp.3.3: = loadint m jmpnz.2.1:... code for S1 jmp.3.2:... code for S2.3: Compilers Introduction CSE 304/504 31 / 33 Final Code Generation Phases of Translation Objective: Map instructions in the intermediate code to specific machine instructions. Supports standard object file formats. Generates sufficient information to enable symbolic debugging. Compilers Introduction CSE 304/504 32 / 33

Phases of Translation Final Code Generation, an Example loadint m jmpnz.2.1:... code for S1 jmp.3.2:... code for S2.3: = movl 8(%ebp), %esi testl %esi, %esi jne.2.1:... code for S1 jmp.3.2:... code for S2.3: Compilers Introduction CSE 304/504 33 / 33