LECTURE 3. Compiler Phases

Similar documents
Compiling and Interpreting Programming. Overview of Compilers and Interpreters

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CST-402(T): Language Processors

Working of the Compilers

CS 321 IV. Overview of Compilation

A Simple Syntax-Directed Translator

CS 415 Midterm Exam Spring 2002

Context-Free Grammars

Syntax-Directed Translation. Lecture 14

Introduction to Compiler Design

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Lecture 14 Sections Mon, Mar 2, 2009

MIDTERM REVIEW. Lectures 1-15

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Introduction to Parsing Ambiguity and Syntax Errors

A simple syntax-directed

Introduction to Parsing Ambiguity and Syntax Errors

CSE 401 Midterm Exam Sample Solution 2/11/15

Abstract Syntax Trees Synthetic and Inherited Attributes

Introduction to Parsing. Lecture 5

Compiler Construction

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Syntax. A. Bellaachia Page: 1

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 8

Compiler Construction

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

CMSC 330: Organization of Programming Languages. Context Free Grammars

Undergraduate Compilers in a Day

Compiling Techniques

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

COP4020 Spring 2011 Midterm Exam

CPS 506 Comparative Programming Languages. Syntax Specification

The Structure of a Syntax-Directed Compiler

Syntax Errors; Static Semantics

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Project Compiler. CS031 TA Help Session November 28, 2011

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Compilers and Interpreters

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS 314 Principles of Programming Languages

Compiler Design (40-414)

Ambiguity and Errors Syntax-Directed Translation

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

Time : 1 Hour Max Marks : 30

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Semantic Analysis. Compiler Architecture

Recursive Descent Parsers

Pioneering Compiler Design

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

2068 (I) Attempt all questions.

Formats of Translated Programs

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

COSE312: Compilers. Lecture 1 Overview of Compilers

Semantic Analysis. Lecture 9. February 7, 2018

Compilers and interpreters

Syntax and Grammars 1 / 21

CSE 3302 Programming Languages Lecture 2: Syntax

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

More On Syntax Directed Translation

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Abstract Syntax Trees & Top-Down Parsing

CS 4201 Compilers 2014/2015 Handout: Lab 1

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COP4020 Programming Languages. Semantics Prof. Robert van Engelen

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Compilers. Compiler Construction Tutorial The Front-end

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Why are there so many programming languages?

Compiler Construction

Lecture 16: Static Semantics Overview 1

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Problem Score Max Score 1 Syntax directed translation & type

ECE251 Midterm practice questions, Fall 2010

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

LECTURE 7. Lex and Intro to Parsing

Chapter 4: Syntax Analyzer

CS 553 Compiler Construction Fall 2007 Project #1 Adding floats to MiniJava Due August 31, 2005

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

Compiler Construction

1 Lexical Considerations

Transcription:

LECTURE 3 Compiler Phases

COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent phases operate on lower-level code representations. Each phase may consist of a number of passes over the program representation. Pascal, FORTRAN, and C languages designed for one-pass compilation, which explains the need for function prototypes. Single-pass compilers need less memory to operate. Java and Ada are multi-pass.

COMPILER PHASES The compiling process phases are listed below. Lexical analysis Syntax analysis Semantic analysis Intermediate (machine-independent) code generation Intermediate code optimization Target (machine-dependent) code generation Target code optimization

FRONT- AND BACK-END Front-End Analysis Source Program Scanner (Lexical Analysis) Tokens Back-End Synthesis Abstract Syntax Tree Machine-Independent Code Improvement Modified Intermediate Form Parser (Syntax Analysis) Parse Tree Semantic Analysis & Intermediate Code Generation Abstract Syntax Tree Target Code Generation Assembly or Object Code Machine-Specific Code Improvement Modified Assembly or Object Code

LEXICAL ANALYSIS Lexical analysis is the process of tokenizing characters that appear in a program. A scanner (or lexer) groups characters together into meaningful tokens which are then sent to the parser. What we write: int main(){ int i = getint(), j = getint(); while(i!=j){ if(i > j) i = i j; else j = j i; } putint(i); } What the scanner picks up: i, n, t,, m, a, i, n, (, ), {.

LEXICAL ANALYSIS As the scanner reads in the characters, it produces meaningful tokens. Tokens are typically defined using regular expressions, which are understood by a lexical analyzer generator such as lex. What the scanner picks up: i, n, t,, m, a, i, n, (, ), {. The resulting tokens: int, main, (, ), {, int, i, =, getint, (, ),.

LEXICAL ANALYSIS What kinds of errors can be caught in the lexical analysis phase?

LEXICAL ANALYSIS What kinds of errors can be caught in the lexical analysis phase? Invalid tokens. An example in C: int i = @3; @ is an invalid token in C

SYNTAX ANALYSIS Syntax analysis is performed by a parser which takes the tokens generated by the scanner and creates a parse tree which shows how tokens fit together within a valid program. The structure of the parse tree is dictated by the grammar of the programming language. Typically, a context-free grammar which is a set of recursive rules for generating valid patterns of tokens.

CONTEXT-FREE GRAMMARS Context-free grammars define the syntax of a programming language. Generally speaking, context-free grammars contain recursive rewrite rules (productions) of the following form: N w where N is a non-terminal symbol and w is a string of terminal symbols and/or nonterminal symbols. Alternatively, w could also be ε, an empty string.

CONTEXT-FREE GRAMMARS Here s a little example. The starting non-terminal is program. program statement statement if ( condition ) { statement } statement assignment assignment exp = exp ; condition exp > exp condition exp < exp exp id

CONTEXT-FREE GRAMMARS What kind of code does this CFG admit? program statement statement if ( condition ) { statement } statement assignment assignment exp = exp ; condition exp > exp condition exp < exp exp id a = b; if(a > b){ b = c; } if(a > b){ if(a > c){ b = d; } }

SYNTAX ANALYSIS How can we use our CFG to create the parse tree? program statement statement if ( condition ) { statement } statement assignment assignment exp = exp ; condition exp > exp condition exp < exp exp id program statement assignment exp = exp ; a = b; id id

SYNTAX ANALYSIS A more complicated example program statement statement if ( condition ) { statement } statement assignment assignment exp = exp ; condition exp > exp condition exp < exp exp id program statement if ( condition ) { statement } exp > exp exp = exp ; id id id id

SYNTAX ANALYSIS What kinds of errors are caught by the parser?

SYNTAX ANALYSIS What kinds of errors are caught by the parser? Syntax errors: invalid sequences of tokens. Example in C: a * b = c;

SEMANTIC ANALYSIS Semantic analysis is the process of attempting to discover whether a valid pattern of tokens is actually meaningful. Even if we know that the sequence of tokens is valid, it may still be an incorrect program. For example: a = b; What if a is an int and b is a character array? To protect against these kinds of errors, the semantic analyzer will keep track of the types of identifiers and expressions in order to ensure they are used consistently.

SEMANTIC ANALYSIS Static Semantic Checks: semantic rules that can be checked at compile time. Type checking. Every variable is declared before being used. Identifiers are used in appropriate contexts. Checking function call arguments. Dynamic Semantic Checks: semantic rules that are checked at run time. Array subscript values are within bounds. Arithmetic errors, e.g. division by zero. Pointers are not dereferenced unless pointing to valid object. When a check fails at run time, an exception is raised.

SEMANTIC ANALYSIS Semantic analysis is typically performed by creating a symbol table that stores necessary information for verifying consistency. A typical intermediate form of code produced by the semantic analyzer is an Abstract Syntax Tree (AST), which is annotated with pointers to the symbol table. Let s look at our previous example and add some more rules to make it viable. int a = 2; int b = 1; int c = 4; if(a > b){ b = c; } program statement statement if ( condition ) { statement } statement assignment statement type exp = num ; statement assignment exp = exp ; condition exp > exp condition exp < exp exp id type int

SEMANTIC ANALYSIS program type = num(2) ; statement int type id(b) = num(1) ; statement int type id(c) = num(4) ; statement Our updated parse tree, which is sometimes also known as a Concrete Syntax Tree. int if ( condition ) { statement } exp > exp exp = exp ; id(a) id(b) id(b) id(c)

SEMANTIC ANALYSIS program Index Symbol Type 1 int type 2 a (1) 3 b (1) 4 c (1) = (2) 2 (3) = = if 1 (4) 4 > = (2) (3) (3) (4) Here is an equivalent AST with symbol table. The goal of the AST is to produce an intermediate form of the code that removes the artificial nodes and annotates the rest with useful information. Some compilers use alternative intermediate forms, but the goal is the same.

SYNTAX AND SEMANTIC ANALYSIS Assuming C++, what kinds of errors are these (lexical, syntax, semantic)? int = @3; int =?3; int y = 3; x = y; Hello, World! int x; double y = 2.5; x = y; void sum(int, int); sum(1,2,3); myint++ z = y/x; // y is 1, x is 0

SYNTAX AND SEMANTIC ANALYSIS Assuming C++, what kinds of errors are these (lexical, syntax, semantic)? int = @3; // Lexical int =?3; // Syntax int y = 3; x = y; // Static Semantic Hello, World! // Syntax int x; double y = 2.5; x = y; // Static Semantic void sum(int, int); sum(1,2,3); // Static Semantic myint++ // Syntax z = y/x; // y is 1, x is 0 // Dynamic Semantic

CODE OPTIMIZATION Once the AST (or alternative intermediate form) has been generated, the compiler can perform machine-independent code optimization. The goal is to modify the code so that it is quicker and uses resources more efficiently. There is an additional optimization step performed after the creation of the object code.

TARGET CODE GENERATION The goal of the Target Code Generation phase is to translate the intermediate form of the code (typically, the AST) into object code. In the case of languages that translate into assembly language, the code generator will first pass through the symbol table, creating space for the variables. Next, the code generator passes through the intermediate code form, generating the appropriate assembly code. As stated before, the compiler makes one more pass through the object code to perform further optimization.

NEXT LECTURE Regular Expressions and Context-Free Grammars