CSC488S/CSC2107S - Compilers and Interpreters. CSC 488S/CSC 2107S Lecture Notes

Similar documents
CSC488S/CSC2107S - Compilers and Interpreters. CSC 488S/CSC 2107S Lecture Notes

CS 536. Class Meets. Introduction to Programming Languages and Compilers. Instructor. Key Dates. Teaching Assistant. Charles N. Fischer.

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Compilers and Interpreters

Intermediate Representations

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

CST-402(T): Language Processors

What do Compilers Produce?

Crafting a Compiler with C (II) Compiler V. S. Interpreter

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

Compiler Construction

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Spring 2015

When do We Run a Compiler?

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

COMPILER DESIGN. For COMPUTER SCIENCE

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Formats of Translated Programs

Principles of Programming Languages. Lecture Outline

Compilers. Prerequisites

Compiler Design (40-414)

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Fall

General Course Information. Catalogue Description. Objectives

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

ECE573 Introduction to Compilers & Translators

History of Compilers The term

Introduction to Compilers and Language Design Copyright (C) 2017 Douglas Thain. All rights reserved.

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

CSE 504: Compiler Design. Code Generation

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Early computers (1940s) cost millions of dollars and were programmed in machine language. less error-prone method needed

Stating the obvious, people and computers do not speak the same language.

Front End. Hwansoo Han

Concepts of Programming Languages

Alternatives for semantic processing

The Structure of a Syntax-Directed Compiler

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Syntax-Directed Translation. Lecture 14

Question Bank. 10CS63:Compiler Design

COMPILER DESIGN LECTURE NOTES

CS426 Compiler Construction Fall 2006

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

CMPE 152 Compiler Design

Compiler Construction

Translator Design CRN Course Administration CMSC 4173 Spring 2017

COMPILER DESIGN. Intermediate representations Introduction to code generation. COMP 442/6421 Compiler Design

Compiler Construction

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

names names identifiers variables subroutines constants

Evaluation Scheme L T P Total Credit Theory Mid Sem Exam

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

CS/SE 153 Concepts of Compiler Design

UNIT-4 (COMPILER DESIGN)

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

LESSON 13: LANGUAGE TRANSLATION

Working of the Compilers

CJT^jL rafting Cm ompiler

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

Programming Languages Third Edition. Chapter 7 Basic Semantics

Introduction to Compiler Construction

Compiler Construction

Introduction. Interpreters High-level language intermediate code which is interpreted. Translators. e.g.

CMPE 152 Compiler Design

Intermediate Representations

CS/SE 153 Concepts of Compiler Design

Compiler Construction

CS5363 Final Review. cs5363 1

TDDD55 - Compilers and Interpreters Lesson 3

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1

CMPE 152 Compiler Design

CS415 Compilers. Intermediate Represeation & Code Generation

CSCI 565 Compiler Design and Implementation Spring 2014

CSE 401/M501 Compilers

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Intermediate Code Generation

Parsing II Top-down parsing. Comp 412

Life Cycle of Source Program - Compiler Design

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Programming Languages

Introduction (1) COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Compiler Construction D7011E

Compiler Design Aug 1996

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Compilers. Computer Science 431

Intermediate Representations & Symbol Tables

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

Compiler Construction

Software Project. Lecturers: Ran Caneti, Gideon Dror Teaching assistants: Nathan Manor, Ben Riva

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Compiler Design. Lecture 1

Syntax Errors; Static Semantics

Principles of Compiler Design

The Structure of a Syntax-Directed Compiler

Transcription:

CSC 488S/CSC 2107S Lecture Notes These lecture notes are provided for the personal use of students taking CSC488H1S or CSC2107S in the Winter 2012/2013 term at the University of Toronto Copying for purposes other than this use and all forms of distribution are expressly prohibited. c David B. Wortman, 2002,2003,2004,2008,2009,2010,2012,2013 c Marsha Chechik, 2005,2006,2007 CSC488S/CSC2107S - Compilers and Interpreters Instructor Prof. Dave Wortman Bahen Centre, Room 5222, dw@cdf.toronto.edu Lectures Tuesday 14:00 WB 130 Thursday 14:00 WB 130 Tutorial Thursday 13:00 RS 208 immediately after lecture and by appointment Text Charles Fischer, Ron Cytron and Richard LeBlanc Jr., Crafting a Compiler, Addison-Wesley 2009 Marking Mid term test, Final Exam, Course Project Web Page http://cdf.toronto.edu/ csc488h/winter/ Bulletin Board Read Often!! https://csc.cdf.toronto.edu/csc488h1s Slides on the Bulletin Board Handouts on the Bulletin Board 0 1 Course Project Design and Implementation of a small compiler system. Work in teams of 5 ( - 1, + 0) Five Phase Project Phase 1 Write programs in project language Phase 2 Revise grammar and build parser Phase 3 Implement symbol table and semantic checking. Phase 4 Design code generation Phase 5 Implement code generator Language specification and project details announced soon Course Outline Topic Chapters Compiler structure Ch. 1, 2 Lexical Ch. 3 Syntax Ch. 4, 5, 6 Tables & Dictionaries Ch. 8 Ch. 7, 9 Run-time Environments Ch. 12 generation Ch. 11, 13 Optimization Ch. 14 2 3

Reading Assignment Fischer, Cytron, LeBlanc Chapter 1 Section 10.1 What Do Compilers Do? Check source program for correctness Well formed lexically Well formed syntactically. Passes static semantic checks Type correctness Usage correctness Transform source program into an executable object program /* Turing program to find prime numbers <= 10,000 */ const n := 10000 var sieve, primes : array 2.. n of boolean var next : int := 2 for j := 2.. n /* initialize */ sieve[ j ] := true primes[ j ] := false end for loop exit when next > n primes[ next ] := true for j : next.. n by next sieve[ j ] := false end for loop /* find next prime */ exit when next > n or sieve[ next ] next += 1 end loop end loop Compiler bebafeca31000000000a7300082b002 330032000034000a360008350a37000 3b00082b000c000a34000a3c000a3d0 15000a42000a430007440015000a450 0013000a13000a4b00084c004e00084 292803000400015665646f434c0f000 69616d041600016e6a4c5b282f61766 756f530a4665637201656c6965740c0 6c2d050007747369000c51000753005 20656c62656d69746e6f7a206120736 015c005b73552d00206567617473657 726165796d3c203e68746e6f643c203 696c2d20000c74730c5e005d60005f0 42676e69646c6975000172656973551 000c62000c640063640065005539000 742074736567206f206120747473696 7a20656d73656e6f6a1b00012f61766 646e656c000c726101660023616a110 590023000067000c0e0001686176616 65540d006e697473616420670720657 71000c7000017200656854367720657 206568746c796164746867697661732 68542b006164206567696c797320746 Executable 4 5 Simple Generic Compiler statement Lexical, Syntax Lexical Characters Lexical Tokens Syntax Parse Tree Intermediate Language Generation Machine Symbol Table Lexical analysis reserved identifier operator identifier Syntax analysis reserved identifier operator identifier reserved identifier operator integer reserved identifier operator integer reserved variable variable variable constant variable comparison reserved identifier operator integer reserved assignment statement if then else statement identifier operator integer reserved assignment statement constant statement 6 7

, Generation analysis integer variable integer integer variable integer integer comparison Generation boolean integer variable integer constant integer integer assignment integer variable integer constant integer integer assignment load r1,x load r2,y less r1,r2 brfalse L23 load r1,=1 loadaddr r2,z store r2,r1 branch L24 L23: L24: load r1,=2 loadaddr r2,z store r2,r1 What Should a Compiler Implementor Know? Computer organization (CSC 258H) Software engineering (CSC 207H, CSC 301H, CSC 302H, CSC 410H) Software Tools (CSC 209H) File and Data structures (CSC 263H/CSC 265H) Communication Skills (CSC 290H) A large variety of programming languages (CSC 324H) Some operating systems (CSC 369H) Compiler implementation techniques (CSC 488H, ECE 489H). 8 9 Compiler Writing Requires Analytic Skills The compiler implementor(s) design the mapping from the source language to the target machine. Must be able to analyze a programming language for potential problems. Determine if language can be processed during lexical analysis, syntax analysis, semantic analysis and code generation. Must be able to analyze target machine and determine best way to implement each construct in the programming language. ming Language Designers are (usually) the Enemy Most programming language definitions are incomplete, imprecise and sometimes inconsistent. Real programs are written in language dialects. a Language designers often don t think deeply about the details of the implementation of a language, leaving lots of problems for the compiler writer. Typical problems Poor lexical structure. May require extensive buffering or lookahead during lexical analysis Difficulty syntax. Ambiguous, not suitable for normal parsing methods. May require hand written parser, backtracking or lookahead. Incompletely defined or inconsistent semantics. User friendly options that are hard to implement. Constructs that are difficult to generate good code for, make optimization difficult, require large run time support a For a discussion of the difficulties of scanning and parsing real programs see http://cacm.acm.org/magazines/2010 /2/69354-a-few-billion-lines-of-code-later/fulltext 10 11

Compiler Design Issues Like any large, long-lived program a compiler should be designed in a modular fashion that is easy to maintain over time. Need to design a software architecture for the compiler that allows it to implement the required language processing steps. A production compiler generally must implement the entire language. Student project and prototype compilers often omit the hard parts. Architecture of the compiler will be influenced by The programming language being compiled. Characteristics of the target machine(s). Compiler design goals Compiler s operating environment. Compiler project management goals ming Language Influences on Compiler Structure Declaration before use? Typed or type less? Separate compilation? modules/objects? Lexical issues, designed to be lexable? Syntactic issues, designed to be parseable? Static semantic checks required? Implementable? Run-time checking required? Size of programs to be compiled? Compatibility with OS or other languages? Dynamic creation/modification of programs? 12 13 Target Machine Influences on Compiler Structure Limited or partitioned memory RISC vs. CISC instruction set. Irregular or incomplete instruction set. Inadequate addressing modes. Hardware support for run-time checking? Poor support for high level languages. Missing instruction modes? Inadequate support for memory management? Characteristics of an Ideal Compiler User Interface Precise and clear diagnostic messages Easy to use processing options. Correctly implements the entire language Detects all statically detectable errors. Generates highly optimal code. Compiles quickly using modest system resources. Compiler Structure Well modularized. Low coupling between modules. Well documented and maintainable. High level of internal consistency checking. 14 15

Some Compiler Design Goals Correctly implement the language. Be highly diagnostic and error correcting. Produce time or space optimized code. Be able to process very large programs. Be very fast or very small. Be easily portable between environments. Have a user interface suitable for inexperienced users. Emit high quality code. Example Compiler Goals Student Compiler Interface for inexperienced users. Be highly diagnostic at compile time and run-time. Compile with blinding speed. Do no optimization Production Compiler Interface for experienced users. Produce highly optimized object code. Quick and Dirty Compiler Minimize compiler construction time Minimize project resource usage and budget. Do no optimization, omit hard parts of language. Compile to interpretive code, assembly language or high-level language. 16 17 Single Pass Compilers Good for simpler languages (e.g. prototypes and course projects). Not feasible for languages that permit backward information flow (e.g. declaration after use). Single pass compilers are usually parser-driven. The parser coroutines with the lexical analyzer to obtain tokens. The parser calls semantic analysis and code generation routines as each construct in the language is parsed. By the time each declaration or statement is completely parsed, all semantic checks have been performed and all code has been generated. Pascal is an example of a language well suited to single pass compilation. Declaration before use, well designed declaration structure, simple control structures. Scanner Data Control Request token Single Pass Compiler Architecture Token Parser IR Do code generation Do semantic analysis Symbol Table Generator 18 19

Multi Pass Compilers Multi pass compilers make several complete passes over the source program. Typically the first pass does lexical and syntax analysis and builds some intermediate representation of the program. analysis is a separate pass that processes this intermediate representation. generation is a separate pass that processes the intermediate representation. Optimization may make multiple passes over the program. Used for more complicated languages, e.g. Cobol, Ada, C, C++, Java Can couple lexical/syntax analysis for multiple languages to a common backend. Scanner Data Control Multi Pass Compiler Architecture Do scan and parse Request token Token Parser Intermediate Driver Representation Do code generation Do semantic analysis Symbol Table Generator 20 21 Interpass Communication Information flows between compiler passes Representation(s) of the program Tables Error messages Compiler flags program coordinates. Form of communication may change as program is processed. A compiler may use multiple representations of a program. Use disk resident information for large programs. Use memory resident information for compiler speed. Intermediate Representations Represent the structure of the program being compiled Declaration structure. Scope structure. Control flow structure. code structure. Used to pass information between compiler passes Compact representation desirable. Should be efficient to read and write. Provide utility to print intermediate language for compiler debugging. Backward information flow should be avoided if possible. 22 23

Intermediate Language Examples Condensed A * B + C / D - E Polish Postfix Notation A B * C D / + E - Triples 1 ( *, A, B ) 2 ( /, C, D ) 3 ( +, ( 1 ), ( 2 ) ) 4 ( -, ( 3 ), E ) Quadruples ( *, A, B, T1 ) ( /, C, D, T2 ) ( +, T1, T2, T3 ) ( -, T3, E, T4 ) Parse Tree Complete representation of the syntactic structure of the program according to some grammar. <> <term> * <factor> <> + <term> <> <term> <term> <factor> <factor> <identifier> <identifier> <factor> <identifier> <factor> E <identifier> B <identifier> D A C / Abstract Syntax Tree Similar to parse tree but only describes essential program structure. Directed Acyclic Graphs Used to represent control structure for ( expni ; expnl ; expns ) statement expni expnl false expns true statement 24 25 Compiler Design Choices Organization of compiler processing Single pass or multiple pass? Choice of compiler algorithms Lexical and syntactic analysis Static semantic analysis, code generation Optimization Compiler data representation Symbol and/or type tables, dictionaries. Memory resident compiler data? Communication between passes? Format of compiler output? Compiler Output Choices Assembly language (or other source code) Let existing assembler do some of the hard work. Makes code generation easier. Used in early C compilers. Relocatable machine code Usually an object module. Allows separate compilation, linking with existing libraries. Absolute machine code Generated code is directly executable. Interpretive pseudo code Machine instructions for some virtual machine. Used for portability and ease of compilation. High level programming language Example Specialized language C 26 27

Interpretive Systems Compiler generates a pseudo machine code that is a simple encoding of the program. The pseudo machine code is executed by another program (an interpreter) Interpreters are used for Debugging newly written programs. Student compilers that require good run-time error messages. Languages that allow dynamic program modification. Typeless languages that can t be semantically analyzed statically. Cases where run-time size must be minimized. Implementing ugly language features. Quick and dirty compilers. As a way to port programs between environments. Interpreters lose on Execution speed, usually significantly slower than machine code. May limit user data space or size of programs. May require recompilation for each run. Examples of Interpreters Pascal P Machine First compiler for Pascal compiled to a pseudo code (P-code) for a language-oriented stack machine. Compiler for Pascal was provided in P-code and source. Porting Pascal to new hardware only required writing a P-code interpreter for the new machine. 1..2 months work. P-code influenced many later pseudo codes including U-code (optimization intermediate language) and Turing internal T-code. Java Virtual Machine a Java programs are compiled to a byte-code for the Java Virtual Machine (JVM). JVM designed to make Java portable to many platforms. JVM slow execution speed has lead to the development of Just In Time (JIT) native code compilers for Java. a See Fischer, Cytron, LeBlanc Section 10.2 28 29 Compiler Design Examples Student compiler The Complete Compilation Process One pass for speed In-memory tables Compile to directly executable absolute code or to interpretive code. Included Emission Tune for compile speed and high quality diagnostics. Production Optimizing Compiler Usually multi pass Uses disk resident tables for large programs. Data structures tuned for large programs. Pre Process Lexical Storage Allocation Mach. Ind. Optimize Module Link Editor Modules Standard Libraries Usually includes heavyweight optimization. Syntax Generation Load Module Loader Mach. Dep. Optimize Execution 30 31

Project Preview Assignment 1 Assignment 3 Lexical Provided Generation Assignments 4, 5 Syntax Assignment 2 Pseudo Symbol Table Assignment 3 Pseudo Machine Provided 32