The Structure of a Compiler

Similar documents
The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler

Formats of Translated Programs

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming.

When do We Run a Compiler?

What do Compilers Produce?

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

The Structure of a Syntax-Directed Compiler

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Spring 2015

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Fall

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

History of Compilers The term

CS 536. Class Meets. Introduction to Programming Languages and Compilers. Instructor. Key Dates. Teaching Assistant. Charles N. Fischer.

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Compilers and Interpreters

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Introduction to Compiler Design

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Compiler Design (40-414)

Undergraduate Compilers in a Day

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Assignment 1. Due 08/28/17

CS 321 IV. Overview of Compilation

Optimizing Finite Automata

Front End. Hwansoo Han

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)


CS 4201 Compilers 2014/2015 Handout: Lab 1

CS606- compiler instruction Solved MCQS From Midterm Papers

Introduction to Compiler

Properties of Regular Expressions and Finite Automata

COMPILER DESIGN LECTURE NOTES


COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CST-402(T): Language Processors

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

A Simple Syntax-Directed Translator

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Languages and Compilers

LECTURE 3. Compiler Phases

The role of semantic analysis in a compiler

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Alternatives for semantic processing

Compiling Regular Expressions COMP360

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Compilers. Prerequisites

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Last Time. What do we want? When do we want it? An AST. Now!

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Compilers and Code Optimization EDOARDO FUSELLA

Working of the Compilers

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Programming Languages Third Edition. Chapter 7 Basic Semantics

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Software II: Principles of Programming Languages

COMPILER DESIGN. For COMPUTER SCIENCE

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

High-level View of a Compiler

Important Project Dates

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Figure 2.1: Role of Lexical Analyzer

Semantic Analysis. Lecture 9. February 7, 2018

The View from 35,000 Feet

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Using an LALR(1) Parser Generator

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Intermediate Code Generation

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

How do LL(1) Parsers Build Syntax Trees?

Yacc: A Syntactic Analysers Generator

TDDD55- Compilers and Interpreters Lesson 3

Pioneering Compiler Design

Context-Free Grammars

CS 132 Compiler Construction

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

CJT^jL rafting Cm ompiler

List of Figures. About the Authors. Acknowledgments

CSE 3302 Programming Languages Lecture 2: Syntax

Let s look at the CUP specification for CSX-lite. Recall its CFG is

CS 314 Principles of Programming Languages

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Context-free grammars (CFG s)

COMPILER CONSTRUCTION Seminar 02 TDDB44

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

TDDD55 - Compilers and Interpreters Lesson 3

Compiling Techniques

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Part 5 Program Analysis Principles and Techniques

CS 415 Midterm Exam Spring 2002

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Transcription:

The Structure of a Compiler A compiler performs two major tasks: Analysis of the source program being compiled Synthesis of a target program Almost all modern compilers are syntax-directed: The compilation process is driven by the syntactic structure of the source program. A parser builds semantic structure out of tokens, the elementary symbols of programming language syntax. Recognition of syntactic structure is a major part of the analysis task. Semantic analysis examines the meaning (semantics) of the program. Semantic analysis plays a dual role. It finishes the analysis task by performing a variety of correctness checks (for example, enforcing type and scope rules). Semantic analysis also begins the synthesis phase. The synthesis phase may translate source programs o some ermediate representation (IR) or it may directly generate target code. 29 30 If an IR is generated, it then serves as input to a code generator component that produces the desired machinelanguage program. The IR may optionally be transformed by an optimizer so that a more efficient program may be generated. Source Program (Character Stream) Scanner Tokens Symbol Tables Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Intermediate Representation Translator Optimizer (IR) IR Code Generator Target Machine Code The Structure of a Syntax-Directed Compiler 31 32

Scanner The scanner reads the source program, character by character. It groups individual characters o tokens (identifiers, egers, reserved words, delimiters, and so on). When necessary, the actual character string comprising the token is also passed along for use by the semantic phases. The scanner: Puts the program o a compact and uniform format (a stream of tokens). Eliminates unneeded information (such as comments). Sometimes enters preliminary information o symbol tables (for example, to register the presence of a particular label or identifier). Optionally formats and lists the source program Building tokens is driven by token descriptions defined using regular expression notation. Regular expressions are a formal notation able to describe the tokens used in modern programming languages. Moreover, they can drive the automatic generation of working scanners given only a specification of the tokens. Scanner generators (like Lex, Flex and Jlex) are valuable compiler-building tools. 33 34 Parser Given a syntax specification (as a context-free grammar, CFG), the parser reads tokens and groups them o language structures. Parsers are typically created from a CFG using a parser generator (like Yacc, Bison or Java CUP). The parser verifies correct syntax and may issue a syntax error message. As syntactic structure is recognized, the parser usually builds an abstract syntax tree (AST), a concise representation of program structure, which guides semantic processing. Type Checker (Semantic Analysis) The type checker checks the static semantics of each AST node. It verifies that the construct is legal and meaningful (that all identifiers involved are declared, that types are correct, and so on). If the construct is semantically correct, the type checker decorates the AST node, adding type or symbol table information to it. If a semantic error is discovered, a suitable error message is issued. Type checking is purely dependent on the semantic rules of the source language. It is independent of the compiler s target machine. 35 36

Translator (Program Synthesis) If an AST node is semantically correct, it can be translated. Translation involves capturing the run-time meaning of a construct. For example, an AST for a while loop contains two subtrees, one for the loop s control expression, and the other for the loop s body. Nothing in the AST shows that a while loop loops! This meaning is captured when a while loop s AST is translated. In the IR, the notion of testing the value of the loop control expression, and conditionally executing the loop body becomes explicit. The translator is dictated by the semantics of the source language. Little of the nature of the target machine need be made evident. Detailed information on the nature of the target machine (operations available, addressing, register characteristics, etc.) is reserved for the code generation phase. In simple non-optimizing compilers (like our class project), the translator generates target code directly, without using an IR. More elaborate compilers may first generate a high-level IR 37 38 (that is source language oriented) and then subsequently translate it o a low-level IR (that is target machine oriented). This approach allows a cleaner separation of source and target dependencies. Optimizer The IR code generated by the translator is analyzed and transformed o functionally equivalent but improved IR code by the optimizer. The term optimization is misleading: we don t always produce the best possible translation of a program, even after optimization by the best of compilers. Why? Some optimizations are impossible to do in all circumstances because they involve an undecidable problem. Eliminating unreachable ( dead ) code is, in general, impossible. 39 40

Other optimizations are too expensive to do in all cases. These involve NP-complete problems, believed to be inherently exponential. Assigning registers to variables is an example of an NP-complete problem. Optimization can be complex; it may involve numerous subphases, which may need to be applied more than once. Optimizations may be turned off to speed translation. Nonetheless, a well designed optimizer can significantly speed program execution by simplifying, moving or eliminating unneeded computations. Code Generator IR code produced by the translator is mapped o target machine code by the code generator. This phase uses detailed information about the target machine and includes machine-specific optimizations like register allocation and code scheduling. Code generators can be quite complex since good target code requires consideration of many special cases. Automatic generation of code generators is possible. The basic approach is to match a low-level IR to target instruction templates, choosing 41 42 instructions which best match each IR instruction. A well-known compiler using automatic code generation techniques is the GNU C compiler. GCC is a heavily optimizing compiler with machine description files for over ten popular computer architectures, and at least two language front ends (C and C++). Symbol Tables A symbol table allows information to be associated with identifiers and shared among compiler phases. Each time an identifier is used, a symbol table provides access to the information collected about the identifier when its declaration was processed. 43 44

Example Our source language will be CSX, a blend of C, C++ and Java. Our target language will be the Java JVM, using the Jasmin assembler. Our source line is a = bb+abs(c-7); this is a sequence of ASCII characters in a text file. The scanner groups characters o tokens, the basic units of a program. a = bb+abs(c-7); After scanning, we have the following token sequence: Id a Asg Id bb Plus Id abs Lparen Id c Minus IntLiteral 7 Rparen Semi The parser groups these tokens o language constructs (expressions, statements, declarations, etc.) represented in tree form: Id a Asg Id bb Plus Id abs Call Minus (What happened to the parentheses and the semicolon?) Id c IntLiteral 7 45 46 The type checker resolves types and binds declarations within scopes: Id a Asg loc Plus loc Id bb Call method Id abs Minus loc Id c IntLiteral 7 Finally, JVM code is generated for each node in the tree (leaves first, then roots): iload 3 ; push local 3 (bb) iload 2 ; push local 2 (c) ldc 7 ; Push literal 7 isub ; compute c-7 invokestatic java/lang/math/ abs(i)i iadd ; compute bb+abs(c-7) istore 1 ; store result o local 1(a) 47 48

Interpreters Language Interpreters There are two different kinds of erpreters that support execution of programs, machine erpreters and language erpreters. Machine Interpreters Machine erpreters simulate the execution of a program compiled for a particular machine architecture. Java uses a bytecode erpreter to simulate the effects of programs compiled for the JVM. Programs like SPIM simulate the execution of a MIPS program on a non-mips computer. Language erpreters simulate the effect of executing a program without compiling it to any particular instruction set (real or virtual). Instead some IR form (perhaps an AST) is used to drive execution. Interpreters provide a number of capabilities not found in compilers: Programs may be modified as execution proceeds. This provides a straightforward eractive debugging capability. Depending on program structure, program modifications may require reparsing or repeated semantic analysis. In Python, for example, any string variable may be erpreted as a Python expression or statement and executed. 49 50 Interpreters readily support languages in which the type of a variable denotes may change dynamically (e.g., Python or Scheme). The user program is continuously reexamined as execution proceeds, so symbols need not have a fixed type. Fluid bindings are much more troublesome for compilers, since dynamic changes in the type of a symbol make direct translation o machine code difficult or impossible. Interpreters provide better diagnostics. Source text analysis is ermixed with program execution, so especially good diagnostics are available, along with eractive debugging. Interpreters support machine independence. All operations are performed within the erpreter. To move to a new machine, we just recompile the erpreter. However, erpretation can involve large overheads: As execution proceeds, program text is continuously reexamined, with bindings, types, and operations sometimes recomputed at each use. For very dynamic languages this can represent a 100:1 (or worse) factor in execution speed over compiled code. For more static languages (such as C or Java), the speed degradation is closer to 10:1. Startup time for small programs is slowed, since the erpreter must be load and the program partially recompiled before execution begins. 51 52

Substantial space overhead may be involved. The erpreter and all support routines must usually be kept available. Source text is often not as compact as if it were compiled. This size penalty may lead to restrictions in the size of programs. Programs beyond these built-in limits cannot be handled by the erpreter. Of course, many languages (including, C, C++ and Java) have both erpreters (for debugging and program development) and compilers (for production work). 53