CS 321 IV. Overview of Compilation

Similar documents
Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

LECTURE 3. Compiler Phases

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CS 4201 Compilers 2014/2015 Handout: Lab 1

Introduction to Compiler Construction

Introduction to Compiler Construction

Introduction to Compiler Construction

Compilers and Interpreters

What do Compilers Produce?

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

The Structure of a Syntax-Directed Compiler

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Formats of Translated Programs

When do We Run a Compiler?

A Simple Syntax-Directed Translator

Compilers and Code Optimization EDOARDO FUSELLA


CST-402(T): Language Processors

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

COMPILER DESIGN LECTURE NOTES

Compilers. Prerequisites

Undergraduate Compilers in a Day

CS606- compiler instruction Solved MCQS From Midterm Papers

Languages and Compilers

Introduction to Compiler Design

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Working of the Compilers

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Principles of Compiler Design

Pioneering Compiler Design

The Structure of a Compiler

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Part 5 Program Analysis Principles and Techniques

The Structure of a Syntax-Directed Compiler

CS 415 Midterm Exam Spring 2002

A programming language requires two major definitions A simple one pass compiler

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CSE 3302 Programming Languages Lecture 2: Syntax

CS 415 Midterm Exam Spring SOLUTION

A simple syntax-directed

Informatica 3 Syntax and Semantics

Compiler Design (40-414)

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Compilers and computer architecture From strings to ASTs (2): context free grammars

CS 441G Fall 2018 Exam 1 Matching: LETTER

UNIT -2 LEXICAL ANALYSIS

Time : 1 Hour Max Marks : 30

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Group A Assignment 3(2)

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

The Structure of a Syntax-Directed Compiler

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

CSc 453 Lexical Analysis (Scanning)

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming.

COP 3402 Systems Software Syntax Analysis (Parser)

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Life Cycle of Source Program - Compiler Design

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Semantic Analysis. Compiler Architecture

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

The role of semantic analysis in a compiler

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Introduction to Parsing. Lecture 5

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Introduction to Lexical Analysis

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

COP4020 Spring 2011 Midterm Exam

CMSC 330: Organization of Programming Languages

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

SLIDE 2. At the beginning of the lecture, we answer question: On what platform the system will work when discussing this subject?

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 350: COMPILER DESIGN

Formal Languages and Compilers Lecture I: Introduction to Compilers

TDDD55- Compilers and Interpreters Lesson 3

Syntax. A. Bellaachia Page: 1

COMPILER CONSTRUCTION Seminar 02 TDDB44

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

Front End. Hwansoo Han

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

CS 314 Principles of Programming Languages

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CA Compiler Construction

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

Introduction to Parsing Ambiguity and Syntax Errors

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CPS 506 Comparative Programming Languages. Syntax Specification

1 Lexical Considerations

Transcription:

CS 321 IV. Overview of Compilation

Overview of Compilation Translating from high-level language to machine code is organized into several phases or passes. In the early days passes communicated through files, but this is no longer necessary.

Language Specification We must first describe the language in question by giving its specification. Syntax: admissible symbols (vocabulary) admissible programs (sentences) Semantics: give meaning to sentences. The formal specifications are often the input to tools that build translators automatically.

Lexical Analyzer String of characters String of tokens Compiler passes Parser Abstract syntax tree Abstract syntax tree Semantic Analyzer Translator Optimizer Abstract syntax tree Abstract syntax tree Low-level intermediate code Low-level intermediate code Low-level intermediate code Source-to-source optimizer Medium-level intermediate code Optimizer Translator Medium-level intermediate code Final Assembly Executable/object code

Compiler passes source program Lexical scanner front end Parser symbol table manager semantic analyzer Translator error handler Optimizer Final assembly target program back end

Lexical analyzer A.k.a scanner or tokenizer. Converts stream of characters into a stream of tokens. Tokens are: Keywords such as for, while, and class. Special characters such as +, -, (, and < Variable name occurrences Constant occurrences such as 1.5e-10, true.

Lexical analyzer (Cont) The lexical analyzer is usually a subroutine of the parser. Each token is a single entity. A numerical code is usually assigned to each type of token.

Lexical analyzers perform: Line reconstruction delete comments delete extraneous blanks handle margin checks delete character/backspace perform text substitution Lexical translation: translation of lexemes -> tokens + -> Taddop >= -> Trelop Joe -> Tident Often additional information is affiliated with a token.

Example. The program fragment if num = 0 then avg := 0 else avg := sum * num; when lexed becomes Tif Tid Trelop Tint Tthen Tid Tasgn Tint Telse Tid Tasgn Tid Tmulop Tid Tsemi

Keywords are easy to translate (finite functions). Relational Operators (and other classes of operators) are also easy to translate. Affiliated information identifies the specific operator. Identifiers, numbers, strings and the like are harder. There are an infinite number of them. Finite State Machines model such tokens s1 a,..z,a,..z Grammars also model them id ax bx zx X ax bx zx 0X 9X ε s2 a,..z,a,..z,0,1,,9

Parser Performs syntax analysis. Imposes syntactic structure on a sentence. Parse trees/derivation trees are used to expose the structure. These trees are often not explicitly built. Simpler representations of them are often used (e.g. syntax trees, tuples). Parsers, accepts a string of tokes and and builds a parse tree representing the program

Parser (Cont.) The collection of all the programs in a given language is usually specified using a list of rules known as a context free grammar. In the next page we present a very simple grammar.

Parser (Cont.) program dmoduleidentifier { statement-list } statement-list d statement ; statement-list statement statement dif(expression )then{ statement-list }else { statement-list } identifier = expression while ( expression < expression) { statement-list } inputidentifier output identifier expression d expression + term expression - term term term d term * factor term / factor factor factor d( expression ) identifier number

Parser (Cont.) A context free grammar like the one above has four components A set of tokens known as terminal symbols A set of variables or nonterminals A set of productions where each production consists of a nonterminal, called the left side of the production, an arrow, and a sequence of tokens and/or nonterminals, called the right side of the production. A designation of one of the nonterminals as the start symbol.

Parser (Cont.) The terminals are represented in courier font in the previous example. The start symbol in the previous grammar is 'program'. A parse tree of a simple program using the grammar above is shown in the next page

Parser (Cont.) program module identifier(a) { statement-list } statement ; statement-list input identifier(x) statement ; statement-list while ( expression < expression ) { statement-list } statement term term statement output identifier(x) factor factor identifier(x) = expression module a { input x; while (x > 2) { x = x / 2 }; output x } identifier(x) number(2) term term / factor factor number(2) identifier(x)

Symbol Table Management The symbol table is a data structure used by all phases of the compiler to keep track of user defined symbols (and sometimes keywords as well). During early phases (lexical and syntax analysis) symbols are discovered and put into the symbol table During later phases symbols are looked up to validate their usage.

Symbol Table Management (Cont.) Typical symbol table activities: add a new name add information for a name access information for a name determine if a name is present in the table remove a name revert to a previous usage for a name (close a scope).

Symbol Table Management (Cont.) Many possible Implementations: linear list sorted list hash table tree structure

Symbol Table Management (Cont.) Typical information fields: print value kind (e.g. reserved, typeid, varid, funcid, etc.) block number/level number (in statically scoped languages) type initial value base address etc.

Abstract Syntax Tree The parse tree is used to recognize the components of the program and to check that the syntax is correct. However, the parse tree is rarely built. As the parser applies productions, it usually generates the component of a simpler tree (known as Abstract Syntax Tree) that does not contain information that would be superfluous for the internal working of the compiler once a tree representing the program is built. For example, there is not need for the semi colon once each statement is organized as a subtree.

Abstract Syntax Tree (Cont.) A typical Abstract Syntax Tree has labeled edges and follows the structure of the parse tree. The AST corresponding to the parse tree above is shown next.

Abstract Syntax Tree (Cont.) program name identifier(a) body input next statement identifier(x) left operand < condition right operand while body next statement identifier(x) number(2) = output identifier(x) / identifier(x) identifier(x) number(2)

Semantic Analyzer The semantic analyzer completes the symbol table with information on the characteristics of each identifier. The symbol table is usually initialized during parsing. One entry is created for each identifier and constant. Scope is taken into account. Two different variables with the same name will have different entries in the symbol table. Nodes in the AST containing identifiers or constants really contain pointers into the symbol table. The semantic analyzer completes the table using information from declarations and perhaps the context where the identifiers occur.

Semantic Analyzer (Cont.) The semantic analyzer does Type checking Flow of control checks Uniqueness checks (identifiers, case labels, etc.) One objective is to identify semantic errors statically. For example: Undeclared identifiers Unreachable statements Identifiers used in the wrong context. Methods called with the wrong number of parameters or with parameters of the wrong type....

Semantic Analyzer (Cont.) Some semantic errors have to be detected at run time to make sure they are always caught. The reason is that the information may not be available at compile time. Array subscript is out of bonds. Variables are not initialized. Divide by zero. Notice, however, that in some cases (but not always) these semantic errors can be detected statically.

Semantic Analyzer (Cont.) For example, it is possible to detect statically (at compile-time) that a in never out of bounds in the loop float a[ ]=new float[100] for(i=1;i<20;i++){ a[i]=0 } However, static analysis would not work if the loop has the following form float a[ ]=new float[100] n=read (); for(i=1;i<n;i++){ a[i]=0 }

Error Management Errors can occur at all phases in the compiler Invalid input characters, syntax errors, semantic errors, etc. Good compilers will attempt to recover from errors and continue.

Source-to-source optimizers It is possible to do code improvement at the highlevel ( or source-to-source transformations). An example of such transformation is the locality enhancement transformation presented earlier.

Translator The lexical scanner, parser, semantic analyzer, and source-to-source optimizer are collectively known as the front end of the compiler. The second part, or back end starts by generating low level code from the (possibly optimized) AST.

Rather than generate code for a specific architecture, most compilers generate intermediate language Three address code is popular. Really a flattened tree representation. Simple. Flexible (captures the essence of many target architectures). Can be interpreted. Translator

Translator One way of performing intermediate code generation: Attach meaning to each node of the AST. The meaning of the sentence = the meaning attached to the root of the tree.

XIL An example of Medium level intermediate language is XIL. XIL is used by IBM to compile FORTRAN, C, C++, and Pascal for RS/6000. Compilers for Fortran 90 and C++ have been developed using XIL for other machines such as Intel 386, Sparc, and S/370. These compiler share a common back end, the Toronto Optimizing Back End with Yorktown, TOBEY.

XIL (Cont.) XIL is free of source language dependences and thus forms a suitable target for the compilation of a broad range of programming languages. XIL presents a model of a machine with a Load/Store archtecture and a number of distinct register sets. These register sets can each contain a unbounded number of symbolic registers. Instruction are similar to those of RISC machines, but more flexible: displacement in addresses can be of any size addresses can contain as many index registers as desired call instructions can have a large number of parameter registers instructions can have as many result registers as is convenient

XIL (Cont.) Indexed load of a(i,j+2) in XIL L r.i=i(r200,64) M r300=r.i,8 L r.j=j(r200,68) A r310=r.j,2 M r320=r310,400 LFL fp330=a(r200,r300,r320,30000) 30,000 a j i 64 r200

Optimizers Improve the quality of the code generated. Optimizers manipulating Medium level intermediate language apply machineindependent transformations. Loop invariant removal Common subexpression elimination Optimizers manipulating low level language apply machine-dependent transformations. Register allocation

Optimizers (Cont.) Intermediate code is examined and improved. Can be simple: changing a:=a+1 to increment a changing 3*5 to 15 Can be complicated: reorganizing data and data accesses for cache efficiency Optimization can improve running time by orders of magnitude, often also decreasing program size.

Code Generation Generation of real executable code for a particular target machine. It is completed by the Final Assembly phase, but other passes contribute to it Final output can either be assembly language for the target machine (requiring execution of the assembler after the compiler completes) object code ready for linking The target machine can be a virtual machine (such as the Java Virtual Machine, JVM), and the real executable code is virtual code (such as Java Bytecode).

Tools Tools exist to help in the development of some stages of the compiler Lex (Flex) - lexical analyzer generator Yacc (Bison) - parser generator