Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

Similar documents
Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

A Simple Syntax-Directed Translator

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

A simple syntax-directed

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

CS 4201 Compilers 2014/2015 Handout: Lab 1

Compiler Design (40-414)

Lexical Analysis. Introduction

Chapter 10 Language Translation

Compiling Regular Expressions COMP360

Compilers. Prerequisites

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Introduction to Compiler Construction

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

COMPILER DESIGN LECTURE NOTES

Introduction to Compiler Design

CST-402(T): Language Processors

Working of the Compilers

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Introduction to Compiler Construction

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Inside PHP Tom OSCON th July, 2012

1 Lexical Considerations

Introduction to Compiler Construction

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Front End. Hwansoo Han

Introduction to Lexical Analysis

A programming language requires two major definitions A simple one pass compiler

Compiler Code Generation COMP360

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Compilers and Interpreters

Languages and Compilers

Undergraduate Compilers in a Day

COMPILER DESIGN LEXICAL ANALYSIS, PARSING

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

Part 5 Program Analysis Principles and Techniques

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Introduction to Compiler

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

CSE 3302 Programming Languages Lecture 2: Syntax

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Group A Assignment 3(2)


Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Typescript on LLVM Language Reference Manual

CPS 506 Comparative Programming Languages. Syntax Specification

Pioneering Compiler Design

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Compiler Construction LECTURE # 1

CS 321 IV. Overview of Compilation

The Structure of a Syntax-Directed Compiler

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Chapter 2 A Quick Tour

Introduction to Compilers

Parsing and Pattern Recognition

Lexical Considerations

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CS Concepts of Compiler Design

CS 415 Midterm Exam Spring SOLUTION

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CS426 Compiler Construction Fall 2006

The Structure of a Syntax-Directed Compiler

Lexical Considerations

Compilers and Code Optimization EDOARDO FUSELLA

Overview of Compiler. A. Introduction

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Downloaded from Page 1. LR Parsing

4. An interpreter is a program that

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Intermediate Representations

tokens parser 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address code

Compiler I: Syntax Analysis

Software II: Principles of Programming Languages

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

COMPILER DESIGN. For COMPUTER SCIENCE

Syntax and Grammars 1 / 21

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

More Assigned Reading and Exercises on Syntax (for Exam 2)

SOFTWARE ARCHITECTURE 5. COMPILER

Advanced Programming & C++ Language

CS453 Compiler Construction


A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

COMPILERS BASIC COMPILER FUNCTIONS

Computer Hardware and System Software Concepts

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

1. Lexical Analysis Phase

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Transcription:

Compiler Structure Source Program Text Phases of Compilation Compilation process is partitioned into a series of four distinct subproblems called phases, each with a separate well-defined translation task This structure is the result of more than 40 years of experience in compiler engineering Can be run in sequence, in which case they are called passes Or can be integrated in various ways, such as parallel execution as co-routines Parse Tree Annotated Parse Tree Lexical Analysis Token Stream Syntax Analysis or Postfix Token Stream Semantic Analysis or Intermediate Code Code Generation Target Machine Code or or or Scanning/ Screening Parsing Context Analysis

Phase 1: Lexical Analysis Lexical Analysis Lexical analysis, or scanning/screening, partitions source text into the words or tokens of the input language Source Program Text Lexical Analysis if x>5 then > keyword "if" identifier "x" operator ">" integer "5" keyword "then" Token Stream

Syntax Analysis Phase 2: Syntax Analysis Syntax analysis, or parsing, groups tokens together into the grammatical structures of the language, often represented as a parse tree or (equivalently) a postfix token stream Token Stream Syntax Analysis identifier "x" operator ">" > integer "5" op ">" id "x" int "5" Parse Tree or Postfix Token Stream identifier "x" integer "5" operator ">"

Phase 3: Semantic Analysis Semantic Analysis Semantic analysis, or context analysis, analyzes and verifies the meaning of the structures, and checks for semantic legality Parse Tree or Postfix Token Stream op "+" Semantic Analysis int "2" string ""hi"" Annotated Parse Tree or Intermediate Code > ERROR

Phase 4: Code Generation Code Generation Generates target machine code for analyzed structures May include optimization of structures, operations and code Annotated Parse Tree or Intermediate Code Code Generation Target Machine or Assembly Code

Alternate Phase 4: Interpretation Bytecode Interpretation Simulates the virtual machine of the intermediate code in software Executes the intermediate code directly Intermediate Code Bytecode Interpretation Execution

Merging Phases Compiler Architectures Some compilers merge or integrate one or more phases together (although they are still designed separately) to gain efficiency If all phases are merged, we have a one pass compiler One pass compilers can be very fast, but they can also be much more complex and difficult to maintain Refining Phases In order to reduce complexity and increase maintainability and reliability, very high quality production compilers (e.g. IBM s) often to the opposite - they actually split phases into even more independent sub-tasks For example, the IBM PL/I compiler actually has about 40 separate phases, each a separate pass

Tables Why is it called a Compiler? Language processing involves compiling tables of information about the program, to be used to analyze scopes, look up references and determine meaning - hence the term compiler Such tables may be shared between phases of compilation, using databases or disk files Or in other compilers each phase builds it own tables Either way, a big part of compiler engineering is the design of the table structures to support efficient analysis

The PT Pascal Compiler PT Pascal Originally designed and implemented by J. Alan Rosselet (U. of Toronto) about 1980, as a proof of concept for the S/SL compiler technology (then a new research result) Specifically designed for teaching and learning about compilers Designed to be easily portable (Pascal was at that time the standard portable language, as C is now) - it takes an expert about three days to port PT to a new machine (mostly spent understanding the new machine s assembly language) Designed to be real - generates good quality code for target machines, can compile and run itself, the S/SL Processor, and other quite large real programs Self-compiling - implemented entirely in PT Pascal itself, using S/SL

PT Compiler Structure Four Phase, Three Pass Scanner / Screener run as co-routine with Parser in Parser pass Parse tree represented as postfix stream of tokens Special PT abstract (virtual) machine code ( T-code ) PT Pascal Source Text Scanner/ Screener Parser Postfix Token Stream Semantic Analysis T-code Code Generation x86 Assembly Code or (Scanner/Screener and Parser run as co-routines) ( E.G. x + y > x y + ) (Intermediate virtual machine code designed for PT) T-code VM Simulator Interpetive execution Intel x86 Assembler x86 Machine Code (Not part of PT compiler)

Scanner PT Phase Overview: Scanner Breaks up input text into PT language tokens, ignores blanks, newlines etc. PT Pascal Source Text Scanner/ Screener Parser Postfix Token Stream Input to Scanner a :=b+ 11; Output from Scanner pidentifier "a" pcolonequals (The p prefix on token pidentifier "b" names indicates that pplus they are to be Parser pinteger "11" input stream tokens) psemicolon

PT Phase Overview: Screener Screener A filter between the Scanner and the Parser Recognizes keywords, evaluates literals, builds identifier table PT Pascal Source Text Scanner/ Screener Parser Input to Scanner const x=22; Output from Scanner pidentifier "const" (before screening) pidentifier "x" pequals pinteger "22" psemicolon Postfix Token Stream Output from Scanner/ pconst Screener to Parser pidentifier 273 (after screening) pequals pinteger 22 psemicolon

Parser PT Pascal Source Text Scanner/ Screener Parser Postfix Token Stream PT Phase Overview: Parser Context-free syntax checking (syntax error detection) Converts expressions to expression trees in postfix form, recognizes statement structure, disambiguates operators Input to Scanner/Screener a := b + 11; Input to Parser from pidentifier 632 Scanner/Screener pcolonequals pidentifier 243 pplus pinteger 11 psemicolon Output from Parser sassignmentstmt sidentifier 632 (The s prefix to token sidentifier 243 names indicates that sinteger 11 they are Semantic pass sadd input stream tokens) sexpnend

PT Phase Overview: Semantic Analyzer Semantic Phase Checks semantic constraints (e.g., type checking) Constructs symbol table, analyzes scopes, resolves references Generates T-code representation of program Postfix Token Stream Semantic Analysis T-code

PT Phase Overview: Semantic Analyzer Input to Semantic Analyzer sassignmentstatement from Parser sidentifier 632 sidentifier 243 sinteger 11 sadd sexpnend Output from tassignbegin Semantic Analyzer tliteraladdress 120 tliteraladdress 150 (PT virtual machine tfetchinteger bytecode - T-code) tliteralinteger 11 taddinteger tassigninteger

The PT Interpreter T-code Virtual Machine Simulator The T-code output by the Semantic Analyzer can be executed directly by a PT VM Simulator, to make a PT Interpreter T-code T-Code VM Simulator (Interpretive execution)

Code Generator PT Phase Overview: Code Generator Translates the T-code form of the program into machine code for the target computer Our target computer is the Intel x86, and the code generator outputs Linux x86 assembly code T-code Code Generation x86 Assembly Code

PT Phase Overview: Code Generator Input from tassignbegin Semantic Analyzer tliteraladdress 120 tliteraladdress 150 tfetchinteger tliteralinteger 11 tadd tassigninteger Output from Code Generator movl u+150, %eax addl $11, %eax (x86 assembly code) movl %eax, u+120

Summary Compiler Structure Compilers split into four phases: Lexical, Syntactic, Semantic, Code generation Phases may be implemented as separate passes, or integrated for performance PT Compiler implements four phases in three passes, using token streams between passes Uses PT virtual machine T-code as intermediate code References Text, chapter 2 Next Time The Syntax/Semantic Language (S/SL) - a domain-specific language (DSL) for specifying compiler phases