Front End. Hwansoo Han

Similar documents
Introduction to Compiler

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

High-level View of a Compiler

Intermediate Representations

CS415 Compilers. Lexical Analysis

Lexical Analysis. Introduction

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Intermediate Representations

Compiling Techniques

Introduction to Parsing

Intermediate Representations

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing II Top-down parsing. Comp 412

Compilers and Interpreters

Part 5 Program Analysis Principles and Techniques

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Alternatives for semantic processing

Intermediate Representations

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS 406/534 Compiler Construction Parsing Part I

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

CS606- compiler instruction Solved MCQS From Midterm Papers

The View from 35,000 Feet

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

CS5363 Final Review. cs5363 1

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS415 Compilers. Intermediate Represeation & Code Generation

CS 132 Compiler Construction

Intermediate Representations & Symbol Tables

CST-402(T): Language Processors

Introduction to Parsing. Comp 412

EECS 6083 Intro to Parsing Context Free Grammars

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Downloaded from Page 1. LR Parsing

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

A simple syntax-directed

2068 (I) Attempt all questions.

CSE 3302 Programming Languages Lecture 2: Syntax

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

Chapter 3. Parsing #1

Compiler Optimisation

CSE 401/M501 Compilers

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Intermediate Representations Part II

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Compilers. Lecture 2 Overview. (original slides by Sam

Optimizing Finite Automata

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Compiler Construction

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Goals for course What is a compiler and briefly how does it work? Review everything you should know from 330 and before

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

What do Compilers Produce?

Overview of a Compiler

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Syntax Analysis, III Comp 412

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

3. Parsing. Oscar Nierstrasz

CS 406/534 Compiler Construction Putting It All Together

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Question Bank. 10CS63:Compiler Design

CSCI312 Principles of Programming Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Introduction to Parsing. Lecture 8

Syntax-directed translation. Context-sensitive analysis. What context-sensitive questions might the compiler ask?

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Compiler Construction LECTURE # 1

Formats of Translated Programs

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

CS 314 Principles of Programming Languages

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Bottom-up Parser. Jungsik Choi

CS 314 Principles of Programming Languages

Context-Free Grammars

Compilers and Code Optimization EDOARDO FUSELLA

A Simple Syntax-Directed Translator

Compiler Design (40-414)

Time : 1 Hour Max Marks : 30

Principles of Compiler Design

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

VIVA QUESTIONS WITH ANSWERS

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Chapter 2 :: Programming Language Syntax

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Compiling Regular Expressions COMP360

CS 352: Compilers: Principles and Practice

Transcription:

Front nd Hwansoo Han

Traditional Two-pass Compiler Source code Front nd IR Back nd Machine code rrors High level functions Recognize legal program, generate correct code (OS & linker can accept) Manage the storage of all variables and code Two passes Use an intermediate representation (IR) Front end maps legal source code into IR - O(n) or O(n log n) Back end maps IR into target machine code - typically NP-complete Admits multiple front ends & multiple passes (better code) 2

Front nd Source code Scanner tokens Parser IR rrors Responsibilities Recognize legal (& illegal) programs Report errors in a useful way Produce IR & preliminary storage map Shape the code for the back end Much of front end construction can be automated Lex/yacc, flex/bison, lkhound 3

Front nd - Scanner Source code Scanner tokens Parser IR rrors Scanner Maps character stream into words (basic units of syntax) Produces tokens a word & its part of speech x = x + y ; becomes <id,x> = <id,x> + <id,y> ; word lexeme, part of speech token type Typical tokens include number, identifier, +,, new, while, if Scanner eliminates white space Produced by automatic scanner generator 4

Front nd - Parser Source code Scanner tokens Parser IR rrors Parser Recognizes context-free syntax Guides context-sensitive ( semantic ) analysis.g. type checking Builds IR for source program Produced by automatic parser generators 5

Back nd IR Instruction Selection IR Instruction Scheduling IR Register Allocation Machine code rrors Responsibilities Translate IR into target machine code Choose instructions to implement each IR operation Decide which values to keep in registers Find optimal order of instruction execution nsure conformance with system interfaces Automation has been less successful in the back end 6

Optimizing Compiler Source Code Front nd IR Middle nd IR Back nd Machine code rrors Code Optimizations Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce xecution time, Space usage, Power consumption, Must preserve meaning of the code 7

Scanner in Front nd Source code Scanner tokens Parser IR rrors Scanner Maps character stream into words (basic units of syntax) Produces tokens a word & its part of speech 8

Scanner Generator We want to avoid writing scanners by hand The scanner is the first stage in the front end Specifications can be expressed using regular expressions Build tables and code from a DFA.g. lex, flex source code Scanner tokens specifications Scanner Generator tables or code Represent words as indices into a global table Specifications written as regular expressions 9

Regular xpressions Regular xpression (over alphabet ) is a R denoting the set {} If a is in, then a is a R denoting {a} If x and y are Rs denoting L(x) and L(y) then x y is an R denoting L(x) L(y) xy is an R denoting L(x)L(y) x * is an R denoting L(x)* 10

Token Recognizer Tokens are recognized by NFA 0 main recognizer 11 9 7 1 2 3 4 6 9 digit digit. digit letter = digit 9 8 7 1 0 keyword recognizer other 8 letter digit other other 0 1 1 9 9 digit digit num (integer) other id_or_keyword equal 1 2 8 9 5 num (real) p r m i re-read the string stored in the buffer n other 9 9 id r 7 1 5 program integer

Automating Scanner Construction R NFA (Thompson s construction) Build an NFA for each term Combine them with -moves NFA DFA (subset construction) Build the simulation DFA Minimal DFA Hopcroft s algorithm The Cycle of Constructions R NFA DFA minimal DFA DFA R (Not part of the scanner construction) All pairs, all paths problem Take the union of all paths from s 0 to an accepting state 12

Parser in Front nd Source code Scanner tokens Parser IR Parser rrors Checks the stream of words and their parts of speech for grammatical correctness Determines if the input is syntactically well formed Guides checking at deeper levels than syntax Builds an IR representation of the code 13

Specification of Grammar Syntax is specified with CFG = <S, T, N, P> 1 xpr xpr Op xpr 2 number 3 id 4 Op + 5 6 * 7 / Rule xpr Such a sequence of rewrites is called a derivation Sentential Form 1 xpr Op xpr 3 <id,x> Op xpr 5 <id,x> xpr 1 <id,x> xpr Op xpr 2 <id,x> <num,2> Op xpr 6 <id,x> <num,2> * xpr 3 <id,x> <num,2> * <id,y> Process of discovering a derivation is called parsing We denote this derivation: xpr * id num * id 14

Derivations Derivation consists of multiple steps of rewrites At each step, we choose a non-terminal to replace Different choices can lead to different derivations Two derivations are of interest Leftmost derivation replace leftmost NT at each step Rightmost derivation replace rightmost NT at each step These are the two systematic derivations (We don t care about randomly-ordered derivations!) The example on the preceding slide was a leftmost derivation Of course, there is also a rightmost derivation Interestingly, it turns out to be different 15

The Two Derivations for x 2 * y Rule Sentential Form xpr 1 xpr Op xpr 3 <id,x> Op xpr 5 <id,x> xpr 1 <id,x> xpr Op xpr 2 <id,x> <num,2> Op xpr 6 <id,x> <num,2> * xpr 3 <id,x> <num,2> * <id,y> Leftmost derivation Rule Sentential Form xpr 1 xpr Op xpr 3 xpr Op <id,y> 6 xpr * <id,y> 1 xpr Op xpr * <id,y> 2 xpr Op <num,2> * <id,y> 5 xpr <num,2> * <id,y> 3 <id,x> <num,2> * <id,y> Rightmost derivation In both cases, xpr * id num * id The two derivations produce different parse trees The parse trees imply different evaluation orders! 16

Leftmost vs. Rightmost Derivations Leftmost derivation Rightmost derivation G G Parse Tree Op Op x Op Op * y 2 * This evaluates as x ( 2 * y ) y x 2 This evaluates as ( x 2 ) * y 17

Parsing Techniques Top-down parsers LL = Left-to-right input scan, Leftmost derivation Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack LL(1) grammars are backtrack-free (predictive parsing) Bottom-up parsers LR = Left-to-right input scan, Rightmost derivation Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars 18

Bottom-Up Parser xample The expression grammar 1 Goal xpr 2 xpr xpr + Term 3 xpr Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor number 9 id 10 ( xpr ) Handles for rightmost derivation of x 2 * y Prod n. Sentential Form Handle Goal 1 xpr 1,1 3 xpr Term 3,3 5 xpr Term * Factor 5,5 9 xpr Term * <id,y> 9,5 7 xpr Factor * <id,y> 7,3 8 xpr <num,2> * <id,y> 8,3 4 Term <num,2> * <id,y> 4,1 7 Factor <num,2> * <id,y> 7,1 9 <id,x> <num,2> * <id,y> 9,1 Reverse rightmost derivation (RRD) Handles are specified in blue 19

Semantic analysis Try to find semantic errors Gather various information (e.g., symbol type) for later phases Use attribute grammar to describe semantics Identify attributes of language elements for semantic analysis Augment grammar with semantic rules (R 1, R 2, R n ) ex. S T 1 {R 1 } T 2 {R 2 } T n {R n } valuate rules in the order given by hierarchical syntax structure 20

Type checking Type checking is a semantic analysis that detects type errors (a kind of semantic errors) Traverse parse tree and evaluate semantic rules Pascal (strongly-typed language) x : integer; y : char; y := a ; x := 5; x + 20 + y integer integer integer integer integer type error id int_num id x + 20 + y char char A simplified example of attribute grammar to check the type of expressions 1 + 2 {.type = ( 1.type == 2.type? 1.type : type error)} int_num {.type = integer} char {.type = char} id {.type = symtable(id).type} 21

Intermediate Representations Source Code Front nd IR Middle nd IR Back nd Machine code Front end - produces an intermediate representation (IR) Middle end - transforms the IR into an equivalent IR that runs more efficiently (Potentially multiple IR s can be used) Back end - transforms the IR into native code rrors IR encodes the compiler s knowledge of the program AST, DAG Stack-machine code (one address code) : Java bytecode Three address code : Low-level, RISC-like IR 22

Types of Intermediate Representations Structural IR Graphically oriented Heavily used in source-to-source translators Tend to be large xamples: Trees, DAGs Linear IR Pseudo-code for an abstract machine Level of abstraction varies Simple, compact data structures asier to rearrange Hybrid IR Combination of graphs and linear code xample: control-flow graph xamples: 3 address code Stack machine code xample: Control-flow graph 23

Abstract Syntax Tree An abstract syntax tree is the procedure s parse tree with the non-terminal nodes removed Op x - 2 * y x - * x Op 2 y Can use linearized form of the tree asier to manipulate than pointers to nodes x 2 y * - in postfix form - * 2 y x in prefix form 24 2 Parse Tree * y AST

Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique node for each value - z w / x * 2 y z x - 2 * y w x / 2 Makes sharing explicit ncodes redundancy Same expression twice means that the compiler might arrange to evaluate it just once! 25

Three Address Code Several different representations of three address code In general, three address code has statements of the form: x y op z With 1 operator (op ) and, at most, 3 names (x, y, z) z x - 2 * y becomes Characteristics Resembles many machines Introduces a new set of names Compact form t 2 * y z x - t x - * 2 y 26

Three Address Code: Quadruples Naïve representation of three address code Table of k * 4 small integers Simple record structure asy to reorder xplicit names load r1, y loadi r2, 2 mult r3, r2, r1 load r4, x sub r5, r4, r3 load r1 y loadi r2 2 mult load sub r3 r2 r1 r4 x r5 r4 r3 RISC assembly code Quadruples 27

Summary Front nd Language grammars are processed (CFG, semantic analysis) Automatic code generators (lex/yacc, flex/bison, lkhound) Intermediate representations Structural IR graphs (AST, DAG) Linear IR pseudo code for abstract machines (3-address code) What s left Middle end optimizations Back end target specific code generation 28