The Compiler s Front End (viewed from a lab 1 persepc2ve) Comp 412 COMP 412 FALL Chapter 1 & 2 in EaC2e. target code

Similar documents
Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS415 Compilers. Lexical Analysis

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Intermediate Representations

COMP 412, Fall 2018 Lab 1: A Front End for ILOC

Intermediate Representations

Instruction Selection: Preliminaries. Comp 412

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Lexical Analysis. Introduction

Local Register Allocation (critical content for Lab 2) Comp 412

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

CS 406/534 Compiler Construction Putting It All Together

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Parsing II Top-down parsing. Comp 412

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

The Software Stack: From Assembly Language to Machine Code

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Building a Parser Part III

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Implementing Control Flow Constructs Comp 412

Lab 3, Tutorial 1 Comp 412

Syntax Analysis, III Comp 412

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Syntax Analysis, III Comp 412

CS415 Compilers. Intermediate Represeation & Code Generation

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Introduction to Parsing. Comp 412

Front End. Hwansoo Han

Languages and Compilers

Introduction to Compiler

The View from 35,000 Feet

Naming in OOLs and Storage Layout Comp 412

CS 314 Principles of Programming Languages

Compilers and Interpreters

Compiler Optimisation

Lexical Analysis. Lecture 2-4

CSc 453 Lexical Analysis (Scanning)

CS 314 Principles of Programming Languages. Lecture 3

Lecture 3: CUDA Programming & Compiler Front End

Alternatives for semantic processing

Lexical Analysis, V Implemen'ng Scanners. Comp 412 COMP 412 FALL Sec0on 2.5 in EaC2e. target code. source code Front End OpMmizer Back End

Lexical Analysis. Chapter 2

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Instruction Selection, II Tree-pattern matching

Parsing II Top-down parsing. Comp 412

Lexical Scanning COMP360

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Figure 2.1: Role of Lexical Analyzer

CST-402(T): Language Processors

Semantic Analysis. Lecture 9. February 7, 2018

CS 406/534 Compiler Construction Parsing Part I

Lexical Analysis. Lecture 3-4

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

DEMO A Language for Practice Implementation Comp 506, Spring 2018

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 4201 Compilers 2014/2015 Handout: Lab 1

CSE 582 Autumn 2002 Exam 11/26/02

Compiler Construction D7011E

CS 321 IV. Overview of Compilation

Homework & Announcements

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

First Midterm Exam CS164, Fall 2007 Oct 2, 2007

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Formal Languages and Compilers Lecture VI: Lexical Analysis

Introduction to Lexical Analysis

CS /534 Compiler Construction University of Massachusetts Lowell

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Intermediate Representations Part II

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Optimizing Finite Automata

Class Information ANNOUCEMENTS

Implementation of Lexical Analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Chapter 2 :: Programming Language Syntax

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

High-level View of a Compiler

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

The ILOC Simulator User Documentation

Chapter 2 - Programming Language Syntax. September 20, 2017

List of Figures. About the Authors. Acknowledgments

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Runtime Support for Algol-Like Languages Comp 412

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

The ILOC Simulator User Documentation

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Transcription:

COMP 412 FALL 2017 The Compiler s Front End (viewed from a lab 1 persepc2ve) Comp 412 source code IR Front End OpOmizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educaoonal insotuoons may use these materials for nonprofit educaoonal purposes, provided this copyright nooce is preserved. Chapter 1 & 2 in EaC2e

Lab 1 In Lab 1, you will build a local register allocator Handouts and sovware will be online Friday Handouts at the course web site under programming exercises SoVware is on CLEAR under ~comp412/students/lab1 Friday s lecture will cover the register allocaoon poroon of the lab Q&A session early next week I have posted a short tutorial (PowerPoint) on the ILOC Virtual Machine and the ILOC Simulator s trace facility (under AddiOonal Materials ) Today s lecture should help you get started on the lab s front end See remarks in lecture 1 on programming language choice COMP 412, Fall 2017 1

Example: Lab 1 ILOC was discussed in Chapter 1 of EaC2e, as well in Appendix A. You should have read Chapter 1 already. In Lab 1, the input is wricen in a subset of ILOC Syntax Meaning Latency load r 1 => r 2 r 2 MEM(r 1 ) 3 store r 1 => r 2 MEM(r 2 ) r1 3 loadi c => r 2 r 2 c 1 add r 1, r 2 => r 3 r 3 r 1 + r 2 1 sub r 1, r 2 => r 3 r 3 r 1 - r 2 1 mult r 1, r 2 => r 3 r 3 r1 * r 2 1 lshiv r 1, r 2 => r 3 r 3 r 1 << r 2 1 rshiv r 1, r 2 => r 3 r 3 r 1 >> r 2 1 output c prints MEM(c) to stdout 1 nop idles for one cycle 1 ILOC is an abstract assembly language for a simple RISC processor. We will use it at Omes as an IR and at other Omes as a target language. For details on Lab 1 ILOC, see the lab handout, the ILOC simulator document and Appendix A in EaC2e. (ILOC appears throughout the book.) Note that ILOC is case sensiove. The I in loadi must be an uppercase leier and the others must be lowercase leiers. c represents a constant. The same ILOC subset (syntax & meaning) will be used in Lab 3, with different per-operaoon COMP 412, Fall latencies. 2017 You should plan to reuse your ILOC front end in Lab 3. Projector 2 2

Lab 1 In Lab 1, you will build a register allocator for straight-line code. The allocator will: take as input some sequence of operaoons in a subset of ILOC produce as output an equivalent program in the same subset of ILOC Your lab will need to (1) read in a file of ILOC operaoons, (2) check them for validity, (3) convert them into some internal representaoon, (4) perform renaming, allocaoon & assignment (next lecture), and (5) write the resulong code out to stdout. This lecture provides a quick tour of a compiler front end, without the theory. We will go back and cover the theory in the next 3 weeks. (EaC2e, Chapters 2 & 3). Today s lecture uses Lab 1 as an extended example to explore the funcoonality of a compiler s front end COMP 412, Fall 2017 3

The Front End source code IR Front End OpOmizer Back End IR target code stream of characters stream of tokens Scanner microsyntax Parser syntax SemanOc ElaboraOon IR annotalons Front End Scanner looks at every character Converts stream of chars to stream of classified words: <part of speech, word> We someomes call a classified word a token Efficiency & scalability maier Parser looks at every token Determines if the stream of words forms a sentence in the source language Fits words to some syntacoc model, or grammar, for the source language COMP 412, Fall 2017 4

The Front End stream of characters stream of tokens Scanner microsyntax Parser syntax SemanOc ElaboraOon IR annotalons Parser controls the process. Parser has a set of grammar rules that define the source language & guide its recognioon Parser invokes scanner to get tokens as needed Parser invokes SemanOc ElaboraOon to perform analysis that is deeper than syntax e.g., build an IR, lay out storage, check types, or compute Front End Scanner operates incrementally Batch scanner would require an unbounded buffer (doesn t scale ) Scanner may need to look beyond end of current word COMP One of 412, the Fall lab 2017 1 test codes has 128,000 lines of input. 5

The Scanner / Parser Interface The scanner has a simple interface Call to the scanner returns the next classified word A <category, lexeme> pair, where the lexeme can be explicit or implicit In Lab 1, the scanner can (should?) discard comments Efficiency in handling strings Strings are big, ugly, awkward, and slow (See 7, EaC2e) Require space proporoonal to their length Comparisons require (worst case) O(string length) Integers are compact and fast Space is constant, comparisons are simple & supported by the hardware When possible, scanners should convert strings to integers Manipulate the integers, limit use of the string to printed output Implies an efficient map from integer to string COMP 412, Fall 2017 6

The Front End stream of characters stream of tokens Scanner microsyntax Parser syntax SemanOc ElaboraOon IR annotalons Front End Before we can build either a scanner or a parser, we need a definioon of the input language. DefiniOon must be formal that is, mathemaocally well formed DefiniOon must be operaoonal that is, it must lead to a piece of sovware that recognizes both valid & invalid inputs Given the division of labor between scanner & parser, we want a similar split in the definioon Microsyntax or lexical structure specifies valid spelling for each part of speech Syntax or grammar specifies how parts of speech fit together to This structure explains one of the key differences between a programming language and a natural language. In a programming language, each word maps to exactly one part of speech. In a natural form sentences language, COMP 412, a word Fall 2017 can have mulople parts of speech. 7

Example: Lab 1 ILOC was discussed in Chapter 1 of EaC2e, as well in Appendix A. You should have read Chapter 1 already. In Lab 1, the input is wricen in a subset of ILOC Syntax Meaning Latency load r 1 => r 2 r 2 MEM(r 1 ) 3 store r 1 => r 2 MEM(r 2 ) r1 3 loadi c => r 2 r 2 c 1 add r 1, r 2 => r 3 r 3 r 1 + r 2 1 sub r 1, r 2 => r 3 r 3 r 1 - r 2 1 mult r 1, r 2 => r 3 r 3 r1 * r 2 1 lshiv r 1, r 2 => r 3 r 3 r 1 << r 2 1 rshiv r 1, r 2 => r 3 r 3 r 1 >> r 2 1 output c prints MEM(c) to stdout 1 nop idles for one cycle 1 ILOC is an abstract assembly language for a simple RISC processor. We will use it at Omes as an IR and at other Omes as a target language. For details on Lab 1 ILOC, see the lab handout, the ILOC simulator document and Appendix A in EaC2e. (ILOC appears throughout the book.) Note that ILOC is case sensiove. The I in loadi must be an uppercase leier and the others must be lowercase leiers. c represents a constant. The same ILOC subset (syntax & meaning) will be used in Lab 3, with different per-operaoon COMP 412, Fall latencies. 2017 You should plan to reuse your ILOC front end in Lab 3. Projector 2 8

The Syntax of Lab 1 ILOC The ILOC-subset operalons fit one of five pacerns or rules An opera1on is load store r 1 => r 2 MEMOP or loadi c => r 2 LOADI or add sub mult lshiv rshiv r 1, r 2 => r 3 ARITHOP or output c OUTPUT or nop NOP Scanner must recognize words & assign categories (or parts of speech) Parser must discover sentences in the stream of categorized words COMP 412, Fall 2017 9

The Syntax of Lab 1 ILOC RewriLng the rules in a more standard notalon opera2on MEMOP REG INTO REG LOADI CONSTANT INTO REG ARITHOP REG COMMA REG INTO REG OUTPUT CONSTANT NOP block block opera2on opera2on Where Italics indicates a syntacoc variable means derives means also derives All CAPITALS indicate a category of word (e.g., INTO contains one word, => ) Categories may contain > 1 word These rules form a grammar. See the digression COMP 412, Fall 2017 on page 87 in EaC2e Otled Backus-Naur Form. 10

The Microsyntax of Lab 1 ILOC Microsyntax is just the spelling of the words in the language Category MEMOP LOADI ARITHOP OUTPUT NOP load, store loadi Specific Words add, sub, mult, lshiv, rshiv output nop CONSTANT A non-negaove, base 10 number REGISTER r followed by a nonnegaove, base 10 number Categories are dictated by the needs of the grammar Scanner finds words and determines their categories (or token type ) For simplicity s sake, assign a small integer to each of these categories (e.g., 0 to 8). Then you can use an array of strings, staocally inioalized, to efficiently convert the integer to a string for debugging or final output. Do not use a map or a dicoonary. The COMMA, array reference is a couple of instrucoons. The overhead on the funcoon call is more INTO => like 30 or 40 instrucoons, before the real work of compuong a hash and looking up COMP 412, Fall 2017 the value in the table. (See B.4 in EaC2e) 11

Scanning Lab 1 ILOC How do we write a scanner for these syntaclc categories? All of your training, to date, tells you to call some support rouone fscanf() in C, the Scanner class in Java, the extract operator (>>) in C++, a regex library in Python (or Java, or C++), COMP 412 is about implemen2ng languages, so in COMP 412, you WILL (that means you MUST) use character-by-character input Character-by-character algorithms have clarity & they expose complexity You may not use a regular expression library or built-in facility. * COMP 412, Fall 2017 12

Scanning Lab 1 ILOC How do we write a scanner for these syntaclc categories? All of your training, to date, tells you to call some support rouone fscanf() in C, the Scanner class in Java, the extract operator (>>) in C++, a regex library in Python (or Java, or C++), COMP 412 is about implemen2ng languages, so in COMP 412, you WILL (that means you MUST) use character-by-character input Character-by-character algorithms have clarity & they expose complexity You may not use a regular expression library or built-in facility. If you want to use regex, implement the regex roulnes yourself. COMP 412 is about implementalon of languages & their runlmes In the next two weeks, you will learn enough to build a great regex library You can look at the simulator s front end, but you cannot reuse it In fact, it uses flex and bison, which are not allowed in this lab COMP 412, Fall 2017 13

Scanning Lab 1 ILOC How do we write a scanner for these syntaclc categories? All of your training, to date, tells you to call some support rouone fscanf() in C, the Scanner class in Java, the extract operator (>>) in C++, a regex library in Python (or Java, or C++), COMP 412 is about implemen2ng languages, so in COMP 412, you WILL (that means you MUST) use character-by-character input Character-by-character algorithms have clarity & they expose complexity Let s look at an easy one. How do we recognize a NOP? The spelling is n followed by o followed by p The code is as simple as the descripoon Cost is O(1) per character AsymptoOcally opomal c next character If c = n then { c next character if c = o then { c next character if c = p then return <NOP, nop > else report error } else report error } else report error COMP 412, Fall 2017 14

Scanning Lab 1 ILOC We can represent this recognizer as an automaton c next character If c = n then { c next character if c = o then { c next character if c = p then return <NOP, nop > else report error } else report error } else report error s 0 s e n Any character s 1 O(1) per input character, if branches are O(1) TransiLon Diagram for nop o s 2 p s 3 NOP (syntac1c category ) ExecuOon begins in the start state, s 0 On an input of n, it follows the edge labeled n, and so on, Implicit transioon to an error state, s e, on any unexpected character Halts when it is out of input States drawn with double lines indicate success (a final state) COMP 412, Fall 2017 See 2.2 in EaC2e 15

Scanning Lab 1 ILOC SLll O(1) per input character Recognizing the rest of the finite categories is (relalvely) easy Start State, s = s 1 s 0 t u l r m s o r e 2 s 3 s 4 s 5 s b 6 s 7 ARITHOP MEMOP Any character MEMOP o a d I s 8 s 9 s 10 s 11 s 12 LOADI s s h i f t s 14 s 15 s 16 s 17 s 18 s 13 u l t s 19 s 20 s 21 s e ERROR TransiOons to s e are implicit from every state ARITHOP s 36 COMMA > s 34 o a n s 22 s 25 d o s 23 s 26 d p s 24 s 27 ARITHOP NOP s 35 s 28 u t p u t s 29 s 30 s 31 s 32 s 33 OUTPUT INTO 2.5 in EaC2e discusses how to convert a transioon diagram into COMP 412, Fall 2017 code. The reference implementaoon uses a table-driven scanner. 16

Scanning Lab 1 ILOC Recognizing the unbounded categories is conceptually harder Consider unsigned integers a CONSTANT in Lab 1 ILOC c next character n 0 if c = 0 then return <n,constant> else if ( 1 c 9 ) then { n atoi(c) c next character while ( 0 c 9 ) { t atoi(c) n n * 10 + t } return <n,constant> } else report error 0 s 0 s 1 Any character s e ERROR s 2 TransiOons to s e are implicit from every state CONSTANT 1 9 0 9 CONSTANT 0 9 means any leier from 0 to 9 Recognizes a valid unsigned integer of arbitrary magnitude Cycle in transioon diagram admits an unbounded set of numbers Unsigned integers are an infinite, but recursively enumerable set. COMP 412, Fall 2017 17

Scanning Lab 1 ILOC REG follows from CONSTANT s 0 Any character s e r ERROR s 1 TransiOons to s e are implicit from every state This parocular recognizer allows r01, r02, and so on. The recognizer for integer constants on the previous slide did not allow leading zeroes. 0 1 9 s 2 s 3 REG REG 0 9 Registers are simple r followed by an unsigned integer We concatenate a recognizer for r onto the front of the one for unsigned integer ResulOng recognizer is slll O(1) cost per character, with very low constant overhead (if done well) We can add the recognizers for REG and CONSTANT to the recognizer on slide 12 by combining the start states & merging the edges for r. One recognizer gets all the words in Lab 1, at O(1) cost per leier. COMP 412, Fall 2017 18

Scanning Lab 1 ILOC These DFAs do not allow leading zeroes 0 s 0 s 1 CONSTANT DFAs that allow leading zeroes are simpler and smaller 1 9 0 9 0 9 s 2 CONSTANT 0 9 s 0 s 2 CONSTANT 0 9 s 0 r s 1 0 1 9 s 2 REG 0 9 s 0 r 0 9 s 1 s 3 REG Any character s 3 REG Any character s e ERROR s e ERROR TransiOons to s e are implicit from every state TransiOons to s e are implicit from every state COMP 412, Fall 2017 Either way, the cost is O(1) per character 19

Scanning Lab 1 ILOC Need to merge these DFAs with the big DFA for the rest of ILOC 0 9 s 0 s 34 s 0 Any character s e ERROR TransiOons to s e are implicit from every state 0 9 CONSTANT r 0 9 s 13 s 36 0 9 REG Note the state numbers States Oe into the big DFA States & transioons in gray are in the earlier DFA Register hangs off the r for rshiv Constant hangs off the start state These two DFAs will extend the big DFA to handle register names and constants. Slide corrected post-lecture. COMP 412, Fall 2017 20

Scanning Lab 1 ILOC Comments are easy to handle s 0 / / s 1 s 2 Not EOL EOL s 3 COMMENT A comment begins with // and includes any character up to the end of line sequence Windows uses \r\n Linux & MacOS use \n Your code will be graded on a linux system, so you need to get the EOL thing right Comment handler should invoke the scanner, recursively, from s 3 Differences in EOL trip up even the most experienced developers. For example, do not try to write a shell script on a Windows machine & copy it to clear. The error messages from the shell will be less than helpful. COMP 412, Fall 2017 21

Scanning Lab 1 ILOC This seems precy simple. How do real compilers scan their input? The approach used in the previous slides is built on the theory of regular languages and their relaoonship to determinisoc finite automata (DFA). Deep theoreocal foundaoons General techniques Automata based scanners have been used in industry and research since the 1960s Same techniques form the basis for regular expression search in editors, wildcard matching in shells, and regex libraries in programming languages Pervasive technology Next week, we will see how to construct DFAs automaocally from specificaoons wriien as regular expressions. Tools such as lex, flex, and the regular expression libraries in common programming languages are based on these techniques. COMP 412, Fall 2017 22 For Lab 1, merge the transioon diagrams on 17, 21, & 22, then implement with one of the schemes in 2.5.

Parsing Lab 1 ILOC ILOC has a parlcularly simple format (as do most assembly codes) The opcode determines the syntax of the rest of the line Dictated by the simple mechanism used to decode operaoons in hardware Once we see an ARITHOP, we know that the rest of the operaoon has form REG COMMA REG INTO REG We can build a simple parser as a switch statement based on opcode switch (category) { case ARITHOP: if (ParseREG() == FALSE) then report error if (ParseCOMMA() == FALSE) then report error... if (ParseREG() == FALSE) then report error } To be useful, the parser must build a representaoon of the parsed instrucoons, presumably in the missing else clauses. COMP 412, Fall 2017 23

Parsing Lab 1 ILOC ILOC has a parlcularly simple format (as do most assembly codes) In C, we might do something even simpler switch (category) { case ARITHOP: if ((ParseREG() && ParseCOMMA() && ParseREG() && ParseINTO() && ParseREG()) == FALSE) then report error else { /* build the structure to represent this op */ } break; case MEMOP: } More concise, although we sacrifice some error reporong ability Cannot tell user where within the line the error occurred OVen, isolaong the error to a specific line might be enough Avoids the long, drawn out series of if-then-else constructs COMP 412, Fall 2017 24

Parsing Lab 1 ILOC Is this how real compilers parse their inputs? No. Assembly languages have a parocularly simple structure. Real programming languages require more complicated mechanisms For example, you cannot match opening and closing brackets this way The simple syntax of Scheme would defeat this kind of ad hoc technique But, the simple structure of assembly makes it easy to parse We will spend three weeks on rigorous parsing techniques Both top-down and boiom-up parsers (recursive descent, LL(1), LR(1)) You will become well versed in both parsing and techniques to automate the construcoon of parsers However, we cannot wait that long to start Lab 1, so get to work with these ad hoc techniques COMP 412, Fall 2017 25

Building an Intermediate RepresentaOon Scanning & Parsing tell us if the input is well formed. To work on the code, your lab needs a concrete representalon for it. The design of an IR has major consequences for the rest of the compiler Compiler cannot easily manipulate what it does not represent Different data structures have different cost structures Common operaoons should be cheap & easy Two-dimensional Array loadi 4 r 1 Original Code load r 1 r 2 loadi 4 => r 1 might be represented add r 1 r 2 r 3 load r 1 => r 2 as either add r 1, r 2 => r 3 Linked List For Lab 1 (& Lab 3), the IR can be almost be as simple as the pictures (wait for next class ) loadi 4 r 1 load r 1 r 2 Before COMP 412, designing Fall 2017 an IR, try to understand the whole problem. add r 1 r 2 r 3 26

Next Class Once the input is scanned, parsed, and encoded into some appropriate IR, what should your lab do with it? The basic terminology of register allocaoon Sheldon Best s algorithm (a.k.a. botom-up local ) Maybe a hint or two To prepare, read 13.1, 13.2 (skip 13.2.3) and 13.3.2 in EaC2e COMP 412, Fall 2017 27