Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

Similar documents
Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

CS 132 Compiler Construction

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Compiler Construction

Part 5 Program Analysis Principles and Techniques

Chapter 4: LR Parsing

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 406/534 Compiler Construction Parsing Part I

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compilers and Interpreters

Alternatives for semantic processing

Syntactic Analysis. Top-Down Parsing

Parsing II Top-down parsing. Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSCI312 Principles of Programming Languages

Types of parsing. CMSC 430 Lecture 4, Page 1

Introduction to Parsing. Comp 412

Computer Science 160 Translation of Programming Languages

Syntax Analysis, III Comp 412

CMSC 330: Organization of Programming Languages

Compiling Techniques

Front End. Hwansoo Han

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis Part I

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis, III Comp 412

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

CS 314 Principles of Programming Languages

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE 3302 Programming Languages Lecture 2: Syntax

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CSE302: Compiler Design

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Lexical Analysis. Introduction

Top down vs. bottom up parsing

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Building Compilers with Phoenix

CMSC 330: Organization of Programming Languages

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Lecture 8: Deterministic Bottom-Up Parsing

Grammars and Parsing. Paul Klint. Grammars and Parsing

A simple syntax-directed

A programming language requires two major definitions A simple one pass compiler

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Compiler Construction 1. Introduction. Oscar Nierstrasz

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lecture 7: Deterministic Bottom-Up Parsing

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Optimizing Finite Automata

Parsing Part II (Top-down parsing, left-recursion removal)

Week 2: Syntax Specification, Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Compilers. Bottom-up Parsing. (original slides by Sam

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Wednesday, August 31, Parsers

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Semantic Analysis. Lecture 9. February 7, 2018

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Introduction to Parsing

2. Lexical Analysis! Prof. O. Nierstrasz!

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS5363 Final Review. cs5363 1

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CA Compiler Construction

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Compiler

VIVA QUESTIONS WITH ANSWERS

Downloaded from Page 1. LR Parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

COP 3402 Systems Software Syntax Analysis (Parser)

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Compiler Design (40-414)

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Introduction to Parsing. Lecture 8

Bottom-Up Parsing. Lecture 11-12

CS 314 Principles of Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

EECS 6083 Intro to Parsing Context Free Grammars

Compiler Construction 1. Introduction. Oscar Nierstrasz

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

How do LL(1) Parsers Build Syntax Trees?

LANGUAGE PROCESSORS. Introduction to Language processor:

Transcription:

Academic Formalities CS6013 - Modern Compilers: Theory and Practise Introduction V. Krishna Nandivada IIT Madras Written assignment = 5 marks. Programming assignments = 40 marks. Midterm = 25 marks, Final = 30 marks. Extra marks During the lecture time - individuals can get additional 5 marks. How? - Ask a good question, answer a chosen question, make a good point! Take 0.5 marks each. Max one mark per day per person. Attendance requirement as per institute norms. Non compliance will lead to W grade. Proxy attendance - is not a help; actually a disservice. Plagiarism - A good word to know. A bad act to own. Will be automatically referred to the institute welfare and disciplinary committee. Contact (Anytime) : Instructor: Krishna, Email: nvk@cse.iitm.ac.in, Office: BSB 352. TA : Jyothi Vedurada, Email: vjyothi@cse V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 2 / 1 What, When and Why of Compilers Images of the day What: A compiler is a program that can read a program in one language and translates it into an equivalent program in another language. When 1952, by Grace Hopper for A-0. 1957, Fortran compiler by John Backus and team. Why? Study? It is good to know how the food you eat, is cooked. A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. For a computer to execute programs written in these languages, these programs need to be translated to a form in which it can be executed by the computer. Figure: Grace Hopper and John Backus V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 3 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 4 / 1

Compilers A Sangam Course outline Compiler construction is a microcosm of computer science Artificial Intelligence greedy algorithms, learning algorithms,... Algo graph algorithms, union-find, dynamic programming,... theory DFAs for scanning, parser generators, lattice theory,... systems allocation, locality, layout, synchronization,... architecture pipeline management, hierarchy management, instruction set use,... optimizations Operational research, load balancing, scheduling,... Inside a compiler, all these and many more come together. Has probably the healthiest mix of theory and practise. A rough outline (we may not strictly stick to this). Overview of Compilers Overview of lexical analysis and parsing. Semantic analysis (aka type checking) Intermediate generation Data flow analysis Constant propagation Static Single Assignment and Optimizations. Loop optimizations Liveness analysis Register Allocation Bitwidth aware register allocation Code Generation Overview of advanced topics. V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 5 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 6 / 1 Your friends: Languages and Tools Start exploring Java - familiarity a must - Use eclipse to save you valuable coding and debugging cycles. JavaCC, JTB tools you will learn to use. Make Ant Scripts recommended toolkit. Find the course webpage: http://www.cse.iitm.ac.in/ krishna/cs6013/ Get set. Ready steady go! V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 7 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 8 / 1

Acknowledgement Compilers A closed area? Optimization for scalar machines was solved years ago These frames borrow liberal portions of text verbatim from Antony L. Hosking @ Purdue and Jens Palsberg @ UCLA. Machines have changed drastically in the last 20 years Copyright c 2017 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from hosking@cs.purdue.edu. Changes in architecture changes in compilers new features pose new problems changing costs lead to different concerns old solutions need re-engineering Changes in compilers should prompt changes in architecture New languages and features V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 9 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 10 / 1 Expectations What qualities are important in a compiler? 1 Correct 2 Output runs fast 3 Compiler runs fast 4 Compile time proportional to program size 5 Support for separate compilation 6 Good diagnostics for syntax errors 7 Works well with the debugger 8 Good diagnostics for flow anomalies 9 Cross language calls 10 Consistent, predictable optimization Each of these shapes your expectations about this course Abstract view Implications: source compiler errors machine recognize legal (and illegal) programs generate correct manage storage of all variables and agreement on format for object (or assembly) Big step up from assembler higher level notations V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 11 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 12 / 1

Traditional two pass compiler source front end IR errors back end machine Implications: intermediate representation (IR). front end maps legal into IR back end maps IR onto target machine simplify retargeting allows multiple front ends multiple passes better A rough statement: Most of the problems in the Front-end are simpler (polynomial time solution exists). Most of the problems in the Back-end are harder (many problems are NP-complete in nature). Our focus: Mainly back end (95%) and little bit of front end (5%). V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 13 / 1 Phases inside the compiler Front end responsibilities: Recognize syntactically legal ; report errors. Recognize semantically legal ; report errors. Produce IR. Back end responsibilities: Optimizations, generation. Our target five out of seven phases. glance over lexical and syntax analysis read yourself or attend the undergraduate course, if interested. V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 14 / 1 Lexical analysis More complex syntax Also known as scanning. Reads a stream of characters and groups them into meaningful sequences, called lexems. A scanner must recognize the units of syntax Q: How to specify patterns for the scanner? Examples: white space <ws> ::= <ws> <ws> \t \t keywords and operators specified as literal patterns: do, end identifiers alphabet followed by k alphanumerics (, $, &,... ) numbers integers: 0 or digit from 1-9 followed by digits from 0-9 decimals: integer. digits from 0-9 reals: (integer or decimal) E (+ or -) digits from 0-9 complex: ( real, real ) We need a powerful notation to specify these patterns - regular expressions. There are mature tools (e.g., flex) that generate lexical token generators (or scanners) from a given specification of tokens (a.k.a. sequence of regular expressions). V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 15 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 16 / 1

The role of the parser A parser source tokens scanner parser IR errors performs context-free syntax analysis guides context-sensitive analysis constructs an intermediate representation produces meaningful error messages attempts error correction V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 17 / 1 Syntax analysis by using a CFG Context-free syntax is specified with a context-free grammar. Formally, a CFG G is a 4-tuple (V t,v n,s,p), where: V t is the set of terminal symbols in the grammar. For our purposes, V t is the set of tokens returned by the scanner. V n, the nonterminals, is a set of syntactic variables that denote sets of (sub)strings occurring in the language. These are used to impose a structure on the grammar. S is a distinguished nonterminal (S V n ) denoting the entire set of strings in L(G). This is sometimes called a goal symbol. P is a finite set of productions specifying how terminals and non-terminals can be combined to form strings in the language. Each production must have a single non-terminal on its left hand side. The set V = V t V n is called the vocabulary of G V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 18 / 1 Notation and terminology Derivations a,b,c,... V t A,B,C,... V n U,V,W,... V α,β,γ,... V u,v,w,... V t If A γ then αaβ αγβ is a single-step derivation using A γ Similarly, and + denote derivations of 0 and 1 steps If S β then β is said to be a sentential form of G L(G) = {w V t S + w}, w L(G) is called a sentence of G We can view the productions of a CFG as rewriting rules. Using an example CFG: 1 goal ::= expr 2 expr ::= expr + term 3 expr term 4 term 5 term ::= term factor 6 term / factor 7 factor 8 factor ::= num 9 id Note, L(G) = {β V S β} V t V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 19 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 20 / 1

Deriving the derivation Different ways of parsing: Top-down Vs Bottom-up Now, for the string x + 2 y: goal expr expr + term expr + term factor expr + term id,y expr + factor id,y expr + num,2 id,y term + num,2 id,y factor + num,2 id,y id,x + num,2 id,y We have derived the sentence x + 2 y. We denote this goal id + num id. Such a sequence of rewrites is a derivation or a parse. The process of discovering a derivation is called parsing. Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive) Bottom-up parsers start at the leaves and fill in start in a state valid for legal first tokens as input is consumed, change state to en possibilities (recognize valid prefixes) use a stack to store both state and sentential forms V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 21 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 22 / 1 Top-down parsing How much lookahead is needed? A top-down parser starts with the root of the parse tree, labelled with the start or goal symbol of the grammar. To build a parse, it repeats the following steps until the fringe of the parse tree matches the input string 1 At a node labelled A, select a production A α and construct the appropriate child for each symbol of α 2 When a terminal is added to the fringe that doesn t match the input string, backtrack 3 Find next node to be expanded (must have a label in V n ) The key is selecting the right production in step 1. If the parser makes a wrong step, the derivation process does not terminate. Why is it bad? We saw that top-down parsers may need to backtrack when they select the wrong production Do we need arbitrary lookahead to parse CFGs? in general, yes use the Earley or Cocke-Younger, Kasami algorithms Fortunately large subclasses of CFGs can be parsed with limited lookahead most programming language constructs can be expressed in a grammar that falls in these subclasses Among the interesting subclasses are: LL(1): left to right scan, left-most derivation, 1-token lookahead; and LR(1): left to right scan, reversed right-most derivation, 1-token lookahead V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 23 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 24 / 1

Bottom-up parsing Goal: Given an input string w and a grammar G, construct a parse tree by starting at the leaves and working to the root. Reductions Vs Derivations Reduction: At each reduction step, a specific substring matching the body of a production is replaced by the non-terminal at the head of the production. Key decisions When to reduce? What production rule to apply? Reduction Vs Derivations Recall: In derivation: a non-terminal in a sentential form is replaced by the body of one of its productions. A reduction is reverse of a step in derivation. Bottum-up parsing is the process of reducing a string w to the start symbol. Goal of bottum-up parsing: build derivation tree in reverse. V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 25 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 26 / 1 Parsing review Closing remarks - parsing Recursive descent A hand d recursive descent parser directly ens a grammar (typically an LL(1) grammar) into a series of mutually recursive procedures. It has most of the linguistic limitations of LL(1). LL(k) An LL(k) parser must be able to recognize the use of a production after seeing only the first k symbols of its right hand side. LR(k) An LR(k) parser must be able to recognize the occurrence of the right hand side of a production after having seen all that is derived from that right hand side with k symbols of lookahead. There are mature tools (e.g., bison) that generate parsers from a given specification of syntax (a.k.a. grammar). Overview of Parsing. Error checking. LR parsing. Reading: Ch 1, 3, 4 from the Dragon book. Announcement: Assignment 1 is out. Due in around 10 days. Next class:? V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 27 / 1 V.Krishna Nandivada (IIT Madras) CS6013 - Jan 2017 28 / 1