Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Similar documents
Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

CS 132 Compiler Construction

Part 5 Program Analysis Principles and Techniques

Compiler Construction

CS 314 Principles of Programming Languages

2. Lexical Analysis! Prof. O. Nierstrasz!

Lexical Analysis. Introduction

CPSC 434 Lecture 3, Page 1

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 314 Principles of Programming Languages. Lecture 3

Compilers and Interpreters

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Alternatives for semantic processing

Formal Languages and Compilers Lecture VI: Lexical Analysis

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CSE302: Compiler Design

COMPILER DESIGN LECTURE NOTES

Compiler course. Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis. Lecture 2-4

UNIT -2 LEXICAL ANALYSIS

Lexical Analysis. Chapter 2

CSc 453 Lexical Analysis (Scanning)

CSE 3302 Programming Languages Lecture 2: Syntax

Lexical Analysis (ASU Ch 3, Fig 3.1)

Implementation of Lexical Analysis

Introduction to Lexical Analysis

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Introduction to optimizations. CS Compiler Design. Phases inside the compiler. Optimization. Introduction to Optimizations. V.

Lexical Analysis. Lecture 3-4

Compiler Design (40-414)

Lexical Analysis. Lecture 3. January 10, 2018

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Implementation of Lexical Analysis

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Introduction to Lexical Analysis

Zhizheng Zhang. Southeast University

Figure 2.1: Role of Lexical Analyzer

Compiling Regular Expressions COMP360

Languages and Compilers

CS415 Compilers. Lexical Analysis

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Structure of Programming Languages Lecture 3

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Compiler Construction 1. Introduction. Oscar Nierstrasz

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Lexical Analysis 1 / 52

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Academic Formalities. CS Principles of Programming Languages. Mutual expectations. What, When and Why of POPL

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Compiler Construction

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Compiler Construction

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Scanning COMP360

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Front End: Lexical Analysis. The Structure of a Compiler

Compilers. Prerequisites

Group A Assignment 3(2)

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

A Simple Syntax-Directed Translator

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Compiler Construction LECTURE # 3

Overview of the Class

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Chapter 3: Lexical Analysis

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CMSC 330: Organization of Programming Languages

CS 415 Midterm Exam Spring SOLUTION

1. Lexical Analysis Phase

Front End. Hwansoo Han

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Week 2: Syntax Specification, Grammars

CSE 582 Autumn 2002 Exam 11/26/02

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 352: Compilers: Principles and Practice

Transcription:

Academic Formalities CS3300 - Language Translators Introduction V. Krishna Nandivada IIT Madras Written assignments = 20 marks. Midterm = 30 marks, Final = 50 marks. Extra marks During the lecture time - individuals can get additional 5 marks. How? - Ask a good question, answer a chosen question, make a good point! Take 0.5 marks each. Max one mark per day per person. Attance requirement as per institute norms. Non compliance will lead to W grade. Proxy attance - is not a help; actually a disservice. Plagiarism - A good word to know. A bad act to own. Fail grade guaranteed. Contact (Anytime) : Instructor: Krishna, Email: nvk@cse.iitm.ac.in, Office: BSB 352. TAs: Praveen:cs12d013@smail and Pavan cs08b040@smail. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 2 / 30 What, When and Why of Compilers Compilers A Sangam What: A compiler is a program that can read a program in one language and translates it into an equivalent program in another language. When 1952, by Grace Hopper for A-0. 1957, Fortran compiler by John Backus and team. Why? Study? It is good to know how the food you eat, is cooked. A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. For a computer to execute programs written in these languages, these programs need to be translated to a form in which it can be executed by the computer. Compiler construction is a microcosm of computer science Artificial Intelligence greedy algorithms, learning algorithms,... Algo graph algorithms, union-find, dynamic programming,... theory DFAs for scanning, parser generators, lattice theory,... systems allocation, locality, layout, synchronization,... architecture pipeline management, hierarchy management, instruction set use,... optimizations Operational research, load balancing, scheduling,... Inside a compiler, all these and many more come together. Has probably the healthiest mix of theory and practise. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 3 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 4 / 30

Mutual expectations Course outline For the class to be a mutually learning experience: What will be required from the students? An open mind to learn. Curiosity to know the basics. Explore their own thought process. Help each other to learn and appreciate the concepts. Honesty and hard work. Leave the fear of marks/grades. What are the students expectations?.... A rough outline (we may not strictly stick to this). Overview of Compilers Regular Expressions and Context Free Grammars (glance) Lexical Analysis and Parsing Type checking Intermediate Code Generation Register Allocation Code Generation Overview of advanced topics. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 5 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 6 / 30 Your fris: Languages and Tools Start exploring C and Java - familiarity a must - Use eclipse to save you valuable coding and debugging cycles. Flex, Bison, JavaCC, JTB tools you will learn to use. Make Ant Scripts recommed toolkit. Find the course webpage: http://www.cse.iitm.ac.in/ krishna/cs3300/ Find the lab webpage: http://www.cse.iitm.ac.in/ krishna/cs3300/cs3310.html Get set. Ready steady go! V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 7 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 8 / 30

Acknowledgement A common confusion: Compilers and Interpreters These slides borrow liberal portions of text verbatim from Antony L. Hosking @ Purdue and Jens Palsberg @ UCLA. Copyright c 2012 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from hosking@cs.purdue.edu. What is a compiler? a program that translates an executable program in one language into an executable program in another language we expect the program produced by the compiler to be better, in some way, than the original. What is an interpreter? a program that reads an executable program and produces the results of running that program usually, this involves executing the source program in some fashion This course deals mainly with compilers Many of the same issues arise in interpreter A common (mis?) statement XYZ is an interpreted (or compiled) languaged. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 9 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 10 / 30 Compilers A closed area? Expectations Optimization for scalar machines was solved years ago Machines have changed drastically in the last 20 years Changes in architecture changes in compilers new features pose new problems changing costs lead to different concerns old solutions need re-engineering Changes in compilers should prompt changes in architecture New languages and features What qualities are important in a compiler? 1 Correct 2 Output runs fast 3 Compiler runs fast 4 Compile time proportional to program size 5 Support for separate compilation 6 Good diagnostics for syntax errors 7 Works well with the debugger 8 Good diagnostics for flow anomalies 9 Cross language calls 10 Consistent, predictable optimization Each of these shapes your expectations about this course V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 11 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 12 / 30

Abstract view Traditional two pass compiler source IR back machine source compiler machine errors Implications: errors recognize legal (and illegal) programs generate correct manage storage of all variables and agreement on format for object (or assembly) Big step up from assembler higher level notations V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 13 / 30 Implications: intermediate representation (IR). Why do we need it? maps legal into IR back maps IR onto target machine simplify retargeting allows multiple s multiple passes better A rough statement: Most of the problems in the Front- are simpler (polynomial time solution exists). Most of the problems in the Back- are harder (many problems are NP-complete in nature). Our focus: Mainly and little bit of back. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 14 / 30 Recap Yesterday: Introduction & Overview of the compilation process A Clarification: FORTRAN C++ CLU Smalltalk back back back Can we build n m compilers with n + m components? must en all the knowledge in each must represent all the features in one IR must handle all the features in each back Limited success with low-level IRs target1 target2 target3 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 15 / 30 Phases inside the compiler Front responsibilities: Recognize syntactically legal ; report errors. Recognize semantically legal ; report errors. Produce IR. Back responsibilities: Optimizations, generation. Our target five out of seven phases. glance over optimizations att the graduate course, if interested. V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 16 / 30

Lexical analysis Specifying patterns Also known as scanning. Reads a stream of characters and groups them into meaningful sequences, called lexems. Eliminates white space For each lexeme, the scanner produces an output of the form: token-type, attribute-values Example token-types: identifier, number, string, operator and... Example attribute-types: token index, token-value, line and column number and... Example scanning: position = initia + rate 60 For a typical language like C/Java the following lexemes and their values can be identified: lexeme token lexeme token + op, + position id, position = op, = initial id, initial rate id, rate op, 60 num, 60 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 17 / 30 Q: How to specify patterns for the scanner? Examples: white space <ws> ::= <ws> <ws> \t \t keywords and operators specified as literal patterns: do, V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 18 / 30 Specifying patterns Regular Expressions A scanner must recognize the units of syntax identifiers alphabetic followed by k alphanumerics (, $, &,... ) numbers integers: 0 or digit from 1-9 followed by digits from 0-9 decimals: integer. digits from 0-9 reals: (integer or decimal) E (+ or -) digits from 0-9 complex: ( real, real ) We need a powerful notation to specify these patterns Patterns are often specified as regular languages Notations used to describe a regular language (or a regular set) include both regular expressions and regular grammars Regular expressions (over an alphabet Σ): 1 ε is a RE denoting the set {ε} 2 if a Σ, then a is a RE denoting {a} 3 if r and s are REs, denoting L(r) and L(s), then: (r) is a RE denoting L(r) (r) (s) is a RE denoting L(r) L(s) (r)(s) is a RE denoting L(r)L(s) (r) is a RE denoting L(r) V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 19 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 20 / 30

Examples of Regular Expressions Generic examples of REs identifier letter (a b c... z A B C... Z) digit (0 1 2 3 4 5 6 7 8 9) id letter ( letter digit ) numbers integer (+ ε) (0 (1 2 3... 9) digit ) decimal integer. ( digit ) real ( integer decimal ) E (+ ) digit complex ( real, real ) Let Σ = {a,b} a b denotes {a,b} (a b)(a b) denotes {aa, ab, ba, bb} i.e., (a b)(a b) = aa ab ba bb a denotes {ε,a,aa,aaa,...} (a b) denotes the set of all strings of a s and b s (including ε) i.e., (a b) = (a b ) a a b denotes {a,b,ab,aab,aaab,aaaab,...} Most tokens can be described with REs We can use REs to build scanners automatically V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 21 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 22 / 30 Recognizers From a regular expression we can construct a deterministic finite automaton (DFA) Recognizer for identifier: letter other 0 1 2 3 error digit other letter digit accept Code for the recognizer Given an automata, can we write a recognizer for a token? ch=nextchar(); state=0; // initial state done=false; tokenval=""// empty while (not done) { class=charclass[ch]; state= nextstate[class,state]; switch(state) { case 1: tokenval=tokenval+ch; char=nextchar(); break; case 2: // accept state tokentype=id; done = true; break; case 3: // error tokentype=error; done=true; break; } // switch } // while return tokentype; V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 23 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 24 / 30

Tables for the recognizer Two tables control the recognizer a z A Z 0 9 other charclass: value letter letter digit other class 0 1 2 3 nextstate: letter 1 1 digit 3 1 other 3 2 To change languages, we can just change tables V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 25 / 30 So what is hard? Language features that can cause problems: reserved words PL/I had no reserved words if then then then = else; else else = then; significant blanks FORTRAN and Algol68 ignore blanks do 10 i = 1,25 do 10 i = 1.25 string constants special characters in strings newline, tab, quote, comment delimiter finite closures some languages limit identifier lengths adds states to count length FORTRAN 66 6 characters V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 26 / 30 Error recovery Automatic construction It is hard to tell (without the aid of other components), if there is a source error. For example: fi (a = f(x)) If fi a misspelling for if, of a function identifier? Since fi is a valid lexeme for the token id, the lexer must return the token id, fi. A later phase (parser or semantic analyzer) may be able to catch the error. Recovery (if the lexer is unable to proceed, that is): Panic and stop! Delete one character! Many other one character related fixes (examples?) Scanner generators automatically construct from RE-like descriptions construct a DFA use state minimization techniques emit for the scanner (table driven or direct ) A key issue in automation is an interface to the parser lex/flex is a scanner generator Takes a specification of all the patterns as a RE. emits C for scanner provides macro definitions for each token (used in the parser) V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 27 / 30 V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 28 / 30

Limits of regular languages Not all languages are regular One cannot construct DFAs to recognize these languages: L = {p k q k } L = {wcw r w Σ } Note: neither of these is a regular expression! (DFAs cannot count!) But, this is a little subtle. One can construct DFAs for: alternating 0 s and 1 s (ε 1)(01) (ε 0) sets of pairs of 0 s and 1 s (01 10)+ V.Krishna Nandivada (IIT Madras) CS3300 - Aug 2012 29 / 30