CS131: Programming Languages and Compilers. Spring 2017

Similar documents
Compiler Design Overview. Compiler Design 1

PLAGIARISM. Administrivia. Compilers. CS143 11:00-12:15TT B03 Gates. Text. Staff. Instructor. TAs. Office hours, contact info on 143 web site

Administrivia. Compilers. CS143 10:30-11:50TT Gates B01. Text. Staff. Instructor. TAs. The Purple Dragon Book. Alex Aiken. Aho, Lam, Sethi & Ullman

Formal Languages and Compilers Lecture I: Introduction to Compilers

CST-402(T): Language Processors

Compiler Design (40-414)

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

ICOM 4036 Spring 2004

Introduction to Compiler Construction

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Introduction to Compiler Construction

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Introduction to Programming Languages and Compilers. CS164 11:00-12:30 TT 10 Evans. UPRM ICOM 4029 (Adapted from: Prof. Necula UCB CS 164)

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

Introduction to Compiler Design

CS426 Compiler Construction Fall 2006

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

Compiler Design 1. Introduction to Programming Language Design and to Compilation

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Compilers. Lecture 2 Overview. (original slides by Sam

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Theory and Compiling COMP360

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

COMPILER DESIGN. For COMPUTER SCIENCE

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax. In Text: Chapter 3

Programming Language Design and Implementation. Cunning Plan. Your Host For The Semester. Wes Weimer TR 9:30-10:45 MEC 214. Who Are We?

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1)

CS 314 Principles of Programming Languages

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY

PLAGIARISM. Administrivia. Course home page: Introduction to Programming Languages and Compilers

Introduction to Programming Languages and Compilers. CS164 11:00-12:00 MWF 306 Soda

Compilers for Modern Architectures Course Syllabus, Spring 2015

CSE 3302 Programming Languages Lecture 2: Syntax

G.PULLAIH COLLEGE OF ENGINEERING & TECHNOLOGY

Lexical Scanning COMP360

CS 4201 Compilers 2014/2015 Handout: Lab 1

2068 (I) Attempt all questions.

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Chapter 3. Describing Syntax and Semantics ISBN

Pioneering Compiler Design

Introduction. Compilers and Interpreters

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

KALASALINGAM UNIVERSITY ANAND NAGAR, KRISHNAN KOIL DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ODD SEMESTER COURSE PLAN

CS 314 Principles of Programming Languages. Lecture 3

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Evaluation Scheme L T P Total Credit Theory Mid Sem Exam

Principles of Compiler Design Presented by, R.Venkadeshan,M.Tech-IT, Lecturer /CSE Dept, Chettinad College of Engineering &Technology

CSCI 565 Compiler Design and Implementation Spring 2014

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Working of the Compilers

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Programming Language Syntax and Analysis

Part 5 Program Analysis Principles and Techniques

Principles of Programming Languages [PLP-2015] Detailed Syllabus

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

A programming language requires two major definitions A simple one pass compiler

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

LANGUAGE TRANSLATORS


EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Syntax. A. Bellaachia Page: 1

CS606- compiler instruction Solved MCQS From Midterm Papers

Question Bank. 10CS63:Compiler Design

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

Translator Design CRN Course Administration CMSC 4173 Spring 2017

VIVA QUESTIONS WITH ANSWERS

CA Compiler Construction

Compiler Design. Dr. Chengwei Lei CEECS California State University, Bakersfield

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

GUJARAT TECHNOLOGICAL UNIVERSITY

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

EECS 6083 Intro to Parsing Context Free Grammars

CSE302: Compiler Design

Introduction to Compilers

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Component Compilers. Abstract

COMPILER DESIGN LECTURE NOTES

Chapter 3. Describing Syntax and Semantics

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Name of chapter & details

TABLE OF CONTENTS S.No DATE TOPIC PAGE No UNIT I LEXICAL ANALYSIS 1 Introduction to Compiling-Compilers 6 2 Analysis of the source program 7 3 The

Describing Syntax and Semantics

Transcription:

CS131: Programming Languages and Compilers Spring 2017

Course Information Instructor: Fu Song Office: Room 1A-504C, SIST Building Email: songfu@shanghaitech.edu.cn Class Hours : Tuesday and Thursday, 8:15--9:55 Room 1D-106, SIST Building TA: Jun Zhang, Zhiming Mei Jun Zhang Office Hours: Friday 13:30-15:30 Tuesday 15:00-17:00 Discussion: PIAZZA https://piazza.com/shanghaitech.edu.cn/spring2017/cs131 Zhiming Mei Course Information: http://sist.shanghaitech.edu.cn/faculty/songfu/course/spring2017/cs131/ 2

Preliminaries Preliminaries Required data structures and algorithms programming languages (C/C++) Textbook Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman Compilers: Principles, Techniques, and Tools Second Edition Pearson Addison-Wesley, 2007, ISBN 0-321-48681-1 Reference book and websites 1. Kenneth C. Louden, Compiler Construction Principles and Practice,2002 2. https://courses.engr.illinois.edu/cs426/ 3. http://web.stanford.edu/class/cs143/ 3

Theoretical Part Course Structure = Goals Basic knowledge of theoretical techniques Practical experience Design and program an actual compiler using tools, such as LEX/Flex, YACC/Bison 4

Grading Bonus : 10% Homework : 20% Projects : 30% Midterm : 20% Final : 30% Optional projects (max. 2 students per project) solution code + report + presentation 5

You must NOT Integrity use work from uncited sources read or posses solutions written by others allow other people to read or possess your solution Ask before action when in doubt 6

Course Outline Introduction to Compiling Lexical Analysis Regular expressions, regular definitions NFA, DFA Subset construction, Thompson s construction Syntax Analysis Context Free Grammars Top-Down Parsing, LL Parsing Bottom-Up Parsing, LR Parsing Syntax-Directed Translation Attribute Definitions Evaluation of Attribute Definitions Semantic Analysis, Type Checking Intermediate Representations Run-Time Environment Operational Semantics Intermediate Code Generation Optimization Assembler Code Generation 7

A brief history of compiler In the late1940s, the stored-program computer invented by John von Neumann,programs were written in machine language, {0,1} + 8

A brief history of compiler (cont.) Mnemonic assembly language: in 1949 numeric codes were replaced symbolic forms Mov R, 2 (R=2) Assembler: translate the symbolic codes and memory location of assembly language into the corresponding numeric codes. Macro instructions + assembly language 9

A brief history of compiler (cont.) FORTRAN language and its compiler: between 1954 and 1957, developed by the team at IBM, John Backus. (1) The structure of natural language studied by Noam Chomsky, (2) The classification of languages according to the complexity of their grammars and the power of the algorithms needed to recognize them. (3) Four levels of grammars: type 0 type 1 type2 and type3 grammars Type 0: Turing machine/recursive enumerable language Type 1: context-sensitive grammar/language Type 2: context-free grammar/language, the most useful of programming language Type 3: right-linear grammar/language, regular expressions Handbook of Formal Languages Vol 1,2,3 10

A brief history of compiler (cont.) (4) parsing problem : studied in 1960s and 1970s (5) Code improvement techniques (optimization techniques): improve compilers efficiency. (6) Compiler-compilers (parser generator): only in one part of the compiler process. YACC: written in 1975 by Steve Johnson for the UNIX system. LEX: written in 1975 by Mike Lest. (7) recent advances in compiler design: (a) application of more sophisticated algorithms for inferring and /or simplifying the information contained in a program. (with the development of more sophisticated programming languages that allow this kind of analysis), e.g. memory safe language Rust (b) development of standard windowing environments ( interactive development environment IDE) 11

Compilers A compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language. source program COMPILER target program ( Normally a program written in a high-level programming language) error messages ( Normally the equivalent program in machine code relocatable object file) 12

Other Applications In addition to the development of a compiler, the techniques used in compiler design can be applicable to many problems in computer science. Techniques used in a lexical analyzer can be used in text editors, information retrieval system, and pattern recognition programs. Techniques used in a parser can be used in a query processing system such as SQL (Structured Query Language,). Many software having a complex front-end may need techniques used in compiler design. A symbolic equation solver which takes an equation as input. That program should parse the given input equation. Most of the techniques used in compiler design can be used in Natural Language Processing (NLP) systems. 13

Compiling system Skeletal source program preprocessor Source program compiler Target assembly program assembler Relocatable machine code Loader/link-editor Library, relocatable object files Absolute machine code 14

Cousins of the compiler Preprocessors delete comments, include other files, perform macro substitutions. Compilers A language translator. It executes the source program immediately. Depending on the language in use and the situation Assemblers A translator translates assembly language into object code Linkers Collects code separately compiled or assembled in different object files into a file. Connects the code for standard library functions. Connects resources supplied by the operating system of the computer. 15

Cousins of the compiler (cont.) Loaders Relocatable : the code is not completely fixed. Loaders resolve all relocatable address relative to the starting address. Editors Produce a standard file ( structure based editors) Debuggers Determine execution errors in a compiled program. 16

Major Parts of Compilers There are two major parts of a compiler: Analysis and Synthesis Front-end and Back-end In analysis phase, an intermediate representation is created from the given source program. Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase. In synthesis phase, the equivalent target program is created from this intermediate representation. Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this phase. 17

The Phases of a Compiler Analysis Synthesis Source Program Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program Each phase transforms the source program from one representation into another representation. They communicate with the symbol table They communicate with error handlers. 18

The Phases of a Compiler (cont.) Source program Lexical analyzer token Syntax analyzer Parse tree Semantic analyzer Symbol-table manager Intermediate code generator IR Code optimizer Error handler Code generator target program 19

First step: recognize words. Smallest unit above letters Lexical Analyzer This is a sentence. Lexical analysis is not trivial. Consider: ist his ase nte nce 20

Lexical Analyzer (cont.) Lexical Analyzer: reads the source program character by character and returns the tokens of the source program. Token describes a pattern of characters having some meaning in the source program (such as identifiers, operators, keywords, numbers, delimiters and so on) Ex: x = y + 12 ; => tokens: x and y are identifiers = assignment operator + add operator 12 a number ; delimiter Ex: if x == y then z = 1; else z = 2; => tokens: x, y and z are identifiers = assignment operator = = compare operator 1 and 2 are number ; delimiter 21

Lexical Analyzer (cont.) Puts information about identifiers into the symbol table. Regular expressions are used to describe tokens (lexical constructs). A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer. Eg. Identifier: [a-za-z] [a-za-z0-9] + Good Id: x, y, x1 Bad Id: 1x 22

Syntax Analyzer Lexical Analyzer provides recognized words The next step is to understand sentence structure 23

Syntax Analyzer (cont.) A Syntax Analyzer (also known as parser) creates the syntactic structure (generally a parse tree) of the given program A parse tree describes a syntactic structure. Ex: identifier x = y + 12 ; identifier number In a parse tree, all terminals are at leaves. All inner nodes are non-terminals in a context free grammar. expr expr expr assign-stmt 24

Syntax Analyzer (cont.) A Syntax Analyzer (also known as parser) creates the syntactic structure (generally a parse tree) of the given program A parse tree describes a syntactic structure. Ex: if x == y then z = 1; else z = 2; relation assign-stmt assign-stmt BExpr stmt stmt In a parse tree, all terminals are at leaves. All inner nodes are non-terminals in a context free grammar. if-then-else-stmt 25

Syntax Analyzer (cont.) The syntax of a language is specified by a context free grammar (CFG) The rules in a CFG are mostly recursive. A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. If it satisfies, the syntax analyzer creates a parse tree for the given program. Ex: We use EBNF (Extended Backus-Naur Form) to specify a CFG assign-stmt -> identifier = expr ; expr -> identifier expr -> number expr -> expr BOP expr BOP -> + - * / Good: x = y +1; Bad: x = y+1, x = + y 1 26

Syntax Analyzer vs. Lexical Analyzer Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer? Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language. The syntax analyzer deals with recursive constructs of the language. The lexical analyzer simplifies the job of the syntax analyzer. The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program. The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures (sentences) in our programming language. 27

Parsing Techniques Depending on how the parse tree is created, there are different parsing techniques. These parsing techniques are categorized into two groups: Top-Down Parsing Bottom-Up Parsing 28

Top-Down Parsing: Parsing Techniques (cont.) Construction of the parse tree starts at the root, and proceeds towards the leaves. Efficient top-down parsers can be easily constructed by hand. Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing). (L-left to right; L-leftmost derivation) 29

Bottom-Up Parsing: Parsing Techniques (cont.) Construction of the parse tree starts at the leaves, and proceeds towards the root. Normally efficient bottom-up parsers are created with the help of some software tools. Bottom-up parsing is also known as shift-reduce parsing LR Parsing much general form of shift-reduce parsing: LR, SLR, LALR (L-left to right; R-rightmost derivation) 30

Top-down vs. Bottom-up x = y + 12 ; assign-stmt 1 1 1 1 expr expr 2 2 2 3 4 expr identifier = identifier + number ; identifier 4 = identifier + number 1 2 expr expr 3 3 3 expr 4 4 4 assign-stmt Top-down (preorder ) Bottom-up (postorder) 31

Semantic Analyzer Once sentence structure is understood, we can try to understand meaning But meaning is too hard for compilers Compilers perform limited analysis to catch inconsistencies Eg. Jack said Jerry left his assignment at home. What does his refer to? Jack or Jerry? Even worse: Jack said Jack left his assignment at home. How many Jacks are there? Which one left the assignment? 32

Semantic Analyzer Programming languages define strict rules to avoid such ambiguities Jack =1 def f(): Jack =2 print(jack) print(jack) f() Output? >> 1 >> 2 33

Semantic Analyzer (cont.) A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Type-checking is an important part of semantic analyzer. Normally semantic information cannot be represented by a context-free language used in syntax analyzers. 34

Rust programs Let x; x =1; Semantic Analyzer (cont.) Let x =0; x =1; Let mut x =0; x =1; Error? Let x = 1 println!("{}", x); Error? Type information + uninitialized Let x; println!("{}", x); Read uninitialized memory cannot be represented by a context-free language Let x; if true { x = 1;} println!("{}", x); Error? Let x; if true { x = 1;} else {x = 2} println!("{}", x); Semantic analysis is not panacea 35

Semantic Analyzer (cont.) Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules) Ex: the result is a syntax-directed translation Attribute grammars x = y + 12 The type of the identifier x must match with type of the expression (y+12) 36

Optimization No strong counterpart in English, but akin to editing Automatically modify programs so that they Run faster Use less recourse, e.g. power, memory Small size Eg. x = y + 0 => x =y x = y + 1; z = x => z= y + 1 for (x=1; x++; x<=10) {z=10; y = y + x *z } for (x=1; x++; x<=10) { y = y + x *10 } The project has no optimization component 37

Intermediate Code Generation A compiler may produce an explicit intermediate codes representing the source program. These intermediate codes are generally machine (architecture) independent (except GCC). But the level of intermediate codes is close to the level of machine codes. 38

Intermediate Code Generation (example) Ex: x = y * fact + 1 id1 = id2 * id3 + 1 MULT temp1, id2, id3 ADD temp2, temp1, #1 MOV id1, temp2 ;Intermediates Codes (Quadraples or three address code) 39

Code Generator Produces the target language in a specific architecture. The target program is normally is a relocatable object file containing the machine codes. Ex: id1 = id2 * id3 + 1 ( assume that we have an architecture with instructions whose at least one of its operands is a machine register) ARM: MOV LDR R1, R1, id2 [id2] MULT LDR R1, R2, id3 [id3] ADD MULT R1, R3, #1 R1, R2 MOV ADD id1, R1, R3, #1 STR R1, [id1] MULT temp1, id2, id3 ADD id1, temp1, #1 40

Symbol-table management A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier In lexical analyzer the identifier is entered into the symbol table The attributes of an identifier cannot normally be determined during lexical analysis Updated and used by syntax analyzer, semantic analyzer, others 41

The grouping of phases analysis and synthesis analysis: lexical analysis, syntax analysis, semantic analysis (optimization) synthesis: code generation (optimization) front end and back end separation depend on the source language or the target language the front end: the scanner, parser, semantic analyzer, intermediate code synthesis the back end: the code generator, some optimization Source code intermediate code target code Front end back end Fig.a.1.1 42

position := initial + rate * 60 Lexical analyzer Symbol table position initial rate......... id1 := id2 + id3 * 60 syntax analyzer := id1 + id2 * id3 60 semantic analyzer := id1 + id2 id3 Intermediate code generator * inttoreal 60 Temp1 := inttoreal(60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 Code optimizer temp1 := id3 * 60.0 id1 := id2 + temp1 Code generator MOVF R2, id3 MULF R2, #60.0 MOVF R1, id2 ADDF R1, R2 MOVF id1, R1 for MIPS: lw R2, id3 mult R2, R2, #60.0 lw R1, id2 add R1, R1, R2 sw R1, id1 43

LLVM and GCC 44

Issues Compiling is almost this simple, but there are many pitfalls. Each phase can encounter errors A phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. It is very difficult to proceed to analyze program or to correct errors. Language design has big impact on compiler Determines what is easy and hard to compile Course theme: many trade-offs in language design 45

Summary the constitute of compiler and compiling system the concept of lexical analyzer, syntax analyzer, semantic analyzer, code generation, symbol table, intermediate representation, parse tree, token and so on. Regular expression and regular definition Context free grammar (CFG) 46