Compiler Construction I

Similar documents
Compiler Construction I

Compiler Construction

Compiler Construction I

Compiler Construction

Compiler Construction I

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Introduction

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Syntax and Type Analysis

2. Syntax and Type Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Formal Languages and Compilers Lecture VI: Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Compiler Construction LECTURE # 3

Figure 2.1: Role of Lexical Analyzer

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Languages and Compilers

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Static Program Analysis

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Lexical Analysis 1 / 52

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS415 Compilers. Lexical Analysis

Lecture 4: Syntax Specification

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler phases. Non-tokens

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Implementation of Lexical Analysis

CPS 506 Comparative Programming Languages. Syntax Specification

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Chapter 3: Lexing and Parsing

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CSE302: Compiler Design

Chapter Seven: Regular Expressions

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

Compiler Construction

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

CSCI312 Principles of Programming Languages!

Lecture 3: Lexical Analysis

CT32 COMPUTER NETWORKS DEC 2015

CS 314 Principles of Programming Languages. Lecture 3

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Static Program Analysis

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Grammars and Parsing. Paul Klint. Grammars and Parsing

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

UNIT -2 LEXICAL ANALYSIS

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Compiler Construction I

Implementation of Lexical Analysis

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Zhizheng Zhang. Southeast University

2068 (I) Attempt all questions.

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

1. (a) What are the closure properties of Regular sets? Explain. (b) Briefly explain the logical phases of a compiler model. [8+8]

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Compiling Regular Expressions COMP360

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSE 401 Midterm Exam Sample Solution 2/11/15

Introduction to Lexical Analysis

6 NFA and Regular Expressions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Chapter 3 Lexical Analysis

Lexical Analysis - 2

Structure of Programming Languages Lecture 3

Lexical Analysis. Implementation: Finite Automata

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CS 310: State Transition Diagrams

Lecture 3: CUDA Programming & Compiler Front End

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CSE 105 THEORY OF COMPUTATION

Midterm I (Solutions) CS164, Spring 2002

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Monday, August 26, 13. Scanners

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

UVa ID: NAME (print): CS 4501 LDI Midterm 1

Wednesday, September 3, 14. Scanners

Compiler course. Chapter 3 Lexical Analysis

Dr. D.M. Akbar Hussain

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler.

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

COMPILER DESIGN LECTURE NOTES

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Working of the Compilers

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Transcription:

TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK Compiler Construction I Dr. Michael Petter, Dr. Axel Simon SoSe 2014 1 / 59

Organizing Master or Bachelor in the 6th Semester with 5 ECTS Prerequisites Informatik 1 & 2 Theoretische Informatik Technische Informatik Grundlegende Algorithmen Delve deeper with Virtual Machines Programmoptimization Programming Languages Praktikum Compilerbau Hauptseminars Materials: TTT-based lecture recordings the slides Related literature list online Tools for visualization of virtual machines Tools for generating components of Compilers 2 / 59

Organizing Dates: Lecture: Mo. 14:15-15:45 Tutorial: You can vote on two dates via moodle Exam: One Exam in the summer, none in the winter Exam managed via TUM-online Successful (50% credits) tutorial exercises earns 0.3 bonus 3 / 59

Preliminary content Basics in regular expressions and automata Specification and implementation of scanners Reduced context free grammars and pushdown automata Bottom-Up Syntaxanalysis Attribute systems Typechecking Codegeneration for stack machines Register assignment Basic Optimization 4 / 59

Topic: Introduction 5 / 59

Interpreter Program Input Interpreter Output Pro: No precomputation on program text necessary no/small Startup-time Con: Program components are analyzed multiple times during the execution longer runtime 6 / 59

Concept of a Compiler: Program Compiler Code Code Input Machine Output Two Phases: 1 Translating the program text into a machine code 2 Executing the machine code on the input 7 / 59

Compiler A precomputation on the program allows a more sophisticated variable management discovery and implementation of global optimizations Disadvantage The Translation costs time Advantage The execution of the program becomes more efficient payoff for more sophisticated or multiply running programs. 8 / 59

Compiler general Compiler setup: Programcode Compiler Analysis Int. Representation Synthesis Compiler Code 9 / 59

Compiler general Compiler setup: Program code Compiler Analysis Int. Representation Synthesis Compiler Code 9 / 59

Compiler The Analysis-Phase is divided in several parts: Program code Analysis (annotated) Syntax tree 9 / 59

Compiler The Analysis-Phase is divided in several parts: Program code Scanner lexicographic Analysis: Partitioning in tokens Analysis Token-Stream (annotated) Syntax tree 9 / 59

Compiler The Analysis-Phase is divided in several parts: Program code Analysis Scanner Token-Stream Parser Syntax tree lexicographic Analysis: Partitioning in tokens syntactic Analysis: Detecting hierarchical structure (annotated) Syntax tree 9 / 59

Compiler The Analysis-Phase is divided in several parts: Program code Analysis Scanner Token-Stream Parser Syntax tree Type Checker... (annotated) Syntax tree lexicographic Analysis: Partitioning in tokens syntactic Analysis: Detecting hierarchical structure semantic Analysis: Infering semantic Properties 9 / 59

Topic: Lexical Analysis 10 / 59

The lexical Analysis Program code Scanner Token-Stream 11 / 59

The lexical Analysis xyz + 42 Scanner xyz + 42 11 / 59

The lexical Analysis xyz + 42 Scanner xyz + 42 A Token is a sequence of characters, which together form a unit. Tokens are subsumed in classes. For example: Names (Identifiers) e.g. xyz, pi,... Constants e.g. 42, 3.14, abc,... Operators e.g. +,... reserved terms e.g. if, int,... 11 / 59

The lexical Analysis Scanner I O C xyz + 42 xyz + 42 A Token is a sequence of characters, which together form a unit. Tokens are subsumed in classes. For example: Names (Identifiers) e.g. xyz, pi,... Constants e.g. 42, 3.14, abc,... Operators e.g. +,... reserved terms e.g. if, int,... 11 / 59

The lexical Analysis Classified tokens allow for further pre-processing: Dropping irrelevant fragments e.g. Spacing, Comments,... Separating Pragmas, i.e. directives vor the compiler, which are not directly part of the program, like include-statements; Replacing of Tokens of particular classes with their meaning / internal representation, e.g. Constants; Names: typically managed centrally in a Symbol-table, evt. compared to reserved terms (if not already done by the scanner) and possibly replaced with an index. Siever 12 / 59

The lexical Analysis Discussion: Scanner and Siever are often combined into a single component, mostly by providing appropriate callback actions in the event that the scanner detects a token. Scanners are mostly not written manually, but generated from a specification. Specification Generator Scanner 13 / 59

The lexical Analysis - Generating: Advantages Productivity The component can be produced more rapidly Correctness The component implements (provably) the specification. Efficiency The generator can provide the produced component with very efficient algorithms. 14 / 59

The lexical Analysis - Generating: Advantages Productivity The component can be produced more rapidly Correctness The component implements (provably) the specification. Efficiency The generator can provide the produced component with very efficient algorithms. Disadvantages Specification is just another form of programming admittedly possibly simpler Generation instead of implementatation pays off for Routine-tasks only... and is only good for problems, that are well understood 14 / 59

The lexical Analysis - Generating:... in our case: Specification Generator Scanner 15 / 59

The lexical Analysis - Generating:... in our case: 0 0 [1-9][0-9]* Generator [1 9] [0 9] Specification of Token-classes: Regular expressions; Generated Implementation: Finite automata + X 15 / 59

Lexical Analysis Chapter 1: Basics: Regular Expressions 16 / 59

Regular expressions Basics Program code is composed from a finite alphabet Σ of input characters, e.g. Unicode The sets of textfragments of a token class is in general Regular languages can be specified by regular expressions. regular. 17 / 59

Regular expressions Basics Program code is composed from a finite alphabet Σ of input characters, e.g. Unicode The sets of textfragments of a token class is in general Regular languages can be specified by regular expressions. regular. Definition Regular expressions The set E Σ of (non-empty) regular expressions is the smallest set E with: ɛ E (ɛ a new symbol not from Σ); a E for all a Σ; (e 1 e 2 ), (e 1 e 2 ), e 1 E if e 1, e 2 E. Stephen Kleene 17 / 59

Regular expressions... Example: ((a b ) a) (a b) ((a b) (a b)) 18 / 59

Regular expressions... Example: ((a b ) a) (a b) ((a b) (a b)) Attention: We distinguish between characters a, 0, $,... and Meta-symbols (,, ),... To avoid (ugly) parantheses, we make use of Operator-Precedences: > > and omit 18 / 59

Regular expressions... Example: ((a b ) a) (a b) ((a b) (a b)) Attention: We distinguish between characters a, 0, $,... and Meta-symbols (,, ),... To avoid (ugly) parantheses, we make use of Operator-Precedences: > > and omit Real Specification-languages offer additional constructs: e? (ɛ e) e + (e e ) and omit ɛ 18 / 59

Regular expressions Specifications need Semantics...Example: Specification Semantics abab {abab} a b {a, b} ab a {ab n a n 0} For e E Σ we define the specified language [[e]] Σ inductively by: [[ɛ]] = {ɛ} [[a]] = {a} [[e ]] = ([[e]]) [[e 1 e 2 ]] = [[e 1 ]] [[e 2 ]] [[e 1 e 2 ]] = [[e 1 ]] [[e 2 ]] 19 / 59

Keep in mind: The operators (_),, are interpreted in the context of sets of words: (L) = {w 1... w k k 0, w i L} L 1 L 2 = {w 1 w 2 w 1 L 1, w 2 L 2 } 20 / 59

Keep in mind: The operators (_),, are interpreted in the context of sets of words: (L) = {w 1... w k k 0, w i L} L 1 L 2 = {w 1 w 2 w 1 L 1, w 2 L 2 } Regular expressions are internally represented as annotated ranked trees: * (ab ɛ). ɛ a b Inner nodes: Operator-applications; Leaves: particular symbols or ɛ. 20 / 59

Regular expressions Example: Identifiers in Java: le = [a-za-z_\$] di = [0-9] Id = {le} ({le} {di})* 21 / 59

Regular expressions Example: Identifiers in Java: le = [a-za-z_\$] di = [0-9] Id = {le} ({le} {di})* Float = {di}* (\.{di} {di}\.) {di}*((e E)(\+ \-)?{di}+)? 21 / 59

Regular expressions Example: Identifiers in Java: le = [a-za-z_\$] di = [0-9] Id = {le} ({le} {di})* Float = {di}* (\.{di} {di}\.) {di}*((e E)(\+ \-)?{di}+)? Remarks: le and di are token classes. Defined Names are enclosed in {, }. Symbols are distinguished from Meta-symbols via \. 21 / 59

Lexical Analysis Chapter 2: Basics: Finite Automata 22 / 59

Finite automata Example: a ɛ ɛ b 23 / 59

Finite automata Example: a ɛ b ɛ Nodes: States; Edges: Transitions; Lables: Consumed input; 23 / 59

Finite automata Definition A non-deterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, I, F) with: Michael Rabin Dana Scott Q Σ I Q F Q δ a finite set of states; a finite alphabet of inputs; the set of start states; the set of final states and the set of transitions (-relation) 24 / 59

Finite automata Definition A non-deterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, I, F) with: Michael Rabin Dana Scott Q Σ I Q F Q δ a finite set of states; a finite alphabet of inputs; the set of start states; the set of final states and the set of transitions (-relation) For an NFA, we reckon: Definition Given δ : Q Σ Q a function and I = 1, then we call A deterministic (DFA). 24 / 59

Finite automata Computations are paths in the graph. Accepting computations lead from I to F. An accepted word is the sequence of lables along an accepting computation... a ɛ b ɛ 25 / 59

Finite automata Computations are paths in the graph. Accepting computations lead from I to F. An accepted word is the sequence of lables along an accepting computation... a b ɛ ɛ 25 / 59

Finite automata Once again, more formally: We define the transitive closure δ of δ as the smallest set δ with: (p, ɛ, p) δ and (p, xw, q) δ if (p, x, p 1 ) δ and (p 1, w, q) δ. δ characterizes for two states p and q the words, along each path between them The set of all accepting words, i.e. A s accepted language can be described compactly as: L(A) = {w Σ i I, f F : (i, w, f ) δ } 26 / 59