Compiler Construction I

Similar documents
Compiler Construction I

Compiler Construction

Compiler Construction

Compiler Construction I

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Compiler Construction I

Lexical Analysis. Introduction

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Syntax and Type Analysis

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

2. Syntax and Type Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis 1 / 52

Compiler Construction LECTURE # 3

CS415 Compilers. Lexical Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Compiler phases. Non-tokens

Static Program Analysis

Figure 2.1: Role of Lexical Analyzer

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE302: Compiler Design

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Languages and Compilers

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Lecture 4: Syntax Specification

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CPS 506 Comparative Programming Languages. Syntax Specification

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Chapter Seven: Regular Expressions

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Compiler Construction

CT32 COMPUTER NETWORKS DEC 2015

CS 314 Principles of Programming Languages. Lecture 3

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Static Program Analysis

ECS 120 Lesson 7 Regular Expressions, Pt. 1

UNIT -2 LEXICAL ANALYSIS

Grammars and Parsing. Paul Klint. Grammars and Parsing

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Implementation of Lexical Analysis

2068 (I) Attempt all questions.

Zhizheng Zhang. Southeast University

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

1. (a) What are the closure properties of Regular sets? Explain. (b) Briefly explain the logical phases of a compiler model. [8+8]

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiling Regular Expressions COMP360

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 3: Lexing and Parsing

Lexical Analysis/Scanning

Lecture 3: Lexical Analysis

CSCI312 Principles of Programming Languages!

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

CSE 401 Midterm Exam Sample Solution 2/11/15

Lexical Analysis - 2

Lexical Analysis. Implementation: Finite Automata

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CS 310: State Transition Diagrams

Structure of Programming Languages Lecture 3

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CSE 105 THEORY OF COMPUTATION

Midterm I (Solutions) CS164, Spring 2002

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Monday, August 26, 13. Scanners

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Implementation of Lexical Analysis

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Wednesday, September 3, 14. Scanners

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

UVa ID: NAME (print): CS 4501 LDI Midterm 1

Dr. D.M. Akbar Hussain

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler.

Lecture 3: CUDA Programming & Compiler Front End

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Introduction to Lexical Analysis

Compiler Construction

Chapter 3 Lexical Analysis

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analysis. Prof. James L. Frankel Harvard University

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Transcription:

TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK Compiler Construction I Dr. Michael Petter SoSe 2015 1 / 58

Organizing Master or Bachelor in the 6th Semester with 5 ECTS Prerequisites Informatik 1 & 2 Theoretische Informatik Technische Informatik Grundlegende Algorithmen Delve deeper with Virtual Machines Programmoptimization Programming Languages Praktikum Compilerbau Seminars Materials: TTT-based lecture recordings the slides Related literature list online ( Wilhelm/Seidl/Hack Compiler Design) Tools for visualization of virtual machines (VAM) Tools for generating components of Compilers (JFlex/CUP) 2 / 58

Organizing Dates: Lecture: Mo. 14:15-15:45 Tutorial: You can vote on two dates via moodle Exam: One Exam in the summer, none in the winter Exam managed via TUM-online Successful (50% credits) tutorial exercises earns 0.3 bonus 3 / 58

Preliminary content Basics in regular expressions and automata Specification and implementation of scanners Reduced context free grammars and pushdown automata Bottom-Up Syntaxanalysis Attribute systems Typechecking Codegeneration for stack machines Register assignment Basic Optimization 4 / 58

Topic: Introduction 5 / 58

Interpreter Program Input Interpreter Output Pro: No precomputation on program text necessary no/small Startup-time Con: Program components are analyzed multiple times during the execution longer runtime 6 / 58

Concept of a Compiler Program Compiler Code Code Input Machine Output Two Phases: 1 Translating the program text into a machine code 2 Executing the machine code on the input 7 / 58

Compiler A precomputation on the program allows a more sophisticated variable management discovery and implementation of global optimizations Disadvantage The Translation costs time Advantage The execution of the program becomes more efficient payoff for more sophisticated or multiply running programs. 8 / 58

Compiler general Compiler setup: Program code Compiler Analysis Int. Representation Synthesis Compiler Code 9 / 58

Compiler general Compiler setup: Program code Compiler Analysis Int. Representation Synthesis Compiler Code 9 / 58

Compiler The Analysis-Phase is divided in several parts: Program code Analysis 9 / 58

Compiler The Analysis-Phase is divided in several parts: Program code Analysis Scanner Token-Stream lexicographic Analysis: Partitioning in tokens 9 / 58

Compiler The Analysis-Phase is divided in several parts: Program code Analysis Scanner Token-Stream Parser Syntax tree lexicographic Analysis: Partitioning in tokens syntactic Analysis: Detecting hierarchical structure 9 / 58

Compiler The Analysis-Phase is divided in several parts: Program code Analysis Scanner Token-Stream Parser Syntax tree Type Checker... lexicographic Analysis: Partitioning in tokens syntactic Analysis: Detecting hierarchical structure semantic Analysis: Infering semantic properties (annotated) Syntax tree 9 / 58

Topic: Lexical Analysis 10 / 58

The lexical Analysis Program code Scanner Token-Stream 11 / 58

The lexical Analysis xyz + 42 Scanner xyz + 42 11 / 58

The lexical Analysis xyz + 42 Scanner xyz + 42 A Token is a sequence of characters, which together form a unit. Tokens are subsumed in classes. For example: Names (Identifiers) e.g. xyz, pi,... Constants e.g. 42, 3.14, abc,... Operators e.g. +,... reserved terms e.g. if, int,... 11 / 58

The lexical Analysis I O C xyz + 42 Scanner xyz + 42 A Token is a sequence of characters, which together form a unit. Tokens are subsumed in classes. For example: Names (Identifiers) e.g. xyz, pi,... Constants e.g. 42, 3.14, abc,... Operators e.g. +,... reserved terms e.g. if, int,... 11 / 58

The Lexical Analysis Classified tokens allow for further pre-processing: Dropping irrelevant fragments e.g. Spacing, Comments,... Separating Pragmas, i.e. directives vor the compiler, which are not directly part of the program, like include-statements; Replacing of Tokens of particular classes with their meaning / internal representation, e.g. Constants; Names: typically managed centrally in a Symbol-table, evt. compared to reserved terms (if not already done by the scanner) and possibly replaced with an index. Siever 12 / 58

The Lexical Analysis Discussion: Scanner and Siever are often combined into a single component, mostly by providing appropriate callback actions in the event that the scanner detects a token. Scanners are mostly not written manually, but generated from a specification. Specification Generator Scanner 13 / 58

The Lexical Analysis - Generating:... in our case: Specification Generator Scanner 14 / 58

The Lexical Analysis - Generating:... in our case: 0 0 [1-9][0-9]* Generator [1 9] [0 9] Specification of Token-classes: Regular expressions; Generated Implementation: Finite automata + X 14 / 58

Lexical Analysis Chapter 1: Basics: Regular Expressions 15 / 58

Regular Expressions Basics Program code is composed from a finite alphabet Σ of input characters, e.g. Unicode The sets of textfragments of a token class is in general Regular languages can be specified by regular expressions. regular. 16 / 58

Regular Expressions Basics Program code is composed from a finite alphabet Σ of input characters, e.g. Unicode The sets of textfragments of a token class is in general Regular languages can be specified by regular expressions. regular. Definition Regular Expressions The set E Σ of (non-empty) regular expressions is the smallest set E with: ɛ E (ɛ a new symbol not from Σ); a E for all a Σ; (e 1 e 2 ), (e 1 e 2 ), e 1 E if e 1, e 2 E. Stephen Kleene 16 / 58

Regular Expressions... Example: ((a b ) a) (a b) ((a b) (a b)) 17 / 58

Regular Expressions... Example: ((a b ) a) (a b) ((a b) (a b)) Attention: We distinguish between characters a, 0, $,... and Meta-symbols (,, ),... To avoid (ugly) parantheses, we make use of Operator-Precedences: > > and omit 17 / 58

Regular Expressions... Example: ((a b ) a) (a b) ((a b) (a b)) Attention: We distinguish between characters a, 0, $,... and Meta-symbols (,, ),... To avoid (ugly) parantheses, we make use of Operator-Precedences: > > and omit Real Specification-languages offer additional constructs: and omit ɛ e? (ɛ e) e + (e e ) 17 / 58

Regular Expressions Specifications need Semantics...Example: Specification Semantics abab {abab} a b {a, b} ab a {ab n a n 0} For e E Σ we define the specified language [e] Σ inductively by: [ɛ] = {ɛ} [a] = {a} [e ] = ([[e]) [e 1 e 2 ] = [[e 1 ] [e 2 ] [e 1 e 2 ] = [[e 1 ] [e 2 ] 18 / 58

Keep in Mind: The operators (_),, are interpreted in the context of sets of words: (L) = {w 1... w k k 0, w i L} L 1 L 2 = {w 1 w 2 w 1 L 1, w 2 L 2 } 19 / 58

Keep in Mind: The operators (_),, are interpreted in the context of sets of words: (L) = {w 1... w k k 0, w i L} L 1 L 2 = {w 1 w 2 w 1 L 1, w 2 L 2 } Regular expressions are internally represented as annotated ranked trees: * (ab ɛ). ɛ a b Inner nodes: Operator-applications; Leaves: particular symbols or ɛ. 19 / 58

Regular Expressions Example: Identifiers in Java: le = [a-za-z_\$] di = [0-9] Id = {le} ({le} {di})* 20 / 58

Regular Expressions Example: Identifiers in Java: le = [a-za-z_\$] di = [0-9] Id = {le} ({le} {di})* Float = {di}* (\.{di} {di}\.) {di}*((e E)(\+ \-)?{di}+)? 20 / 58

Regular Expressions Example: Identifiers in Java: le = [a-za-z_\$] di = [0-9] Id = {le} ({le} {di})* Float = {di}* (\.{di} {di}\.) {di}*((e E)(\+ \-)?{di}+)? Remarks: le and di are token classes. Defined Names are enclosed in {, }. Symbols are distinguished from Meta-symbols via \. 20 / 58

Lexical Analysis Chapter 2: Basics: Finite Automata 21 / 58

Finite Automata Example: a ɛ ɛ b 22 / 58

Finite Automata Example: a ɛ b ɛ Nodes: States; Edges: Transitions; Lables: Consumed input; 22 / 58

Finite Automata Definition Finite Automata A non-deterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, I, F) with: Michael Rabin Dana Scott Q Σ I Q F Q δ a finite set of states; a finite alphabet of inputs; the set of start states; the set of final states and the set of transitions (-relation) 23 / 58

Finite Automata Definition Finite Automata A non-deterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, I, F) with: Michael Rabin Dana Scott Q Σ I Q F Q δ a finite set of states; a finite alphabet of inputs; the set of start states; the set of final states and the set of transitions (-relation) For an NFA, we reckon: Definition Deterministic Finite Automata Given δ : Q Σ Q a function and I = 1, then we call the NFA A deterministic (DFA). 23 / 58

Finite Automata Computations are paths in the graph. Accepting computations lead from I to F. An accepted word is the sequence of lables along an accepting computation... a ɛ b ɛ 24 / 58

Finite Automata Computations are paths in the graph. Accepting computations lead from I to F. An accepted word is the sequence of lables along an accepting computation... a b ɛ ɛ 24 / 58

Finite Automata Once again, more formally: We define the transitive closure δ of δ as the smallest set δ with: (p, ɛ, p) δ and (p, xw, q) δ if (p, x, p 1 ) δ and (p 1, w, q) δ. δ characterizes for two states p and q the words, along each path between them The set of all accepting words, i.e. A s accepted language can be described compactly as: L(A) = {w Σ i I, f F : (i, w, f ) δ } 25 / 58

Lexical Analysis Chapter 3: Converting Regular Expressions to NFAs 26 / 58

In linear time from Regular Expressions to NFAs e = ɛ ɛ ɛ a ɛ e = a a e = a b ɛ b ɛ e = ab a ɛ b ɛ e = a ɛ a ɛ ɛ Thompson s Algorithm Produces O(n) states for regular expressions of length n. Ken Thompson 27 / 58