Compiling Regular Expressions COMP360

Similar documents
Lexical Scanning COMP360

Theory and Compiling COMP360

Compiler Code Generation COMP360

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Compilers and Interpreters

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

A simple syntax-directed

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COMPILER DESIGN LECTURE NOTES

CS 314 Principles of Programming Languages. Lecture 3

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

CS 314 Principles of Programming Languages

Lexical Analysis. Introduction

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1


MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Specifying Syntax COMP360

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

CSCI312 Principles of Programming Languages!

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CST-402(T): Language Processors

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

CS 4201 Compilers 2014/2015 Handout: Lab 1

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

CS606- compiler instruction Solved MCQS From Midterm Papers

COMPILER DESIGN. For COMPUTER SCIENCE

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Introduction to Compiler Construction

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Working of the Compilers

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Compiler Design (40-414)


Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Lexical Analysis 1 / 52

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compilers. Prerequisites

Introduction to Compiler Design

CS 415 Midterm Exam Spring SOLUTION

Compiling Techniques

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

COMPILERS BASIC COMPILER FUNCTIONS

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Introduction to Compiler Construction

Introduction to Compiler Construction

Introduction to Lexing and Parsing

UNIT -2 LEXICAL ANALYSIS

Compiler Design Overview. Compiler Design 1

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

LECTURE NOTES ON COMPILER DESIGN P a g e 2

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Building Compilers with Phoenix

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Introduction to Compiler Construction

Introduction. Compilers and Interpreters

Group A Assignment 3(2)

Introduction to Compiler Construction

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

Front End. Hwansoo Han

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Peace cannot be kept by force; it can only be achieved by understanding. Albert Einstein

Syntax and Grammars 1 / 21

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Introduction to Compiler

Compiler Construction

List of Figures. About the Authors. Acknowledgments

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CSE 582 Autumn 2002 Exam 11/26/02

UNIT III. The following section deals with the compilation procedure of any program.

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

2068 (I) Attempt all questions.

Lecture 3: Lexical Analysis

CA Compiler Construction

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CS 132 Compiler Construction

Transcription:

Compiling Regular Expressions COMP360

Logic is the beginning of wisdom, not the end. Leonard Nimoy

Compiler s Purpose The compiler converts the program source code into a form that can be executed by the hardware The compiler works with the language libraries and the system linker

Output of a Compiler Most language systems compile the source into an object file of machine language which is linked into an executable Some languages, like Java and C#, are compiled to an intermediate language which is interpreted A compiler can output a high level language Some systems interpret and execute the source code directly

Run Time Compilers Java and C# compilers create an intermediate language that can be interpreted by a virtual machine Most virtual machines contain a Just In Time (JIT) compiler that compiles the intermediate language into machine language for efficiency

Stages of a Compiler Source preprocessing Lexical Analysis (scanning) Syntactic Analysis (parsing) Semantic Analysis Optimization Code Generation Link to libraries

Source Preprocessing In C and C++, preprocessor statements begin with a # The preprocessor edits the source code based on the preprocessor statements #include is the same as copying the included file at that point with the editor The output of the preprocessor is expanded source code with no # statements Old C compilers had a separate preprocessor program

Lexical Analysis Lexical Analysis or scanning reads the source code (or expanded source code) It removes all comments and white space The output of the scanner is a stream of tokens Tokens can be words, symbols or character strings A scanner can be a finite state automata (FSA)

Syntactic Analysis Syntactic Analysis or parsing reads the stream of tokens created by the scanner It checks that the language syntax is correct The output of the Syntactic Analyzer is a parse tree The parser can be implemented by a context free grammar stack machine

Semantic Analysis The Semantic Analysis inputs the parse tree from the parser Language requirements not checked by the syntax are enforced This stage determines what the program is to do The output of the Semantic Analysis is an intermediate code. This is similar to assembler language, but may include higher level operations

Optimization Most compilers will attempt to optimize the intermediate code Some compilers will also optimize after code generation There are many optimizations possible such as moving computations out of loops, avoiding redundant loads and stores, efficient use of registers, etc.

Code Generation The Code Generator inputs the intermediate language and outputs machine language for the target machine The code generator is specific to the machine architecture

Linking and Loading While not truly part of the compiler, the libraries provide the functionality that is more than just a few machine language statements The linker reads the object files and outputs and executable file

At which stage will these errors be detected? String num2go; int dog cat; dog = myfunc( dog ); // A // B // C int myfunc( int cat, int cow) { }

Simple Program /* This is an example program */ A = Boy + Cat + Dog;

After Lexical Scan A = Boy + Cat + Dog ;

Parsing <statement> -> <variable> = <expression> <variable> -> A Boy Cat Dog <expression> -> <variable> + <expression> <variable>

Compile A = B + C + D Intermediate code Temp1 = B + C Temp2 = Temp1 + D A = Temp2

Simple Machine Language Load register with B Add C to register Store register in Temp1 Load register with Temp1 Add D to register Store register in Temp2 Load register with Temp2 Store register in A

Optimized Machine Language Load register with B Add C to register Store register in Temp1 Load register with Temp1 Add D to register Store register in Temp2 Load register with Temp2 Store register in A

Symbol Table Many stages of a compiler create and reference a symbol table The symbol table keeps a list of all of the names used in the program To assist debugging, the symbol table can be written into the output object file. This tells debuggers where variables are located The symbol table can be created by the scanner and updated by all other stages

Output of Each Stage Source preprocessing expanded source code Lexical Analysis List of tokens Syntactic Analysis Parse Tree Semantic Analysis Intermediate code Optimization Intermediate code Code Generation Object file Link to libraries Executable program

Machines of a Compiler Source preprocessing simple editing Lexical Analysis Finite State Automata Syntactic Analysis Push Down Automata Semantic Analysis Optimization Code Generation Link to libraries

State Tables An FSA graph can be converted to a table The table has cells for each state and each input symbol In the cell goes the next state if the DFA is in that state and receives that input symbol You can consider the state table to be an adjacency table for the graph

Converting an Example FSA Consider the regular expression that begins and ends with an a and can have an even number of b s between them a (bb)* a It can be recognized by the FSA a b b 1 2 3 4 5 a a b

Convert the Graph to a Table 1 2 3 4 5 a 2 5 0 5 0 b 0 3 4 3 0 The states are listed along the top The input symbols are along the side For that symbol while in that state, the DFA will go to the new state given in the table State zero represents a final error state

Draw a DFA for this Regular Language (bb ba) a +

Possible Solution (bb ba) a + a a b b 1 2 3 4 a b

Possible Solution (bb ba) a + a a b b 1 2 3 4 a b

Create a state table for the FSA a a b b 1 2 3 4 a b 1 2 3 4 a 0 3 4 4 b 2 3 2 0

Programming a FSA It is relatively simple to implement a Finite State Automata in a modern programming language This program can be used to recognize if a string conforms to a regular language

FSA Program state = 1 while not end of file { symbol = next input character state = statetable[ symbol, state ] if state = 0 then error } if state is a terminating state, success

Grouping Input Symbols For many FSA programs, it is useful to create an index value for the input symbols, i.e. a = 0, b = 1 Often you can have groups of symbols use the same index value, i.e. all letters have the index 2 and all numbers have the index 3

Mealy and Moore Machines The FSA we have discussed so far simply determine if the input is valid for the specified language An FSA can also produce an output A Mealy machine has an output or function associated with each transition or edge of the graph A Moore machine has an output or function associated with each state or node of the graph

Using Mealy Machines to Create Tokens Consider an FSA reading text and creating token of words separated by spaces or punctuation a..z/1 a..z/0 not a..z/2 0 = Save character as first letter in a string 1 = Save character as next letter in a string 2 = Save the string as a token, handle other symbol

Mealy Machine State Table a..z/1 a..z/0 not a..z/2 1 2 3 a.. z 2/0 2/1 2/0 not a.. z 1/x 3/2 3/x New state / Output function

Lexical Analysis with a Mealy Machine Compilers can use a Mealy machine to scan the source code The FSA recognizes and discards comments and white space Names, numbers, strings and punctuation are each output as a list of tokens A token is an object that contains one unit of the input specifying the value and type

Reading Read sections 3.1 3.3 in the textbook by Wednesday