Similar documents
CT32 COMPUTER NETWORKS DEC 2015

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Compiler Construction

CSE 105 THEORY OF COMPUTATION

ECS 120 Lesson 7 Regular Expressions, Pt. 1

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

UNIT -2 LEXICAL ANALYSIS

Languages and Compilers

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Dr. D.M. Akbar Hussain

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Compiler Construction

Decidable Problems. We examine the problems for which there is an algorithm.

Regular Languages and Regular Expressions

Converting a DFA to a Regular Expression JP

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Introduction

Zhizheng Zhang. Southeast University

lec3:nondeterministic finite state automata

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Formal Languages and Compilers Lecture VI: Lexical Analysis

FAdo: Interactive Tools for Learning Formal Computational Models

CS415 Compilers. Lexical Analysis

JNTUWORLD. Code No: R

Lexical Analysis. Prof. James L. Frankel Harvard University

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Introduction to Lexical Analysis

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Multiple Choice Questions

Figure 2.1: Role of Lexical Analyzer

Lexical Analysis. Chapter 2

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CS308 Compiler Principles Lexical Analyzer Li Jiang

Lexical Analysis. Implementation: Finite Automata

Formal Languages and Automata

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CSc 453 Lexical Analysis (Scanning)

Formal languages and computation models

Lexical Analysis. Lecture 2-4

Implementation of Lexical Analysis. Lecture 4

2. Lexical Analysis! Prof. O. Nierstrasz!

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Front End: Lexical Analysis. The Structure of a Compiler

Implementation of Lexical Analysis

CS 310: State Transition Diagrams

Question Bank. 10CS63:Compiler Design

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Lecture 4: Syntax Specification

CS402 - Theory of Automata Glossary By

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CS2 Language Processing note 3

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

ECS 120 Lesson 16 Turing Machines, Pt. 2

Lexical Analysis. Lecture 3-4

Lexical Analyzer Scanner

Implementation of Lexical Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Implementation of Lexical Analysis

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Introduction to Computers & Programming

2. Syntax and Type Analysis

CSE 105 THEORY OF COMPUTATION

1. Lexical Analysis Phase

Learn Smart and Grow with world

(Refer Slide Time: 0:19)

Actually talking about Turing machines this time

Syntax and Type Analysis

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Finite Automata Part Three

A Typed Lambda Calculus for Input Sanitation

14.1 Encoding for different models of computation

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Lexical Analyzer Scanner

Lecture 3: Lexical Analysis

Lexical Analysis/Scanning

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Theory of Languages and Automata

6 NFA and Regular Expressions

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016


CMPE 4003 Formal Languages & Automata Theory Project Due May 15, 2010.

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Regular Expression Constrained Sequence Alignment

CISC 4090 Theory of Computation

Chapter Seven: Regular Expressions

Theory Bridge Exam Example Questions Version of June 6, 2008

Transcription:

Nondeterministic Finite Automata (NFA): Nondeterministic Finite Automata (NFA) states of an automaton of this kind may or may not have a transition for each symbol in the alphabet, or can even have multiple transitions for a symbol. The automaton accepts a word if there exists at least one path from q0 to a state in F labeled with the input word. If a transition is undefined, so that the automaton does not know how to keep on reading the input, the word is rejected. Formulas: An NFA is represented formally by a 5-tuple, (Q, Σ, Δ, q 0, F), consisting of a finite set of states Q a finite set of input symbols Σ a transition function Δ : Q Σ P(Q). an initial (or start) state q 0 Q a set of states F distinguished as accepting (or final) states F Q. Here, P(Q) denotes the power set of Q. Let w = a 1 a 2... a n be a word over the alphabet Σ. The automaton M accepts the word w if a sequence of states, r 0,r 1,..., r n, exists in Q with the following conditions: 1. r 0 = q 0 2. r i+1 Δ(r i, a i+1 ), for i = 0,..., n 1 3. r n F In words, the first condition says that the machine starts in the start state q 0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function Δ. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. In order for w being accepted by M it is not required that every state sequence ends in an accepting state, it is sufficient if one does. Otherwise, i.e. if it is impossible at all to get from q 0 to a state from F by following w, it is said that the automaton rejects the string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M). We can also define L(M) in terms of Δ*: Q Σ* P(Q) such that: 1. Δ*(r, ε)= {r} where ε is the empty string, and 2. If x Σ*, a Σ, and Δ*(r, x)={r 1, r 2,..., r k } then Δ*(r, xa)= Δ(r 1, a)... Δ(r k, a). Now L(M) = {w Δ*(q 0, w) F }. Chapter 1 partii Page 1

Note that there is a single initial state, which is not necessary. Sometimes, NFAs are defined with a set of initial states. There is an easy construction that translates a NFA with multiple initial states to a NFA with single initial state, which provides a convenient notation. For a more elementary introduction of the formal definition see automata theory Q = { 0, 1, 2 }, = { a, b }, A = { 2 }, the initial state is 0 and is as shown in the following table. State (q) Input (a) Next State ( (q, a) ) 0 A { 1, 2 } 0 B 1 A 1 B { 2 } 2 A 2 B Note that for each state there are two rows in the table for corresponding to the symbols a and b. A state transition diagram for this finite automaton is given below. Regular Expression to NFA: It is proven (Kleene s Theorem) that RE and FA are equivalent language definition methods. Based on this theoretical result practical algorithms have been developed enabling us actually to construct FA s from RE s and simulate the FA with a computer program using Transition Tables. In following this progression an NFA is constructed first from a regular expression, then the NFA is reconstructed to a DFA, and finally a Transition Table is built. The Thompson s Construction Algorithm is one of the algorithms that can be used to build a Nondeterministic Finite Automaton (NFA) from RE, and Subset construction Algorithm can be applied to convert the NFA into a Deterministic Finite Automaton (DFA). Chapter 1 partii Page 2

The last step is to generate a transition table. We need a finite state machine that is a deterministic finite automaton (DFA) so that each state has one unique edge for an input alphabet element. So that for code generation there is no ambiguity. But a nondeterministic finite automaton (NFA) with more than one edge for an input alphabet element is easier to construct using a general algorithm - Thompson s construction. Then following a standard procedure, we convert the NFA to a DFA for coding. Regular expression Consider the regular expression r = (a b)*abb, that matches {abb, aabb, babb, aaabb, bbabb, ababb, aababb, } To construct a NFA from this, use Thompson s construction. This method constructs a regular expression from its components using ε-transitions. The ε transitions act as glue or mortar for the subcomponent NFA s. An ε-transition adds nothing since concatenation with the empty string leaves a regular expression unchanged (concatenation with ε is the identity operation). Parse the regular expression into its subexpressions involving alphabet symbols a and b and ε: ε, a, b, a b, ()*, ab, abb These describe a regular expression for single characters ε, a, b alternation between a and b representing the union of the sets: L(a) U L(b) Kleene star ()* concatenation of a and b: ab, and also abb Subexpressions of these kinds have their own Nondeterministic Finite Automata from which the overall NFA is constructed. Each component NFA has its own start and end accepting states. A Nondeterministic Finite Automata (NFA) has a transition diagram with possibly more than one edge for a symbol (character of the alphabet) that has a start state and an accepting state. The NFA definitely provides an accepting state for the symbol. Converting a regular expression to a NFA - Thompson's Algorithm We will use the rules which defined a regular expression as a basis for the construction: Chapter 1 partii Page 3

Automata (DFA)Converting an NFA to a DFAFunction CallsLL(1) ParsingSLR ParsingSymbol tablesregister AllocationLexical analysis in detailparsingsyntax-directed translationruntime 1. The NFA representing the empty string is: 2. If the regularr expression is just a character, eg. a, then the corresponding NFA is : 3. The union operator is represented by a choice of transitions from a node; thus a b can be 4. Concatenationn simply involves connecting one NFA to the other; eg. ab is: 5. The Kleene closure must allow for taking zero or more instances of the letter from the input; Converting NFA to DFA: represented thus a* looks This section specifically describes how one may transform any nondeterministic finite automaton (NFA) into a deterministic automaton (DFA) by using the tools under the Convert Convert to DFA menu option. For any string x, theree may exist none or more than one path from initial state and associated with x. The Von Neumann principle for the design and operation of computers requires that a program has to be primary memory resident to execute. Also, a user requires to revisit his programs often during its evolution. However, due to the fact that primary memory as: like: Chapter 1 partii Page 4

is volatile, a user needs to store his program in some non-volatile store. All computers provide a non-volatile secondary memory available as an online storage. Programs and files may be disk resident and downloaded whenever their execution is required. Therefore, some form of memory management is needed at both primary and secondary memory levels. Secondary memory may store program scripts, executable process images and data files. It may store applications, as well as, system programs. In fact, a good part of all OS, the system programs which provide services (the utilities for instance) are stored in the secondary memory. These are requisitioned as needed. The main motivation for management of main memory comes from the support for multiprogramming. Several executables processes reside in main memory at any given time. In other words, there are several programs using the main memory as their address space. Also, programs move into, and out of, the main memory as they terminate, or get suspended for some IO, or new executables are required to be loaded in main memory. So, the OS has to have some strategy for main memory management. In this chapter we shall discuss the management issues and strategies for both main memory and secondary memory. Main Memory Management Let us begin by examining the issues that prompt the main memory management. Allocation: First of all the processes that are scheduled to run must be resident in the memory. These processes must be allocated space in main memory. Swapping, fragmentation and compaction: If a program is moved out or terminates, it creates a hole, (i.e. a contiguous unused area) in main memory. When a new process is to be moved in, it may be allocated one of the available holes. It is quite possible that main memory has far too many small holes at a certain time. In such a situation none of these holes is really large enough to be allocated to a new process that may be moving in. The main memory is too Operating Systems/Memory management Lecture Notes PCP Bhatt/IISc, Bangalore M4/V1/June 04/2 fragmented. It is, therefore, essential to attempt compaction. Compaction means OS re-allocates the existing programs in contiguous regions and creates a large enough free area for allocation to a new process. Chapter 1 partii Page 5

Garbage collection: Some programs use dynamic data structures. These programs dynamically use and discard memory space. Technically, the deleted data items (from a dynamic data structure) release memory locations. However, in practice the OS does not collect such free space immediately for allocation. This is because that affects performance. Such areas, therefore, are called garbage. When such garbage exceeds a certain threshold, the OS would not have enough memory available for any further allocation. This entails compaction (or garbage collection), without severely affecting performance. Protection: With many programs residing in main memory it can happen that due to a programming error (or with malice) some process writes into data or instruction area of some other process. The OS ensures that each process accesses only to its own allocated area, i.e. each process is protected from other processes. Virtual memory: Often a processor sees a large logical storage space (a virtual storage space) though the actual main memory may not be that large. So some facility needs to be provided to translate a logical address available to a processor into a physical address to access the desired data or instruction. IO support: Most of the block-oriented devices are recognized as specialized files. Their buffers need to be managed within main memory alongside the other processes. The considerations stated above motivate the study of main memory management. One of the important considerations in locating an executable program is that it should be possible to relocate it any where in the main memory. We shall dwell upon the concept of relocation next. Code Generation: In computing, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine. The input to the code generator typically consists of a parse tree or an abstract syntax tree. The tree is converted Chapter 1 partii Page 6

into a linear sequence of instructions, usually in an intermediate language such as threeaddress code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a peephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.) A code generator is expected to have an understanding of the target machine s runtime environment and its instruction set. The code generator should take the following things into consideration to generate the code: Target language: The code generator has to be aware of the nature of the target language for which the code is to be transformed. That language may facilitate some machine-specific instructions to help the compiler generate the code in a more convenient way. The target machine can have either CISC or RISC processor architecture. IR Type: Intermediate representation has various forms. It can be in Abstract Syntax Tree AST structure, Reverse Polish Notation, or 3-address code. Selection of instruction : The code generator takes Intermediate Representation as input and converts maps it into target machine s instruction set. One representation can have many ways instructions to convert it, so it becomes the responsibility of the code generator to choose the appropriate instructions wisely. Register allocation: A program has a number of values to be maintained during the execution. The target machine s architecture may not allow all of the values to be kept in the CPU memory or registers. Code generator decides what values to keep in the registers. Also, it decides the registers to be used to keep these values. Ordering of instructions : At last, the code generator decides the order in which the instruction will be executed. It creates schedules for instructions to execute them. Descriptors: The code generator has to track both the registers foravailability and addresses locationofvalues while generating the code. For both of them, the following two descriptors are used: Register descriptor: Register descriptor is used to inform the code generator about the availability of registers. Register descriptor keeps track of values stored in each register. Whenever a new register is required during code generation, this descriptor is consulted for register availability. Chapter 1 partii Page 7

Address descriptor: Values of the names identifiers used in the program might be stored at different locations while in execution. Address descriptors are used to keep track of memory locations where the values of identifiers are stored. These locations may include CPU registers, heaps, stacks, memory or a combination of the mentioned locations. Code generator keeps both the descriptor updated in real-time. For a load statement, LD R1, x, the code generator: updates the Register Descriptor R1 that has value of x and updates the Address Descriptor x to show that one instance of x is in R1. Basic blocks comprise of a sequence of three-address instructions. Code generator takes these sequence of instructions as input. Note : If the value of a name is found at more than one place register, cache, ormemory, the register s value will be preferred over the cache and main memory. Likewise cache s value will be preferred over the main memory. Main memory is barely given any preference. getreg: Code generator uses getreg function to determine the status of available registers and the location of name values. getreg works as follows: If variable Y is already in register R, it uses that register. Else if some register R is available, it uses that register. Else if both the above options are not possible, it chooses a register that requires minimal number of load and store instructions. For an instruction x = y OP z, the code generator may perform the following actions. Let us assume that L is the location preferablyregister where the output of y OP z is to be saved: Call function getreg, to decide the location of L. Determine the present location registerormemory of y by consulting the Address Descriptor of y. If y is not presently in register L, then generate the following instruction to copy the value of y to L: MOV y, L where y represents the copied value of y. Determine the present location of z using the same method used in step 2 for y and generate the following instruction: OP z, L where z represents the copied value of z. Now L contains the value of y OP z, that is intended to be assigned to x. So, if L is a register, update its descriptor to indicate that it contains the value of x. Update the descriptor of x to indicate that it is stored at location L. If y and z has no further use, they can be given back to the system. Chapter 1 partii Page 8

Deterministic Finite Automaton (DFA) In DFA, for each input symbol, one can determine the state to which the machine will move. Hence, it is called Deterministic Automaton. As it has a finite number of states, the machine is called Deterministic Finite Machineor Deterministic Finite Automaton. Formal Definition of a DFA A DFA can be represented by a 5-tuple (Q,, δ, q 0, F) where Q is a finite set of states. is a finite set of symbols called the alphabet. δ is the transition function where δ: Q Q q 0 is the initial state from where any input is processed (q0 Q). F is a set of final state/states of Q (F Q). Graphical Representation of a DFA A DFA is represented by digraphs called state diagram. The vertices represent the states. The arcs labeled with an input alphabet show the transitions. The initial state is denoted by an empty single incoming arc. The final state is indicated by double circles. Example Let a deterministic finite automaton be Q = {a, b, c}, = {0, 1}, q 0 ={a}, F={c}, and Transition function δ as shown by the following table Chapter 1 partii Page 9

Present State Next State for Input 0 Next State for Input 1 A A b B C a C B c Its graphical representation would be as follows Problem Statement Let X = (Q x,, δ x, q 0, F x ) be an NDFA which accepts the language L(X). We have to design an equivalent DFA Y = (Q y,, δ y, q 0, F y ) such that L(Y) = L(X). The following procedure converts the NDFA to its equivalent DFA Algorithm Input: Output: Step 1 Step 2 Step 3 Step 4 An NDFA equivalent DFA Create state table from the given NDFA. Create a blank state table under possible input alphabets for the equivalent DFA. Mark the start state of the DFA by q 0 (Same as the NDFA). Find out the combination of States {Q 0, Q 1,..., Q n } for each possible input alphabet. Step 5 Each time we generate a new DFA state under the input alphabet columns, we have to apply step 4 again, otherwise go to step 6. Step 6 The states which contain any of the final states of the NDFA are the final states of the equivalent DFA. Chapter 1 partii Page 10

Example The NDFA table is as follows q δ(q,0) δ(q,1) a {a,b,c,d,e} {d,e} b {c} {e} c {b} d {e} e Let us consider the NDFA shown in the figure below. Using above algorithm, we find its equivalent DFA. The state table of the DFA is shown in below. Q δ(q,0) δ(q,1) A {a,b,c,d,e} {d,e} {a,b,c,d,e} {a,b,c,d,e} {b,d,e} {d,e} E D {b,d,e} {c,e} E Chapter 1 partii Page 11

E D E {c,e} B B C E C B The state diagram of the DFA is as follows Runtime Environment: Before code generation static source text of a program needs to be related to the actions that must occur at runtime to implement the program. As execution proceeds same name in the source text can denote different data objects in the target machine. The allocation and de allocation of data objects is managed by the runtime support package. Chapter 1 partii Page 12

Each execution of a function is referred to as an activation of the procedure/function If the function is recursive several of its activations may be alive at the same time A procedure is activated when called The lifetime of an activation of a procedure is the sequence of steps between the first and last steps in the execution of the procedure body A procedure is recursive if a new activation can begin before an earlier activation of the same procedure has ended Chapter 1 partii Page 13