Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

Similar documents
CS 441G Fall 2018 Exam 1 Matching: LETTER

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Introduction to Lexing and Parsing

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages. Lecture 3

Theory and Compiling COMP360

Homework & Announcements

Theory of Programming Languages COMP360

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Theory of Computations Spring 2016 Practice Final Exam Solutions

CSCI312 Principles of Programming Languages!

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

Lexical Analysis. Introduction

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

UNIT -2 LEXICAL ANALYSIS

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Behaviour Diagrams UML

CS 4201 Compilers 2014/2015 Handout: Lab 1

Decidable Problems. We examine the problems for which there is an algorithm.

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Universal Turing Machine Chomsky Hierarchy Decidability Reducibility Uncomputable Functions Rice s Theorem Decidability Continued

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

Chapter 3. Describing Syntax and Semantics ISBN

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

CPS 506 Comparative Programming Languages. Syntax Specification

Compiling Regular Expressions COMP360

Formal Grammars and Abstract Machines. Sahar Al Seesi

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CSCE 314 Programming Languages

Syntax Analysis Part I

CT32 COMPUTER NETWORKS DEC 2015

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Dr. D.M. Akbar Hussain

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

1. [5 points each] True or False. If the question is currently open, write O or Open.

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

COMPILER DESIGN LECTURE NOTES

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CSc 453 Lexical Analysis (Scanning)

Theory of Computations Spring 2016 Practice Final

Chapter 4. Lexical and Syntax Analysis

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

A Characterization of the Chomsky Hierarchy by String Turing Machines

Context Free Languages and Pushdown Automata

Chapter 3. Describing Syntax and Semantics

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Formal Languages. Formal Languages

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS415 Compilers. Lexical Analysis

Programming Language Syntax and Analysis

Languages and Compilers

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Implementation of Lexical Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Lexical and Syntax Analysis

Week 2: Syntax Specification, Grammars

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Describing Syntax and Semantics

Syntax. In Text: Chapter 3

2. Lexical Analysis! Prof. O. Nierstrasz!

Formal Languages and Compilers Lecture VI: Lexical Analysis

CS 321 IV. Overview of Compilation

Structure of Programming Languages Lecture 3

Question Bank. 10CS63:Compiler Design

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Compiler Construction LECTURE # 3

Formal languages and computation models

Introduction to Compiler Construction

4. Lexical and Syntax Analysis

The Big Picture. Chapter 3

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Compilers and Interpreters

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Lexical Scanning COMP360

CSE 105 THEORY OF COMPUTATION

4. Lexical and Syntax Analysis

CSE302: Compiler Design

Turing Machine Languages

CS402 - Theory of Automata Glossary By

2068 (I) Attempt all questions.

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 403: Scanning and Parsing

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Part 3. Syntax analysis. Syntax analysis 96

Implementation of Lexical Analysis

Introduction to Compiler Construction

Transcription:

CS 441 Fall 2018 Notes Compiler - software that translates a program written in a source file into a program stored in a target file, reporting errors when found. Source Target program written in a High Level Language program translated into a Low Level Language, usually Assembly, Machine Code or Virtual Machine Code. Interpreter software that translates statements of a source (usually) one statement at a time into Machine Code, then executes the translated statement before translating/executing the next. Categories of Languages: Machine Code binary representation simple commands - differs by CPU model knowledge of the CPU/HW required to program in this language. Assembly text (easy for a programmer to read/write) representation simple commands (usually 1:1 with Machine Code) - differs by CPU model - knowledge of the CPU/HW required to program in this language. High Level Language has more-complex commands translates 1:many Machine Code instructions same language across CPU models and Operating Systems (with some minor differences) in depth knowledge of the CPU/HW not required to program in these languages. Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens. Parser extracts tokens from the Scanner and builds an Abstract Syntax Tree of the tokens, based on the rules of the language. Constrainer adds context-sensitive and other information to the tokens of the AST produced by the Parser, producing a Decorated Abstract Syntax Tree. This DAST represents the full meaning (Semantics) of the source program being translated. Code Generator given the DAST produced by the Constrainer, writes code in the Target Language with the same meaning (semantics) as the source program. String Table stores the exact spelling of identifiers and literals discovered by the scanner. Symbol Table stores semantic information on tokens. Lexeme the string representation of a single word or symbol extracted from the source. Token simplified (integer) representation of a Lexeme. It may be an object/structure containing members/fields describing a single word or symbol from the source. Optimization changes to the DAST and/or Target Code generated to make the Target program more efficient.

Grammar: defines the correct forms of sentences (programs) of a language. - Lexical Grammar: defines the correct forms for Lexemes in terms of characters. - Phrase/Structure Grammar: defines the correct formation of tokens into sentences (programs) for the language. Mathematically defined as (specific types of Grammars may have extensions/modifications): G = {, N, P, S } where - The Alphabet: a set of all possible terminals (individual words/characters) in a language - N Non-Terminals: as set of symbols that represent possible combinations of terminals/non terminals. - P - Productions: a set of rules where each Production specifies a string of terminals/nonterminals that can substitute for another set of terminals/non-terminals. Ex: axa bbyb means axa may be substituted with bbyb - S Goal Symbol: a special, single Non-Terminal (S N) that represents all possible valid sentences (programs) in a language. Derivation: a proof that a given sentence can be generated starting with the Goal Symbol followed by the application of productions producing the given sentence. Example: { a,b,c } traditionally Terminals are lower case letters and N = { S, X, Y, Z } Non-Terminals are upper case letters in the mathematical model P = { S ax S by X cx Z c Z X dz Y byb Y dz Z a } S = S Example: Derive: accda Tree Current Sent. Production Used Derivation S Start ax S ax acx X cx accx X cx accdz X dz accda Z a Terminology: Deterministic: when there is more than one production that can be used for a substitution, exactly one can be chosen by examining the sentence being derived for matching tokens. This usually implies no empty productions (N ) Non-Deterministic: when a grammar is NOT Deterministic by the above definition. Accepts: a grammar accepts a sentence when a derivation can be found for the sentence. Rejects: a grammar rejects a sentence when a derivation can NOT be found for the sentence.

Chomsky Hierarchy: a method for classifying Languages/Grammars by their complexity, indicating the type of automata (machine) that can recognize each level. Class Grammar Recognizer 0 Unrestricted Turing Machine 1 Context Sensitive Linear Bounded Automata 2 Context Free Push Down Automata 3 Regular Finite State Automata Classifying Grammars: - here symbols means terminals and/or non-terminals - LHS means Left Hand Side of the arrow of a Production - RHS means Right Hand Side of the arrow of a Production - LHS means the count of the number of symbols on the LHS - For example productions: = {a,b,c,d,e} N = {S,W,X,Y,Z} S is the Goal/Start Symbol Name Unrestricted Context Sensitive Context Free Regular Restrictions Productions may have any number of symbols on the LHS and RHS Ex: abc de - At least 1 Non-Terminal on the LHS - LHS <= RHS implying no N allowed (except maybe S ) - Ex: awxb YcdeZ - LHS is 1 Non-Terminal, nothing else - RHS can be anything - (N always allowed, but makes it Non-Deterministic) - Ex: X ayzb - LHS is 1 Non-Terminal, nothing else - RHS is 1 terminal, optionally followed by 1 Non-Terminal - implying no N allowed (except maybe S ) - Ex: X ay (X Ya is invalid) Automata: in general, consists of: - a tape on which symbols may written or read - a read/write head that accesses one position on the tape at any given time - Possible actions: read, write, Advance 1, Rewind 1 - A controller: consists of an implementation of a Grammar that controls the actions of the machine. The machine has a current state which is modified with the application of a Grammar Production. - Added options and restrictions define the particular type of Automata - Purpose: To Accept or Reject a sentence (string of terminals) as valid or invalid for the language.

Turing Machine: has - An infinite tape - Read/write/advance/rewind operations - Uses an Unrestricted Grammar - Not commonly used as they are inefficient for computer languages and may get into an infinite loop since they have infinite tape. Linear Bounded Automata: - Has a finite tape - Read/write/advance/rewind operations - Uses a Context Sensitive Grammar Push Down Automata: - Finite tape - Read, Advance, Rewind (but no Write) - Uses a Stack of symbols, starting with Push(S) - Uses a Context Free Grammar Finite State Automata: - Finite tape - Read only (and can read the tape only once) - Limited Memory (usually just Current State and Current Terminal, no stack) - Uses a Regular Grammar FSA Representations: - Formal (Mathematical): {, Q,, q 0, F} where - set of Terminals (called the Alphabet ) Q set of States (Non-Terminals) set of Productions where each represents a transition from one state to the next based on the current input Terminal. The formal form is: (currentstate, currentterminal) = nextstate q 0 start state q 0 Q F - set of Halt States F Q - Graphical: o Circles: States (double circle means Halt State) o Arrows between states with terminals represent productions - Table:

- Regular Expression Meta-Symbols: ab - means a followed immediately by b c* - means 0 or more c s c + - means 1 or more c s c b - means c or b (a b)*c - parenthesis can be used for grouping. Means a or b repeated 0 or more times, followed by a single c. - Algorithm: using two variables, with States, Terminals and Productions somehow encoded: State = startstate c = readnextchar() while state!= error and not end-of-file { state = get next state depending on state and c c = readnextchar() } if end-of-file and state is a halt state ACCEPT Else REJECT Also: See Hash Table notes posted on the course website