CS 314 Principles of Programming Languages. Lecture 3

Similar documents
CS 314 Principles of Programming Languages

CPSC 434 Lecture 3, Page 1

Formal Languages and Compilers Lecture VI: Lexical Analysis

Part 5 Program Analysis Principles and Techniques

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS 314 Principles of Programming Languages

Lexical Scanning COMP360

Week 2: Syntax Specification, Grammars

Lexical Analysis. Introduction

CS415 Compilers. Lexical Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Non-deterministic Finite Automata (NFA)

Formal Languages. Formal Languages

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis 1 / 52

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Theory and Compiling COMP360

Introduction to Lexical Analysis

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSc 453 Lexical Analysis (Scanning)

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Lexical Analyzer Scanner

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Lexical Analyzer Scanner

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Compiler Design Overview. Compiler Design 1

CSCI312 Principles of Programming Languages!

Compiling Regular Expressions COMP360

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3 Lexical Analysis

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Lecture 2-4

CSE 3302 Programming Languages Lecture 2: Syntax

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compiler Construction LECTURE # 3

CSCE 314 Programming Languages

Compiler course. Chapter 3 Lexical Analysis

Chapter 2 :: Programming Language Syntax

Implementation of Lexical Analysis

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

COP 3402 Systems Software Syntax Analysis (Parser)

Zhizheng Zhang. Southeast University

Implementation of Lexical Analysis

Lecture 4: Syntax Specification

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

2068 (I) Attempt all questions.

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CSE 401/M501 Compilers

CSE302: Compiler Design

CST-402(T): Language Processors

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

2. Lexical Analysis! Prof. O. Nierstrasz!

CMSC 330: Organization of Programming Languages

A simple syntax-directed

Introduction to Lexing and Parsing

CS 441G Fall 2018 Exam 1 Matching: LETTER

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. Chapter 2

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Dr. D.M. Akbar Hussain

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3. Parsing #1

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Languages and Compilers

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

UNIT -2 LEXICAL ANALYSIS

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

CS308 Compiler Principles Lexical Analyzer Li Jiang

Figure 2.1: Role of Lexical Analyzer

CMSC 330: Organization of Programming Languages

Lexical Analysis (ASU Ch 3, Fig 3.1)

CMSC 330: Organization of Programming Languages

Principles of Programming Languages COMP251: Syntax and Grammars

Lexical Analysis. Lecture 3. January 10, 2018

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Lexical Analysis. Lecture 3-4

Transcription:

CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University

Class Information First homework is due this coming Monday. Submission window is open on Sakai. Recitation starts this week. Office hours for TAs of all three sections start this week. http://www.cs.rutgers.edu/courses/314/classes/ fall 2016 zhang/ Zheng Zhang 2 CS@Rutgers University

Review: Front end of a compiler tokens source code scanner parser intermediate representation errors Parser: syntax & semantic analyzer, il code generator (syntax-directed translator) Front End Responsibilities: recognize legal programs report errors produce il preliminary storage map shape the code for the back end Much of front end construction can be automated Zheng Zhang 3 CS@Rutgers University

Review: Syntax and Semantics of Prog. Languages The syntax of programming languages is often defined in two layers: tokens and sentences. tokens basic units of the language Question: How to spell a token (word)? Answer: regular expressions sentences legal combination of tokens in the language Question: How to build correct sentences with tokens? Answer: (context-free) grammars (CFG) E.g., Backus-Naur form (BNF) is a formalism used to express the syntax of programming languages. Zheng Zhang 4 CS@Rutgers University

Review: Formalisms for Lexical and Syntactic Analysis 1. Lexical Analysis: Converts source code into sequence of tokens. 2. Syntax Analysis: Structures tokens into parse tree. Two issues in Formal Languages: Language Specification formalism to describe what a valid program (sentence) looks like. Language Recognition formalism to describe a machine and an algorithm that can verify that a program is valid or not. For (2), we use context-free grammars to specify programming languages. Note: recognition, i.e., parsing algorithms using PDAs (push-down automata) will be covered in CS415. For (1), we use regular grammars/expressions for specification and finite (state) automata for recognition. Zheng Zhang 5 CS@Rutgers University

Review: Lexical Analysis (Scott 2.1, 2.2) character sequence i f a < = b t h e n c : = 1 scanner if then id <a> id <c> <= := id <b> num <1> token sequence Tokens (Terminal Symbols of CFG, Words of Lang.) Smallest atomic units of syntax Used to build all the other constructs Example, C: keywords: for if goto volatile... = * / - < > = <= >= <> ( ) [ ] ; :=.,... number (Example: 3.14 28... ) identifier (Example: b square addentry... ) Zheng Zhang 6 CS@Rutgers University

Review: Lexical Analysis (cont.) Identifiers Names of variables, etc. Sequence of terminals of restricted form; Example, C: A31, but not 1A3 Upper/lower case sensitive? Keywords Special identifiers which represent tokens in the language May be reserved (reserved words) or not E.g., C: if is reserved. E.g., FORTRAN: if is not reserved. Delimiters When does character string for token end? Example: identifiers are longest possible character sequence that does not include a delimiter Few delimiters in Fortran (not even ) DO I = 1.5 same as DOI=1.5 Most languages have more delimiters such as, new line, keywords,... Zheng Zhang 7 CS@Rutgers University

Review: Regular Expressions A syntax (notation) to specify regular languages. RE r Language L(r) a ɛ r s rs {a} {ɛ} L(r) L(s) {rs r L(r), s L(s)} r + L(r) L(rr) L(rrr)... (any number of r s concatenated) r* {ɛ} L(r) L(rr) L(rrr)... (r* = r + ɛ) ( s ) L(s) Zheng Zhang 8 CS@Rutgers University

Review: Examples of Expressions RE Language a bc (a b)c aɛ a* b ab* ab* c + (a b)* (0 1)*1 Zheng Zhang 9 CS@Rutgers University

Review: Examples of Expressions - Solution RE a bc (a b)c aɛ a* b Language {a, bc} {ac, bc} {a} {ɛ, a, aa, aaa, aaaa,...} {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,...} {c, cc, ccc,...} (a b)* {ɛ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*1 binary numbers ending in 1 Zheng Zhang 10 CS@Rutgers University

Review: Regular Expressions for Programming Languages Let letter stand for A B C... Z Let digit stand for 0 1 2... 9 integer constant: identifier: real constant: Zheng Zhang 11 CS@Rutgers University

Review: Recognizers for Regular Expressions Example 1: integer constant RE: digit + FSA: digit S0 digit S2 Example 2: identifier RE: letter ( letter digit ) FSA: S0 letter letter S2 Zheng Zhang 12 CS@Rutgers University digit

Example 3: Real constant RE: digit.digit + FSA: digit S0. digit S1 digit S2 Zheng Zhang 13 CS@Rutgers University

Review: Finite State Automata S1 0 0 S0 S3 1 S2 1 A Finite-State Automaton is a quadruple: < S, s, F, T > S is a set of states, e.g., {S0, S1, S2, S3} s is the start state, e.g., S0 F is a set of final states, e.g., {S3} T is a set of labeled transitions, of the form (state, input) state [i.e., S Σ S] Zheng Zhang 14 CS@Rutgers University

Review: Finite State Automata Transitions can be represented using a transition table: 0 1 Input State S0 S1 S2 S1 S3 - S2 - S3 An FSA accepts or recognizes an input string iff there is some path from its start state to a final state such that the labels on the path are that string. Lack of entry in the table (or no arc for a given character) indicates an error reject. Zheng Zhang 15 CS@Rutgers University

Review: Practical Recognizers recognizer should be a deterministic finite automaton (DFA) read until the end of a token report errors (error recovery?) identifier letter (a b c... z A B C... Z) digit (0 1 2 3 4 5 6 7 8 9) id letter (letter digit) letter / digit S0 letter S1 other S2 other / digit S3 Recognizer for identifier: (transition diagram) Zheng Zhang 16 CS@Rutgers University

Review: Implementation: Tables for the recognizer Two tables control the recognizer. char class: a z A Z 0 9 other value letter letter digit other next state: class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 To change languages, we can just change tables. Zheng Zhang 17 CS@Rutgers University

Review: Implementation: Code for the recognizer char next char(); state S0; /* code for S0 */ done false; token value "" /* empty string */ while( not done ) { class char class[char]; state next state[class,state]; switch(state) { case S1: /* building an id */ token value token value + char; char next char(); if (char == EOF) done = true; break; case S2: /* error state */ case S3: /* error */ token type = error; done = true; break; } } return token type; Zheng Zhang 18 CS@Rutgers University

Review: Improved efficiency Table driven implementation is slow relative to direct code. Each state transition involves: 1. classifying the input character 2. finding the next state 3. an assignment to the state variable 4. a trip through the case statement logic 5. a branch (while loop) We can do better by encoding the state table in the scanner code. 1. classify the input character 2. test character class locally 3. branch directly to next state This takes many fewer instructions. Zheng Zhang 19 CS@Rutgers University

Review: Implementation: Faster scanning S0: char next char(); token value "" /* empty string */ class char class[char]; if (class!= letter) goto S3; S1: token value token value + char; char next char(); class char class[char]; if (class!= other) goto S1; if (char == DELIMITER ) token type = identifier; return token type; goto S2; S2: S3: token type error; return token type; Zheng Zhang 20 CS@Rutgers University