CSCI312 Principles of Programming Languages!

Similar documents
Programming Languages 2nd edition Tucker and Noonan"

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

Lexical and Syntax Analysis

High Level Languages. Java (Object Oriented) This Course. Jython in Java. Relation. ASP RDF (Horn Clause Deduction, Semantic Web) Dr.

CSCI312 Principles of Programming Languages!

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Introduction to Lexing and Parsing

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COP 3402 Systems Software Syntax Analysis (Parser)

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Compiler phases. Non-tokens

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical and Syntax Analysis

CPS 506 Comparative Programming Languages. Syntax Specification

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax Intro and Overview. Syntax

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Building Compilers with Phoenix

22c:111 Programming Language Concepts. Fall Syntax III

Introduction to Parsing. Lecture 8

Syntax Analysis Part I

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSCE 314 Programming Languages

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CSE 3302 Programming Languages Lecture 2: Syntax

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CSE 401/M501 Compilers

CS 314 Principles of Programming Languages. Lecture 3

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE302: Compiler Design

Lexical Scanning COMP360

Syntax. 2.1 Terminology

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 4. Lexical and Syntax Analysis

Lexical and Syntax Analysis. Top-Down Parsing

Introduction to Lexical Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

CS 314 Principles of Programming Languages

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lexical Analysis. Introduction

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Recursive Descent Parsers

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

CS 314 Principles of Programming Languages

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Week 2: Syntax Specification, Grammars

Writing a Simple DSL Compiler with Delphi. Primož Gabrijelčič / primoz.gabrijelcic.org

Structure of Programming Languages Lecture 3

Grammars and Parsing. Paul Klint. Grammars and Parsing

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Compiler Construction

Compiling Regular Expressions COMP360

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. Chapter 2

WARNING for Autumn 2004:

Lexical and Syntax Analysis

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Specifying Syntax COMP360

Introduction to Parsing Ambiguity and Syntax Errors

Compiler Construction D7011E

Habanero Extreme Scale Software Research Project

CT32 COMPUTER NETWORKS DEC 2015

Introduction to Parsing Ambiguity and Syntax Errors

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 536 Midterm Exam Spring 2013

A simple syntax-directed

CSc 453 Compilers and Systems Software

Lecture 4: Syntax Specification

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

A programming language requires two major definitions A simple one pass compiler

UNIT -2 LEXICAL ANALYSIS

Theory and Compiling COMP360

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

CS 415 Midterm Exam Spring SOLUTION

Homework. Lecture 7: Parsers & Lambda Calculus. Rewrite Grammar. Problems

4. Lexical and Syntax Analysis

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Compiler Construction LECTURE # 3

Lexical Analysis. Lecture 2-4

CS 403: Scanning and Parsing

Lexical Analysis. Lecture 3-4

Syntax Analysis Check syntax and construct abstract syntax tree

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

4. Lexical and Syntax Analysis

Transcription:

CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu

Recap! Copyright 2006 The McGraw-Hill Companies, Inc.

Clite: Lexical Syntax! Input: a stream of characters from the ASCII set, keyed by a programmer. Output: a stream of tokens or basic symbols, classified as follows: Identifiers e.g., Stack, x, i, push! Literals e.g., 123, 'x', 3.25, true Keywords bool char else false float if int!!!!!main true while Operators = && ==!= < <= > >= + - * /!! Punctuation ;, { } ( )

Clite: Concrete Syntax! Based on a parse of its Tokens ; is a statement terminator (Algol-60, Pascal use ; as a separator) Rule for IfStatement is ambiguous: The else ambiguity is resolved by connecting an else with the last encountered else-less if. [Stroustrup, 1991]

Abstract Syntax Tree for z = x + 2*y; Fig. 2.10! Copyright 2006 The McGraw-Hill Companies, Inc.

Compilers and Interpreters! Source Program Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Machine Code

Contents!! 3.1 Chomsky Hierarchy! 3.2 Lexical Analysis! 3.3 Syntactic Analysis!

3.1 Chomsky Hierarchy! Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar

Regular Grammar! Simplest; least powerful Equivalent to: Regular expression Finite-state automaton Right regular grammar: ω T*, B N A ω B A ω

Example! Integer 0 Integer 1 Integer... 9 Integer 0 1... 9

Regular Grammars! Left regular grammar: equivalent Used in construction of tokenizers Less powerful than context-free grammars Not a regular language { aⁿ bⁿ n 1 } i.e., cannot balance: ( ), { }, begin end A = a B b B = a B b ε A = a A b ε

Context-free Grammars! BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers

Context-Sensitive Grammars! Production: α β α β α, β (N T)* i.e., lefthand side can be composed of strings of terminals and nonterminals

Undecidable Properties of CSGs! Given a string ω and grammar G: ω L(G) L(G) is non-empty Defn: Undecidable means that you cannot write a computer program that is guaranteed to halt to decide the question for all ω L(G).

Unrestricted Grammar! Equivalent to: Turing machine von Neumann machine C++, Java That is, can compute any computable function.

Review: Compilers and Interpreters! Source Program Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Machine Code

Lexical Analysis! Purpose: transform program representation Input: printable ASCII characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol.

Example Tokens! Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char... Operators: + - * /... Punctuation: ;, ( ) { }

Other Sequences! Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file

Why a Separate Phase?! Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs

Regular Expressions! RegExpr Meaning x a character x \x an escaped character, e.g., \n { name } a reference to a name M N M or N M N M followed by N M* zero or more occurrences of M

! RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits. Any single character

Clite Lexical Syntax! Category Definition anychar [ -~] Letter [a-za-z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004

Category! Definition Keyword bool char else false float if int main true while Identifier {Letter}({Letter} {Digit})* integerlit {Digit}+ floatlit charlit {Digit}+\.{Digit}+ {anychar}

! Category Definition Operator = && ==!= < <= > >= + - * /! [ ] Separator :. { } ( )! Comment // ({anychar} {Whitespace})* {eol}!

Generators! Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex

Finite State Automata! Set of states: representation graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final states

Deterministic FSA! Defn: A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.

! A Finite State Automaton for Identifiers What is a non-deterministic FSA?

Definitions! A configuration on an fsa consists of a state and the remaining input. A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then: If no input and state is final, then accept. Otherwise, error.

! An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.

Example! (S, a2i$) (I, 2i$) (I, i$) (I, $) (F, ) Thus: (S, a2i$) * (F, )

Chomsky Hierarchy! Regular grammar least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar

Contents!! 3.1 Chomsky Hierarchy! 3.2 Lexical Analysis! 3.3 Syntactic Analysis!

Review: Compilers and Interpreters! Source Program Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Machine Code

Syntactic Analysis! Phase also known as: parser Purpose is to recognize source structure Input: tokens Output: parse tree or abstract syntax tree A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal.