Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

Similar documents
Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 5. Exam 1 overview. Type checking basics. HW4 due. HW5 out, due in 2 Tuesdays

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 4. Syntax Analysis: Abstract Syntax Trees. HW3 due. HW4 out (due next Thurs)

Syntax. In Text: Chapter 3

Introduction to Lexical Analysis

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

B The SLLGEN Parsing System

A simple syntax-directed

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Programming Language Syntax and Analysis

Lexical Analysis. Introduction

Introduction to Parsing. Lecture 8

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

CPS 506 Comparative Programming Languages. Syntax Specification

Project 1: Scheme Pretty-Printer

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Operator Associativity & Precedence. AST Navigation. HW4 out (due next Thurs)

Formal Languages and Compilers Lecture VI: Lexical Analysis

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

CS 230 Programming Languages

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

DVA337 HT17 - LECTURE 4. Languages and regular expressions

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CMSC 330: Organization of Programming Languages

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Defining syntax using CFGs

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CSE 401 Midterm Exam 11/5/10

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Compiler principles, PS1

Chapter 3. Describing Syntax and Semantics ISBN

CMSC 330: Organization of Programming Languages

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

COSE312: Compilers. Lecture 1 Overview of Compilers

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CSCE 314 Programming Languages. Functional Parsers

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CS 403: Scanning and Parsing

Building Compilers with Phoenix

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Syntax-Directed Translation. Lecture 14

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

CMSC 330: Organization of Programming Languages. Context Free Grammars

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

2.2 Syntax Definition

A programming language requires two major definitions A simple one pass compiler

Introduction to Lexical Analysis

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Part 5 Program Analysis Principles and Techniques

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Optimizing Finite Automata

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Topic 3: Syntax Analysis I

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Functional Parsers

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Finite State Automata are Limited. Let us use (context-free) grammars!

Chapter 2 - Programming Language Syntax. September 20, 2017

Programming Language Definition. Regular Expressions

Properties of Regular Expressions and Finite Automata

CMSC 350: COMPILER DESIGN

The Structure of a Syntax-Directed Compiler

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

ICOM 4036 Spring 2004

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

LECTURE 3. Compiler Phases

COP 3402 Systems Software Syntax Analysis (Parser)

Lexical Analysis. Chapter 2

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

Languages and Compilers

Syntax Analysis, III Comp 412

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Transcription:

Today Quiz 2 Lexer design Syntax Analysis: Context-Free Grammars Assignments HW2 (out, due Tues) S. Bowers 1 of 15

Implementing a Lexer for MyPL (HW 2) Similar in spirit to HW 1 We ll create three classes: MyPLError for representing all MyPL errors Token token type, lexeme, line and column numbers Lexer mainly a next_token() function (mypl_error.py) (mypl_token.py) (mypl_lexer.py) We ll use Exceptions to handle errors In file mypl_error.py class MyPLError(Exception): def init (self, message, line, column): self.message = message self.line = line self.column = column def str (self): To throw an exception raise Error('oops!', 1, 10) S. Bowers 2 of 15

To handle an exception try: code that could cause an exception except Error as e: print(e) also see Lecture 2 S. Bowers 3 of 15

Representing Tokens (HW 2) Define a set of token-type constants (in mypl_token.py) SEMICOLON = 'SEMICOLON' LPAREN = 'LPAREN' RPAREN = 'RPAREN' EOS = 'EOS' # end of file/stream ($$) Define a Token class (in mypl_token.py) class Token(object): def init (self, tokentype, lexeme, line, column): def str (self): A token object just holds the information for one token A lexer object converts the program text to a stream of tokens Each token is returned by the lexer as a token object for example: the_token = the_lexer.next_token() S. Bowers 4 of 15

Lexical Analyzer (HW 2) We ll define a Lexer class (in mypl_lexer.py) the lexer is initialized with the file to scan the next_token() method returns the next token in the input for example: file_stream = open(filename, `r') the_lexer = lexer.lexer(file_stream) the_token = the_lexer.next_token() # open source file # create a lexer # get first token The basic structure of the Lexer class class Lexer(object): def init (self, input_stream): self.line = 1 self.column = 0 self.input_stream = input_stream def read(self): return self.input_stream.read(1) def peek(self): pos = self.input_stream.tell() symbol = self. read() self.input_stream.seek(pos) return symbol # stores current line # stores current column # returns one character # current stream position # read 1 character # return to old position def next_token(self): S. Bowers 5 of 15

Some hints for implementing next_token() Can check if end of file using peek if self. peek() == '': return Token(token.EOS, '', self.line, self.column) Use read() to build next token one charactor symbol at a time symbol = self. read() self.column += 1 Check if a character is a digit using the isdigit() function if symbol.isdigit(): build up curr_lexeme to hold number return Token(token.INT, curr_lexeme, self.line, self.column) Check if a character is a letter using the isalpha() function if symbol.isalpha(): Check if a character is whitespace using the isspace() function if symbol.isspace(): return self.next_token() S. Bowers 6 of 15

Use the symbol as the lexeme if the lexeme is unimportant if symbol == ';': return Token(token.SEMICOLON, symbol, self.line, self.column) Raise an exception if token can t be created s = 'unexpected symbol "%s"' % symbol raise MyPLError(s, self.line, self.column) You don t need to modularize your next_token() function it is okay in this case to have a relatively long function body you can modularize it if you want, but it isn t required for HW 2 S. Bowers 7 of 15

Compilation Steps (Syntax Analysis) Source Program Lexical Analysis Token Sequence Syntax Analysis Abstract Syntax Tree Front End Semantic Analysis Input Abstract Syntax Tree Computer Machine Code Machine Code Generation Intermediate Code Optimization Intermediate Code Intermediate Code Generation Output Back End Syntax Analysis: The Plan 1. Start with defining a language s syntax using formal grammars 2. Describe how to check syntax using recursive descent parsing 3. Extend parsing by adding abstract syntax trees (parse trees) S. Bowers 8 of 15

Formal Grammars Grammars define rules that specify a language s syntax a language here means a set of allowable strings Grammars can be used within Lexers (lexical analysis) Parsers (syntax analysis) e.g., numbers, strings, comments check if syntax is correct We ll look at context-free grammars won t get into all the details typically studied in a compiler or theory (formal languages) class S. Bowers 9 of 15

Grammar Rules Grammar rules define productions ( rewritings ) s a Here we say s produces (or yields ) a s is a non-terminal symbol (LHS of a rule) sometimes as s or S a is a terminal symbol terminal and non-terminal symbols are disjoint set of terminals is the alphabet of the language often there is a distinguished start symbol Concatenation s ab s yields the string a followed by the string b t uv u a v b t yields the same exact string as s S. Bowers 10 of 15

Alternation s a b s yields the string a or b the same as: s a s b The empty string s a ɛ ɛ denotes the special empty terminal s yields either the string a or '' (empty string) Kleene Star (Closure) s a s yields the strings with zero or more a s S. Bowers 11 of 15

Recursion s (s) () s yields the strings of well balanced parentheses note that this is not possible to express using (closure) the opposite is true through can be implemented using recursion (w/ the empty string ) Q: How can we represent s aa using recursion? s a as sometimes denoted as s a + Exercise: Try to define some languages (exercise sheet) S. Bowers 12 of 15

MyPL Constants Using grammar rules to define constant values BOOLVAL true false INTVAL pdigit digit 0 FLOATVAL INTVAL. digit digit STRINGVAL " character " ID letter ( letter digit _ ) letter a z A Z pdigit 1 9 digit 0 pdigit where character is any symbol (letter, number, etc.) except " S. Bowers 13 of 15

Parsing: An example grammar Simple list of assignment statements stmt_list stmt stmt ';' stmt_list stmt var '=' expr var 'A' 'B' 'C' expr var var '+' var var '-' var Note: many possible grammars for our language! We can use grammars to generate strings (derivations) 1. choose a production (start symbol on left-hand side) 2. replace with right-hand side 3. pick a non-terminal N and production with N on left side 4. replace N with production s right-hand side 5. continue until only terminals remain S. Bowers 14 of 15

Example derivation of A = B + C; B = A stmt_list stmt ; stmt_list var = expr ; stmt_list A = expr ; stmt_list A = var + var ; stmt A = B + var ; stmt_list A = B + C ; stmt_list A = B + C ; stmt A = B + C ; var = expr A = B + C ; B = expr A = B + C ; B = var A = B + C ; B = C This is a left-most derivation derived the string by replacing left-most non-terminals The opposite is a right-most derivation stmt_list stmt ; stmt_list var = expr ; stmt_list var = expr ; stmt var = expr ; var = expr... Sometimes write for a multi-step derivation e.g.: stmt_list var = expr ; var = expr S. Bowers 15 of 15