The Lexical Structure of Verdi TR Mark Saaltink. Release date: July 1994

Size: px
Start display at page:

Download "The Lexical Structure of Verdi TR Mark Saaltink. Release date: July 1994"

Transcription

1 The Lexical Structure of Verdi TR Mark Saaltink Release date: July 1994 ORA Canada 267 Richmond Road, Suite 100 Ottawa, Ontario K1Z 6X3 CANADA

2 Verdi Compiler Project TR This report formally denes the lexical structure of Verdi [1, 3], using the Z notation [4]. This formulation of lexical structure is similar to the denition in the formal denition of Turing [2]. The grammar of Verdi, like that of most programming languages, is most conveniently described using two distinct phases: lexical analysis and parsing. Verdi programs are composed of characters. Lexical analysis transforms this sequence of characters to a sequence of tokens. This sequence of tokens is then parsed according to a context-free grammar. 1 Changes to Verdi The original description of Verdi had a lexical structure that was ambiguous; the string "\141" could be interpreted as "a" or as "141", depending on whether the escape was interpreted as a numeric representation or as a single character escape. This ambiguity has been removed by restricting the characters that may appear in a single character escape. 2 Characters Verdi programs are written in the ASCII character set. The set CHAR comprises some representation of these characters. [CHAR] Function char code is a bijection between CHAR and small numbers; for a character c, char code(c) is the ASCII code of c. char code : CHAR! char val : ! CHAR char val = char code 01 We will write characters inside single quotes, for example `a' denotes the lower case letter \a", which is also the value of char val(97). We use the usual names for the common formatting characters: cr; lf ; ; sp; tab : CHAR cr = char val(13) lf = char val(10) = char val(12) sp = char val(32) tab = char val(9) Several sets of characters are used in the denitions below. The visible characters, with codes between 33 and 126 inclusive, have associated glyphs; the graphic characters also include space. The blank characters are formatting characters that leave \white space". Carriage return and line feed characters are used to end lines. Visible; Graphic; Blank; EndLine : P CHAR Visible = char val(j j) Graphic = char val(j j) = Visible [ fspg Blank = fcr; lf ; ; sp; tabg EndLine = fcr; lf g

3 Verdi Compiler Project TR Several kinds of digits are used in the description of numerals: BinaryDigit; OctalDigit; Digit; HexDigit : P CHAR BinaryDigit = f`0'; `1'g = char val(j j) OctalDigit = f`0'; `1'; `2'; `3'; `4'; `5'; `6'; `7'g = char val(j j) Digit = OctalDigit [ f`8'; `9'g = char val(j j) HexDigit = Digit [ f`a'; `b'; `c'; `d'; `e'; `f'; `A'; `B'; `C'; `D'; `E'; `F'g The escape characters have special uses in character literals and strings: EscapeChar : P CHAR EscapeTable : CHAR 7! CHAR EscapeChar = dom EscapeTable EscapeTable = f`b' 7! char val(8); `d' 7! char val(127); `l' 7! lf ; `n' 7! lf ; `p' 7! ; `r' 7! cr; `s' 7! sp; `t' 7! tab; `"' 7! `"'; `\' 7! `\'g 3 Lexical Units Two kinds of lexical units are used: tokens and separators. LexicalUnit b= Token [ Separator 3.1 Tokens Tokens are certain sequences of characters. There are four classes of tokens in Verdi: numerals, identiers, character literals, and strings; in addition, the left and right parentheses are special tokens. Token : P(seq CHAR) Token = Numeral [ Identier [ Character literal [ String literal [ fh`('i; h`)'ig 3.2 Attributes We will associate an \attribute" with each token. This attribute is used in the description of the semantics of Verdi, or else is used to dene the abstract syntax corresponding to the concrete syntax. The attribute of a numeral is the number it represents. The attribute of an identier is a name, which has been normalized (by converting upper case letters to lower case). The attribute of a character literal is a character. The attribute of a string is a sequence of characters. Parentheses have no attributes. Attribute :== numhhzii j identhhseq CHARii j charhhcharii j stringhhseq CHARii j openparen j closeparen Function attr gives the attribute of a token. This function is composed in the obvious way from functions determining the attribute value for each type of token. These individual functions will be dened below.

4 Verdi Compiler Project TR attr : Token! Attribute attr = (num num attr) [ (ident ident attr) [ (char char attr) [ (string string attr) [ fh`('i 7! openparen; h`)'i 7! closepareng 3.3 Numerals Numeral : P(seq CHAR) Numeral = seq 1 Digit [ f s : seq 1 Digit h`-'i a s g [ f s : seq 1 BinaryDigit; r : f`b'; `B'g h`#'; ri a s g [ f s : seq 1 OctalDigit; r : f`o'; `O'g h`#'; ri a s g [ f s : seq 1 HexDigit; r : f`h'; `H'g h`#'; ri a s g The value of a numeral is calculated in the obvious way. value in radix : Z 2 seq HexDigit! Z digit value : HexDigit! Z value in radix(r; hi) = 0 value in radix(r; s a hdi) = r 3 value in radix(r; s) + digit value(d) digit value = f`0' 7! 0;... ; `9' 7! 9; `a' 7! 10; `A' 7! 10;... ; `f' 7! 15; `F' 7! 15g num attr : Numeral! Z 8 s : seq 1 Digit num attr(s) = value in radix(10; s) 8 s : seq 1 Digit num attr(h`-'i a s) = 0value in radix(10; s) 8 s : seq 1 BinaryDigit; r : f`b'; `B'g num attr(h`#'; ri a s g = value in radix(2; s) 8 s : seq 1 OctalDigit; r : f`o'; `O'g num attr(h`#'; ri a s g = value in radix(8; s) 8 s : seq 1 HexDigit; r : f`h'; `H'g num attr(h`#'; ri a s g = value in radix(16; s) 3.4 Identiers Identier : P(seq CHAR) Identier = (seq 1 X ) n Numeral where X = Visible n f`('; `)'; `"'; `''; ``'; `;'; `#'; g It is not immediately obvious that this lexical class can be dened by a regular expression. Clearly, though, seq 1 X is denable by a regular expression, as is Numeral. Furthermore, the set of regular languages is closed under complementation and intersection. So, we can conclude that Identier can be specied by a regular expression. Once we think to look for it, it is easy to nd: Identier ::= `-' j XV 3 j (D j `-')D 3 NV 3 where V = Visible n f`('; `)'; `"'; `''; ``'; `;'; `#'; g D = Digit N = V n Digit X = N n f`-'g

5 Verdi Compiler Project TR The attribute associated with an identier is derived by converting all letters in the identier to lower case. (We could equally well convert them to upper case; all that matters is that case is normalized.) ident attr : Identier! Identier lower : CHAR! CHAR ident attr = i : Identier (lower i) lower = (id CHAR) 8 f`a' 7! `a';...; `Z' 7! `z'g 3.5 Character Literals Character literal : P(seq CHAR) Escape : P(seq Char) Character literal = f c : Graphic j c 6= `\' h`''; c; `''i g [ f s : Escape h`''i a s a h`''i g Escape = f c : EscapeChar h`\'; ci g [ f s : seq OctalDigit j #s = 3 h`\'i a s g char attr : Character literal! CHAR escape value : Escape! CHAR 8 c : Graphic char attr(h`''; c; `''i) = c 8 s : Escape char attr(h`''i a s a h`''i) = escape value(s) 8 c : EscapeChar escape value(h`\'; ci) = EscapeTable(c) 8 s : seq OctalDigit j #s = 3 escape value(h`\'i a s) = char val(value in radix(8; s)) 3.6 String Literals String literal : P(seq CHAR) String element : P(seq CHAR) String literal = fs : seq String element h`"'i a ( a = s) a h`"'i g String element = f c : (Graphic n f`"'; `\'g) hci g [ Escape A given string literal can be divided into elements in only one way: Lemma 1 Suppose e; e 0 : seq String element, and suppose a = e = a = e 0. Then e = e 0. This property makes it easy to nd the attribute of a string literal. Each element is interpreted as (the body of) a character literal: string attr : String literal! seq CHAR element char : String element! CHAR 8 s : String literal; x : seq String element j s = h`"'i a ( a = x) a h`"'i string attr(s) = element char x 8 e : String element element char(e) = char attr(h`''i a e a h`''i)

6 Verdi Compiler Project TR Separators Separators are either whitespace or comments. Separator : P(seq CHAR) Whitespace : P(seq CHAR) Comment : P(seq CHAR) Separator = Whitespace [ Comment Whitespace = f c : Blank hci g Comment = f c : EndLine; s : seq(char n EndLine) h`;'i a s a hci g 4 Tokenization A sequence of characters is tokenized by rst dividing it into tokens and separators, then throwing away the separators. The rst stage introduces possible ambiguities: the sequence h`1'; `2'i can be divided into two one-digit numbers, or into a single number; similarly, the sequence h`a'; `b'i can be divided into a single identier, or two identiers. A simple principle (called maximum munch in [2]) resolves such problems: each lexical unit must be as long as possible. To put it another way, a sequence of characters is divided into two or more tokens only if necessary. This principle can be formalized as a property of a sequence of lexical units: each unit in the sequence must be the longest possible lexical unit that is a prex of the input. This is easily described formally (note that the input is obtained by concatenating the lexical units in the sequence): Maximal : P(seq LexicalUnit) 8 u : seq LexicalUnit Maximal u, (#u > 0 ) Maximal (tail(u)) ^ 8 t : LexicalUnit j t a = u t head(u)) Tokenization is a relation between an input sequence of characters and an output sequence of tokens, dened according to the above description: Tokenize : seq CHAR $ seq Token Tokenize = f u : seq LexicalUnit j Maximal u ( a = u; u TOKEN ) g Tokenize is, in fact, a partial function. We rst show that the maximal munch principle uniquely determines the lexical tokens comprising a sequence (this is also proven in [2]): Lemma 2 Suppose u; u 0 : seq LexicalUnit are both maximal, and a = u = a = u 0. Then u = u 0. We can prove this by inducting on the length of u. Suppose rst that a = u is empty. Since <>62 LexicalUnit, u 0 must be empty as well. Therefore, we may assume that both u and u 0 are nonempty. Put h = head(u) and h 0 = head(u 0 ). Then h and h 0 are lexical units, and By the maximality of u 0, h a = u = a = u 0 : 8 t : LexicalUnit j t a = u 0 t head(u 0 ):

7 Verdi Compiler Project TR Therefore, instantiating t to h, we have h head(u 0 ) = h 0 : Similarly, since u is maximal, we can show h 0 h. Therefore h = h 0. By the denition of maximality, both tail(u) and tail(u 0 ) are maximal. Moreover, since a = u = a = u 0 and head(u) = head(u 0 ), we have a = tail(u) = a = tail(u 0 ). So by the induction hypothesis, tail(u) = tail(u 0 ), and thus u = u 0. Lemma 3 Tokenize is a function. Suppose (s; t) 2 Tokenize and (s; t 0 ) 2 Tokenize. We must show t = t 0. By the denition of Tokenize and the hypotheses, we have and 9 u : seq LexicalUnit j Maximal u s = a = u ^ t = u Token 9 u 0 : seq LexicalUnit j Maximal u 0 s = a = u 0 ^ t 0 = u 0 Token But Lemma 2 shows that u = u 0, so clearly t = t 0. We should make sure that the maximum munch principle does not resolve too much ambiguity. In fact, this is so: the only possible ambiguities it resolves are as in the above examples, where identiers or numerals are arbitrarily divided: Lemma 4 Suppose t and t 0 are tokens, t 6= t 0, and t t 0. Then t and t 0 are both in the set Identier [ Numeral. The proof is a rather tedious consideration of cases. References [1] Dan Craigen. Reference manual for the language Verdi. Technical Report TR , Odyssey Research Associates, February [2] R. C. Holt, P. A. Matthews, J. A. Rosselet, and J. R. Cordy. The Turing Programming Language: Design and Denition. University of Toronto (Draft of June 1985). [3] Mark Saaltink. A formal description of Verdi. Technical Report TR a, Odyssey Research Associates, November [4] J. M. Spivey. The Z Notation: A Reference Manual. Prentice Hall, 1989.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Compiler Techniques MN1 The nano-c Language

Compiler Techniques MN1 The nano-c Language Compiler Techniques MN1 The nano-c Language February 8, 2005 1 Overview nano-c is a small subset of C, corresponding to a typical imperative, procedural language. The following sections describe in more

More information

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Lecture 3. January 10, 2018 Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re

More information

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation Concepts Introduced in Chapter 2 A more detailed overview of the compilation process. Parsing Scanning Semantic Analysis Syntax-Directed Translation Intermediate Code Generation Context-Free Grammar A

More information

Language Reference Manual simplicity

Language Reference Manual simplicity Language Reference Manual simplicity Course: COMS S4115 Professor: Dr. Stephen Edwards TA: Graham Gobieski Date: July 20, 2016 Group members Rui Gu rg2970 Adam Hadar anh2130 Zachary Moffitt znm2104 Suzanna

More information

Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata #1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata #1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical

More information

A simple syntax-directed

A simple syntax-directed Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

Programming in C++ 4. The lexical basis of C++

Programming in C++ 4. The lexical basis of C++ Programming in C++ 4. The lexical basis of C++! Characters and tokens! Permissible characters! Comments & white spaces! Identifiers! Keywords! Constants! Operators! Summary 1 Characters and tokens A C++

More information

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1 Introduction to Automata Theory BİL405 - Automata Theory and Formal Languages 1 Automata, Computability and Complexity Automata, Computability and Complexity are linked by the question: What are the fundamental

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Lexical Analysis. Chapter 2

Lexical Analysis. Chapter 2 Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

DVA337 HT17 - LECTURE 4. Languages and regular expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions DVA337 HT17 - LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning

More information

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively Regular expressions: a regular expression is built up out of simpler regular expressions using a set of defining rules. Regular expressions allows us to define tokens of programming languages such as identifiers.

More information

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np Chapter 1: Introduction Introduction Purpose of the Theory of Computation: Develop formal mathematical models of computation that reflect real-world computers. Nowadays, the Theory of Computation can be

More information

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages! CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2010 Handout Decaf Language Tuesday, Feb 2 The project for the course is to write a compiler

More information

Typescript on LLVM Language Reference Manual

Typescript on LLVM Language Reference Manual Typescript on LLVM Language Reference Manual Ratheet Pandya UNI: rp2707 COMS 4115 H01 (CVN) 1. Introduction 2. Lexical Conventions 2.1 Tokens 2.2 Comments 2.3 Identifiers 2.4 Reserved Keywords 2.5 String

More information

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics Recall Architecture of Compilers, Interpreters CMSC 330: Organization of Programming Languages Source Scanner Parser Static Analyzer Operational Semantics Intermediate Representation Front End Back End

More information

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical Analysis Lookahead Ambiguity Specifying Lexers Regular Expressions

More information

Advanced Algorithms and Computational Models (module A)

Advanced Algorithms and Computational Models (module A) Advanced Algorithms and Computational Models (module A) Giacomo Fiumara giacomo.fiumara@unime.it 2014-2015 1 / 34 Python's built-in classes A class is immutable if each object of that class has a xed value

More information

Sprite an animation manipulation language Language Reference Manual

Sprite an animation manipulation language Language Reference Manual Sprite an animation manipulation language Language Reference Manual Team Leader Dave Smith Team Members Dan Benamy John Morales Monica Ranadive Table of Contents A. Introduction...3 B. Lexical Conventions...3

More information

Supplementary Notes on Abstract Syntax

Supplementary Notes on Abstract Syntax Supplementary Notes on Abstract Syntax 15-312: Foundations of Programming Languages Frank Pfenning Lecture 3 September 3, 2002 Grammars, as we have discussed them so far, define a formal language as a

More information

JME Language Reference Manual

JME Language Reference Manual JME Language Reference Manual 1 Introduction JME (pronounced jay+me) is a lightweight language that allows programmers to easily perform statistic computations on tabular data as part of data analysis.

More information

1. Suppose you are given a magic black box that somehow answers the following decision problem in polynomial time:

1. Suppose you are given a magic black box that somehow answers the following decision problem in polynomial time: 1. Suppose you are given a magic black box that somehow answers the following decision problem in polynomial time: Input: A CNF formula ϕ with n variables x 1, x 2,..., x n. Output: True if there is an

More information

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011 CS152: Programming Languages Lecture 2 Syntax Dan Grossman Spring 2011 Finally, some formal PL content For our first formal language, let s leave out functions, objects, records, threads, exceptions,...

More information

The MaSH Programming Language At the Statements Level

The MaSH Programming Language At the Statements Level The MaSH Programming Language At the Statements Level Andrew Rock School of Information and Communication Technology Griffith University Nathan, Queensland, 4111, Australia a.rock@griffith.edu.au June

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

The Pencil Reference Manual

The Pencil Reference Manual The Pencil Reference Manual Christopher Conway Cheng-Hong Li Megan Pengelly November 7 2002 1 Grammar Notation Grammar symbols are defined as they are introduced in this document. Regular expression notation

More information

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 Andrew W. Appel 1 James S. Mattson David R. Tarditi 2 1 Department of Computer Science, Princeton University 2 School of Computer

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler,

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler, J. Functional Programming 1 (1): 1{000, January 1993 c 1993 Cambridge University Press 1 F U N C T I O N A L P E A R L S Monadic Parsing in Haskell Graham Hutton University of Nottingham Erik Meijer University

More information

Programming Languages Third Edition

Programming Languages Third Edition Programming Languages Third Edition Chapter 12 Formal Semantics Objectives Become familiar with a sample small language for the purpose of semantic specification Understand operational semantics Understand

More information

CS525 Winter 2012 \ Class Assignment #2 Preparation

CS525 Winter 2012 \ Class Assignment #2 Preparation 1 CS525 Winter 2012 \ Class Assignment #2 Preparation Ariel Stolerman 2.26) Let be a CFG in Chomsky Normal Form. Following is a proof that for any ( ) of length exactly steps are required for any derivation

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4 CS321 Languages and Compiler Design I Winter 2012 Lecture 4 1 LEXICAL ANALYSIS Convert source file characters into token stream. Remove content-free characters (comments, whitespace,...) Detect lexical

More information

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM. Syntax Analysis Announcements Written Assignment 1 out, due Friday, July 6th at 5PM. xplore the theoretical aspects of scanning. See the limits of maximal-munch scanning. Class mailing list: There is an

More information

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things. A Appendix Grammar There is no worse danger for a teacher than to teach words instead of things. Marc Block Introduction keywords lexical conventions programs expressions statements declarations declarators

More information

RDGL Reference Manual

RDGL Reference Manual RDGL Reference Manual COMS W4115 Programming Languages and Translators Professor Stephen A. Edwards Summer 2007(CVN) Navid Azimi (na2258) nazimi@microsoft.com Contents Introduction... 3 Purpose... 3 Goals...

More information

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis (ASU Ch 3, Fig 3.1) Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain 1 2 Compiler Construction F6S Lecture - 2 1 3 4 Compiler Construction F6S Lecture - 2 2 5 #include.. #include main() { char in; in = getch ( ); if ( isalpha (in) ) in = getch ( ); else error (); while

More information

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer Assigned: Thursday, September 16, 2004 Due: Tuesday, September 28, 2004, at 11:59pm September 16, 2004 1 Introduction Overview In this

More information

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis dministrivia Lexical nalysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Moving to 6 Evans on Wednesday HW available Pyth manual available on line. Please log into your account and electronically

More information

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010 IPCoreL Programming Language Reference Manual Phillip Duane Douglas, Jr. 11/3/2010 The IPCoreL Programming Language Reference Manual provides concise information about the grammar, syntax, semantics, and

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds

York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds Don t cheat by looking at these answers prematurely. 1. Consider the following

More information

CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

More information

The TXL. Programming Language. Version 10.4 January 2005 TXL TXL. James R. Cordy. Ian H. Carmichael Russell Halliday

The TXL. Programming Language. Version 10.4 January 2005 TXL TXL. James R. Cordy. Ian H. Carmichael Russell Halliday The TXL Programming Language Version 10.4 January 2005 James R. Cordy TXL Ian H. Carmichael Russell Halliday TXL James R. Cordy et al. The TXL Programming Language Version 10.4 1991-2005 James R. Cordy,

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

Formal languages and computation models

Formal languages and computation models Formal languages and computation models Guy Perrier Bibliography John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman - Introduction to Automata Theory, Languages, and Computation - Addison Wesley, 2006.

More information

The Language for Specifying Lexical Analyzer

The Language for Specifying Lexical Analyzer The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions The discussion centers around

More information

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor COSC252: Programming Languages: Semantic Specification Jeremy Bolton, PhD Adjunct Professor Outline I. What happens after syntactic analysis (parsing)? II. Attribute Grammars: bridging the gap III. Semantic

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract iflex: A Lexical Analyzer Generator for Icon Ray Pereda Unicon Technical Report UTR-02 February 25, 2000 Abstract iflex is software tool for building language processors. It is based on flex, a well-known

More information

Stating the obvious, people and computers do not speak the same language.

Stating the obvious, people and computers do not speak the same language. 3.4 SYSTEM SOFTWARE 3.4.3 TRANSLATION SOFTWARE INTRODUCTION Stating the obvious, people and computers do not speak the same language. People have to write programs in order to instruct a computer what

More information

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI CSCI 2010 Principles of Computer Science Data and Expressions 08/09/2013 CSCI 2010 1 Data Types, Variables and Expressions in Java We look at the primitive data types, strings and expressions that are

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

Second release of the COMPASS Tool Tool Grammar Reference

Second release of the COMPASS Tool Tool Grammar Reference Grant Agreement: 287829 Comprehensive Modelling for Advanced Systems of Systems Second release of the COMPASS Tool Tool Grammar Reference Deliverable Number: D31.2c Version: 1.2 Date: January 2013 Public

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

CSE 401 Midterm Exam Sample Solution 2/11/15

CSE 401 Midterm Exam Sample Solution 2/11/15 Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013 Harvard School of Engineering and Applied Sciences CS 152: Programming Languages Lecture 1 Tuesday, January 29, 2013 1 Intro to semantics What is the meaning of a program? When we write a program, we use

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

Question Points Score

Question Points Score CS 453 Introduction to Compilers Midterm Examination Spring 2009 March 12, 2009 75 minutes (maximum) Closed Book You may use one side of one sheet (8.5x11) of paper with any notes you like. This exam has

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

X Language Definition

X Language Definition X Language Definition David May: November 1, 2016 The X Language X is a simple sequential programming language. It is easy to compile and an X compiler written in X is available to simplify porting between

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Operational Semantics CMSC 330 Summer 2018 1 Formal Semantics of a Prog. Lang. Mathematical description of the meaning of programs written in that language

More information

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens

More information

Computer Science 236 Fall Nov. 11, 2010

Computer Science 236 Fall Nov. 11, 2010 Computer Science 26 Fall Nov 11, 2010 St George Campus University of Toronto Assignment Due Date: 2nd December, 2010 1 (10 marks) Assume that you are given a file of arbitrary length that contains student

More information

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program

More information

.Math 0450 Honors intro to analysis Spring, 2009 Notes #4 corrected (as of Monday evening, 1/12) some changes on page 6, as in .

.Math 0450 Honors intro to analysis Spring, 2009 Notes #4 corrected (as of Monday evening, 1/12) some changes on page 6, as in  . 0.1 More on innity.math 0450 Honors intro to analysis Spring, 2009 Notes #4 corrected (as of Monday evening, 1/12) some changes on page 6, as in email. 0.1.1 If you haven't read 1.3, do so now! In notes#1

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Compiler Theory. (Semantic Analysis and Run-Time Environments) Compiler Theory (Semantic Analysis and Run-Time Environments) 005 Semantic Actions A compiler must do more than recognise whether a sentence belongs to the language of a grammar it must do something useful

More information

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator: Syntax A programming language consists of syntax, semantics, and pragmatics. We formalize syntax first, because only syntactically correct programs have semantics. A syntax definition of a language lists

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Lecture 15 Ana Bove May 23rd 2016 More on Turing machines; Summary of the course. Overview of today s lecture: Recap: PDA, TM Push-down

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:

More information

Generell Topologi. Richard Williamson. May 6, 2013

Generell Topologi. Richard Williamson. May 6, 2013 Generell Topologi Richard Williamson May 6, 2013 1 8 Thursday 7th February 8.1 Using connectedness to distinguish between topological spaces I Proposition 8.1. Let (, O ) and (Y, O Y ) be topological spaces.

More information

Grammars and Parsing, second week

Grammars and Parsing, second week Grammars and Parsing, second week Hayo Thielecke 17-18 October 2005 This is the material from the slides in a more printer-friendly layout. Contents 1 Overview 1 2 Recursive methods from grammar rules

More information

B.V. Patel Institute of BMC & IT, UTU 2014

B.V. Patel Institute of BMC & IT, UTU 2014 BCA 3 rd Semester 030010301 - Java Programming Unit-1(Java Platform and Programming Elements) Q-1 Answer the following question in short. [1 Mark each] 1. Who is known as creator of JAVA? 2. Why do we

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

Examination in Compilers, EDAN65

Examination in Compilers, EDAN65 Examination in Compilers, EDAN65 Department of Computer Science, Lund University 2016 10 28, 08.00-13.00 Note! Your exam will be marked only if you have completed all six programming lab assignments in

More information

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF) Chapter 3: Describing Syntax and Semantics Introduction Formal methods of describing syntax (BNF) We can analyze syntax of a computer program on two levels: 1. Lexical level 2. Syntactic level Lexical

More information

VHDL Lexical Elements

VHDL Lexical Elements 1 Design File = Sequence of Lexical Elements && Separators (a) Separators: Any # of Separators Allowed Between Lexical Elements 1. Space character 2. Tab 3. Line Feed / Carriage Return (EOL) (b) Lexical

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Lecture Notes on Static and Dynamic Semantics

Lecture Notes on Static and Dynamic Semantics Lecture Notes on Static and Dynamic Semantics 15-312: Foundations of Programming Languages Frank Pfenning Lecture 4 September 9, 2004 In this lecture we illustrate the basic concepts underlying the static

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

(Refer Slide Time: 0:19)

(Refer Slide Time: 0:19) Theory of Computation. Professor somenath Biswas. Department of Computer Science & Engineering. Indian Institute of Technology, Kanpur. Lecture-15. Decision Problems for Regular Languages. (Refer Slide

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language? The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

More information

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing Görel Hedin Revised: 2017-09-04 This lecture Regular expressions Context-free grammar Attribute grammar

More information

Data Types and Variables in C language

Data Types and Variables in C language Data Types and Variables in C language Basic structure of C programming To write a C program, we first create functions and then put them together. A C program may contain one or more sections. They are

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

CS 374 Fall 2014 Homework 2 Due Tuesday, September 16, 2014 at noon

CS 374 Fall 2014 Homework 2 Due Tuesday, September 16, 2014 at noon CS 374 Fall 2014 Homework 2 Due Tuesday, September 16, 2014 at noon Groups of up to three students may submit common solutions for each problem in this homework and in all future homeworks You are responsible

More information