Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

Size: px
Start display at page:

Download "Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G."

Transcription

1 Writing a Lexer CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks ggchappell@alaska.edu 2017 Glenn G. Chappell

2 Lua: Programming II Closures A closure is a function that carries with it (some portion of) the environment in which it was defined. In Lua, when we return a function from a function, we get a closure. A closure can form a simpler alternative to a traditional OO construction (class, objects), particularly when a class exists primarily to support a single member function. See prog2.lua. Closures are found in a number of PLs. Since the 2011 standard, C++ has had closures, in the form of lambda functions. See closure.cpp. 6 Feb 2017 CS F331 / CSCE A331 Spring

3 Lua: Programming III Coroutines [1/4] A coroutine is a function that can give up control yield) at any point, and then later be resumed. Yielding typically involves sending a value back to the caller. Coroutines are available in Lua through the standard module coroutine. To write a coroutine, write an ordinary Lua function. Each time there is a value to send back to the caller or when control should be given back to the caller call coroutine.yield, passing the yielded value, if any. When the coroutine has finished, return as usual. 6 Feb 2017 CS F331 / CSCE A331 Spring

4 Lua: Programming III Coroutines [2/4] To use a coroutine, first get the coroutine object, with coroutine.create. cor = coroutine.create(cfunc) Call coroutine.resume with the coroutine object to execute the coroutine. The first time resume is called, any additional parameters will be passed to the coroutine function. coroutine.resume returns an error code (true: okay, false: error) and the yielded value, if any. If the value of coroutine.status is not "dead", then the coroutine was successful, and any yielded value may be used. Then call coroutine.resume again to resume the coroutine and request another value, handling it as above. See prog3.lua. 6 Feb 2017 CS F331 / CSCE A331 Spring

5 Lua: Programming III Coroutines [3/4] Four questions from last time, about Lua coroutines. (1) coroutine.yield sends data out of a coroutine. Can we similarly send data into a coroutine? Yes! Any additional arguments passed to the 2nd and later calls to coroutine.resume (other than the coroutine object) become the return value(s) of coroutine.yield, inside the coroutine. (2) Can a coroutine be resumed, when its status is "dead"? No. To run the coroutine again, we must get a new coroutine object from coroutine.create. 6 Feb 2017 CS F331 / CSCE A331 Spring

6 Lua: Programming III Coroutines [4/4] (3) Can a coroutine return a value? Yes! Any arguments to return become the values yielded when coroutine.status returns "dead". (4) Must we go through the rigamarole of checking both the error flag (ok) and coroutine.status? If there is an error (ok is false) then coroutine.status will return "dead". So only coroutine.status needs to be checked to determine whether to exit a loop going through the yielded values. We may wish to check ok after leaving the loop. 6 Feb 2017 CS F331 / CSCE A331 Spring

7 Lua: Programming III Custom Iterators [1/2] A Lua for-in loop uses an iterator. Here is the form of a custom iterator, using a simplified model. function XYZ( ) function iter(dummy1, dummy2) if then return nil -- Iterator exhausted end return??? -- Next value (or values) end return iter, nil, nil end 6 Feb 2017 CS F331 / CSCE A331 Spring

8 Lua: Programming III Custom Iterators [2/2] Here is an actual custom iterator, based on the simplified model. -- count: iterator. Given a, b. Counts from a up to b. function count(a, b) function iter(dummy1, dummy2) See prog3.lua. if a > b then return nil Code that uses this iterator: end local old_a = a for i in count(2, 6) do a = a+1 io.write(i.." ") return old_a end end io.write("\n") return iter, nil, nil end The above prints: Feb 2017 CS F331 / CSCE A331 Spring

9 Overview of Lexing & Parsing Parsing Character Stream cout << ff(12.6); Lexer Lexeme Stream cout << ff(12.6); id op id lit op Parser punct op AST or Error expr binop: << Two phases: Lexical analysis (lexing) Syntax analysis (parsing) The output of a parser is often an abstract syntax tree (AST). Specifications of these can vary. expr id: cout id: ff expr funccall expr numlit: Feb 2017 CS F331 / CSCE A331 Spring

10 Introduction to Lexical Analysis Lexeme Categories [1/6] A lexer reads a character stream and outputs a lexeme stream. Each lexeme is generally placed into a category. We look at some common categories. Character Stream cout << ff(12.6); Lexer Lexeme Stream cout << ff(12.6); id op id lit op punct op 6 Feb 2017 CS F331 / CSCE A331 Spring

11 Introduction to Lexical Analysis Lexeme Categories [2/6] An identifier is a name a program gives to some entity: variable, function, type, namespace, etc. In the C++ code below, the identifiers are circled. class MyClass { public: void myfunc(type1 & aa, int bb) const { for (int ii = -37; ii <= bb; ++ii) { aa.foo(); } } }; 6 Feb 2017 CS F331 / CSCE A331 Spring

12 Introduction to Lexical Analysis Lexeme Categories [3/6] A keyword is an identifier-like lexeme that has special meaning within a programming language. In the C++ code below, the keywords are circled. class MyClass { public: void myfunc(type1 & aa, int bb) const { for (int ii = -37; ii <= bb; ++ii) { aa.foo(); } } }; 6 Feb 2017 CS F331 / CSCE A331 Spring

13 Introduction to Lexical Analysis Lexeme Categories [4/6] An operator is a word that gives an alternate method for making what is essentially a function call. The arguments of an operator are called operands. Operators are often but not always placed between their operands; such an operator is an infix operator. The arity of an operator is the number of operands it has. A unary operator has one operand. A binary operator has two operands. In the following code, the operators are += and *, which are binary operators, and -, which is a unary operator. aaa += b * -c; 6 Feb 2017 CS F331 / CSCE A331 Spring

14 Introduction to Lexical Analysis Lexeme Categories [5/6] A literal is a bare value. Here are some C++ literals f Literal "Hello there!" 'x' false Kind of Literal Numeric literal Character array literal Character literal Boolean literal { 1, 2, 3 } Initializer list literal Since C Feb 2017 CS F331 / CSCE A331 Spring

15 Introduction to Lexical Analysis Lexeme Categories [6/6] Punctuation is the category for the extra lexemes in a program that do not fit into any of the previously mentioned categories. Punctuation in C++ includes braces ({ }), semicolons (;), an ampersand (&) indicating a reference, and a colon (:) after public or private. Lexeme categories mentioned: Identifier Keyword Operator Literal Punctuation 6 Feb 2017 CS F331 / CSCE A331 Spring

16 Introduction to Lexical Analysis Reserved Words [1/3] A reserved word is a word that fits the general specification of an identifier, but is not allowed as a legal identifier in a program. Note that, while this is an important concept, reserved word is not a lexeme category. In many PLs, the keywords and the reserved words are the same. However, it is not hard to envision a variant of (say) C in which the compiler could distinguish how a word is used, based only on its position in the code. Then there could be keywords that are not reserved words. Something like the following might be legal. for (for for = 10; for; --for) ; 6 Feb 2017 CS F331 / CSCE A331 Spring

17 Introduction to Lexical Analysis Reserved Words [2/3] Since the 2011 Standard, C++ has had two keywords that are not reserved words: override and final. These have special meaning when placed after the parentheses in a memberfunction declaration, but otherwise are simply identifiers. So the following is legal C++. class Derived : public Base { virtual void override() override; // Derived member function named "override" // Overrides Base member function "override" 6 Feb 2017 CS F331 / CSCE A331 Spring

18 Introduction to Lexical Analysis Reserved Words [3/3] The programming language Fortran traditionally has no reserved words. The following is, famously, legal code in at least some versions of Fortran. IF IF THEN THEN ELSE ELSE On the other hand, there can be reserved words that are not keywords. The Java standard specifies that goto is a reserved word. However, it is not a keyword. Thus, this word cannot legally be included in a Java program at all. Note: We will use the above definitions consistently in this class. Be aware, however, that the term keyword is sometimes used to mean what we mean by reserved word. 6 Feb 2017 CS F331 / CSCE A331 Spring

19 Introduction to Lexical Analysis Lexer Operation There are essentially three ways to write a lexer. Automatically generated, based on regular grammars or regular expressions for each lexeme category. Hand-coded state machine using a table. Entirely hand-coded state machine. The first method might involve a software package like lex, which generates C code for a lexer, given input that consists mostly of regular expressions. We will write a lexer using the last method. When we are done, it should not be difficult to see how we might have used a table instead. A lexer outputs a series of lexemes. There is generally no need to store these lexemes in a data structure. Rather, the lexer can provide get-next-lexeme functionality, which the parser can then use. 6 Feb 2017 CS F331 / CSCE A331 Spring

20 Writing a Lexer Introduction We wish to write a lexer in Lua for a hypothetical PL. Lexemes are described on the In-Class Lexeme Specification. Identifiers are essentially as in C/C++. No lexeme includes any whitespace. All whitespace will usually be treated the same. Lexemes may be arbritrarily long. There might not be any delimiter between lexemes. Comments are like multiline C/C++ comments. The keywords and reserved words are the same. Note that there are PLs in which some of the above do not apply. In Python, Haskell, JavaScript, and Go, a newline can sometimes serve as something like an end-of-statement marker. In Forth, consecutive lexemes are always separated by whitespace. Lua uses different syntax for comments. 6 Feb 2017 CS F331 / CSCE A331 Spring

21 Writing a Lexer Design Decisions [1/2] We will write a Lua modula lexer, with the following interface. The module has a function lexer.lex, which is given a string ( program ) and returns an iterator that goes through lexemes. The iterator returns pairs: string and number. The string is the lexeme itself. The number represents the lexeme category. The category is an index for table lexer.catnames, whose values will be human-readable string forms of the category names. So the following prints text & category of all lexemes in program. lexer = require "lexer" for lexstr, cat in lexer.lex(program) do catstr = lexer.catnames(cat) io.write(string.format("%-10s %s\n", lexstr, catstr)) end 6 Feb 2017 CS F331 / CSCE A331 Spring

22 Writing a Lexer Design Decisions [2/2] The function returned by lexer.lex will be a closure. Whatever data are necessary for lexing (the kind of thing we might store as data members in a C++/Java object) will be stored in this closure. Lua has no character type. We represent a character as a string of length one. I have written some character-testing functions (isletter, isdigit, iswhitespace). Each of these takes a string parameter. Each returns false if this parameter does not have length exactly one. 6 Feb 2017 CS F331 / CSCE A331 Spring

23 Writing a Lexer Coding a State Machine I Core Functionality Internally, our lexer will run as a state machine. A state machine has a current state, stored in variable state. It proceeds in a series of steps. At each step, it looks at the current character in the input and the current state. It then decides what state to go to next. Variables & Utility Functions The input stream is in the given string: prog. The index of the next character to read is stored in variable pos, which starts at 1, since this is Lua. The state starts at START. The lexeme we are building is in string lexstr, which starts as an empty string (""). To add the current character to the lexeme, call add1(). To skip the character without adding it, call drop1(). When a complete lexeme has been found, set state to DONE, and set category appropriately (lexer.id, lexer.key, etc.). 6 Feb 2017 CS F331 / CSCE A331 Spring

24 Writing a Lexer Coding a State Machine I Issues [1/3] We need to be careful about invariants: statements that are always true at a particular point in a program. What should we expect to be true about variables (pos, in particular) when our iterator function is called? Whatever we decide, we need to ensure that it is true when this function returns. 6 Feb 2017 CS F331 / CSCE A331 Spring

25 Writing a Lexer Coding a State Machine I Issues [2/3] We need to be clear about what happens when we read past the end of the input. We use the string function sub to get single characters out of the input string. This function returns the empty string when it is asked to read past the end. And an empty string will always result in false when passed to a character-testing function, or when equality-compared with any single character. So anything we attempt to check about a past-the-end character will be false. 6 Feb 2017 CS F331 / CSCE A331 Spring

26 Writing a Lexer Coding a State Machine I Issues [3/3] I follow the convention that each state is named after a short string that will put me in that state. If we have read a, then we are in state LETTER, since we have read a single letter. As we write a state machine, an important question is when do we add a new state? A good guiding principle: Two situations can be handled by the same state if they would react identically to all future input. Continuing from above, we have read a and are in state LETTER. Suppose the next character is 3. Are we still in state LETTER? Applying the above principle: yes. Because whatever follows a3, we handle it the same as we would if it followed a. For example, a3_xq6 is an identifier; and so is a_xq6. 6 Feb 2017 CS F331 / CSCE A331 Spring

27 Writing a Lexer Coding a State Machine I CODE A skeleton for a lexer is in lexer.lua. This needs to be turned into a complete lexer for the lexemes specified in the In-Class Lexeme Description. We expanded lexer.lua to handle some of the lexeme categories. The lexer is still unfinished. It should be completed next time. See lexer.lua. I have also posted a simple main program for the lexer. See uselexer.lua. 6 Feb 2017 CS F331 / CSCE A331 Spring

28 Writing a Lexer TO BE CONTINUED Writing a Lexer will be continued next time. 6 Feb 2017 CS F331 / CSCE A331 Spring

Introduction to Syntax Analysis Recursive-Descent Parsing

Introduction to Syntax Analysis Recursive-Descent Parsing Introduction to Syntax Analysis Recursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 10, 2017 Glenn G. Chappell Department of

More information

Scheme: Data. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, April 3, Glenn G.

Scheme: Data. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, April 3, Glenn G. Scheme: Data CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, April 3, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks ggchappell@alaska.edu

More information

Haskell: Lists. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, Glenn G.

Haskell: Lists. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, Glenn G. Haskell: Lists CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks

More information

Writing an Interpreter Thoughts on Assignment 6

Writing an Interpreter Thoughts on Assignment 6 Writing an Interpreter Thoughts on Assignment 6 CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, March 27, 2017 Glenn G. Chappell Department of Computer Science

More information

Thoughts on Assignment 4 Haskell: Flow of Control

Thoughts on Assignment 4 Haskell: Flow of Control Thoughts on Assignment 4 Haskell: Flow of Control CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 27, 2017 Glenn G. Chappell Department of Computer

More information

Scheme: Expressions & Procedures

Scheme: Expressions & Procedures Scheme: Expressions & Procedures CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, March 31, 2017 Glenn G. Chappell Department of Computer Science University

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

B The SLLGEN Parsing System

B The SLLGEN Parsing System B The SLLGEN Parsing System Programs are just strings of characters. In order to process a program, we need to group these characters into meaningful units. This grouping is usually divided into two stages:

More information

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010 IPCoreL Programming Language Reference Manual Phillip Duane Douglas, Jr. 11/3/2010 The IPCoreL Programming Language Reference Manual provides concise information about the grammar, syntax, semantics, and

More information

Java Bytecode (binary file)

Java Bytecode (binary file) Java is Compiled Unlike Python, which is an interpreted langauge, Java code is compiled. In Java, a compiler reads in a Java source file (the code that we write), and it translates that code into bytecode.

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

YOLOP Language Reference Manual

YOLOP Language Reference Manual YOLOP Language Reference Manual Sasha McIntosh, Jonathan Liu & Lisa Li sam2270, jl3516 and ll2768 1. Introduction YOLOP (Your Octothorpean Language for Optical Processing) is an image manipulation language

More information

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens Syntactic Analysis CS45H: Programming Languages Lecture : Lexical Analysis Thomas Dillig Main Question: How to give structure to strings Analogy: Understanding an English sentence First, we separate a

More information

Syntax Errors; Static Semantics

Syntax Errors; Static Semantics Dealing with Syntax Errors Syntax Errors; Static Semantics Lecture 14 (from notes by R. Bodik) One purpose of the parser is to filter out errors that show up in parsing Later stages should not have to

More information

Time : 1 Hour Max Marks : 30

Time : 1 Hour Max Marks : 30 Total No. of Questions : 6 P4890 B.E/ Insem.- 74 B.E ( Computer Engg) PRINCIPLES OF MODERN COMPILER DESIGN (2012 Pattern) (Semester I) Time : 1 Hour Max Marks : 30 Q.1 a) Explain need of symbol table with

More information

Decaf Language Reference

Decaf Language Reference Decaf Language Reference Mike Lam, James Madison University Fall 2016 1 Introduction Decaf is an imperative language similar to Java or C, but is greatly simplified compared to those languages. It will

More information

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking CSE450 Translation of Programming Languages Lecture 11: Semantic Analysis: Types & Type Checking Structure Project 1 - of a Project 2 - Compiler Today! Project 3 - Source Language Lexical Analyzer Syntax

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava MP 3 A Lexer for MiniJava CS 421 Spring 2010 Revision 1.0 Assigned Tuesday, February 2, 2010 Due Monday, February 8, at 10:00pm Extension 48 hours (20% penalty) Total points 50 (+5 extra credit) 1 Change

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:

More information

Scheme: Strings Scheme: I/O

Scheme: Strings Scheme: I/O Scheme: Strings Scheme: I/O CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, April 5, 2017 Glenn G. Chappell Department of Computer Science University of

More information

The PCAT Programming Language Reference Manual

The PCAT Programming Language Reference Manual The PCAT Programming Language Reference Manual Andrew Tolmach and Jingke Li Dept. of Computer Science Portland State University September 27, 1995 (revised October 15, 2002) 1 Introduction The PCAT language

More information

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow CSE 452: Programming Languages Expressions and Control Flow Outline of Today s Lecture Expressions and Assignment Statements Arithmetic Expressions Overloaded Operators Type Conversions Relational and

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

PL Categories: Functional PLs Introduction to Haskell Haskell: Functions

PL Categories: Functional PLs Introduction to Haskell Haskell: Functions PL Categories: Functional PLs Introduction to Haskell Haskell: Functions CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February 22, 2017 Glenn G. Chappell

More information

Programming for Engineers Introduction to C

Programming for Engineers Introduction to C Programming for Engineers Introduction to C ICEN 200 Spring 2018 Prof. Dola Saha 1 Simple Program 2 Comments // Fig. 2.1: fig02_01.c // A first program in C begin with //, indicating that these two lines

More information

Fundamentals of Programming Session 4

Fundamentals of Programming Session 4 Fundamentals of Programming Session 4 Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu 1 Fall 2011 These slides are created using Deitel s slides, ( 1992-2010 by Pearson Education, Inc).

More information

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer CS664 Compiler Theory and Design LIU 1 of 16 ANTLR Christopher League* 17 February 2016 ANTLR is a parser generator. There are other similar tools, such as yacc, flex, bison, etc. We ll be using ANTLR

More information

Crayon (.cry) Language Reference Manual. Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361)

Crayon (.cry) Language Reference Manual. Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361) Crayon (.cry) Language Reference Manual Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361) 1 Lexical Elements 1.1 Identifiers Identifiers are strings used

More information

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments CS 6353 Compiler Construction Project Assignments In this project, you need to implement a compiler for a language defined in this handout. The programming language you need to use is C or C++ (and the

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

CGS 3066: Spring 2015 JavaScript Reference

CGS 3066: Spring 2015 JavaScript Reference CGS 3066: Spring 2015 JavaScript Reference Can also be used as a study guide. Only covers topics discussed in class. 1 Introduction JavaScript is a scripting language produced by Netscape for use within

More information

COMPILER DESIGN LECTURE NOTES

COMPILER DESIGN LECTURE NOTES COMPILER DESIGN LECTURE NOTES UNIT -1 1.1 OVERVIEW OF LANGUAGE PROCESSING SYSTEM 1.2 Preprocessor A preprocessor produce input to compilers. They may perform the following functions. 1. Macro processing:

More information

C++ Programming: From Problem Analysis to Program Design, Third Edition

C++ Programming: From Problem Analysis to Program Design, Third Edition C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 4: Control Structures I (Selection) Control Structures A computer can proceed: In sequence Selectively (branch) - making

More information

Typescript on LLVM Language Reference Manual

Typescript on LLVM Language Reference Manual Typescript on LLVM Language Reference Manual Ratheet Pandya UNI: rp2707 COMS 4115 H01 (CVN) 1. Introduction 2. Lexical Conventions 2.1 Tokens 2.2 Comments 2.3 Identifiers 2.4 Reserved Keywords 2.5 String

More information

Software Engineering Concepts: Invariants Silently Written & Called Functions Simple Class Example

Software Engineering Concepts: Invariants Silently Written & Called Functions Simple Class Example Software Engineering Concepts: Invariants Silently Written & Called Functions Simple Class Example CS 311 Data Structures and Algorithms Lecture Slides Friday, September 11, 2009 continued Glenn G. Chappell

More information

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

Full file at

Full file at Java Programming: From Problem Analysis to Program Design, 3 rd Edition 2-1 Chapter 2 Basic Elements of Java At a Glance Instructor s Manual Table of Contents Overview Objectives s Quick Quizzes Class

More information

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

More information

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

More information

UNIVERSITY OF COLUMBIA ADL++ Architecture Description Language. Alan Khara 5/14/2014

UNIVERSITY OF COLUMBIA ADL++ Architecture Description Language. Alan Khara 5/14/2014 UNIVERSITY OF COLUMBIA ADL++ Architecture Description Language Alan Khara 5/14/2014 This report is submitted to fulfill final requirements for the course COMS W4115 at Columbia University. 1 P a g e Contents

More information

MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava MP 3 A Lexer for MiniJava CS 421 Spring 2012 Revision 1.0 Assigned Wednesday, February 1, 2012 Due Tuesday, February 7, at 09:30 Extension 48 hours (penalty 20% of total points possible) Total points 43

More information

SFU CMPT 379 Compilers Spring 2018 Milestone 1. Milestone due Friday, January 26, by 11:59 pm.

SFU CMPT 379 Compilers Spring 2018 Milestone 1. Milestone due Friday, January 26, by 11:59 pm. SFU CMPT 379 Compilers Spring 2018 Milestone 1 Milestone due Friday, January 26, by 11:59 pm. For this assignment, you are to convert a compiler I have provided into a compiler that works for an expanded

More information

Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

More information

DEMO A Language for Practice Implementation Comp 506, Spring 2018

DEMO A Language for Practice Implementation Comp 506, Spring 2018 DEMO A Language for Practice Implementation Comp 506, Spring 2018 1 Purpose This document describes the Demo programming language. Demo was invented for instructional purposes; it has no real use aside

More information

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards Language Reference Manual Introduction The purpose of

More information

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual Contents 1 Introduction...2 2 Lexical Conventions...2 3 Types...3 4 Syntax...3 5 Expressions...4 6 Declarations...8 7 Statements...9

More information

LECTURE 3. Compiler Phases

LECTURE 3. Compiler Phases LECTURE 3 Compiler Phases COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Alice E. Fischer Lecture 6: Stacks 2018 Alice E. Fischer Data Structures L5, Stacks... 1/29 Lecture 6: Stacks 2018 1 / 29 Outline 1 Stacks C++ Template Class Functions 2

More information

In Java, data type boolean is used to represent Boolean data. Each boolean constant or variable can contain one of two values: true or false.

In Java, data type boolean is used to represent Boolean data. Each boolean constant or variable can contain one of two values: true or false. CS101, Mock Boolean Conditions, If-Then Boolean Expressions and Conditions The physical order of a program is the order in which the statements are listed. The logical order of a program is the order in

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 20, 2007 Outline Recap

More information

CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

More information

1. Lexical Analysis Phase

1. Lexical Analysis Phase 1. Lexical Analysis Phase The purpose of the lexical analyzer is to read the source program, one character at time, and to translate it into a sequence of primitive units called tokens. Keywords, identifiers,

More information

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 164 Spring 2009 P. N. Hilfinger CS 164: Final Examination (corrected) Name: Login: You have

More information

Syntax Intro and Overview. Syntax

Syntax Intro and Overview. Syntax Syntax Intro and Overview CS331 Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal

More information

CS 403: Scanning and Parsing

CS 403: Scanning and Parsing CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

More information

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues) Today Quiz 2 Lexer design Syntax Analysis: Context-Free Grammars Assignments HW2 (out, due Tues) S. Bowers 1 of 15 Implementing a Lexer for MyPL (HW 2) Similar in spirit to HW 1 We ll create three classes:

More information

Semantics of programming languages

Semantics of programming languages Semantics of programming languages Informatics 2A: Lecture 27 John Longley School of Informatics University of Edinburgh jrl@inf.ed.ac.uk 21 November, 2011 1 / 19 1 2 3 4 2 / 19 Semantics for programming

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2010 Handout Decaf Language Tuesday, Feb 2 The project for the course is to write a compiler

More information

STATS 507 Data Analysis in Python. Lecture 2: Functions, Conditionals, Recursion and Iteration

STATS 507 Data Analysis in Python. Lecture 2: Functions, Conditionals, Recursion and Iteration STATS 507 Data Analysis in Python Lecture 2: Functions, Conditionals, Recursion and Iteration Functions in Python We ve already seen examples of functions: e.g., type()and print() Function calls take the

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements Programming Languages Third Edition Chapter 9 Control I Expressions and Statements Objectives Understand expressions Understand conditional statements and guards Understand loops and variation on WHILE

More information

Decaf Language Reference Manual

Decaf Language Reference Manual Decaf Language Reference Manual C. R. Ramakrishnan Department of Computer Science SUNY at Stony Brook Stony Brook, NY 11794-4400 cram@cs.stonybrook.edu February 12, 2012 Decaf is a small object oriented

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

More information

Angela Z: A Language that facilitate the Matrix wise operations Language Reference Manual

Angela Z: A Language that facilitate the Matrix wise operations Language Reference Manual Angela Z: A Language that facilitate the Matrix wise operations Language Reference Manual Contents Fei Liu, Mengdi Zhang, Taikun Liu, Jiayi Yan 1. Language definition 3 1.1. Usage 3 1.2. What special feature

More information

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm 1 Overview of the Programming Project Programming assignments I IV will direct you to design and build a compiler for Cool. Each assignment

More information

The Decaf Language. 1 Lexical considerations

The Decaf Language. 1 Lexical considerations The Decaf Language In this course, we will write a compiler for a simple object-oriented programming language called Decaf. Decaf is a strongly-typed, object-oriented language with support for inheritance

More information

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm 1 Overview Programming assignments I IV will direct you to design and build a compiler for Cool. Each assignment will cover one component

More information

Programming Assignment II

Programming Assignment II Programming Assignment II 1 Overview of the Programming Project Programming assignments II V will direct you to design and build a compiler for Cool. Each assignment will cover one component of the compiler:

More information

CPSC 411, Fall 2010 Midterm Examination

CPSC 411, Fall 2010 Midterm Examination CPSC 411, Fall 2010 Midterm Examination. Page 1 of 11 CPSC 411, Fall 2010 Midterm Examination Name: Q1: 10 Q2: 15 Q3: 15 Q4: 10 Q5: 10 60 Please do not open this exam until you are told to do so. But please

More information

7. Introduction to Denotational Semantics. Oscar Nierstrasz

7. Introduction to Denotational Semantics. Oscar Nierstrasz 7. Introduction to Denotational Semantics Oscar Nierstrasz Roadmap > Syntax and Semantics > Semantics of Expressions > Semantics of Assignment > Other Issues References > D. A. Schmidt, Denotational Semantics,

More information

A Short Summary of Javali

A Short Summary of Javali A Short Summary of Javali October 15, 2015 1 Introduction Javali is a simple language based on ideas found in languages like C++ or Java. Its purpose is to serve as the source language for a simple compiler

More information

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance. 2.1 Introduction (No questions.) 2.2 A Simple Program: Printing a Line of Text 2.1 Which of the following must every C program have? (a) main (b) #include (c) /* (d) 2.2 Every statement in C

More information

Language Reference Manual

Language Reference Manual ALACS Language Reference Manual Manager: Gabriel Lopez (gal2129) Language Guru: Gabriel Kramer-Garcia (glk2110) System Architect: Candace Johnson (crj2121) Tester: Terence Jacobs (tj2316) Table of Contents

More information

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES (2018-19) (ODD) Code Optimization Prof. Jonita Roman Date: 30/06/2018 Time: 9:45 to 10:45 Venue: MCA

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree (AST) Type Checker Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements Programming, Data Structures and Algorithms Prof. Shankar Balachandran Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 05 I/O statements Printf, Scanf Simple

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1 Static Semantics Lecture 15 (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1 Current Status Lexical analysis Produces tokens Detects & eliminates illegal tokens Parsing

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization Second round: the scanner Lexical analysis Syntactical analysis Semantical analysis Intermediate code generation Optimization Code generation Target specific optimization Lexical analysis (Chapter 3) Why

More information

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer Assigned: Thursday, September 16, 2004 Due: Tuesday, September 28, 2004, at 11:59pm September 16, 2004 1 Introduction Overview In this

More information

ARG! Language Reference Manual

ARG! Language Reference Manual ARG! Language Reference Manual Ryan Eagan, Mike Goldin, River Keefer, Shivangi Saxena 1. Introduction ARG is a language to be used to make programming a less frustrating experience. It is similar to C

More information

ASTs, Objective CAML, and Ocamlyacc

ASTs, Objective CAML, and Ocamlyacc ASTs, Objective CAML, and Ocamlyacc Stephen A. Edwards Columbia University Fall 2012 Parsing and Syntax Trees Parsing decides if the program is part of the language. Not that useful: we want more than

More information

Fundamental of Programming (C)

Fundamental of Programming (C) Borrowed from lecturer notes by Omid Jafarinezhad Fundamental of Programming (C) Lecturer: Vahid Khodabakhshi Lecture 3 Constants, Variables, Data Types, And Operations Department of Computer Engineering

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program Objectives Chapter 2: Basic Elements of C++ In this chapter, you will: Become familiar with functions, special symbols, and identifiers in C++ Explore simple data types Discover how a program evaluates

More information

Chapter 2: Basic Elements of C++

Chapter 2: Basic Elements of C++ Chapter 2: Basic Elements of C++ Objectives In this chapter, you will: Become familiar with functions, special symbols, and identifiers in C++ Explore simple data types Discover how a program evaluates

More information

5/3/2006. Today! HelloWorld in BlueJ. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont.

5/3/2006. Today! HelloWorld in BlueJ. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. HelloWorld in BlueJ, Cont. Today! Build HelloWorld yourself in BlueJ and Eclipse. Look at all the Java keywords. Primitive Types. HelloWorld in BlueJ 1. Find BlueJ in the start menu, but start the Select VM program instead (you

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction Chapter 2: Basic Elements of C++ C++ Programming: From Problem Analysis to Program Design, Fifth Edition 1 Objectives In this chapter, you will: Become familiar with functions, special symbols, and identifiers

More information

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 20 Intermediate code generation Part-4 Run-time environments

More information

CPS122 Lecture: From Python to Java

CPS122 Lecture: From Python to Java Objectives: CPS122 Lecture: From Python to Java last revised January 7, 2013 1. To introduce the notion of a compiled language 2. To introduce the notions of data type and a statically typed language 3.

More information

Introduction to C Programming

Introduction to C Programming 1 2 Introduction to C Programming 2.6 Decision Making: Equality and Relational Operators 2 Executable statements Perform actions (calculations, input/output of data) Perform decisions - May want to print

More information

Semantics of programming languages

Semantics of programming languages Semantics of programming languages Informatics 2A: Lecture 27 Alex Simpson School of Informatics University of Edinburgh als@inf.ed.ac.uk 18 November, 2014 1 / 18 Two parallel pipelines A large proportion

More information

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler. More detailed overview of compiler front end Structure of a compiler Today we ll take a quick look at typical parts of a compiler. This is to give a feeling for the overall structure. source program lexical

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a

More information