Compilation 2014 Warm-up project

Similar documents
CS 132 Compiler Construction

Fundamentals of Compilation

Topic 3: Syntax Analysis I

CPSC 411: Introduction to Compiler Construction

Compiler Construction

Topic 5: Syntax Analysis III

CS 352: Compilers: Principles and Practice

JavaCC. JavaCC File Structure. JavaCC is a Lexical Analyser Generator and a Parser Generator. The structure of a JavaCC file is as follows:

PART ONE Fundamentals of Compilation

Compiler Construction 1. Introduction. Oscar Nierstrasz

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Modern Compiler Implementation in ML

Structure of JavaCC File. JavaCC Rules. Lookahead. Example JavaCC File. JavaCC rules correspond to EBNF rules. JavaCC rules have the form:

JavaCC Rules. Structure of JavaCC File. JavaCC rules correspond to EBNF rules. JavaCC rules have the form:

Compiler Construction 1. Introduction. Oscar Nierstrasz

Introduction. ML-Modules & ML-Lex. Agenda. Overview. The Assignment. The Assignment. ML-Modules & ML-Lex 1. Erik (Happi) Johansson.

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Parsing and Pattern Recognition

Abstract Syntax Trees

Abstract Syntax. Mooly Sagiv. html://

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

CMSC 350: COMPILER DESIGN

Introduction to Lexical Analysis

Compiler phases. Non-tokens

Lexical Analysis. Chapter 2

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

The Zephyr Abstract Syntax Description Language

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

ML 4 A Lexer for OCaml s Type System

Lexical Analysis. Lecture 2-4

LECTURE 11. Semantic Analysis and Yacc

EDAN65: Compilers, Lecture 02 Regular expressions and scanning. Görel Hedin Revised:

Introduction to Lexical Analysis

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

MP 3 A Lexer for MiniJava

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

Lecture 9 CIS 341: COMPILERS

Regular Language Applications

Figure 2.1: Role of Lexical Analyzer

Chapter 3 Lexical Analysis

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Compiler course. Chapter 3 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CSC 467 Lecture 3: Regular Expressions

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

CSE 3302 Programming Languages Lecture 2: Syntax

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Lexical Analysis. Finite Automata

Lexical Analysis. Introduction

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Implementation of Lexical Analysis

ASTs, Objective CAML, and Ocamlyacc

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.

Programming Languages and Compilers (CS 421)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

Lexical Analysis. Lecture 3-4

Simple Lexical Analyzer

CSCI312 Principles of Programming Languages!

Lexical Analysis. Lecture 3. January 10, 2018

MP 3 A Lexer for MiniJava

Lexical Analysis. Finite Automata

Languages and Compilers

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Compiler Construction

Implementation of Lexical Analysis

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Compiler Construction

CS 11 Ocaml track: lecture 6

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

UNIT -2 LEXICAL ANALYSIS

Parser Generators. Aurochs and ANTLR and Yaccs, Oh My. Tuesday, December 1, 2009 Reading: Appel 3.3. Department of Computer Science Wellesley College

Writing a Lexical Analyzer in Haskell (part II)

B The SLLGEN Parsing System

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Formal Languages and Compilers Lecture VI: Lexical Analysis

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

CSc 453 Compilers and Systems Software

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Lexical and Syntax Analysis

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Syntax Analysis Part IV

TDDD55- Compilers and Interpreters Lesson 2

Transcription:

Compilation 2014 Warm-up project Aslan Askarov aslan@cs.au.dk Revised from slides by E. Ernst

Straight-line Programming Language Toy programming language: no branching, no loops Skip lexing and parsing issues Focus on the meaning interpretation Syntax Stm Stm; Stm Stm id := Exp Stm print ( ExpList ) Exp id Exp num Exp Exp BinOp Exp Exp ( Stm, Exp ) (CompoundStm) (AssignStm) (PrintStm) (IdExp) (NumExp) (OpExp) (EseqExp) ExpList Exp, ExpList ExpList Exp Binop + Binop Binop Binop / (PairExpList) (LastExpList) (Plus) (Minus) (Times) (Div)

Straight-line program Source: CompoundStm a := 5 + 3; b := (print (a, a - 1),10 * a); print (b) Corresponding syntax tree: NumExp 5 a AssignStm OpExp BinOp NumExp Plus 3 CompoundStm AssignStm b EseqExp PrintStm OpExp PrintStm LastExpList IdExp PairExpList NumExp BinOp IdExp b IdExp LastExpList 10 Times a a OpExp IdExp BinOp NumExp a Minus 1

SLP syntax representation datatype SML declaration type id = string datatype binop = Plus Minus Times Div datatype stm = CompoundStm of stm * stm AssignStm of id * exp PrintStm of exp list and exp = IdExp of id NumExp of int OpExp of exp * binop * exp EseqExp of stm * exp Stm Stm; Stm Stm id := Exp Stm print ( ExpList ) Exp id Exp num Exp Exp BinOp Exp Exp ( Stm, Exp ) ExpList Exp, ExpList ExpList Exp Binop + Binop Binop Binop / (CompoundStm) (AssignStm) (PrintStm) (IdExp) (NumExp) (OpExp) (EseqExp) (PairExpList) (LastExpList) (Plus) (Minus) (Times) (Div)

SLP syntax representation Source program CompoundStm a := 5 + 3; b := (print (a, a - 1),10 * a); print (b) a AssignStm OpExp CompoundStm AssignStm PrintStm NumExp BinOp NumExp b EseqExp LastExpList SML value: val prog = CompoundStm ( AssignStm ( a", OpExp ( NumExp 5, Plus, NumExp 3)), CompoundStm ( AssignStm ("b", EseqExp ( PrintStm [IdExp "a", OpExp ( )], OpExp (NumExp 10, ))), PrintStm [IdExp "b"])) 5 Plus IdExp a IdExp a 3 PairExpList LastExpList OpExp BinOp Minus PrintStm NumExp 10 NumExp 1 OpExp BinOp Times IdExp a IdExp b

Project assignment Follow descriptions p10-12 in MCIML Modularity principles p9-10: discussed on Friday, may be ignored at first

Lexical analysis

Lexical analysis High-level source code Lexing Parsing Elaboration Low-level target code

Lexical analysis First phase in the compilation Input: stream of characters i f ( x > 0 ) \n \t t h e n 1 \n \t e l s e 0 IF LPAREN ID ( x ) GE INT (0) RPAREN THEN INT (1) ELSE INT (0) Output: stream of tokens in our language Discards comments, whitespace, newline, tab characters, preprocessor directives

Tokens Type ID Examples foo n14 a my-fun INT 73 0 070 REAL 0.0.5 10. IF if COMMA, LPAREN ( ASGMT :=

Non-tokens Type Examples comments /* dead code */ // comment (* nest (*ed*) *) preprocessor directives #define N 10 #include <stdio.h> whitespace

Token data structure Many tokens need no associated data, e.g.: IF, COMMA, LPAREN, RPAREN, ASGMT Some tokens carry an associated string: ID ( my-fun ) Some tokens carry associated data of other types: INT (73), INT (1), FLOAT (IEEE754, 1001111100 ) Tokens may include useful additional information: start/end pos in input file (line number + column, or charpos)

Q/A Consider source program var δ := 0.0 Language: case sensitive, ASCII How to report error of using δ? FileName:Line.Col: Illegal character δ

Regular expressions We can use regular expressions to specify programming language tokens Regular expressions: Expected to be well-known Syntax: symbol a choice x y concat x y empty ε repeat x*

Regular expressions used for scanning Examples if (IF); [a-z][a-z0-9]* (ID); [0-9]* (NUM); ([0-9]+. [0-9]*) ([0-9]*. [0-9]+) (REAL); ( -- [a-z]* \n ) ( \t ) (continue());. (error (); continue());

Resolving ambiguities Rule: when a string can match multiple tokens, the longest matching token wins i f x > 0 ID ( ifx ) We also need to specify priorities if we match several tokens of the same length. Usual rule: earliest declaration wins if (IF); [a-z][a-z0-9]* (ID); i f IF ID ( if )

Lexical analysis Specification: Tokens as regular exps +longest-matching rule +priorities Formalism: NFA DFA Implementation: Simulate NFA Simulate DFA linear complexity Output: Program that translates raw text into stream of tokens

Total NFA for ID,IF,NUM,REAL a-e,g-z,0-9 blank etc. ID f 0-9,a-z 2 3 4 5 i 0-9 1 blank etc. - other IF ID error REAL a-h,j-z NUM 12 13 9 error. 0-9,a-z 7. 0-9 error REAL 8 0-9 0-9 6 0-9 - \n 10 a-z whitespace 11

ML-Lex Lexer generator, built-in part of SML/NJ Accepts lexical specification, produces a scanner Example specification (* SML declarations *) type lexresult = Tokens.token fun eof() = Tokens.EOF(0,0) %% (* Lex definitions *) digits=[0-9]+ %% (* Regular Expressions and Actions *) if => (Tokens.IF(yypos,yypos+2)); [a-z][a-z0-9]* => (Tokens.ID(yytext,yypos,yypos + size yytext)); {digits} => (Tokens.NUM( Int.fromString yytext, yypos, yypos + size yytext); ({digits}. [0-9]*) ([0-9]*. {digits}) => (Tokens.REAL( Real.fromString yytext, yypos, yypos + size yytext)); ( -- [a-z]* \n ) ( \n \t )+ => (continue()); => ( ErrorMsg.error yypos Illegal character ; continue());

Lexer states Helpful when handling different kinds of tokens For ex.: use state INITIAL in general lexing (automatic) STRING when scanning the contents of a string COMMENT when scanning a comment Point: keep different concerns apart simpler Syntax:... (* Regular Expressions and Actions *) <INITIAL>if => (Tokens.IF(yypos,yypos+2)); <INITIAL>[a-z][a-z0-9]* => (Tokens.ID(yytext,yypos,yypos + size yytext));... <INITIAL> \ => (YYBEGIN STRING; continue());... <STRING>. => (continue());...

Summary Warm-up project: Program in SML Straight-line programming language, no lexing/parsing involved Express programs: use abstract syntax tree datatype Project specified on website, essentially as in the book Lexical analysis Avoid complexity in grammar. Use lexer Based on regular expressions. Implementation via NFA/DFA Theory assumed known Tools: ML-Lex Scanner generator, outputs SML code from spec Note lexer states