Book. Signatures and grammars. Signatures and grammars. Syntaxes. The 4-layer architecture

Similar documents
Plan. Booleans. Variables have to be dec. Steps towards a Pico environment. Expressions: natural, string, +, - and + and - have natural ope

The Syntax Definition Formalism SDF

1 Lexical Considerations

Syntax. A. Bellaachia Page: 1

CPS 506 Comparative Programming Languages. Syntax Specification

Lexical Considerations

Lexical Considerations

Syntax and Grammars 1 / 21

RSL Reference Manual

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Model transformations. Model transformations. Model transformations. Model transformations

The term. reduces to

Grammars and Parsing. Paul Klint. Grammars and Parsing

Model transformations. Overview of DSLE. Model transformations. Model transformations. The 4-layer architecture

Domain-Specific Languages for Composable Editor Plugins

Renaud Durlin. May 16, 2007

3. Context-free grammars & parsing

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

The SPL Programming Language Reference Manual

Informatica 3 Syntax and Semantics

CS 415 Midterm Exam Spring SOLUTION

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

DaMPL. Language Reference Manual. Henrique Grando

ECE251 Midterm practice questions, Fall 2010

PL Revision overview

Rscript: examples. Rscript in a Nutshell. Rscript: examples. Comprehensions. Set: {3, 5, 3}

A Simple Syntax-Directed Translator

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Sprite an animation manipulation language Language Reference Manual

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Flang typechecker Due: February 27, 2015

Chapter 2: Introduction to C++

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

A simple evaluation function... 1 Symbolic Differentiation... 1 Sorting... 2 Code Generation... 3 Larger ASF+SDF Specifications...

Decaf Language Reference Manual

Language Reference Manual simplicity

More Assigned Reading and Exercises on Syntax (for Exam 2)

Modules, Structs, Hashes, and Operational Semantics

The Decaf language 1

The Decaf Language. 1 Lexical considerations

6.037 Lecture 4. Interpretation. What is an interpreter? Why do we need an interpreter? Stages of an interpreter. Role of each part of the interpreter

Chapter 2: Special Characters. Parts of a C++ Program. Introduction to C++ Displays output on the computer screen

Haskell Introduction Lists Other Structures Data Structures. Haskell Introduction. Mark Snyder

Functional Programming. Pure Functional Programming

This book is licensed under a Creative Commons Attribution 3.0 License

CSE 401 Midterm Exam Sample Solution 2/11/15

SMURF Language Reference Manual Serial MUsic Represented as Functions

CSC 467 Lecture 3: Regular Expressions

Infrastructure for Program Transformation Systems

Decaf Language Reference

UPTR - a simple parse tree representation format

A simple. programming language and its implementation. P. Klint. J.A. Bergstra J. Heering

Advanced Algorithms and Computational Models (module A)

Language Reference Manual

CS143 Handout 03 Summer 2012 June 27, 2012 Decaf Specification

Questions? Static Semantics. Static Semantics. Static Semantics. Next week on Wednesday (5 th of October) no

Introduction to Programming (Java) 2/12

Sir Muhammad Naveed. Arslan Ahmed Shaad ( ) Muhammad Bilal ( )

Type Checking. Chapter 6, Section 6.3, 6.5

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

IC Language Specification

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Typescript on LLVM Language Reference Manual

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

The PCAT Programming Language Reference Manual

CA Compiler Construction

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER COMPILERS ANSWERS. Time allowed TWO hours

Semantic Analysis. Compiler Architecture

Rscript: a Relational Approach to Program and System Understanding

CMSC 330: Organization of Programming Languages. Context Free Grammars

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Data Abstraction. An Abstraction for Inductive Data Types. Philip W. L. Fong.

Programming Language Concepts, cs2104 Lecture 04 ( )

A simple syntax-directed

A programming language requires two major definitions A simple one pass compiler

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

2.1. Chapter 2: Parts of a C++ Program. Parts of a C++ Program. Introduction to C++ Parts of a C++ Program

CSE 3302 Programming Languages Lecture 2: Syntax

Compilers. Compiler Construction Tutorial The Front-end

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Computational Expression

Syntax Intro and Overview. Syntax

Principles of Programming Languages COMP251: Syntax and Grammars

2.2 Syntax Definition

CSE 401 Midterm Exam Sample Solution 11/4/11

4. Semantic Processing and Attributed Grammars

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

The Warhol Language Reference Manual

Transition from EBNF to Xtext

arxiv: v1 [cs.pl] 21 Jan 2013

DEMO A Language for Practice Implementation Comp 506, Spring 2018

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Object oriented programming. Instructor: Masoud Asghari Web page: Ch: 3

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

... is a Programming Environment (PE)?... is Generic Language Technology (GLT)?

Spoke. Language Reference Manual* CS4118 PROGRAMMING LANGUAGES AND TRANSLATORS. William Yang Wang, Chia-che Tsai, Zhou Yu, Xin Chen 2010/11/03

Software Engineering using Formal Methods

An Oz Subset. 1 Microsyntax for An Oz Subset. Notational Conventions. COP 4020 Programming Languages 1 January 17, 2012

Pace University. Fundamental Concepts of CS121 1

Transcription:

Book Generic Language g Technology (2IS15) Syntaxes Software Language g Engineering g by Anneke Kleppe (Addison Wesley) Prof.dr. Mark van den Brand / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 1 Definition of a (programming) g) language g involves: abstract syntax, so-called signature concrete syntax: textual syntax graphical syntax semantics: static semantics dynamic semantics Grammar world The 4-layer architecture M3 (E)BNF/SDF grammar defines structure of the (E)BNF in (E)BNF M2 Java grammar defines the structure of Java in (E)BNF M1 Java program describes the manipulation (algorithm) of objects in the object layer M0 Object layer Objects we wish to manipulate / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 2 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 3

Abstract syntax: defines basic structure of the language (skeleton) is starting point for defining: concrete syntax static semantics dynamic semantics Abstract syntax is a collection of constructors/- functions No information about keywords, priorities, iti associativities, etc. Abstract syntax definition of Booleans: true () -> BoolCon false () -> BoolCon con (BoolCon) -> Bool and (Bool, Bool) -> Bool or (Bool, Bool) -> Bool not (Bool) -> Bool constructor nonterminal / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 4 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 5 There is no standardized way of defining abstract syntax SSL (specification formalism of the Synthesizer Generator) Signature-like (Meta-modeling) SSL (grammar specification formalism of the Synthesizer Generator) describes it as follows: A collection of rules that define phyla and operators A phylum is a nonempty set of terms A term is the application of a k-ary operator to k terms of the appropriate phylum A k-ary operator is a constructor function mapping k terms to a term A phylum can be considered a nonterminal phyl 0 : op(phyl 1 phyl 2 phyl k ) / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 6 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 7

SSL notation of the definition of the abstract syntax of Booleans: boolcon : True() False() bool : Con(boolcon) And(bool bool) Or(bool bool) Signature describes it as follows: A collection of functions that define sorts and operators A sort represents a nonempty set of terms A term is the application of a k-ary operator to k terms of the appropriate sort A k-ary operator is a constructor function mapping k terms to aterm A sort can be considered a nonterminal op(sort( 1, sort 2,, sort k ) sort 0 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 8 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 9 Signature notation of the definition of the abstract syntax of Booleans: true () -> BoolCon false () -> BoolCon con (BoolCon) -> Bool and (Bool, Bool) -> Bool or (Bool, Bool) -> Bool not (Bool) -> Bool Given signatures it is possible to generate APIs Tooling for defining signatures and generating APIs: GOM part of TOM (http://tom.loria.fr/wiki/index.php5/documentation:gom) php5/documentation:gom) ApiGen part of SDF (see later) / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 10 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 11

Definition of a (programming) language involves: lexical syntax, so-called tokens: identifiers, numbers, strings, if, then, class (keywords) context-free syntax, so-called production rules: Statement ::= if Expression then Statements else Statements fi static semantics: identification and scope resolution type checking dynamic semantics: operational semantics interpretation compilation Goal: defining languages g & manipulating programs SDF: Syntax definition Formalism lexical & context-free syntax ASF+SDF SDF Meta-Environment: t IDE for ASF+SDF SDF manuals/documentation: www.meta-environment.org Spoofax/IMP: Eclipse plugin for SDF manuals/documentation: http://strategoxt.org/spoofax / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 12 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 13 Anatomy of SDF specifications module A imports B C module B imports D module C module D Anatomy of an SDF module module ModuleName ImportSection* ExportOrHiddenSection* ti imports, aliases, sorts, lexical syntax, context-free syntax, priorities, variables Name of this module; may be followed by parameters Names of modules imported by this module; May be followed by renamings Grammar elements that are visible from the outside (exports) or only inside the module (hiddens). / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 14 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 15

SDF by examples Boolean language Pico language Boolean Constants module basic/boolcon exports sorts BoolCon context-free syntax "true"" -> BoolCon {cons( true )} "false" -> BoolCon {cons( false )} Sort of Boolean constants Sorts should always start with a capital letter The constants true and false, literals should always be quoted / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 16 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 17 Booleans module basic/booleans imports basic/boolcon exports sorts Boolean context-free syntax BoolCon -> Boolean {cons( con )} Import Boolean constants The sort of Boolean expressions Each Boolean constant is a Boolean Expression, also called injection rule or chain hi rule / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 18 The infix operators and & and or. Both are left-associative (left) The prefix function not Boolean " " Boolean -> Boolean {cons( ns( or ), left} Boolean "&" Boolean -> Boolean {cons( and ), left} not (Boolean) -> Boolean {cons( not )} "(" Boolean ")" -> Boolean {bracket} ( and ) may be used as brackets in Boolean expressions; they are context-free priorities ignored after parsing Boolean "&" Boolean -> Boolean > & has higher h priority it than Boolean " " Boolean -> Boolean Example: Bool & Bool Bool is interpreted as: (Bool & Bool) Bool / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 19

hiddens context-free start-symbols Boolean The start symbol of a grammar. Without a start symbol the parser does not know how to start parsing an input sentence imports basic/comments Import the standard comments Summary: Each module defines a language; in this case the language of Booleans (synonym: data type) We can use this language definition to Create a (syntax-directed) editor for the Boolean language and create Boolean terms Import it in another module; this makes the Boolean language available for the importing module / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 20 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 21 A toy language g Pico: Pico has two types: natural number and string Variables have to be declared Statements: assign, if-then-else, while-do Expressions: natural, string, +, - and + and - have natural operands; the result is natural has string operands and the result is string Tests (if, while) should be of type natural input value begin declare input : natural, output : natural, repnr : natural, output value rep : natural; input := 14; output := 1; What does this program compute? while input - 1 do rep := output; repnr := input; while repnr - 1 do output := output + rep; repnr := repnr - 1 od; input := input - 1 od end / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 22 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 23

begin declare input : natural, output : natural, repnr : natural, rep : natural; input := 14; output := 1; while input - 1 do rep := output; repnr := input; while repnr - 1 do output := output + rep; repnr := repnr - 1 od; input := input - 1 od end input value output value What does this program compute? 14! = 14 * 13 * * 1 Why is it written in this clumsy style? (a) Pico has no input/output statements (b) Pico has no multiplication operator Defining the syntax for Pico basic/natcon basic/whitespace basic/strcon languages/pico/syntax/pico languages/pico/syntax/types languages/pico/syntax/identifiers / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 24 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 25 module languages/pico/syntax/pico imports Sorts and syntax rules lesfor languages/pico/syntax/identifiers program and declarations languages/pico/syntax/types basic/natcon basic/strcon exports List of zero or more sorts statements separated by ; PROGRAM DECLS ID-TYPE STATEMENT EXP zero or more context-free start-symbols * + one or more PROGRAM context-free syntax "begin" DECLS {STATEMENT ";"}* "end" -> PROGRAM {cons( program )} "declare" {ID-TYPE ","}* ";" -> DECLS {cons( decls )} PICO-ID ":" TYPE -> ID-TYPE {cons( id-type )} Syntax rules for statemen PICO-ID ID ":="" EXP -> STATEMENT {cons( assign )} "if" EXP "then" {STATEMENT ";"}* "else" {STATEMENT ";"}* "fi" -> STATEMENT {cons( cond )} "while" EXP "do" {STATEMENT ";"}* "od -> STATEMENT {cons( loop )} / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 26 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 27

PICO-ID -> EXP {cons( id )} NatCon -> EXP {cons( nat )} StrCon -> EXP {cons( str )} EXP "+" EXP -> EXP {cons( plus ), left} EXP "-" EXP -> EXP {cons( min ), left} EXP " " EXP -> EXP {cons( conc ), left} "(" EXP ") -> EXP {bracket} context-free priorities EXP " " EXP -> EXP > EXP "-" EXP -> EXP > EXP "+" EXP -> EXP Syntax rules for expressions The sort NatCon is imported from basic/natcon The sort StrCon is imported from basic/strcon Binary operators are left-associative The priorities of the binary operators, a disambiguation construct: 1 - (2 + 3), or (1-2) + 3 Lexical syntax: Identifiers module languages/pico/syntax/identifiers exports sorts PICO-ID lexical syntax [a-z] [a-z0-9]* -> PICO-ID lexical l restrictions PICO-ID -/- [a-z0-9] Repeat zero (*) or one (+)ormoretimes A lexical restriction: is aaa three, two or one identifier? -/- can be used to define longest match A character class: PICO-ID Starts with a lowercase letter / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 28 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 29 Pico-Types module languages/pico/syntax/types exports sorts TYPE context-free syntax "natural" -> TYPE {cons( natural )} "string" " -> TYPE {cons( string )} The sort of possible types in a Pico program The constants natural and string represent types as can be declared in Pico program Summary The modules languages/pico/syntax/pico defines (together with the imported modules) the syntax for the Pico language This syntax can be used to Generate a parser that can parse Pico programs Generate a syntax-directed editor for Pico programs / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 30 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 31

An elementary symbol is: Literal: abc Sort (non-terminal) names: INT Character classes: [a-z]: one of a, b,, z ~: complement of character class. /: difference of two character classes. /\: intersection of two character classes. \/: union of two character classes. A complex symbol is: Repetition: S* zero or more times S; S+ one or more times S {S1 S2}* zero or more times S1 separated by S2 {S1 S2}+ one or more times S1 separated by S2 Optional: S? zero or one occurrences of S Alternative: S T an S or a T Tuple: <S,T> shorthand for < S, T > Parameterized sorts: S[[ P1, P2 ]] / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 32 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 33 Productions (functions): General form of a production (function): S1 S2 Sn -> S0 Attributes Lexical syntax and context-free syntax are similar, but Between the symbols in a production optional layout symbols may occur in the input text. A context-free t production is equivalent with: S1 LAYOUT? S2 LAYOUT? LAYOUT? Sn -> S0 Floating point numbers sorts UnsignedInt SignedInt UnsignedReal Number lexical syntax [0] ([1-9][0-9]*) -> UnsignedInt [\+\-]? UnsignedInt -> SignedInt UnsignedInt "." [0-9]+ ([ee] SignedInt)? -> UnsignedReal UnsignedInt [ee] SignedInt -> UnsignedReal UnsignedInt UnsignedReal -> Number 0 1 14 0.1 3e4 3.014e-7 00 01 04.1 3e04 3.14e-07 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 34 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 35

Various ways of constructing lists A+ a a a a Assume: a -> A {A ; }+ a a ; a a ; a; a a ; a; a; (A ; )+ a ; a ; a; a ; a; a; a ; a; a (A ;?)+ ;?) a a a a ; a a ; a; / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 36