Book Generic Language g Technology (2IS15) Syntaxes Software Language g Engineering g by Anneke Kleppe (Addison Wesley) Prof.dr. Mark van den Brand / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 1 Definition of a (programming) g) language g involves: abstract syntax, so-called signature concrete syntax: textual syntax graphical syntax semantics: static semantics dynamic semantics Grammar world The 4-layer architecture M3 (E)BNF/SDF grammar defines structure of the (E)BNF in (E)BNF M2 Java grammar defines the structure of Java in (E)BNF M1 Java program describes the manipulation (algorithm) of objects in the object layer M0 Object layer Objects we wish to manipulate / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 2 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 3
Abstract syntax: defines basic structure of the language (skeleton) is starting point for defining: concrete syntax static semantics dynamic semantics Abstract syntax is a collection of constructors/- functions No information about keywords, priorities, iti associativities, etc. Abstract syntax definition of Booleans: true () -> BoolCon false () -> BoolCon con (BoolCon) -> Bool and (Bool, Bool) -> Bool or (Bool, Bool) -> Bool not (Bool) -> Bool constructor nonterminal / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 4 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 5 There is no standardized way of defining abstract syntax SSL (specification formalism of the Synthesizer Generator) Signature-like (Meta-modeling) SSL (grammar specification formalism of the Synthesizer Generator) describes it as follows: A collection of rules that define phyla and operators A phylum is a nonempty set of terms A term is the application of a k-ary operator to k terms of the appropriate phylum A k-ary operator is a constructor function mapping k terms to a term A phylum can be considered a nonterminal phyl 0 : op(phyl 1 phyl 2 phyl k ) / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 6 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 7
SSL notation of the definition of the abstract syntax of Booleans: boolcon : True() False() bool : Con(boolcon) And(bool bool) Or(bool bool) Signature describes it as follows: A collection of functions that define sorts and operators A sort represents a nonempty set of terms A term is the application of a k-ary operator to k terms of the appropriate sort A k-ary operator is a constructor function mapping k terms to aterm A sort can be considered a nonterminal op(sort( 1, sort 2,, sort k ) sort 0 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 8 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 9 Signature notation of the definition of the abstract syntax of Booleans: true () -> BoolCon false () -> BoolCon con (BoolCon) -> Bool and (Bool, Bool) -> Bool or (Bool, Bool) -> Bool not (Bool) -> Bool Given signatures it is possible to generate APIs Tooling for defining signatures and generating APIs: GOM part of TOM (http://tom.loria.fr/wiki/index.php5/documentation:gom) php5/documentation:gom) ApiGen part of SDF (see later) / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 10 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 11
Definition of a (programming) language involves: lexical syntax, so-called tokens: identifiers, numbers, strings, if, then, class (keywords) context-free syntax, so-called production rules: Statement ::= if Expression then Statements else Statements fi static semantics: identification and scope resolution type checking dynamic semantics: operational semantics interpretation compilation Goal: defining languages g & manipulating programs SDF: Syntax definition Formalism lexical & context-free syntax ASF+SDF SDF Meta-Environment: t IDE for ASF+SDF SDF manuals/documentation: www.meta-environment.org Spoofax/IMP: Eclipse plugin for SDF manuals/documentation: http://strategoxt.org/spoofax / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 12 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 13 Anatomy of SDF specifications module A imports B C module B imports D module C module D Anatomy of an SDF module module ModuleName ImportSection* ExportOrHiddenSection* ti imports, aliases, sorts, lexical syntax, context-free syntax, priorities, variables Name of this module; may be followed by parameters Names of modules imported by this module; May be followed by renamings Grammar elements that are visible from the outside (exports) or only inside the module (hiddens). / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 14 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 15
SDF by examples Boolean language Pico language Boolean Constants module basic/boolcon exports sorts BoolCon context-free syntax "true"" -> BoolCon {cons( true )} "false" -> BoolCon {cons( false )} Sort of Boolean constants Sorts should always start with a capital letter The constants true and false, literals should always be quoted / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 16 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 17 Booleans module basic/booleans imports basic/boolcon exports sorts Boolean context-free syntax BoolCon -> Boolean {cons( con )} Import Boolean constants The sort of Boolean expressions Each Boolean constant is a Boolean Expression, also called injection rule or chain hi rule / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 18 The infix operators and & and or. Both are left-associative (left) The prefix function not Boolean " " Boolean -> Boolean {cons( ns( or ), left} Boolean "&" Boolean -> Boolean {cons( and ), left} not (Boolean) -> Boolean {cons( not )} "(" Boolean ")" -> Boolean {bracket} ( and ) may be used as brackets in Boolean expressions; they are context-free priorities ignored after parsing Boolean "&" Boolean -> Boolean > & has higher h priority it than Boolean " " Boolean -> Boolean Example: Bool & Bool Bool is interpreted as: (Bool & Bool) Bool / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 19
hiddens context-free start-symbols Boolean The start symbol of a grammar. Without a start symbol the parser does not know how to start parsing an input sentence imports basic/comments Import the standard comments Summary: Each module defines a language; in this case the language of Booleans (synonym: data type) We can use this language definition to Create a (syntax-directed) editor for the Boolean language and create Boolean terms Import it in another module; this makes the Boolean language available for the importing module / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 20 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 21 A toy language g Pico: Pico has two types: natural number and string Variables have to be declared Statements: assign, if-then-else, while-do Expressions: natural, string, +, - and + and - have natural operands; the result is natural has string operands and the result is string Tests (if, while) should be of type natural input value begin declare input : natural, output : natural, repnr : natural, output value rep : natural; input := 14; output := 1; What does this program compute? while input - 1 do rep := output; repnr := input; while repnr - 1 do output := output + rep; repnr := repnr - 1 od; input := input - 1 od end / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 22 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 23
begin declare input : natural, output : natural, repnr : natural, rep : natural; input := 14; output := 1; while input - 1 do rep := output; repnr := input; while repnr - 1 do output := output + rep; repnr := repnr - 1 od; input := input - 1 od end input value output value What does this program compute? 14! = 14 * 13 * * 1 Why is it written in this clumsy style? (a) Pico has no input/output statements (b) Pico has no multiplication operator Defining the syntax for Pico basic/natcon basic/whitespace basic/strcon languages/pico/syntax/pico languages/pico/syntax/types languages/pico/syntax/identifiers / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 24 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 25 module languages/pico/syntax/pico imports Sorts and syntax rules lesfor languages/pico/syntax/identifiers program and declarations languages/pico/syntax/types basic/natcon basic/strcon exports List of zero or more sorts statements separated by ; PROGRAM DECLS ID-TYPE STATEMENT EXP zero or more context-free start-symbols * + one or more PROGRAM context-free syntax "begin" DECLS {STATEMENT ";"}* "end" -> PROGRAM {cons( program )} "declare" {ID-TYPE ","}* ";" -> DECLS {cons( decls )} PICO-ID ":" TYPE -> ID-TYPE {cons( id-type )} Syntax rules for statemen PICO-ID ID ":="" EXP -> STATEMENT {cons( assign )} "if" EXP "then" {STATEMENT ";"}* "else" {STATEMENT ";"}* "fi" -> STATEMENT {cons( cond )} "while" EXP "do" {STATEMENT ";"}* "od -> STATEMENT {cons( loop )} / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 26 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 27
PICO-ID -> EXP {cons( id )} NatCon -> EXP {cons( nat )} StrCon -> EXP {cons( str )} EXP "+" EXP -> EXP {cons( plus ), left} EXP "-" EXP -> EXP {cons( min ), left} EXP " " EXP -> EXP {cons( conc ), left} "(" EXP ") -> EXP {bracket} context-free priorities EXP " " EXP -> EXP > EXP "-" EXP -> EXP > EXP "+" EXP -> EXP Syntax rules for expressions The sort NatCon is imported from basic/natcon The sort StrCon is imported from basic/strcon Binary operators are left-associative The priorities of the binary operators, a disambiguation construct: 1 - (2 + 3), or (1-2) + 3 Lexical syntax: Identifiers module languages/pico/syntax/identifiers exports sorts PICO-ID lexical syntax [a-z] [a-z0-9]* -> PICO-ID lexical l restrictions PICO-ID -/- [a-z0-9] Repeat zero (*) or one (+)ormoretimes A lexical restriction: is aaa three, two or one identifier? -/- can be used to define longest match A character class: PICO-ID Starts with a lowercase letter / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 28 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 29 Pico-Types module languages/pico/syntax/types exports sorts TYPE context-free syntax "natural" -> TYPE {cons( natural )} "string" " -> TYPE {cons( string )} The sort of possible types in a Pico program The constants natural and string represent types as can be declared in Pico program Summary The modules languages/pico/syntax/pico defines (together with the imported modules) the syntax for the Pico language This syntax can be used to Generate a parser that can parse Pico programs Generate a syntax-directed editor for Pico programs / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 30 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 31
An elementary symbol is: Literal: abc Sort (non-terminal) names: INT Character classes: [a-z]: one of a, b,, z ~: complement of character class. /: difference of two character classes. /\: intersection of two character classes. \/: union of two character classes. A complex symbol is: Repetition: S* zero or more times S; S+ one or more times S {S1 S2}* zero or more times S1 separated by S2 {S1 S2}+ one or more times S1 separated by S2 Optional: S? zero or one occurrences of S Alternative: S T an S or a T Tuple: <S,T> shorthand for < S, T > Parameterized sorts: S[[ P1, P2 ]] / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 32 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 33 Productions (functions): General form of a production (function): S1 S2 Sn -> S0 Attributes Lexical syntax and context-free syntax are similar, but Between the symbols in a production optional layout symbols may occur in the input text. A context-free t production is equivalent with: S1 LAYOUT? S2 LAYOUT? LAYOUT? Sn -> S0 Floating point numbers sorts UnsignedInt SignedInt UnsignedReal Number lexical syntax [0] ([1-9][0-9]*) -> UnsignedInt [\+\-]? UnsignedInt -> SignedInt UnsignedInt "." [0-9]+ ([ee] SignedInt)? -> UnsignedReal UnsignedInt [ee] SignedInt -> UnsignedReal UnsignedInt UnsignedReal -> Number 0 1 14 0.1 3e4 3.014e-7 00 01 04.1 3e04 3.14e-07 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 34 / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 35
Various ways of constructing lists A+ a a a a Assume: a -> A {A ; }+ a a ; a a ; a; a a ; a; a; (A ; )+ a ; a ; a; a ; a; a; a ; a; a (A ;?)+ ;?) a a a a ; a a ; a; / Faculteit Wiskunde en Informatica 13-9-2011 PAGE 36