Parser Generation. Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team. Mainly with applications to! language processing

Similar documents
Software Language Processing

Parsing. Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team

SML-SYNTAX-LANGUAGE INTERPRETER IN JAVA. Jiahao Yuan Supervisor: Dr. Vijay Gehlot

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Building Compilers with Phoenix

CPS 506 Comparative Programming Languages. Syntax Specification

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

LECTURE 3. Compiler Phases

Functional OO Programming. Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team

Install and Configure ANTLR 4 on Eclipse and Ubuntu

Lecture 14 Sections Mon, Mar 2, 2009

Syntax. In Text: Chapter 3

Syntax Analysis MIF08. Laure Gonnord

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

JavaCC: SimpleExamples

PL Revision overview

Introduction to Compiler Design

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

A programming language requires two major definitions A simple one pass compiler

Building Compilers with Phoenix

A simple syntax-directed

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Design and Implementation of a Java to CodeML converter

Compiler Design (40-414)

Syntax-Directed Translation. Lecture 14

COSE312: Compilers. Lecture 1 Overview of Compilers

Programming Language Syntax and Analysis

The design of the PowerTools engine. The basics

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau

CS 230 Programming Languages

Abstract Syntax Trees Synthetic and Inherited Attributes

CSE302: Compiler Design

A Simple Syntax-Directed Translator

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Semantic Analysis. Lecture 9. February 7, 2018

Tree visitors. Context classes. CS664 Compiler Theory and Design LIU 1 of 11. Christopher League* 24 February 2016

Semantic Analysis Attribute Grammars

Programming Languages. Dr. Philip Cannata 1

4. Semantic Processing and Attributed Grammars

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Lexical and Syntax Analysis

Lecture 12: Parser-Generating Tools

Software Tools ANTLR

Compiler Construction

Time : 1 Hour Max Marks : 30

Lexing, Parsing. Laure Gonnord sept Master 1, ENS de Lyon

Introduction to Compiler

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CS 230 Programming Languages

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

CSCI312 Principles of Programming Languages!

Programming Languages and Compilers (CS 421)

Writing Evaluators MIF08. Laure Gonnord

CS 3304 Comparative Languages. Lecture 1: Introduction

Grammar Convergence. Ralf Lämmel and Vadim Zaytsev Software Languages Team Universität Koblenz-Landau. 13 февраля 2009 г.

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Index. Comma-separated values (CSV), 30 Conditional expression, 16 Continuous integration (CI), 34 35, 41 Conway s Law, 75 CurrencyPair class,

Syntax Errors; Static Semantics

Syntax. A. Bellaachia Page: 1

Chapter 3. Describing Syntax and Semantics ISBN

Lexical Analysis. Introduction

Compiler Construction

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Syntax Intro and Overview. Syntax

Writing a Simple DSL Compiler with Delphi. Primož Gabrijelčič / primoz.gabrijelcic.org

Compilation: a bit about the target architecture

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Better Extensibility through Modular Syntax. Robert Grimm New York University

Error Handling Syntax-Directed Translation Recursive Descent Parsing

ECE251 Midterm practice questions, Fall 2010

CMSC 330: Organization of Programming Languages. Context Free Grammars


EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Little Language [Grand]

Describing Syntax and Semantics

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

JavaCUP. There are also many parser generators written in Java

COP 3402 Systems Software Syntax Analysis (Parser)

CS /534 Compiler Construction University of Massachusetts Lowell

Software Tools ANTLR

Programming Languages. Dr. Philip Cannata 1

Extending xcom. Chapter Overview of xcom

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compiler construction in4303 lecture 3

1 Lexical Considerations

Test I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Department of Electrical Engineering and Computer Science

of Grammarware by Grammar Transformation

Introduction to Syntax Analysis Recursive-Descent Parsing

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Principles of Programming Languages COMP251: Syntax and Grammars

Transcription:

Parser Generation Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team Mainly with applications to! language processing

Program generation Specification Program generator Program

Parser generation Grammar! with some! code Program generator Parser

Generative programming Generative programming is a style of computer programming that uses automated source code creation through generic frames, classes, prototypes, templates, aspects, and code generators to improve programmer productivity... It is often related to codereuse topics such as component-based software engineering and product family engineering. Retrieved from Wikipedia! on 3 May, 2011.

Language processing patterns 1. The Chopper Pattern! 2. The Lexer Pattern! 3. The Copy/Replace Pattern! 4. The Acceptor Pattern! 5. The Parser Pattern! 6. The Lexer Generation Pattern! 7. The Acceptor Generation Pattern! 8. The Parser Generation Pattern! 9. The Text-to-object Pattern! 10.The Parser Generation 2 Pattern! 11.(The Text-to-tree Pattern)! 12.(The Tree-walk Pattern)! 13.The Object-to-text Pattern manual patterns generative patterns

See! the Lexer Pattern! for comparison The Lexer Generation Pattern

The Lexer Generation Pattern Intent: Analyze text at the lexical level. Use token-level grammar as input.! Operational steps (compile time):! 1.Generate code from the lexer grammar.! 2.Write boilerplate code to drive the lexer.! 3.Write code to process token/lexeme stream.! 4.Build generated and hand-written code.

Token-level grammar for 101 COMPANY : 'company'; DEPARTMENT : 'department'; EMPLOYEE : 'employee'; MANAGER : 'manager'; ADDRESS : 'address'; SALARY : 'salary'; OPEN : '{'; CLOSE : '}'; STRING : '"' (~'"')* '"'; FLOAT : ('0'..'9')+ ('.' ('0'..'9')+)?; WS : (' ' '\r'? '\n' '\t')+;

Lexer generation...g ANTLR...java

ANTLR ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting.!! http://www.antlr.org/

Boilerplate code for lexer execution FileInputStream stream = new FileInputStream(file); ANTLRInputStream antlr = new ANTLRInputStream(stream); MyLexer lexer = new MyLexer(antlr); Token token; while ((token = lexer.nexttoken())!=token.eof_token) {... Grammarderived class }

Demo http://101companies.org/wiki/ Contribution:antlrLexer

Summary of implementation aspects Define regular expressions for all tokens.! Generate code for lexer.! Set up input and token/lexeme stream.! Implement operations by iteration over pairs.! Build generated and hand-written code.

See! the Acceptor Pattern! for comparison The Acceptor Generation Pattern

The Acceptor Generation Pattern Intent: Verify syntactical correctness of input. Use context-free grammar as input.! Operational steps (compile time):! 1.Generate code from the context-free grammar.! 2.Write boilerplate code to drive the acceptor (parser).! 3.Build generated and hand-written code.

Context-free grammar for 101 company : 'company' STRING '{' department* '}' EOF; department : 'department' STRING '{' ('manager' employee) ('employee' employee)* department* '}';! employee : STRING '{' 'address' STRING 'salary' FLOAT '}'; Wanted: a generated acceptor

Parser generation...lexer.java.g ANTLR...Parser.java

Driver for acceptor import org.antlr.runtime.*;! public class Test { public static void main(string[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new...lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer);...Parser parser = new...parser(tokens); parser...(); } } This is basically boilerplate code. Copy and Paste!

Driver for acceptor Prepare System.in as ANTLR input stream import org.antlr.runtime.*;! public class Test { public static void main(string[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new...lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer);...Parser parser = new...parser(tokens); parser...(); } }

Driver for acceptor import org.antlr.runtime.*;! public class Test { public static void main(string[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new...lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer);...Parser parser = new...parser(tokens); parser...(); } } Set up lexer! for language

Driver for acceptor import org.antlr.runtime.*;! public class Test { public static void main(string[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new...lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer);...Parser parser = new...parser(tokens); parser...(); } } Obtain token stream from lexer

Driver for acceptor import org.antlr.runtime.*;! public class Test { public static void main(string[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new...lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer);...Parser parser = new...parser(tokens); parser...(); } } Set up parser and feed with token stream of lexer

Driver for acceptor import org.antlr.runtime.*;! public class Test { public static void main(string[] args) throws Exception { ANTLRInputStream input = new ANTLRInputStream(System.in); SetyLexer lexer = new...lexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer);...Parser parser = new...parser(tokens); parser...(); } } Invoke start symbol of parser

Demo http://101companies.org/wiki/ Contribution:antlrAcceptor

Summary of implementation aspects Define regular expressions for all tokens.! Define context-free productions for all nonterminals.! Generate code for acceptor (parser).! Set up input and token/lexeme stream.! Feed parser with token stream.! Acceptor (parser) execution = call start symbol.! Build generated and hand-written code.

See! the Parser Pattern! for comparison The Parser Generation Pattern

The Parser Generation Pattern Intent: Enable structure-aware functionality. Use context-free grammar as the primary input.! Operational steps (compile time):! 1.Include semantic actions into context-free grammar.! 2.Generate code from enhanced grammar.! 3.Write boilerplate code to drive the parser.! 4.Build generated and hand-written code.

A grammar rule with a semantic action employee : STRING '{' 'address' STRING 'salary' s=float { total += Double.parseDouble($s.text); } '}'; Semantic actions are enclosed between braces. Variable local to parser execution Retrieve and parse lexeme for the number

Demo http://101companies.org/wiki/ Contribution:antlrParser

Parser descriptions can also use synthesized attributes. Add synthesized attributes to nonterminals. Add semantic actions to productions.! Compute an int expr returns [int value] value : e=multexpr {$value = $e.value;} ( '+' e=multexpr {$value += $e.value;} '-' e=multexpr {$value -= $e.value;} )*;! Perform addition & Co. (Context-free part in bold face.)

Use parsing! for AST construction The Text-to-object Pattern

The Text-to-object Pattern Intent: Enable structure-aware OO programs. Build AST objects along parsing. Operational steps (compile time):! 1.Define object model for AST.! 2.Instantiate Parser Generation pattern as follows: Use semantic actions for AST construction. AST Abstract Syntax Tree

Acceptor---for comparison company : 'company' STRING '{' department* '}' EOF;

With semantic actions for object construction included company returns [Company c]: { $c = new Company(); } Construct company object. 'company' STRING { $c.setname($string.text); } '{' Set name of company. ( topdept= department { $c.getdepts().add($topdept.d); } )* '}' ; Add all departments.

Demo http://101companies.org/wiki/ Contribution:antlrObjects

See! the Parser Pattern! and! the Parser Generation Pattern! for comparison The Parser Generation 2 Pattern! (That is, the Parser Generation Generation Pattern.)

The Parser Generation 2 Pattern Intent: Enable structure-aware OO programs. Build AST objects along parsing. Use idomatic context-free grammar as the only input.! Operational steps (compile time):! 1.Author idiomatic context-free grammar.! 2.Generate object model from context-free grammar.! 3.Generate enhanced grammar for AST construction.! 4.Generate parser and continue as before.

More declarative parser generation...g idiomatic! grammar Program generator...java object model

Idiomatic grammar for 101 Company :...;!! Department :! 'department' dname=qqstring '{'!! 'manager' manager=employee!!! subdepartments=department*!!! employees=nonmanager*!!'}';!! NonManager :...;! Employee :...; Leverage predefined token-level grammar Label all nonterminal occurrences

Demo http://101companies.org/wiki/ Contribution:yapg

Bootstrapping 1.Develop ANTLR-based parser for idiomatic grammars.! 2.Develop object model for ASTs for idiomatic grammars.! 3.Develop ANTLR generator.! 4.Develop object model generator.! 5.Define idiomatic grammar of idiomatic grammars.! 6.Generate grammar parser from said grammar.! 7.Generate object model from said grammar.!

The grammar of grammars Grammar!! : prods=production*;! Production! : lhs=id ':' rhs=expression ';';! Expression!: Sequence Choice;! Sequence!! : list=atom*;! Choice!!! : first=id rest=option*;! Atom!!!! : Terminal Nonterminal Many;! Terminal!! : symbol=qstring;! Nonterminal!: label=id '=' symbol=id;! Option!!! : ' ' symbol=id;! Many!!! : elem=nonterminal '*';

Use trees rather than POJOs! for AST construction! along parsing The Text-to-tree Pattern Mentioned in passing

The Text-to-tree Pattern Intent: Advanced stuff! Enable structure-aware grammar-based programs. Build AST trees along parsing. Use grammar with tree construction actions as input.! Operational steps (compile time):! 1.Author grammar with tree construction actions.! 2.Generate parser.

Tree construction with ANTLR department :! 'department' name=string '{'! manager! ('employee' employee)*! department*! '}'! -> ^(DEPT $name manager employee* department*)! ; ANTLR s special semantic actions for tree construction

Demo http://101companies.org/wiki/ Contribution:antlrTrees

The Tree-walk Pattern Mentioned in passing

The Tree-walk Pattern Intent: Advanced stuff! Operate on trees in a grammar-based manner. Productions describe tree structure. Semantic actions describe actions along tree walking.! Operational steps (compile time):! 1.Author tree grammar (as opposed to CFG).! 2.Add semantic actions.! 3.Generate and build tree walker.

Total: Tree walking with ANTLR Matching trees! rather than parsing text. Semantic actions as before. company :! ^(COMPANY STRING dept*)! ;!! dept :! ^(DEPT STRING manager employee* dept*)! ;!! manager :! ^(MANAGER employee)! ;!! employee :! ^(EMPLOYEE STRING STRING FLOAT)! { total += Double.parseDouble($FLOAT.text); }! ;

Cut: Tree walking with ANTLR Tree walking strategy Match trees Rebuild tree topdown : employee;!! employee :! ^(EMPLOYEE STRING STRING s=float)! -> ^(EMPLOYEE! STRING! STRING! FLOAT[! Double.toString(! Double.parseDouble(! $s.text) / 2.0d)])! ;

Demo http://101companies.org/wiki/ Contribution:antlrTrees

Summary Language processing is a programming domain.! Grammars are program specifications in that domain.! Those specifications may be enhanced semantically.! Those specifications may also be constrained idiomatically.! Manual implementation of such specifications is not productive.! Parser generators automatically implement such specifications.! Parser generation is an instance of generative programming.

A development cycle for language-based tools 1.Author language samples.! Some bits of! software engineering 2.Complete samples into test cases.! 3.Author the EBNF for the language.! 4.Refine EBNF into acceptor (using generation or not).! 5.Build and test the acceptor.! 6.Extend the acceptor with semantic actions.! 7.Build and test the language processor.