yacc, lex Source Code (C) compile Executable Parser Test on HOL Examples, TPTP Library, and TSTP Library

Similar documents
Chapter 3. Describing Syntax and Semantics

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CS323 Lecture - Specifying Syntax and Semantics Last revised 1/16/09

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Working of the Compilers

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax. A. Bellaachia Page: 1

Syntax. In Text: Chapter 3

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics ISBN

Describing Syntax and Semantics

Programming Languages Third Edition

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CPS 506 Comparative Programming Languages. Syntax Specification

ISO/IEC INTERNATIONAL STANDARD. Information technology - Syntactic metalanguage - Extended BNF

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

CS 415 Midterm Exam Spring 2002

This book is licensed under a Creative Commons Attribution 3.0 License

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Programming Language Syntax and Analysis

PL Revision overview

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Part 5 Program Analysis Principles and Techniques

Introduction to Parsing

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

3. DESCRIBING SYNTAX AND SEMANTICS

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

CST-402(T): Language Processors

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

22c:111 Programming Language Concepts. Fall Syntax III

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Using an LALR(1) Parser Generator

2.2 Syntax Definition

DEMO A Language for Practice Implementation Comp 506, Spring 2018

CSE 3302 Programming Languages Lecture 2: Syntax

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Syntax Analysis. Chapter 4

CS 411 Midterm Feb 2008

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COMPILER DESIGN LECTURE NOTES

EECS 6083 Intro to Parsing Context Free Grammars

Chapter 2 - Programming Language Syntax. September 20, 2017

Yacc: A Syntactic Analysers Generator

A simple syntax-directed

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

Programming Languages

The Structure of a Syntax-Directed Compiler

Lecture 10 Parsing 10.1

3. Context-free grammars & parsing

Lecture 4: Syntax Specification

SLIDE 2. At the beginning of the lecture, we answer question: On what platform the system will work when discussing this subject?

Syntax Intro and Overview. Syntax

Introduction to Lexical Analysis

Chapter 3: Lexing and Parsing

ICOM 4036 Spring 2004

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

15 212: Principles of Programming. Some Notes on Grammars and Parsing

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

Language Processors Chapter 1. By: Bhargavi H Goswami Assistant Professor Sunshine Group of Institutes Rajkot, Gujarat, India

(F)lex & Bison/Yacc. Language Tools for C/C++ CS 550 Programming Languages. Alexander Gutierrez

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

Object-oriented Compiler Construction

UNIT III & IV. Bottom up parsing

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Chapter 4. Lexical and Syntax Analysis

L L G E N. Generator of syntax analyzier (parser)

Automatic Scanning and Parsing using LEX and YACC

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Building a Parser Part III

Formats of Translated Programs

CSC 467 Lecture 3: Regular Expressions

FreePascal changes: user documentation

1 Lexical Considerations

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

The Structure of a Syntax-Directed Compiler

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing II Top-down parsing. Comp 412

Compiler Lab. Introduction to tools Lex and Yacc

List of Figures. About the Authors. Acknowledgments

A programming language requires two major definitions A simple one pass compiler

4. An interpreter is a program that

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Principles of Programming Languages COMP251: Syntax and Grammars

UNIVERSITY OF CALIFORNIA

B The SLLGEN Parsing System

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Context-free Grammars

Functional Parsing A Multi-Lingual Killer- Application

Compilers and Interpreters

Chapter 2 :: Programming Language Syntax

Building Compilers with Phoenix

Transcription:

Extending the TPTP Language to Higher-Order Logic with Automated Parser Generation Allen Van Gelder 1 and Geoff Sutcliffe 2 1 University of California at Santa Cruz, USA, http://www.cse.ucsc.edu/~avg 2 University of Miami, USA, geoff@cs.miami.edu Abstract. A stable proposal for extending the first-order TPTP (Thousands of Problems for Theorem Provers) language to higher-order logic, based primarily on lambda-calculus expressions, is presented. The purpose of the system is to facilitate sharing of theorem-proving problems in higher-order logic among many researchers. Design goals are discussed. BNF2, a new specification language, is presented. Unix/Linux scripts translate the specification document into a lex scanner and yacc parser. 1 Introduction The goal of this work is to extend the current TPTP (Thousands of Problems for Theorem Provers) language [9] to include adequate support for higher-order logic, while continuing to recognize the existing first-order language. It was motivated by a panel discussion at the Workshop on Experimentally Successful Automated Reasoning in Higher-Order Logic held in conjunction with LPAR- 12, December 2005. The panel discussion conveyed the desire to have a common language in which various researchers could express benchmark problems in higher-order logics that would be contributed to a common library along the lines of the TPTP problem library [8]. The new language developed in this project is tentatively named HOTPTP. Some of the design goals were 1. The rules of the language should be simple and regular, so that humans can understand them without too much trouble. 2. The rules of the language should be presented in a specification document that has sufficient formality and rigor to be unambiguous, yet is not so technical and complicated that its meaning is obscured. 3. The language should be amenable to straightforward automated parser generation, with established tools such as lex and yacc, or flex and bison. These tools accept LALR-1 languages. It would be undesirable to rely on tricks and extensions that might be supported in one tool and not another. 4. It should be straightforward to set up a Prolog parser for the language, using Prolog's read() procedure to accomplish most of the parsing drudgery. The rules of the TPTP language, as released with TPTP v3.0.0, already achieved goals (1), (2), and (4) above quite well. The first step of the project was to achieve goal (3). During this initial phase, a few ambiguities were discovered

in the existing language, and minor revisions of the rules were implemented to remove these ambiguities without changing the underlying language. Following that work, the task of extending the language to accommodate higher order constructs began. Briefly, the contributions arising from this project are: 1. The development of BNF2, a new variant of Backus-Naur form. BNF2 is oriented toward the modern practice of two-level syntax for programming languages and is easy for humans to read. 2. Unix/Linux scripts to translate BNF2 into input readable by lex and yacc. 3. Stable BNF2 rules that extend the TPTP language and accept a variety of higher-order logic expressions in a human-readable language. Software and documents are at http://www.cse.ucsc.edu/~avg/tptpparser. 2 Specifications with BNF2 The TPTP language was specified in TPTP v3.0.0 using the original standard Backus-Naur form (BNF) [6], with informal explanations to get over some rough spots. In this simple and easy-to-read format, which is found in many programming-language texts, grammar symbols are enclosed in < >, and the only meta-symbols are the production symbol ::=", the alternative symbol ", and the repetition symbol *"; any other character sequence stands for itself, and is called a self-declared token. More sophisticated variants have been proposed over the years; see Section 5. While trying to write scripts to translate BNF into inputs for lex and yacc it was realized that standard BNF is ill-suited for specifying tokens. That is, the modern two-level style of programming-language specification defines tokens using regular expressions, and defines grammar symbols using context-free production rules. A lexical analyzer parses the raw input into tokens, while the production rules treat tokens as terminal symbols. This distinction is blurred in standard BNF. Another aspect of the TPTP language that was observed was that some production rules went beyond specifying the form of the input, and specified a list of acceptable words. This presented a conflict in that such words became self-declared tokens. Without making a context-sensitive lexical analyzer, such words became unavailable for user identifiers. To overcome these limitations of standard BNF we designed BNF2, a simple extension of BNF that preserves the easy-to-read format for production rules and adds different formats to specify semantic rules, tokens, and macros for tokens. Semantic production rules are ignored for purposes of syntactic parsing, but are available to specify more detail about the semantic content of certain sentential forms. All the extensions are implemented by using additional metasymbols to specify various rule types, according to the following table. As the table shows, a symbol that has a semantic rule must also have a normal grammar rule if it appears on the right side of any normal grammar rule. The string <=>" and following strings are self-declared tokens: grammar symbols must consist of alphanumerics. The right sides of token and macro rules are lex-ready regular expressions, except that < >" need to be converted to }".

Design Language Revise Specification in BNF2 translation scripts Errors, Revise Human Review Errors, Revise Prolog Parser with Semantics Inputs to lex, yacc hand-code yacc, lex Source Code (C) compile Executable Parser Test on HOL Examples, TPTP Library, and TSTP Library DONE (for now) Fig. 1. Overview of the system evaluate a proposed HOTPTP language. Human activities are shown in italics. Meta- Rule Symbol Type Examples (some are simplified from the TPTP language) ::= Grammar <TPTP input> ::= <annotated formula> <comment> <nonassoc op> ::= <=> => <= <~> <formula role> ::= <lower word> :== Semantic <formula role> :== axiom conjecture lemma theorem negated_conjecture ::- Token <lower word> ::- <lower><alphanum>* ::= Macro <lower> ::: [a-z] <alphanum> ::: [A-Za-z0-9_] 3 System Description This section describes the system that evaluates a proposed HOTPTP language, based on a BNF2 specification document produced manually. The system includes both manual and automated elements. This is not a system to process an arbitrary BNF2 document; its main purpose is to support TPTP-related development. Figure 1 provides an overview. The primary automated part of the system generates an executable parser from a BNF2 specification document for HOTPTP, following the right branch of the diagram. This parser is extremely simple, to ensure that the input being checked is really in the language of the specification document. The first step to generate a parser is to translate the (ASCII text) BNF2 specification document, say hotptp-bnf2.txt, into a pair of files, hotptp-1.lex0 and hotptp-1.y,

which are input files for lex (or flex) and yacc (or bison). Unix/Linux scripts accomplish this translation, invoking sed, awk, grep, sort, etc. No errors are detected during this step. There is a clear correspondence between grammar rules in hotptp-bnf2.txt and hotptp-1.y. Tokens have mnemonic names and are easy to locate in hotptp-1.lex0. The analysis and compilation of hotptp-1.y by yacc or bison is a critical step. Grammar errors and ambiguities are often located here after the BNF2 document has passed human inspection. The default library routines are used for all procedures that are expected to be supplied by the programmer. A standard semantic action is attached to each grammar rule, which builds a naive parse tree for each sentence, which may be printed in verbose mode. A syntax error causes syntax error" on the stderr stream and a nonzero exit code; the exit code is zero upon success. Testing against the full TPTP Problem library requires several minutes. All files with extension.p" should be accepted, whereas the TSTP library contains files that are known to have syntax errors. The most volatile files are the HOL examples, which use the higher order extensions. When a syntax error or unexpected parse tree occurs, analysis is needed to determine if fault lies with the formula or the language specification. Based on available examples, the HOTPTP language has stabilized after about ten iterations of the left branch of the system diagram. 4 The HOTPTP Syntax Proposal The complete BNF2 document for the proposed HOTPTP syntax is available as hotptp-bnf2.txt at the URL given at the end of Section 1. The following example illustrates many of the features added to express higher order constructs. TPTP follows the Prolog convention that variables begin with capital letters, and uses?" for 9 and!" for 8. hof(1, definition, set_union := lambda [A: $type]: lambda [D: ((A-> $o)-> $o), X: A]:? [S: (A-> $o)]: ( (D @ S) & (S @ X) ) ). The new operator :=" permits a definition at the top level of a formula" (Hudak uses " [5]). Other new operators are: lambda" for lambda abstraction, @" for application, and ->" for type mappings. The colon :" operator has several new meanings, for typing and lambda expressions. Also, ^" is a synonym for lambda and >" is a synonym for ->". We call these new operators the HOF operators. Logical operators and HOF operators can be mixed in /@ expressions,, subject to using parentheses as needed. Following the general principle in the TPTP language, an apply expression, using one or more binary @" operators, must be parenthesized; however, the unary operator X and its argument need not be. Note that @" is left-associative, ->" is right-associative, and :" is right-associative, following usual lambda-calculus conventions. Associativities of existing TPTP operators carry over. The lambda expression shown is: A:fi: D:((A!o)!o): X:A: 9S:(A!o): ((D @ S) ^ (S @ X)).

Variables can be typed at the point where they are bound, but not elsewhere. Typing is not required. Builtin base types are $type (the set of types), $i (the set of individuals), and $o (the set of truth values). User-defined base types can be constants or functional terms. Compound types may be built from base types with ->". Constants can be typed where they occur in a formula; the syntax is (c: (int-> int)) or (c: A), etc. Other expressions cannot be typed. For example, (g(u, V) : (int-> int-> int)) is impossible (but the apply expression (g: (int-> int-> int)) @ U @ V is accepted). Operators other than the HOF operators can be treated as constants by enclosing them in parentheses, as in (&) or (~) or (=) etc. The expression ( (&) @ X @ Y ) is accepted. A first-order style functional term can appear where a /@ expression is needed, but /@/! expressions cannot appear inside a functional term. That is, p((s @ X)) is impossible, but (S @ p(x)) is accepted. 5 Related Work and Acknowledgments Other variants of BNF have been proposed before BNF2. Extended BNF (EBNF) was designed by a standards committee to have great generality but is quite complicated, with about a dozen meta-symbols, and does not distinguish tokens from grammar symbols. Labeled BNF (LBNF) is designed to generate parsers automatically [3], and distinguishes tokens from grammar symbols, but is even more complicated than EBNF. HOTPTP requirements were culled from Hudak's exposition of lambda calculus [5], and descriptions of Coq [2], LF [4], and ELF [7]. We thank Chad Brown and Chris Benzmüller for contributing examples of formulas that should be expressible in HOTPTP syntax, based on their work [1]. References 1. Benzmüller, C., Brown, C.: A Structured Set of Higher-Order Problems. In: Proc. 18th Theorem Proving in Higher Order Logics. (2005) 66 81 2. Felty, D., et al.: The Coq Proof Assistant. http://pauillac.inria.fr/coq (URL) 3. Forsberg, M., Ranta, A.: The Labelled BNF Grammar Formalism. Technical report, Chalmers, Gothenburg, Sweden (2005) http://www.cs.chalmers.se/οmarkus/bnfc. 4. Harper, R., Honsell, F., Plotkin, G.: A Framework for Defining Logics. Journal of the ACM 40 (1993) 143 184 5. Hudak, P.: Conception, Evolution, and Application of Functional Programming Languages. ACM Computing Surveys 21 (1989) 359 411 6. Naur et al., P.: Report on Algorithmic Language ALGOL 60. Communications of the ACM 3 (1960) 299 314 7. Pfenning, F.: ELF: A meta-language for deductive systems (system description). In: CADE. (1994) 8. Sutcliffe, G., Suttner, C.: The TPTP Problem Library: CNF Release v1.2.1. Journal of Automated Reasoning 21 (1998) 177 203 9. Sutcliffe, G., Zimmer, J., Schulz, S.: TSTP Data-Exchange Formats for Automated Theorem Proving Tools. In: Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems. IOS Press (2004) 201 215