Appendix A: Syntax Diagrams

Similar documents
A simple syntax-directed

2.2 Syntax Definition

UNIT -2 LEXICAL ANALYSIS

Part 5 Program Analysis Principles and Techniques

Syntax Analysis. Chapter 4

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

This book is licensed under a Creative Commons Attribution 3.0 License

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

The Structure of a Syntax-Directed Compiler

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Chapter 2 Basic Elements of C++

Formats of Translated Programs

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

printf( Please enter another number: ); scanf( %d, &num2);

Context-Free Grammars

Full file at

CSCI312 Principles of Programming Languages!

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS143 Midterm Fall 2008

1 Lexical Considerations

CS 6353 Compiler Construction Project Assignments

Introduction to Lexical Analysis

ML 4 A Lexer for OCaml s Type System

Compiler Design (40-414)

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Considerations

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CPS 506 Comparative Programming Languages. Syntax Specification

The PCAT Programming Language Reference Manual

Lexical Analysis. Chapter 2

CS164: Midterm I. Fall 2003

Lecture 4: Syntax Specification

CS101 Introduction to Programming Languages and Compilers

Grammars and Parsing, second week

Principles of Programming Languages COMP251: Syntax and Grammars

Shorthand for values: variables

Lexical and Syntax Analysis

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

A programming language requires two major definitions A simple one pass compiler

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

1. Lexical Analysis Phase

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Project 1: Scheme Pretty-Printer

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

2 Review of Set Theory

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

What do Compilers Produce?

5. Control Statements

Programming Languages Third Edition. Chapter 7 Basic Semantics

Lexical Analysis (ASU Ch 3, Fig 3.1)

Introduction to Parsing Ambiguity and Syntax Errors

ARG! Language Reference Manual

The Structure of a Syntax-Directed Compiler

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntax. A. Bellaachia Page: 1

Introduction to Parsing Ambiguity and Syntax Errors

Languages and Compilers

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Syntax and Grammars 1 / 21

Compiler Construction

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

4. Lexical and Syntax Analysis

Parsing and Pattern Recognition

Chapter 4. Lexical and Syntax Analysis

Lexical Considerations

COP 3402 Systems Software Syntax Analysis (Parser)

Einführung in die Programmierung Introduction to Programming

Formal Languages and Compilers Lecture VI: Lexical Analysis

Variables and Constants

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Optimizing Finite Automata

The Structure of a Compiler

Homework 1 Due Tuesday, January 30, 2018 at 8pm

Lexical Analysis. Lecture 2-4

Text Input and Conditionals

The Structure of a Syntax-Directed Compiler

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

4. Lexical and Syntax Analysis

COMPILER DESIGN LECTURE NOTES

Problem: Read in characters and group them into tokens (words). Produce a program listing. Do it efficiently.

Grammars and Parsing. Paul Klint. Grammars and Parsing

CSE 3302 Programming Languages Lecture 2: Syntax

A Simple Syntax-Directed Translator

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

KU Compilerbau - Programming Assignment

Week 2: Syntax Specification, Grammars

Petros: A Multi-purpose Text File Manipulation Language

Transcription:

A. Syntax Diagrams A-1 Appendix A: Syntax Diagrams References: Kathleen Jensen/Niklaus Wirth: PASCAL User Manual and Report, 4th Edition. Springer, 1991. Niklaus Wirth: Compilerbau (in German). Teubner, 1986. Oracle8 SQL Reference, Oracle Corporation, 1997, Part No. A58225-01. Appendix A is a short introduction to syntax diagrams. Don Chamberlin: A Complete Guide to DB2 Universal Database. Morgan Kaufmann, 1998. Section 1.1.2 is a very quick introduction to syntax diagrams.

A. Syntax Diagrams A-2 Overview 1. General Remarks about Syntax Formalisms 2. Syntax Diagrams

A. Syntax Diagrams A-3 Syntax Formalisms (1) An SQL query is a sequence of ASCII characters, but not every sequence of ASCII characters is a valid SQL query. This is not quite precise: (1) The ASCII encoding is not required. The Standard defines an SQL character set that includes letters a-za-z, digits, the space, and these special characters: "%& ()*+,-./:;<=>?_. (2) Extended character sets (with national characters) can be used. There are various formalisms for specifying clearly and unambiguously the subset of all character sequences which are legal in a language like SQL. Other character sequences are said to be syntactically incorrect / to contain syntax errors.

A. Syntax Diagrams A-4 Syntax Formalisms (2) The most important syntax formalisms are: Regular Expressions Context-Free Grammars (CFGs) Syntax Diagrams Syntax Diagrams are equivalent in their expressive power to context-free grammars. Both are more powerful than regular expressions. I.e. every language which can be defined by a regular expression can also be defined by a CFG/syntax diagram.

A. Syntax Diagrams A-5 Syntax Formalisms (3) In order to master a language like SQL, one of course need examples, but one also should to be able to read a formal specification of the language. This can be used as a reference for doubtful cases, but it will also improve the understanding of the language. E.g. syntactic categories introduced in the formal definition are often useful concepts. If one has only seen n examples and learnt nothing more general, all that one can really do are these n examples. The Oracle SQL Reference Manual as well as many other database books contain syntax diagrams. The standard uses a context free grammar.

A. Syntax Diagrams A-6 Lexical Syntax (1) The syntax of languages like SQL or Pascal is normally specified in two steps. One first defines how words (formally called tokens or lexical symbols) can be composed from single characters. This part of the definition is the lexical syntax of the language. Then the overall syntax of the language is defined in terms of these tokens. This two-step approach reduces the complexity.

A. Syntax Diagrams A-7 Lexical Syntax (2) In case of SQL, such words/tokens are, e.g. Reserved words / keywords with a specific meaning, e.g. SELECT. Identifiers used e.g. for table and column names, e.g. STUDENTS. Datatype constants/literals, e.g. DBMS or 123. Comparison and datatype operators, e.g. =, <=, +,. Punctation characters, e.g. parentheses and the comma.

A. Syntax Diagrams A-8 Lexical Syntax (3) Simpler formalisms such as regular expressions can be used for defining the lexical syntax. In this course, syntax diagrams are used for the lexical syntax, too. However, when implementing an SQL parser, one would use a tool like lex (based on regular expressions) for the lexical syntax, because it runs faster than a more powerful tool like yacc that is required for the overall syntax. Since there are much more characters than tokens, it makes sense to condensate the input first with a fast tool. Free-format languages like SQL or Pascal allow white space (blanks, line breaks, etc.) and comments between tokens. The lexical analysis phase removes white space (including coments). The overall syntax analysis sees a sequence of tokens as input.

A. Syntax Diagrams A-9 Overview 1. General Remarks about Syntax Formalisms 2. Syntax Diagrams

A. Syntax Diagrams A-10 Syntax Diagrams (1) Digit:... 0 1 2 3 4 8 9 A syntax diagram consists of: A name, in this case Digit. A start (node): Arrow that enters the diagram. A finish (node): Arrow that leaves the diagram. Oval and rectangular boxes connected by directed edges.

A. Syntax Diagrams A-11 Syntax Diagrams (2) The formal language defined by this diagram is very simple, it is the set of digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. In order to check whether an input, e.g. 2, really belongs to the language Digit defined by this diagram, one must trace a path through the diagram that corresponds to the input: 0 1 2 3 4... 8 9

A. Syntax Diagrams A-12 Syntax Diagrams (3) Such diagrams can be used in two ways: In order to generate legal words of the language Digit, one follows a path through the diagram from start to finish and prints every symbol in each oval box through which one passes. In order to determine that a given input is syntactically correct, one must find a path through the diagram such that whenever one reaches an oval, the character in the oval is the next input character, which is then read away.

A. Syntax Diagrams A-13 Syntax Diagrams (4) Every edge has only one possible direction. Sometimes it might be a bit difficult for the beginner to decide on the direction of an edge. If all directions are made explicit, the diagram looks as follows:... 0 1 2 3 4 8 9 Since this looks complicated, the arrow heads are normally not repeated on every segment of an edge.

A. Syntax Diagrams A-14 Syntax Diagrams (5) There are branching points in the diagram where one can choose different paths. E.g. after following the incoming arrow, one can choose to go down and print/read the digit 0 or go to the right for digits 1 to 9. When analyzing a given input for correctness, looking at the next input symbol will often determine a unique path that must be chosen. One tries to construct syntax diagrams such that at every branching, the ovals that can be reached in the two directions contain disjoint symbols. If one cannot find a path, the input is wrong.

A. Syntax Diagrams A-15 Syntax Diagrams (6) Syntax diagrams can contain cycles (e.g. there are infinitely many different possible SQL queries). Digit Sequence:... 0 1 8 9 Exercise: Find a path to show that 81 is correct. One can pass through the same edge several times.

A. Syntax Diagrams A-16 Syntax Diagrams (7) Previously defined syntax diagrams can be used as modules in larger ones. This is symbolized by drawing a rectangular box with the name of the syntax diagram in it. Digit Sequence: Digit I.e. a rectangular box stands for a syntactic variable or category (like verb and object ).

A. Syntax Diagrams A-17 Syntax Diagrams (8) A box can be replaced by the diagram it stands for. A box has one incoming and one outgoing edge, as has the diagram. So one can simply plug in the diagram for the box. Digit Sequence: Digit... 0 1 8 9

A. Syntax Diagrams A-18 Syntax Diagrams (9) One can view the boxes also like procedures: When a box is entered, one go to the corresponding diagram, finds a path through it, and then returns back to the original diagram where the arrow leaves the box. When syntax diagrams are used for analyzing input, each such procedure reads part of the input. E.g. when digit is called, it expects to see in the input a digit, and it reads this away. Whatever comes after it remains in the input to be analyzed by other procedures.

A. Syntax Diagrams A-19 Syntax Diagrams (10) Of course, after some time, one knows what e.g. Digit stands for, and does not have to look up the syntax diagram explicitly. Each diagram defines a formal language (set of character strings). When passing through a box, one can read/print any element of the language defined by the corresponding syntax diagram.

A. Syntax Diagrams A-20 Syntax Diagrams (11) Ovals and boxes can be freely mixed in one diagram. Integer: - Digit Sequence The diagram Integer describes a language containing, e.g., 123, -45, 007. It does not contain, e.g., +89, --5, 23-42, 0.56.

A. Syntax Diagrams A-21 Syntax Diagrams (12) Syntax diagrams can be defined recursively, e.g. the diagram for X can contain the a box labelled X. Mutual recursion is also allowed. With recursion, one cannot expand all boxes beforehand, but one can expand them on demand (whenever the box is entered). Term: Integer Term + Integer ( Term )

A. Syntax Diagrams A-22 Syntax Diagrams (13) When defining the lexical syntax (possible words or tokens), single characters appear in the ovals. Later, words/tokens like SELECT are used as the basic building blocks: SELECT Goal-List FROM Source-List WHERE Condition

A. Syntax Diagrams A-23 Syntax Diagrams (14) In the Oracle SQL Manual, boxes are used for literal symbols, and ovals for syntactic variables. I followed the original notation, as e.g. in the Pascal reference. In the book by Chamerlin about DB2, a more compact notation is used. Ovals are used for syntactic variables that are defined by a syntax diagram, uppercase (without any box) is used for key words that must appear as written, lowercase (without box) is used for token types such as column name. Furthermore, diagrams can extend over multiple lines without explicit backward arrow (Special symbols mark the start and the end of a diagram. If an arrow simply leaves the diagram on the right side, continue in the next line). Finally, a special notation is used for options that can be specified in any order.

A. Syntax Diagrams A-24 Exercise Define syntax diagrams for the command language of a text advanture game. Typical commands consist of a verb and an object, e.g. take lamp. Verbs are e.g. take, drop, examine, use. Objects are e.g. lamp, sword, rope. One can optionally use an article: take the lamp. A verb can be used with multiple objects, separated by and : take the lamp and the rope. This stands for take lamp, take rope.