Programming Language Syntax and Analysis

Similar documents
Chapter 3. Describing Syntax and Semantics ISBN

Chapter 4. Lexical and Syntax Analysis

Syntax. In Text: Chapter 3

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Describing Syntax and Semantics

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

ICOM 4036 Spring 2004

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Chapter 3. Describing Syntax and Semantics

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Chapter 3. Describing Syntax and Semantics

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

4. LEXICAL AND SYNTAX ANALYSIS

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical and Syntax Analysis

Chapter 3: Syntax and Semantics. Syntax and Semantics. Syntax Definitions. Matt Evett Dept. Computer Science Eastern Michigan University 1999

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Lexical and Syntax Analysis (2)

Programming Language Definition. Regular Expressions

Syntax and Semantics

CS 230 Programming Languages

Habanero Extreme Scale Software Research Project

3. DESCRIBING SYNTAX AND SEMANTICS

Syntax. A. Bellaachia Page: 1

CPS 506 Comparative Programming Languages. Syntax Specification

Homework & Announcements

Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Programming Languages

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Chapter 3. Describing Syntax and Semantics. Introduction. Why and How. Syntax Overview

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

MIDTERM EXAM (Solutions)

Context-free grammars

CMSC 330: Organization of Programming Languages

Parsing II Top-down parsing. Comp 412

Lexical Scanning COMP360

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Dr. D.M. Akbar Hussain

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Introduction to Lexing and Parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Context-free grammars (CFG s)

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

EECS 6083 Intro to Parsing Context Free Grammars

Chapter 4: LR Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Compiler Construction: Parsing

CSE 3302 Programming Languages Lecture 2: Syntax

Introduction to Parsing. Comp 412

CMSC 330: Organization of Programming Languages

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Wednesday, August 31, Parsers

Theory and Compiling COMP360

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Monday, September 13, Parsers

Software II: Principles of Programming Languages

CMSC 330: Organization of Programming Languages

Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

CMSC 330: Organization of Programming Languages. Context Free Grammars

UNIT III & IV. Bottom up parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Formal Languages. Formal Languages

CSE302: Compiler Design

P L. rogramming anguages. Fall COS 301 Programming Languages. Syntax & Semantics. UMaine School of Computing and Information Science

Syntax & Semantics. COS 301 Programming Languages


Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Chapter 2 :: Programming Language Syntax

CMSC 330: Organization of Programming Languages. Context Free Grammars

Part 5 Program Analysis Principles and Techniques

CS 406/534 Compiler Construction Parsing Part I

Languages and Compilers

Bottom-up Parser. Jungsik Choi

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Transcription:

Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University

Introduction Syntax the form or structure of the expressions, statements, and program units Semantics the meaning of the expressions, statements, and program units Syntax and semantics provide a language s definition Users of a language definition Other language designers Implementers Programmers (the users of the language) 2

The General Problem of Describing Syntax: Terminology Sentence a string of characters over some alphabet Language a set of sentences Lexeme the lowest level syntactic unit of a language (e.g., *, sum, begin) Token a category of lexemes (e.g., identifier) 3

Formal Definition of Languages Recognizers A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language Example: syntax analysis part of a compiler - Detailed discussion of syntax analysis appears in Chapter 4 Generators A device that generates sentences of a language One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator 4

BNF and Context-Free Grammars Context-Free Grammars Developed by Noam Chomsky in the mid-1950s Language generators, meant to describe the syntax of natural languages Define a class of languages called context-free languages Backus-Naur Form (1959) Invented by John Backus to describe the syntax of Algol 58 BNF is equivalent to context-free grammars 5

BNF Fundamentals In BNF, abstractions used to represent classes of syntactic structures syntactic variables nonterminal symbols, terminals Terminals lexemes or tokens A syntax rule a left-hand side (LHS), which is a nonterminal a right-hand side (RHS), which is a string of terminals and/or nonterminals 6

BNF Fundamentals (continued) Nonterminals are often enclosed in angle brackets Examples of BNF rules: <ident_list> identifier identifier, <ident_list> <if_stmt> if <logic_expr> then <stmt> Grammar a finite non-empty set of rules A start symbol a special element of the nonterminals of a grammar 7

Nonterminal symbol can have more than one RHS BNF Rules <stmt> <single_stmt> begin <stmt_list> end 8

Syntactic lists described using recursion Describing Lists <ident_list> ident ident, <ident_list> A derivation is a repeated application of rules starting with the start symbol ending with a sentence (all terminal symbols) 9

An Example Grammar <program> <stmts> <stmts> <stmt> <stmt> ; <stmts> <stmt> <var> = <expr> <var> a b c d <expr> <term> + <term> <term> - <term> <term> <var> const 10

An Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const 11

Derivations Every string of symbols in a derivation is a sentential form A sentence is a sentential form that has only terminal symbols A leftmost derivation one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation may be neither leftmost nor rightmost 12

Parse Tree A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b 13

Extended BNF Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> <term> (+ -) const Repetitions (0 or more) are placed inside braces { } <ident> letter {letter digit} 14

BNF and EBNF BNF <expr> <expr> + <term> <expr> - <term> <term> <term> <term> * <factor> <term> / <factor> <factor> EBNF <expr> <term> {(+ -) <term>} <term> <factor> {(* /) <factor>} 15

Chap. 4 : Syntax Analysis

Syntax Analysis The syntax analysis portion of a language processor nearly always consists of two parts: A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF) 17

Advantages of Using BNF to Describe Syntax Provides a clear and concise syntax description The parser can be based directly on the BNF Parsers based on BNF are easy to maintain 18

Reasons to Separate Lexical and Syntax Analysis Simplicity less complex approaches can be used for lexical analysis; separating them simplifies the parser Efficiency separation allows optimization of the lexical analyzer Portability parts of the lexical analyzer may not be portable, but the parser always is portable 19

Lexical Analysis A lexical analyzer is a pattern matcher for character strings front-end for the parser Identifies substrings of the source program that belong together - lexemes Lexemes match a character pattern, which is associated with a lexical category called a token sum is a lexeme; its token may be IDENT 20

Lexical Analysis (continued) The lexical analyzer usually a function that is called by the parser when it needs the next token Three approaches to building a lexical analyzer: Write a formal description of the tokens and use a software tool that constructs a tabledriven lexical analyzer from such a description Design a state diagram that describes the tokens and write a program that implements the state diagram Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram 21

A naïve state diagram State Diagram Design have a transition from every state on every character in the source language such a diagram would be very large! 22

23

Implementation: SHOW front.c (pp. 185-190) Following is the output of the lexical analyzer of front.c when used on (sum + 47) / total Next token is: 25 Next lexeme is ( Next token is: 11 Next lexeme is sum Next token is: 21 Next lexeme is + Next token is: 10 Next lexeme is 47 Next token is: 26 Next lexeme is ) Next token is: 24 Next lexeme is / Next token is: 11 Next lexeme is total Next token is: -1 Next lexeme is EOF 24

The parsing Problem Goals of the parser, given an input program: Find all syntax errors; for each, produce an appropriate diagnostic message and recover quickly Produce the parse tree, or at least a trace of the parse tree, for the program 25

Two categories of parsers The Parsing Problem (continued) Top down - produce the parse tree, beginning at the root Order is that of a leftmost derivation Traces or builds the parse tree in preorder Bottom up - produce the parse tree, beginning at the leaves Order is that of the reverse of a rightmost derivation Useful parsers look only one token ahead in the input 26

Top-down Parsers Given a sentential form, xa, the parser must choose the correct A-rule to get the next sentential form in the leftmost derivation using only the first token produced by A The most common top-down parsing algorithms: Recursive descent - a coded implementation LL parsers - table driven implementation 27

Bottom-up parsers Given a right sentential form,, determine what substring of is the right-hand side of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation The most common bottom-up parsing algorithms are in the LR family. 28

Recursive-Descent Parsing There is a subprogram for each nonterminal in the grammar, which can parse sentences that can be generated by that nonterminal EBNF is ideally suited for being the basis for a recursive-descent parser, because EBNF minimizes the number of nonterminals 29

A grammar for simple expressions: <expr> <term> {(+ -) <term>} <term> <factor> {(* /) <factor>} <factor> id int_constant ( <expr> ) 30

/* Function expr arses strings in the language generated by the rule: <expr> <term> {(+ -) <term>} */ void expr() { /* Parse the first term */ term(); /* As long as the next token is + or -, call lex to get the next token and parse the next term */ while (nexttoken == ADD_OP nexttoken == SUB_OP){ lex(); term(); } 31

Trace of the lexical and syntax analyzers on (sum + 47) / total Next token is: 25 Next lexeme is ( Next token is: 11 Next lexeme is total Enter <expr> Enter <factor> Enter <term> Next token is: -1 Next lexeme is EOF Enter <factor> Exit <factor> Next token is: 11 Next lexeme is sum Exit <term> Enter <expr> Exit <expr> Enter <term> Enter <factor> Next token is: 21 Next lexeme is + Exit <factor> Exit <term> Next token is: 10 Next lexeme is 47 Enter <term> Enter <factor> Next token is: 26 Next lexeme is ) Exit <factor> Exit <term> Exit <expr> Next token is: 24 Next lexeme is / Exit <factor> 32

Pairwise Disjointness Test: For each nonterminal, A, in the grammar that has more than one RHS, for each pair of rules, A i and A j, it must be true that FIRST( i ) FIRST( j ) = Example: A a bb cab A a ab 33

Bottom-up Parsing The parsing problem is finding the correct RHS in a right-sentential form to reduce to get the previous right-sentential form in the derivation Shift-Reduce Algorithms Reduce is the action of replacing the handle on the top of the parse stack with its corresponding LHS Shift is the action of moving the next token to the top of the parse stack 34

Advantages of LR parsers: They will work for nearly all grammars that describe programming languages. They work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. They can detect syntax errors as soon as it is possible. The LR class of grammars is a superset of the class parsable by LL parsers. 35

An LR configuration stores the state of an LR parser (S 0 X 1 S 1 X 2 S 2 X m S m, a i a i+1 a n $) 36

LR parsers are table driven, where the table has two components, an ACTION table and a GOTO table The ACTION table specifies the action of the parser, given the parser state and the next token Rows are state names; columns are terminals The GOTO table specifies which state to put on top of the parse stack after a reduction action is done Rows are state names; columns are nonterminals 37

Structure of An LR Parser 38

Parser actions: For a Shift, the next symbol of input is pushed onto the stack, along with the state symbol that is part of the Shift specification in the Action table For a Reduce, remove the handle from the stack, along with its state symbols. Push the LHS of the rule. Push the state symbol from the GOTO table, using the state symbol just below the new LHS in the stack and the LHS of the new rule as the row and column into the GOTO table 39

For an Accept, the parse is complete and no errors were found. For an Error, the parser calls an error-handling routine. 40

41

Compiler-Compiler 컴파일러생성기 (compiler generator) 컴파일러 - 컴파일러 (compiler-compiler) 컴파일러자동화도구 Program written in L Language Description : L Machine Description : M Compiler - Compiler Compiler Executable form on M 42

Lexical Analyzer Generator 어휘분석기생성기 (Lexical Analyzer Generator) 어휘분석기를자동으로생성하는도구 토큰에대한표현을입력으로받아기술된형태의토큰을찾아내는어휘분석기를생성 원시프로그램 토큰표현 ( 정규표현 ) LEX Lexical Analyzer (Scanner) 토큰열 (Token streams) 43

LEX 정의및특징 Lesk & Schmidt(Bell lab., 1975) 토큰의형태를묘사한정규표현들과각정규표현이매칭되었을때처리를나타내는수행코드로구성된입력받아어휘분석의일을처리하는프로그램을출력 정규표현 + 수행코드 lex 원시프로그램 lex.yy.c 일련의토큰 44

Parser Generator 정의 언어의문법표현으로부터파서를자동으로생성하는도구 : Parser Generating System; PGS 파서생성기의기능 파서생성기입력 : 문법표현 (context-free 문법 ) 파서생성기출력 : 파서를제어하는테이블 파서 주어진문장에대한문법검사 컴파일러다음단계에서필요한정보생성 45

YACC YACC(Yet Another Compiler-Compiler) S.C. Johnson(Bell Lab., 1975) 문법규칙과그에해당하는수행코드를입력으로받아파싱 (= 구문분석 ) 을담당하는프로그램을출력 정규표현 + 수행코드 문법규칙 + 수행코드 lex yacc 원시프로그램 lex.yy.c 토큰 y.tab.c 출력 < 어휘분석 > < 구문분석 > 46

NEXT Class : Names, Bindings, Scopes, and Memory Management kkman@sangji.ac.kr