Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Similar documents
JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

JJTree. The Compilation Task. Automated? JJTree. An easier way to create an Abstract Syntax Tree

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis 1 / 52

Building Compilers with Phoenix

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Figure 2.1: Role of Lexical Analyzer

Compiler Construction

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

CSCI312 Principles of Programming Languages!

CPS 506 Comparative Programming Languages. Syntax Specification

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Implementation of Lexical Analysis

Preparing for the ACW Languages & Compilers

CMSC 330: Organization of Programming Languages. Context Free Grammars

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lexical and Syntax Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Compilers CS S-01 Compiler Basics & Lexical Analysis

Compiler phases. Non-tokens

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 4. Lexical and Syntax Analysis

Compilers CS S-01 Compiler Basics & Lexical Analysis

Languages and Compilers

JavaCC: SimpleExamples

Optimizing Finite Automata

Implementation of Lexical Analysis

CS S-01 Compiler Basics & Lexical Analysis 1

Compil M1 : Front-End

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Lexical Analysis. Introduction

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CSE302: Compiler Design

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Introduction to Compiler Design

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Chapter 3. Describing Syntax and Semantics ISBN

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Assignment 1 (Lexical Analyzer)

Properties of Regular Expressions and Finite Automata

Using an LALR(1) Parser Generator

Finite Automata and Scanners

Chapter 3: Lexing and Parsing

Structure of Programming Languages Lecture 3

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Alternation. Kleene Closure. Definition of Regular Expressions

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

UNIT -2 LEXICAL ANALYSIS

Part 5 Program Analysis Principles and Techniques

Compiler Construction

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

Syntax Intro and Overview. Syntax

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Chapter 3 Lexical Analysis

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Lexical Scanning COMP360

Compiler Construction LECTURE # 3

Compiler Construction

Programming Language Syntax and Analysis

Monday, August 26, 13. Scanners

CS202 Compiler Construction. Christian Skalka. Course prerequisites. Solid programming skills a must.

Wednesday, September 3, 14. Scanners

Compiler course. Chapter 3 Lexical Analysis

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CMSC 330: Organization of Programming Languages

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Introduction to Lexical Analysis

Introduction to Lexing and Parsing

Chapter 2 - Programming Language Syntax. September 20, 2017

CS 403: Scanning and Parsing

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler Lab. Introduction to tools Lex and Yacc

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Lexical Error Recovery

Yacc: A Syntactic Analysers Generator

Compiler Construction

Assignment 1 (Lexical Analyzer)

CS 314 Principles of Programming Languages

Compiler Construction

Compiling Regular Expressions COMP360

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Undergraduate Compilers in a Day

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Week 2: Syntax Specification, Grammars

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Name of chapter & details

Lecture 4: Syntax Specification

CS415 Compilers. Lexical Analysis

Working of the Compilers

Transcription:

Automated Tools Easier ways to create parsers The Compilation Task Input character stream Lexer Token stream Parser Abstract Syntax Tree Analyser Annotated AST Code Generator Code CC&P 2003 1 CC&P 2003 2 Automated? Automated? The initial stages of compilation are very mechanistic Lexer Text to stream of tokens Pattern matching Parser Application of grammar rules Recovery from errors AST Generation Data structure for subsequent stages The final stages of compilation are language dependant Semantic Analysis Unfortunately often ad-hoc and hard to codify Code Generation Transformation of AST Dependant on the target machine/language CC&P 2003 3 CC&P 2003 4

Lexer Generators Transforms a set of regular expressions into a finite state automaton. Attempts to find match rules that consume the maximum number of characters Jlex Produces Java Lex/Flex Produces C YACC and Bison Produces C Parser Generators Bottom up make choices after consuming all the tokens associated with the choice Use BNF grammars rather than EBNF grammars CC&P 2003 5 CC&P 2003 6 Combined Approaches What is JavaCC ANTLR ANTLR (ANother Tool for Language Recognition) Supports three stages Lexer, Parser and Tree walker Produces C/C++, Java or Sather LL(k) JavaCC Supports two stages Three with JJTree Lexer and Parser LL(k) CC&P 2003 7 JavaCC stands for "the Java Compiler Compiler a parser generator and lexical analyzer generator. hand-crafting a lexer and a parser is difficult www.experimentalstuff.com/technologies/javac C/ CC&P 2003 8

JavaCC Overview Generates a top down parser. Hence ideal for generating a parser which is LL(k). Generates a parser in Java. Hence can be integrated with any Java based application. Token specification and grammar specification structures are in the same file. Types of Productions in JavaCC There can be four different kinds of Productions. 1. Regular Expressions Used to describe the tokens (terminals) of the grammar. 2. Token Manager Declarations The declarations and statements are written into the generated Token Manager (lexer) and are accessible from within lexical actions. CC&P 2003 9 CC&P 2003 10 Types of Productions in JavaCC There can be four different kinds of Productions. 3. EBNF Standard way of specifying the productions of the grammar. 4. Java Code For something that is not context free or is difficult to write a grammar for. An Example Grammar start ::= expr ";" expr ::= (addop)? term (addop term)* addop ::= "+" "-" term ::= factor (multop factor)* multop ::= "*" "/" factor ::= primary ( "**" primary)? primary ::= number "(" expr ")" ident ( "(" rvalue ( "," rvalue )* ")" )? rvalue ::= expr string number ::= <INTEGER_LITERAL> <REAL_LITERAL> <HEX_LITERAL> ident ::= <IDENTIFIER> string ::= <STRING> CC&P 2003 11 CC&P 2003 12

Terminals REAL_LITERAL: (["0"-"9"])+ "." (["0"-"9"])+ (["e","e"] ["+","-"] (["0"-"9"])+)? INTEGER_LITERAL: ["1"-"9"] (["0"-"9"])* (["e","e"] ["+","-"] (["0"-"9"])+)? HEX_LITERAL: "0X" (["0"-"9","a"-"f","A"-"F"])+ LETTER: ["a"-"z","a"-"z"] DIGIT: ["0"-"9"] IDENTIFIER: <LETTER> (("_")? (<LETTER><DIGIT>))* How does the Lexer work? The JavaCC lexer produces a Deterministic Finite State Automata DFA = (Q,, T, Q 0,F) a set of states Q, a set of symbols (the alphabet) a transition function mapping Qx to Q a start state Q 0 that belongs to Q and a set of states F included in Q named the final states Deterministic? CC&P 2003 13 CC&P 2003 14 DFA for HEX_LITERAL DFA for INTEGER_LITERAL HEX_LITERAL: "0X" (["0"-"9","a"-"f","A"-"F"])+ ["1"-"9"] (["0"-"9"])* (["e","e"] ["+","-"] (["0"-"9"])+)? http://www.cs.virginia.edu/~nc2y/dfa/ CC&P 2003 15 CC&P 2003 16

Structure of a JavaCC Definition JavaCC Lexer features JavaCC PARSER_BEGIN(ParserName) Class ParserName definition PARSER_END(ParserName) Explicit Token definitions Grammar Rules JLex Java code to access Yylex class %% % Additional Yylex declarations % Pragmas General Token definitions %% State based Token definitions SKIP: When a production is applied NO token object is created TOKEN: When a production is applied a token object is created and passed to the parser SPECIAL_TOKEN: When a production is applied a token object is created and NOT passed to the parser CC&P 2003 17 CC&P 2003 18 JavaCC Lexer features Parser Class Definition MORE: A token is not created but the characters recognised will be part of the next created token STATE Explicit lexical state Java Code Not normally necessary PARSER_BEGIN(Expression) class Expression public static void main(string args[]) System.out.println("Reading from standard input..."); Expression t = new Expression(System.in); try t.start(); System.out.println("Thank you."); catch (Exception e) System.out.println(e.getMessage()); e.printstacktrace(); PARSER_END(Expression) With Jlex define A Lexer driver class A token class With JavaCC define A Parser class CC&P 2003 19 CC&P 2003 20

Token Definition 1 Token Definition 2 SKIP : " " "\t" "\n" "\r" Definition of token separators (white space) TOKEN : < REAL_LITERAL: (["0"-"9"])+ "." (["0"-"9"])+ (["e","e"] ["+","-"] (["0"-"9"])+)?> < INTEGER_LITERAL: ["1"-"9"] (["0"-"9"])* (["e","e"] ["+","-"] (["0"-"9"])+)?> < HEX_LITERAL: "0X" (["0"-"9","a"-"f","A"-"F"])+ > NUMBER LITERALS CC&P 2003 21 CC&P 2003 22 Token Definition 3 Token Definition 4 MORE : "\"" : WithinString <WithinString> TOKEN : <STRING: "\""> : DEFAULT <WithinString> MORE : <"\"\""> <WithinString> MORE : <~["\n","\r"]> DFA? String Literal TOKEN : < IDENTIFIER: <LETTER> (("_")? (<LETTER><DIGIT>))* > < #LETTER: ["a"-"z","a"-"z"] > < #DIGIT: ["0"-"9"] > DFA? Hidden Token Identifier CC&P 2003 23 CC&P 2003 24

Questions Create a DFA for the following definition of Real numbers (["0"-"9"])+ "." (["0"-"9"])+ (["e","e"] ["+","-"] (["0"-"9"])+)? Create a single DFA for Integer, Real and Hex numbers Describe two ways how could you implement the 'consume the maximum number of characters' requirement. CC&P 2003 25