flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Similar documents
Yacc: A Syntactic Analysers Generator

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Table of Contents. Chapter 1. Introducing Flex and Bison

CSE 3302 Programming Languages Lecture 2: Syntax

PRACTICAL CLASS: Flex & Bison

LECTURE 7. Lex and Intro to Parsing

CMSC 330: Organization of Programming Languages

L L G E N. Generator of syntax analyzier (parser)

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

A simple syntax-directed

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

B The SLLGEN Parsing System

Lexical and Syntax Analysis

Programming Language Syntax and Analysis

File I/O in Flex Scanners

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

CPS 506 Comparative Programming Languages. Syntax Specification

COP 3402 Systems Software Syntax Analysis (Parser)

Using Lex or Flex. Prof. James L. Frankel Harvard University

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Introduction to Lexing and Parsing

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Introduction to Lex & Yacc. (flex & bison)

CSC 467 Lecture 3: Regular Expressions

Using an LALR(1) Parser Generator

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

TDDD55 - Compilers and Interpreters Lesson 3

CSE 401 Midterm Exam 11/5/10

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Syntax. In Text: Chapter 3

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Lexical Scanning COMP360

Full file at C How to Program, 6/e Multiple Choice Test Bank

CMSC 330: Organization of Programming Languages

1 Lexical Considerations

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Lexical Considerations

CMSC 330: Organization of Programming Languages. Context Free Grammars

Theory and Compiling COMP360

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Preparing for the ACW Languages & Compilers

COMPILER CONSTRUCTION Seminar 02 TDDB44

CMSC 330: Organization of Programming Languages

Typescript on LLVM Language Reference Manual

COMPILERS BASIC COMPILER FUNCTIONS

Homework & Announcements

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Project 1: Scheme Pretty-Printer

TDDD55- Compilers and Interpreters Lesson 2

Programming Project II

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

Chapter 3. Describing Syntax and Semantics ISBN

Part 5 Program Analysis Principles and Techniques

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

Syntax. 2.1 Terminology

Languages and Compilers

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CSCI312 Principles of Programming Languages!

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Stating the obvious, people and computers do not speak the same language.

TDDD55- Compilers and Interpreters Lesson 3

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

Parsing and Pattern Recognition

Lecture 4: Syntax Specification

CS 297 Report. By Yan Yao

CMSC 330: Organization of Programming Languages. Context Free Grammars

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Syntax Analysis Part VIII

Syntax-Directed Translation. Lecture 14

An Introduction to LEX and YACC. SYSC Programming Languages

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Describing Syntax and Semantics

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Programming Languages Third Edition

CS 403: Scanning and Parsing

A Simple Syntax-Directed Translator

Dr. D.M. Akbar Hussain

Defining syntax using CFGs

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Transcription:

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides the input into tokens that are then used by other parts of your program. 3071 1

Calculator Program We ll start by recognizing only integers, four basic arithmetic operators, and a unary absolute value operator The sixth pattern matches an integer. The bracketed pattern [0-9] matches any single digit, and the following + sign means to match one or more of the preceding item, which here means a string of one or more digits. The action prints out the string that s matched, using the pointer yytext that the scanner sets after each match. The first five patterns are literal operators, written as quoted strings, and the actions, for now, just print a message saying what matched. The quotes tell flex to use the strings as is, rather than interpreting them as regular expressions. The seventh pattern matches a newline character, represented by the usual C \n sequence. The eighth pattern ignores whitespace. It matches any single space or tab ( \t ), and the empty action code does nothing. 3071 2

In this simple flex program, there s no C code in the third section. The flex library ( -lfl ) provides a tiny main program that just calls the scanner, which is adequate for this example. By default, the terminal will collect input from the user until he presses Enter/Return. Then the whole line is pushed to the input filestream of your program. This is useful because your program does not have to deal with interpreting all keyboard events (e.g. remove letters when Backspace is pressed). 3071 3

Scanner as Coroutine Coroutines are computer-program components that allow multiple entry for suspending and resuming execution at certain locations. Most programs with flex scanners use the scanner to return a stream of tokens that are handled by a parser. Each time the program needs a token, it calls yylex(), which reads a little input and returns the token. When it needs another token, it calls yylex() again. The scanner acts as a coroutine; that is, each time it returns, it remembers where it was, and on the next call it picks up where it left off. 3071 4

The rule is actually quite simple: If action code returns, scanning resumes on the next call to yylex() ; if it doesn t return, scanning resumes immediately. 3071 5

Tokens and Values When a flex scanner returns a stream of tokens, each token actually has two parts, the token and the token s value. The token is a small integer. The token numbers are arbitrary, except that token zero always means end-of-file. When bison creates a parser, bison assigns the token numbers automatically starting at 258 (this avoids collisions with literal character tokens, discussed later) and creates a.h with definitions of the tokens numbers. But for now, we ll just define a few tokens by hand: NUMBER = 258, ADD = 259, SUB = 260, MUL = 261, DIV = 262, ABS = 263, EOL = 264 end of line 3071 6

We define the token numbers in a C enum make yylval, the variable that stores the token value, an integer, which is adequate for the first version of our calculator. For each of the tokens, the scanner returns the appropriate code for the Token; for numbers, it turns the string of digits into an integer and stores it in yylval before returning 3071 7

3071 8

An expression is a finite combination of symbols that is well-formed according to some rules Terms are those parts of the expression between addition signs and subtraction signs. Factors are the separate parts of a multiplication of division. 3071 9

Parsing Parsing, syntax analysis or syntactic analysis is the process of analysing a string of Symbols conforming to the rules of a grammar. The term parsing comes from Latin pars meaning part. A parse tree represents the syntactic structure of a string according to some grammar. A grammar is a set of production rules for strings in a language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them only their form. 3071 10

Grammars and Parsing The parser s job is to figure out the relationship among the input tokens. A common way to display such relationships is a parse tree. For example, under the usual rules of arithmetic, multiplication has higher precedence than addition, the arithmetic expression 1 * 2 + 3 * 4 + 5 would have the parse tree 3071 11

Backus-Naur Form Backus-Naur Form (BNF), created around 1960 to describe Algol 60 and named after two members of the Algol 60 committee In order to write a parser, we need some way to describe the rules, the grammer, the parser uses to turn a sequence of tokens into a parse tree. 1 * 2 + 3 * 4 + 5 Production Rule Nonterminal <exp> ::= <factor> <exp> + <factor> <factor> ::= NUMBER <factor> * NUMBER metasymbol Terminal 3071 12

A parser is a software component that takes input data (frequently text) and builds a data structure often some kind of parse tree, abstract syntax tree or other hierarchical structure, giving a structural representation of the input while checking for correct syntax Bison is a general-purpose parser generator that converts a grammar description (Bison Grammar Files) into a C program to parse that grammar. The Bison parser is a bottom-up parser. It tries, by shifts and reductions, to reduce the entire input down to a single grouping whose symbol is the grammar's start-symbol. 3071 13

As Bison reads tokens, it pushes them onto a stack along with their semantic values. The stack is called the parser stack. Pushing a token is traditionally called shifting. When the last n tokens and groupings shifted match the components of a grammar rule, they can be combined according to that rule. This is called reduction. Those tokens and groupings are replaced on the stack by a single grouping whose symbol is the result (left hand side) of that rule. Running the rule s action is part of the process of reduction, because this is what computes the semantic value of the resulting grouping. 3071 14

In order for Bison to parse a language, it must be described by a grammar. This means that you specify one or more syntactic groupings and give rules for constructing them from their parts. For example, in the C language, one kind of grouping is called an expression. One rule for making an expression might be, An expression can be made of a minus sign and another expression. Another would be, An expression can be an integer. As you can see, rules are often recursive, but there must be at least one rule which leads out of the recursion. 3071 15

Bison programs have the same three-part structure as flex programs, with declarations, rules, and C code. token declarations, telling bison the names of the symbols in the parser that are tokens. By convention, tokens have uppercase names, although bison doesn t require it. Any symbols not declared as tokens have to appear on the left side of at least one rule in the program. Default action $$ = $1 Bison programs handle nesting 3071 16

The second section contains the rules in simplified BNF. Bison uses a single colon rather than ::=, and since line boundaries are not significant, a semicolon marks the end of a rule. Again, like flex, the C action code goes in braces at the end of each rule. 3071 17

Rather than defining explicit token values in the first part, we include a header file that bison will create for us, which includes both definitions of the token Numbers and a definition of yylval. We also delete the testing main routine in the third section of the scanner, since the parser will now call the scanner. bison -d fb1-5.y ; flex fb1-5.l ; gcc fb1-5.tab.c lex.yy.c -lfl -d write an extra output file containing macro definitions for the token type names defined in the grammar 3071 18

One of the nicest things about using flex and bison to handle a program s input is That it s often quite easy to make small changes to the grammar. Our expression language would be a lot more useful if it could handle parenthesized expressions, and it would be nice if it could handle comments, using // syntax. To do this, we need only add one rule to the parser and three to the scanner. Since a dot matches anything except a newline,.* will gobble up the rest of the line. 3071 19