Left to right design 1
|
|
- Rosalind Bennett
- 5 years ago
- Views:
Transcription
1 Left to right design 1
2 Left to right design The left to right design method suggests that the structure of the program should closely follow the structure of the input. The method is effective when the structure of the input dominates the problem. Many problems in practice have complex input structure. Even if it doesn t dominate the whole problem, the sub problem of handling the input can be solved using left-to-right design. Any program that reads input is a language recognizer or parser. 2
3 The problem Write a program to act as a simple calculator. Users type in arithmetic expressions, one per line; the program should print the value of each expression. Expressions may involve the four main arithmetic functions and parentheses. For simplicity, assume all numbers are integers. Unix comes with two calculator programs, one (dc) for postfix expressions and the other (bc) for infix expressions. Bc is built on top of dc. 3
4 Input Expression - Examples * 3 7/ * * (4* 2 2/3) - (2+ 3)* (4+ 5) (2+ 4)/(3* 5)
5 Structure Description Formal notation: x,y x then y x* zero or more repetitions of x x+ one or more repetitions of x x y either x or y [x] either x or nothing x: y x is defined as y 5
6 Data Description (grammer) file: line*,end line: newline (expr,newline) expr: term,((add sub),term)* term: factor,((mul div),factor)* factor: number (lparen,expr,rparen) number: digit+ add: + sub: '- mul: '* div: '/ lparen: '( rparen: ') newline: '\n end: EOF 6
7 White Space This data description does not say where white space may appear in the input, because that would make the description unnecessarily complicated. Most programs accept white space in some places and not in others. In standard terminology, a token is a unit of input such that any spaces between tokens are not significant and any spaces within a token are significant (if allowed) 7
8 Tokens We must decide what the tokens of our grammar are Tokens are the smallest elements of the grammar. They must be defined: without reference to other tokens without recursion independent of preceding tokens The tokens in this program are: number, add, sub, mul, div, lparen, rparen, newline, end. 8
9 Two stages Traditionally, the task of recognizing the structure of the input has been done in two stages. The option of using only one stage is discussed in a later section. The first stage, lexical analysis, scanning, or tokenizing,groups characters into tokens while ignoring white space and comments (both may appear anywhere and neither is significant). 9
10 Two stages The second stage, syntactic analysis or parsing, groups tokens into higher-level entities such a expression. In technical language, tokens are called terminal symbols, while the entities recognized by parsers are called nonterminal symbols 10
11 Tokenizer operations The tokenizer, lexical analyser, or scanner is a function. Each time it is called, it should read the next token, and return an indication of which kind of token it is (number or add or sub etc); and, if there is more than one token of that kind, an indication of the one that was seen. For example, all plus signs are alike, but when the calculator reads in a number, it must know which number it is. 11
12 Indicators The standard way to indicate the kind of a token is via an enumerated type: typedef enum { ADD, SUB, MUL, DIV, LPAREN, RPAREN, NL, END, NUMBER } TokenKind; 12
13 Indicators In this case only one kind of token, NUMBER, needs an indicator that says which token of that kind was seen, so the value of the token can be put into an integer (we are not concerned with real numbers in this exercise). 13
14 Using unions In general, more than one kind of token may have an associated value, and these values may be of different types. For example, some tokenizers must be able to recognize both integers and identifiers. The solution is to use a union: typedef union { int number; char *ident; }TokenValue; Every value of type TokenValue will have enough storage to hold either an int or a char*, but not both. 14
15 Token representation Conceptually, a token is a kind/value pair, and should be represented as a structure with two fields. typedef struct { TokenKind kind; TokenValue value; } Token; 15
16 Token representation Token token; token.kind = = NUMBER = > value is in token.value.number token.kind = = IDENT = > value is in token.value.ident token.kind is something else = > token has no associated value However, for simplicity people often use two separate variables for kind and value. 16
17 Tokenizer structure Tokenizer functions start with code that gets rid of nonsignificant white space and comments, if they are allowed. c = getc(stdin); while (c!= EOF && c!= \n && isspace(c)) c = getc(stdin); The first character left in the input is then often sufficient to find out what kind of token is next.(if it isn't, we must use techniques usually used for parsing.) 17
18 Consider all the rules for tokens: number: digit+ add: + sub: '- mul: '* div: '/ lparen: '( rparen: ') newline: '\n end: EOF Each token begins with different characters, so we can switch on thefirst non-space character to decide the token kind. 18
19 Identifiers Many grammars have some kind of identifier token. For our calculator, we might want to allow identifiers for variable names. Identifiers usually have a structure like: ident:letter,(letter digit)* precisely to distinguished them from numbers by the first character 19
20 The rest of the token Once the tokenizer has found out what kind of token is next, it must read in the rest of the token. The structure of the code that does this should follow the structure of data description of the rest of the token. ident: letter,(letter digit)* /* c is known to be a letter * / buf[i+ + ] = c; c = getc(stdin); while (isalpha(c) isdigit(c)) { buf[i+ + ] = c; c = getc(stdin); } 20
21 Tokenizer TokenKind do_get_token(int * token_value){ int c, val; c = getc(stdin); while (c!= EOF && c!= \n && isspace(c)) c = getc(stdin); switch (c) { case '+ ': return ADD; case '-': return SUB; case '* ': return MUL; case '/': return DIV; case '(': return LPAREN; case ')': return RPAREN; case '\n': return NL; case EOF: return END; (continued) 21
22 Tokenizer (2) } case 0 : case 1 : case 2 : case 3 : case 4 : case 5 : case 6 : case 7 : case 8 : case 9 : val = c - 0 ; c = getc(stdin); while (c!= EOF && isdigit(c)) { val = val * 10 + c - 0 ; c = getc(stdin); } ungetc(c, stdin); * token_value = val; return NUMBER; default: /* handle the error * / } 22
23 Pushback do_get_token must remove exactly one token from the input, together with its preceding white space. We cannot find out whether a digit is the last character in a number or not until we have read the next character. This character may be e.g. +, which represents a token, so we must make sure that the next invocation of do_get_token processes it. Our code does this by calling ungetc, which arranges for the next call to getc on the same file to read the character pushed back by ungetc. 23
24 Recursive Descent Parsing 24
25 Recursive Descent Parsing The parser has a function for each nonterminal in the grammar. The structure of this function is derived from the nonterminal s definition in the grammar. The translation scheme is: grammar rule Æ function nonterminal Æ function call terminal Æ check token and consume sequence (,) Æ sequence of statements repetitions (* and + ) Æ while or do statement based on next token kind alternative ( and []) Æ if or switch statement based on next token kind 25
26 Data Description (grammer) file: line*,end line: newline (expr,newline) expr: term,((add sub),term)* term: factor,((mul div),factor)* factor: number (lparen,expr,rparen) number: digit+ add: + sub: '- mul: '* div: '/ lparen: '( rparen: ') newline: '\n end: EOF 26
27 Fixed one-token lookahead This scheme maintains this invariant: when the function of a nonterminal is called, the global variables hold information about the first token that may be part of that nonterminal; and when the function of a nonterminal returns, the global variables hold information about the first token beyond that nonterminal. As soon as a token is recognized, it should be consumed by a call to get_token, which sets the global variables according to the next token. Lookahead is an alternative to pushback. 27
28 The top level function To implement the lookahead, we must begin our program by looking ahead. Next we handle the top level nonterminal in our grammar: file. We handle a nonterminal with a function call. int main(void) { /* recognizes file: line*,end */ get_token(); get_file(); return 0; } void get_token(void) { next_token_kind = do_get_token( &next_token_value); } 28
29 file: line*,end file We translate the file grammar rule to a get_file() function whose function is the translation of the RHS of the rule. We translate a nonterminal, such as line, to a call to the function for that nonterminal, such as get_line(). We translate a * repetition into a while loop whose condition tests that the next token could be the first token of what is repeated, in this case NUMBER or LPAREN. 29
30 file file: line*,end line: newline (expr,newline) expr: term,((add sub),term)* term: factor, ((mul div),factor)* factor: number (lparen,expr,rparen) 30
31 file void get_file(void) { /* recognizes file: line*, end */ while (next_token_kind == NUMBER next_token_kind == LPAREN) get_line(); if (next_token_kind!= END)... handle the error... /* no need to get token after END */ } 31
32 Error Conditions We must consider what happens to invalid input. With this definition, if a line begins with, say, PLUS, we get an error message and get_file() returns. It would usually be better to ignore the erroneous line and keep processing. void get_file(void) { /* recognizes file: line*,end */ while (next_token_kind!= END) if (next_token_kind == NUMBER else next_token_kind=lparen) get_line();... print error message and skip line... /* no need to get token after END */ } 32
33 line line: newline (expr,newline) The alternative construct translates to an if or switch on the next token kind. A terminal is handled by checking it and getting the next token. 33
34 line cont. void get_line(void) { if (next_token_kind == NL) get_token(); else { get_expr(); if (next_token_kind == NL) get_token(); else... error... } } 34
35 Consuming tokens Code like if (next_token_kind == something get_token(); else handle a syntax error is common enough that it s often worth writing a function or macro to handle it. void consume(tokenkind tok) { if (next_token_kind == tok) get_token(); else... handle syntax error... } 35
36 Consuming tokens (2) Using this function simplifies the get_line() function and makes its similarity to the grammar rule more apparent: line: newline (expr,newline) void get_line(void) { if (next_token_kind == NL) get_token(); else { get_expr(); consume(nl); } } 36
37 Recognize an expression expr: term, ((add sub),term)* void get_expr(void) { get_term(); while (next_token_kind == ADD next_token_kind == SUB) { get_token(); /* ADD or SUB */ get_term(); } } Code for get_term() is very similar 37
38 Recognizing a factor factor: number (lparen,expr,rparen) void get_factor(void) { switch (next_token_kind) { case NUMBER: get_token(); break; case LPAREN: get_token(); get_expr(); consume(rparen); break; default:... error... } } 38
39 Actions This code does nothing but check the syntax of the input stream. But it is easy to extend it to perform whatever actions are required, for example: The action can compute the value of the expression. The action can create a tree structure to represent the expression. The action can generate code to evaluate the expression. 39
40 Action (2) We extend get_expr() to return the value of the expression int get_expr(void) { int val = get_term(); while (next_token_kind == ADD next_token_kind ==SUB) { TokenKind op = next_token_kind; get_token(); if (operation == ADD) val += get_term(); else val -= get_term(); } return val; } 40
41 Grammar Manipulation Suppose we had defined expr this way: expr: number (lparen,expr,rparen) (expr,add,expr) (expr,sub,expr) (expr,mul,expr) (expr,div,expr) This description is correct, but we cannot decide which alternative to apply just by looking at the first token of an expression. Therefore we cannot derive a working parser from it using the techniques of recursive descent parsers; we must transform the grammar first. 41
42 Left Factoring Left factoring uses the rule from that a, (b c) (a,b) (a,c) to pull out a common initial part of several alternatives, so it is not repeated. This gives us: expr: number (lparen,expr,rparen) (expr, ( (add,expr) (sub,expr) (mul,expr) (div,expr))) 42
43 Left Factoring We write this more manageably as: expr: number (lparen,expr,rparen) (expr,rest) rest: (add,expr) (sub,expr) (mul,expr) (div,expr) 43
44 Left recursion expr: number (lparen,expr,rparen) (expr,rest) We cannot derive a working parser from this data structure description either. The problem is that one of the alternatives for expr starts with expr. If we wrote get_expr() following this grammar, when the token was other than NUMBER or LPAREN, we would immediately call get_expr(). Since we would not have consumed any tokens, the current token would still not be NUMBER or LPAREN, so we would again immediately call get_expr(). And so on 44
45 Left recursion elimination Consider what our grammar rule will recognize: NUMBER or LPAREN expr RPAREN or NUMBER rest or LPAREN expr RPAREN rest or NUMBER rest rest or LPAREN expr RPAREN rest rest or We see a pattern here: it begins with either NUMBER or LPAREN expr RPAREN, and follows with any number or repetitions of rest. So we can rewrite our rule as: expr: factor, rest* factor: number (lparen,expr,rparen) 45
46 Left recursion elimination (2) The general rule is to invent a new nonterminal for the non-left-recursive alternatives: factor: number (lparen,expr,rparen) Then define another new nonterminal as all of the left recursive alternatives, with the left recursive nonterminal removed. In this case it s just rest. 46
47 Left recursion elimination (3) Finally, replace the left recursive rule with one that starts with the new non-left-recursive nonterminal factor and ends with 0 or more repetitions of the other new nonterminal (just rest in this case). This gives us: expr: factor, rest* factor: number (lparen,expr,rparen) 47
48 Precedence This data description divides up the input * 4 as 2, followed by + 3, followed by * 4. That is, (2+ 3)* 4. This would be OK if + and * had the same precedence, but they don t. We want the parser to treat 3 * 4 as a unit. In general, we want any sequence of factors with multiplicative operators between them to be treated as a unit. We call these units terms. 48
49 Fixing precedence We must separate the multiplicative from the addative operators: term: factor,restterm* restterm: (mul,term) (div,term) expr: term,restexpr* restexpr: (add,expr) (sub,expr) After substituting the definitions of restterm and restexpr for their uses and some factoring: term: expr: factor,((mul div),term)* term,((add sub),expr)* 49
50 Associativity When matching the input DJDLQVW expr: term,((add sub),expr)* we don t want to be considered an expr, because that would lead to evaluating 10 - (1 + 2), when what we want is (10-1) + 2. We can fix this by changing the grammar to: term: expr: factor,((mul div),factor)* term,((add sub),term)* 50
51 Compiler technology Scanning and parsing are the best understood aspects of compiler technology. They have a large body of theory, much of it developed in the sixties and seventies. Many tools exist for the automatic creation of tokenizers and parsers. Two of the best known are the scanner generator lex and the parser generator yacc, which are standard on Unix systems. The theories of scanning and parsing are covered in some detail in , and may be explored further in These units should also introduce tools such as lex and yacc. 51
52 Parsing without tokenizing A separate tokenizer is helpful if parts of the input are to be ignored (e.g. white space, comments) and if the code to check for and parse these parts would have to repeated at several points in the program. If all of the input is significant, or if there are only a few places in the grammar where the parts to be ignored occur, we need not have a tokenizer; the parser should view each character as a token. file: line* line: name,colon,pw,colon,number,colon,users,nl users: [user,(comma,user)*] 52
COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]
More information10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis
Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output
More information10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis
Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output
More informationExamples of attributes: values of evaluated subtrees, type information, source file coordinates,
1 2 3 Attributes can be added to the grammar symbols, and program fragments can be added as semantic actions to the grammar, to form a syntax-directed translation scheme. Some attributes may be set by
More informationPart III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.
Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular
More informationA simple syntax-directed
Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character
More informationParsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.
Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationLexical and Syntax Analysis
Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input
More informationCSE 401 Midterm Exam 11/5/10
Name There are 5 questions worth a total of 100 points. Please budget your time so you get to all of the questions. Keep your answers brief and to the point. The exam is closed books, closed notes, closed
More informationContext-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation
Concepts Introduced in Chapter 2 A more detailed overview of the compilation process. Parsing Scanning Semantic Analysis Syntax-Directed Translation Intermediate Code Generation Context-Free Grammar A
More informationCOP 3402 Systems Software Syntax Analysis (Parser)
COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis
More informationParsing and Pattern Recognition
Topics in IT 1 Parsing and Pattern Recognition Week 10 Lexical analysis College of Information Science and Engineering Ritsumeikan University 1 this week mid-term evaluation review lexical analysis its
More informationLast time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010
Last time Source code Scanner Tokens Parser What are compilers? Phases of a compiler Syntax tree Semantic Routines IR Optimizer IR Code Generation Executable Extra: Front-end vs. Back-end Scanner + Parser
More informationChapter 3. Parsing #1
Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)
More informationCSC 467 Lecture 3: Regular Expressions
CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token
More informationChapter 3. Describing Syntax and Semantics ISBN
Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All
More informationflex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.
flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides
More informationCOP4020 Programming Assignment 2 - Fall 2016
COP4020 Programming Assignment 2 - Fall 2016 To goal of this project is to implement in C or C++ (your choice) an interpreter that evaluates arithmetic expressions with variables in local scopes. The local
More informationA lexical analyzer generator for Standard ML. Version 1.6.0, October 1994
A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 Andrew W. Appel 1 James S. Mattson David R. Tarditi 2 1 Department of Computer Science, Princeton University 2 School of Computer
More informationA Simple Syntax-Directed Translator
Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called
More informationParsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones
Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,
More informationCS 403: Scanning and Parsing
CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax
More informationStructure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.
More detailed overview of compiler front end Structure of a compiler Today we ll take a quick look at typical parts of a compiler. This is to give a feeling for the overall structure. source program lexical
More informationA programming language requires two major definitions A simple one pass compiler
A programming language requires two major definitions A simple one pass compiler [Syntax: what the language looks like A context-free grammar written in BNF (Backus-Naur Form) usually suffices. [Semantics:
More informationCSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1
CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions
More informationCSE 3302 Programming Languages Lecture 2: Syntax
CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:
More informationA parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.
Top Down Parsing 1 Parsers Formal Definitions of a Parser A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G. which talks about a
More informationProgramming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson
Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create
More informationCOP4020 Programming Assignment 2 Spring 2011
COP4020 Programming Assignment 2 Spring 2011 Consider our familiar augmented LL(1) grammar for an expression language (see Syntax lecture notes on the LL(1) expression grammar): ->
More informationLECTURE 7. Lex and Intro to Parsing
LECTURE 7 Lex and Intro to Parsing LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens) and create real programs that can recognize them.
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationTHE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES
THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language
More informationPart 5 Program Analysis Principles and Techniques
1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationIntroduction to Syntax Directed Translation and Top-Down Parsers
Introduction to Syntax Directed Translation and Top-Down Parsers 1 Attributes and Semantic Rules Let s associate attributes with grammar symbols, and semantic rules with productions. This gives us a syntax
More informationRYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS
RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 STUDENT ID: INSTRUCTIONS Please write your student ID on this page. Do not write it or your name
More informationLexical Analysis. Introduction
Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies
More informationProject 1: Scheme Pretty-Printer
Project 1: Scheme Pretty-Printer CSC 4101, Fall 2017 Due: 7 October 2017 For this programming assignment, you will implement a pretty-printer for a subset of Scheme in either C++ or Java. The code should
More informationCPS 506 Comparative Programming Languages. Syntax Specification
CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens
More informationIteration. Side effects
Computer programming Iteration. Side effects Marius Minea marius@cs.upt.ro 17 October 2017 Assignment operators We ve used the simple assignment: lvalue = expression lvalue = what can be on the left of
More informationInterpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console
Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the
More informationCOMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table
COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab
More informationCD Assignment I. 1. Explain the various phases of the compiler with a simple example.
CD Assignment I 1. Explain the various phases of the compiler with a simple example. The compilation process is a sequence of various phases. Each phase takes input from the previous, and passes the output
More informationProgramming Language Syntax and Analysis
Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of
More informationLecture 12: Parser-Generating Tools
Lecture 12: Parser-Generating Tools Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (31/10/17) Lecture 12: Parser-Generating Tools 2017-2018 1 / 27 Summary Overview
More informationSyntax-Directed Translation
Syntax-Directed Translation ALSU Textbook Chapter 5.1 5.4, 4.8, 4.9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 What is syntax-directed translation? Definition: The compilation
More informationCompiler Construction D7011E
Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:
More informationWhere We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser
More informationSoftware II: Principles of Programming Languages
Software II: Principles of Programming Languages Lecture 4 Language Translation: Lexical and Syntactic Analysis Translation A translator transforms source code (a program written in one language) into
More informationOutline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013
Outline Top Down Parsing Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form Recursive-descent parsing 1 CS 412/413 Spring 2008 Introduction to Compilers 2 Where We Are SLL(1) Parsing
More informationCSCI312 Principles of Programming Languages
Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill
More informationChapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley
More informationTopic 3: Syntax Analysis I
Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis
More informationRegular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications
Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata
More informationCSE 401 Midterm Exam Sample Solution 2/11/15
Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 4. Lexical Analysis (Scanning) Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office: TA-121 For
More informationJim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4
Jim Lambers ENERGY 211 / CME 211 Autumn Quarter 2008-09 Programming Project 4 This project is due at 11:59pm on Friday, October 31. 1 Introduction In this project, you will do the following: 1. Implement
More informationChapter 4. Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.
More informationLexical and Syntax Analysis
Lexical and Syntax Analysis (of Programming Languages) Bison, a Parser Generator Lexical and Syntax Analysis (of Programming Languages) Bison, a Parser Generator Bison: a parser generator Bison Specification
More informationCSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on
More informationCSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I
CSCI 1260: Compilers and Program Analysis Steven Reiss Fall 2015 Lecture 4: Syntax Analysis I I. Syntax Analysis A. Breaking the program into logical units 1. Input: token stream 2. Output: representation
More informationCS321 Languages and Compiler Design I. Winter 2012 Lecture 4
CS321 Languages and Compiler Design I Winter 2012 Lecture 4 1 LEXICAL ANALYSIS Convert source file characters into token stream. Remove content-free characters (comments, whitespace,...) Detect lexical
More informationSyntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming
The Big Picture Again Syntactic Analysis source code Scanner Parser Opt1 Opt2... Optn Instruction Selection Register Allocation Instruction Scheduling machine code ICS312 Machine-Level and Systems Programming
More informationCompiler construction in4303 lecture 3
Compiler construction in4303 lecture 3 Top-down parsing Chapter 2.2-2.2.4 Overview syntax analysis: tokens AST program text lexical analysis language grammar parser generator tokens syntax analysis AST
More informationProgramming Language Specification and Translation. ICOM 4036 Fall Lecture 3
Programming Language Specification and Translation ICOM 4036 Fall 2009 Lecture 3 Some parts are Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics
More informationB The SLLGEN Parsing System
B The SLLGEN Parsing System Programs are just strings of characters. In order to process a program, we need to group these characters into meaningful units. This grouping is usually divided into two stages:
More informationSyntax Analysis, III Comp 412
COMP 412 FALL 2017 Syntax Analysis, III Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp
More informationSyntax-Directed Translation. Lecture 14
Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik) 9/27/2006 Prof. Hilfinger, Lecture 14 1 Motivation: parser as a translator syntax-directed translation stream of tokens parser ASTs,
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More information4. Semantic Processing and Attributed Grammars
4. Semantic Processing and Attributed Grammars 1 Semantic Processing The parser checks only the syntactic correctness of a program Tasks of semantic processing Checking context conditions - Declaration
More informationSyntax. In Text: Chapter 3
Syntax In Text: Chapter 3 1 Outline Syntax: Recognizer vs. generator BNF EBNF Chapter 3: Syntax and Semantics 2 Basic Definitions Syntax the form or structure of the expressions, statements, and program
More informationOptimizing Finite Automata
Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states
More informationError Recovery. Computer Science 320 Prof. David Walker - 1 -
Error Recovery Syntax Errors: A Syntax Error occurs when stream of tokens is an invalid string. In LL(k) or LR(k) parsing tables, blank entries refer to syntax erro How should syntax errors be handled?
More informationParser Tools: lex and yacc-style Parsing
Parser Tools: lex and yacc-style Parsing Version 6.11.0.6 Scott Owens January 6, 2018 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationProgram Assignment 2 Due date: 10/20 12:30pm
Decoration of parse tree for (1 + 3) * 2 N. Meng, S. Arthur 1 Program Assignment 2 Due date: 10/20 12:30pm Bitwise Manipulation of Hexidecimal Numbers CFG E E A bitwise OR E A A A ^ B bitwise XOR A B B
More informationLex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:
Lex & Yacc By H. Altay Güvenir A compiler or an interpreter performs its task in 3 stages: 1) Lexical Analysis: Lexical analyzer: scans the input stream and converts sequences of characters into tokens.
More informationLL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3
LL(k) Compiler Construction More LL parsing Abstract syntax trees Lennart Andersson Revision 2011 01 31 2010 Related names top-down the parse tree is constructed top-down recursive descent if it is implemented
More informationBuilding a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1
Building a Parser III CS164 3:30-5:00 TT 10 Evans 1 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka
More informationFall, 2015 Prof. Jungkeun Park
Data Structures t and Algorithms Stacks Application Infix to Postfix Conversion Fall, 2015 Prof. Jungkeun Park Copyright Notice: This material is modified version of the lecture slides by Prof. Rada Mihalcea
More informationLex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:
Lex & Yacc by H. Altay Güvenir A compiler or an interpreter performs its task in 3 stages: 1) Lexical Analysis: Lexical analyzer: scans the input stream and converts sequences of characters into tokens.
More informationCOLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR
Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept.. 22nd Jan 2018 CONTENTS :- 1. Role of lexical analysis 2.
More informationSyntax Analysis, III Comp 412
Updated algorithm for removal of indirect left recursion to match EaC3e (3/2018) COMP 412 FALL 2018 Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, III Comp 412 source code
More informationICOM 4036 Spring 2004
Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3 Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics Structure of a Compiler
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationParsing. Zhenjiang Hu. May 31, June 7, June 14, All Right Reserved. National Institute of Informatics
National Institute of Informatics May 31, June 7, June 14, 2010 All Right Reserved. Outline I 1 Parser Type 2 Monad Parser Monad 3 Derived Primitives 4 5 6 Outline Parser Type 1 Parser Type 2 3 4 5 6 What
More informationAn Introduction to LEX and YACC. SYSC Programming Languages
An Introduction to LEX and YACC SYSC-3101 1 Programming Languages CONTENTS CONTENTS Contents 1 General Structure 3 2 Lex - A lexical analyzer 4 3 Yacc - Yet another compiler compiler 10 4 Main Program
More informationPrinciples of Programming Languages COMP251: Syntax and Grammars
Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2007
More informationLanguages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1
CSE 401 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2010 1/8/2010 2002-10 Hal Perkins & UW CSE B-1 Agenda Quick review of basic concepts of formal grammars Regular
More informationConcepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis
More informationParser Tools: lex and yacc-style Parsing
Parser Tools: lex and yacc-style Parsing Version 5.0 Scott Owens June 6, 2010 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1 Creating
More informationGrammars and Parsing, second week
Grammars and Parsing, second week Hayo Thielecke 17-18 October 2005 This is the material from the slides in a more printer-friendly layout. Contents 1 Overview 1 2 Recursive methods from grammar rules
More informationDefining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1
Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind
More informationBuilding lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy
Building lexical and syntactic analyzers Chapter 3 Syntactic sugar causes cancer of the semicolon. A. Perlis Chomsky Hierarchy Four classes of grammars, from simplest to most complex: Regular grammar What
More informationCS415 Compilers. Lexical Analysis
CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework
More informationCOMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna
More informationCS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)
Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language
More informationGrammars & Parsing. Lecture 12 CS 2112 Fall 2018
Grammars & Parsing Lecture 12 CS 2112 Fall 2018 Motivation The cat ate the rat. The cat ate the rat slowly. The small cat ate the big rat slowly. The small cat ate the big rat on the mat slowly. The small
More information2.2 Syntax Definition
42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions
More information