# Monday, August 26, 13. Scanners

Size: px
Start display at page:

Transcription

1 Scanners

2 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can we recognize tokens? How do we write scanners?

3 Regular expressions Regular sets: set of strings defined by regular expressions Strings are regular sets (with one element): purdue So is the empty string: λ (sometimes use ɛ instead) Concatentations of regular sets are regular: purdue To avoid ambiguity, can use ( ) to group regexps together A choice between two regular sets is regular, using : (purdue ) 0 or more of a regular set is regular, using *: (purdue)* Some other notation used for convenience: Use Not to accept all strings except those in a regular set Use? to make a string optional: x? equivalent to (x λ) Use + to mean 1 or more strings from a set: x+ equivalent to xx* Use [ ] to present a range of choices: [1-3] equivalent to (1 2 3)

4 Examples of regular expressions Numbers: D = [0-9]+ Words: L = [A-Za-z]+ Literals (integers or floats): -?D+(.D*)? Identifiers: (_ L)(_ L D)* Comments (as in Micro): -- Not(\n)*\n More complex comments (delimited by ##, can use # inside comment): ##((# λ)not(#))*##

5 Finite automata Finite state machine which will only accept a string if it is in the set defined by the regular expression (a b c+)+ a a b c start state transition state final state c

6 λ transitions Transitions between states that aren t triggered by seeing another character Can optionally take the transition, but do not have to Can be used to link states together λ

7 Non-deterministic FA Note that if a finite automaton has a λ-transition in it, it may be non-deterministic (do we take the transition? or not?) More precisely, FA is non-deterministic if, from one state reading a single character could result in transition to multiple states How do we deal with non-deterministic finite automata (NFAs)?

8 Running an NFA Intuition: take every possible path through an NFA Think: parallel execution of NFA Maintain a pointer that tracks the current state Every time there is a choice, split the pointer, and have one pointer follow each choice Track each pointer simultaneously If a pointer gets stuck, stop tracking it If any pointer reaches an accept state at the end of input, accept

9 Example How does this NFA handle the string aba? 1 λ 2 a 5 a a a, b 3 b 4

10 Building a FA from a regexp Expression a λ FA a λ AB λ A B λ A B λ λ A λ λ B A* λ A λ λ Mini-exercise: how do we build an FA that accepts Not(A)?

11 NFAs to DFAs Can convert NFAs to deterministic finite automata (DFAs) No choices never a need to split pointers Initial idea: simulate NFA for all possible inputs, any time there is a new configuration of pointers, create a state to capture it Pointers at states 1, 3 and 4 new state {1, 3, 4} Trying all possible inputs is impractical; instead, for any new state, explore all possible next states (that can be reached with a single character) Process ends when there are no new states found This can result in very large DFAs!

12 Example Convert the following into a DFA 1 λ 2 a 5 a a a, b 3 b 4

13 DFA reduction DFAs built from NFAs are not necessarily optimal May contain many more states than is necessary (ab)+ (ab)(ab)* b a b a

14 DFA reduction DFAs built from NFAs are not necessarily optimal May contain many more states than is necessary (ab)+ (ab)(ab)* a a b

15 DFA reduction Intuition: merge equivalent states Two states are equivalent if they have the same transitions to the same states Basic idea of optimization algorithm Start with two big nodes, one representing all the final states, the other representing all other states Successively split those nodes whose transitions lead to nodes in the original DFA that are in different nodes in the optimized DFA

16 Example Simplify the following 1 a 2 b 3 c 4 d 5 b 6 c 7

17 Transition tables Table encoding states and transitions of FA 1 row per state, 1 column per possible character Each entry: if automaton in a particular state sees a character, what is the next state? State Character a b c a a 2 b 3 c start state transition state final state c

18 Finite automata program Using a transition table, it is straightforward to write a program to recognize strings in a regular language state = initial_state; //start state of FA while (true) { next_char = getc(); if (next_char == EOF) break; next_state = T[state][next_char]; if (next_state == ERROR) break; state = next_state; } if (is_final_state(state)) //recognized a valid string else handle_error(next_char);

19 Alternate implementation Here s how we would implement the same program conventionally next_char = getc(); while (next_char == a ) { next_char = getc(); if (next_char!= b ) handle_error(next_char); next_char = getc(); if (next_char!= c ) handle_error(next_char); while (next_char == c ) { next_char = getc(); if (next_char == EOF) return; //matched token if (next_char == a ) break; if (next_char!= c ) handle_error(next_char); } } handle_error(next_char);

20 Practical Consderations Or: what do I have to worry about if I m actually going to write a scanner?

21 Handling reserved words Keywords can be written as regular expressions. However, this leads to a big blowup in FA size Consider writing a regular expression that accepts identifiers which cannot be if, while, do, for, etc. Usually better to specify reserved words as exceptions Capture them using the identifier regex, and then decide if the token corresponds to a reserved word

22 Lookahead Up until now, we have only considered matching an entire string to see if it is in a regular language What if we want to match multiple tokens from a file? Distinguish between int a and inta We need to look ahead to see if the next character belongs to the current token If it does, we can continue If it doesn t, the next character becomes part of the next token

23 Multi-character lookahead Sometimes, a scanner will need to look ahead more than one character to distinguish tokens Examples Fortran: DO I = 1,100 (loop) vs. DO I = (variable assignment) Pascal: (literal) vs (range) D D D. D 2 solutions: Backup or special action state

24 Multi-character lookahead Sometimes, a scanner will need to look ahead more than one character to distinguish tokens Examples Fortran: DO I = 1,100 (loop) vs. DO I = (variable assignment) Pascal: (literal) vs (range) D D D. D. 2 solutions: Backup or special action state

25 General approach Remember states (T) that can be final states Buffer the characters from then on If stuck in a non-final state, back up to T, restore buffered characters to stream Example: 12.3e+q input stream e + q FA processing T Error!

26 Why can t we do this? Just build an FA which recognizes the string D+( λ.d+)(...)d+( λ.d+) and recognize the final state we are in to determine the token type? Note that this will recognize tokens of the form 12.3 and 12..3

27 Error Recovery What do we do if we encounter a lexical error (a character which causes us to take an undefined transition)? Two options Delete all currently read characters, start scanning from current location Delete first character read, start scanning from second character This presents problems with ill-formatted strings (why?) One solution: create a new regexp to accept runaway strings

28 Scanner Generators

29 Scanner generators Essentially, tools for converting regular expressions into finite automata Two popular scanner generators Lex (Flex): generates C/C++ scanners ANTLR: generates Java scanners

30 Lex (Flex) Commonly used Unix scanner generator (superseded by Flex) Flex is a domain specific language for writing scanners Features: Character classes : define sets of characters (e.g., digits) Token definitions : regex {action to take}

31 DIGIT [0-9] ID [a-z][a-z0-9]* %% Lex (Flex) {DIGIT}+! {!!! printf( "An integer: %s (%d)\n", yytext,!!! atoi( yytext ) );!! } {DIGIT}+"."{DIGIT}*!{ printf( "A float: %s (%g)\n", yytext, atof( yytext ) ); } if then begin end procedure function!{!!! printf( "A keyword: %s\n", yytext );!! } {ID}!! printf( "An identifier: %s\n", yytext );

32 Lex (Flex) The order in which tokens are defined matters! Lex will match the longest possible token ifa becomes ID(ifa), not IF ID(a) If two regexes both match, Lex uses the one defined first if becomes IF, not ID(if) Use action blocks to process tokens as necessary Convert integer/float literals to numbers Remove quotes from string literals

33 Lex (Flex) Compile lex file to C code Example of compiling high-level language to another high-level language! Compile generated scanner to produce working scanner Combine with yacc/bison to produce parser

34 ANTLR More powerful tool than Lex (can generate parsers, too, not just scanners) Same basic principles Tokens: Token definition: tokenname : regex1 regex2... Character classes: Look similar to token definitions fragment characterclassname : regex1 regex2... Can use character classes when defining tokens

35 Next Time We ve covered how to tokenize an input program But how do we decide what the tokens actually say? How do we recognize that IF ID(a) OP(<) ID(b) { ID(a) ASSIGN LIT(5) ; } is an if-statement? Next time: Parsers

### Wednesday, September 3, 14. Scanners

Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

### Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved

### Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical

### Compiler course. Chapter 3 Lexical Analysis

Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata

### Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

### Writing a Lexical Analyzer in Haskell (part II)

Writing a Lexical Analyzer in Haskell (part II) Today Regular languages and lexicographical analysis part II Some of the slides today are from Dr. Saumya Debray and Dr. Christian Colberg This week PA1:

### Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

### Crafting a Compiler with C (V) Scanner generator

Crafting a Compiler with C (V) 資科系 林偉川 Scanner generator Limit the effort in building a scanner to specify which tokens the scanner is to recognize Some generators do not produce an entire scanner; rather,

### Figure 2.1: Role of Lexical Analyzer

Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

### Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

### Implementation of Lexical Analysis

Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

### Introduction to Lexical Analysis

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

### Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program

### Alternation. Kleene Closure. Definition of Regular Expressions

Alternation Small finite sets are conveniently represented by listing their elements. Parentheses delimit expressions, and, the alternation operator, separates alternatives. For example, D, the set of

### Implementation of Lexical Analysis

Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

### CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3 CS 536 Spring 2015 1 Scanning A scanner transforms a character stream into a token stream. A scanner is sometimes

### Lexical Analyzer Scanner

Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

### 2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

2010: Compilers Lexical Analysis: Finite State Automata Dr. Licia Capra UCL/CS REVIEW: REGULAR EXPRESSIONS a Character in A Empty string R S Alternation (either R or S) RS Concatenation (R followed by

### Compiler phases. Non-tokens

Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree

### Lexical Analyzer Scanner

Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

### Finite Automata and Scanners

Finite Automata and Scanners A finite automaton (FA) can be used to recognize the tokens specified by a regular expression. FAs are simple, idealized computers that recognize strings belonging to regular

### CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:

### Lexical Error Recovery

Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation

### CSc 453 Lexical Analysis (Scanning)

CSc 453 Lexical Analysis (Scanning) Saumya Debray The University of Arizona Tucson Overview source program lexical analyzer (scanner) tokens syntax analyzer (parser) symbol table manager Main task: to

### CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

### UNIT -2 LEXICAL ANALYSIS

OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

### Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

### Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

### 2068 (I) Attempt all questions.

2068 (I) 1. What do you mean by compiler? How source program analyzed? Explain in brief. 2. Discuss the role of symbol table in compiler design. 3. Convert the regular expression 0 + (1 + 0)* 00 first

### Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

### Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

COMP 520 Winter 2016 Scanning COMP 520: Compiler Design (4 credits) Professor Laurie Hendren hendren@cs.mcgill.ca Scanning (1) COMP 520 Winter 2016 Scanning (2) Readings Crafting a Compiler: Chapter 2,

### Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

### Compiler Construction D7011E

Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:

### Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

COMP 520 Winter 2017 Scanning COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca MWF 13:30-14:30, MD 279 Scanning (1) COMP 520 Winter 2017 Scanning (2) Announcements

### CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5 CS 536 Spring 2015 1 Multi Character Lookahead We may allow finite automata to look beyond the next input character.

### Lexical Error Recovery

Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation

### Lexical Analysis. Chapter 2

Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

### CSc 453 Compilers and Systems Software

CSc 453 Compilers and Systems Software 3 : Lexical Analysis I Christian Collberg Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2009 Christian Collberg August 23, 2009

### Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Second round: the scanner Lexical analysis Syntactical analysis Semantical analysis Intermediate code generation Optimization Code generation Target specific optimization Lexical analysis (Chapter 3) Why

### Week 2: Syntax Specification, Grammars

CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences

### Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis Chapter 1, Section 1.2.1 Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens

### CMSC 350: COMPILER DESIGN

Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

### Zhizheng Zhang. Southeast University

Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

### Languages and Compilers

Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 4. Lexical Analysis (Scanning) Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office: TA-121 For

### Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens

### Lexical Analysis - 2

Lexical Analysis - 2 More regular expressions Finite Automata NFAs and DFAs Scanners JLex - a scanner generator 1 Regular Expressions in JLex Symbol - Meaning. Matches a single character (not newline)

### LECTURE 6 Scanning Part 2

LECTURE 6 Scanning Part 2 FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions. We then discussed how we can create a recognizer

### CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2 CS 536 Spring 2015 1 Reading Assignment Read Chapter 3 of Crafting a Com piler. CS 536 Spring 2015 21 The Structure

### CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

Tokenizing 1 Tasks of the Tokenizer Group the input (which is just a stream/string of characters) into tokens. (Numbers, operators, special words, strings) Eliminate comments. In C : Deal with #include,

### CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

### Implementation of Lexical Analysis. Lecture 4

Implementation of Lexical Analysis Lecture 4 1 Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working

### Lexical Analysis. Lecture 3-4

Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

### Lexical Analysis. Lecture 2-4

Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

### Lexical Analysis. Finite Automata. (Part 2 of 2)

# Lexical Analysis Finite Automata (Part 2 of 2) PA0, PA Although we have included the tricky file ends without a newline testcases in previous years, students made good cases against them (e.g., they

### COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

### Implementation of Lexical Analysis

Outline Implementation of Lexical nalysis Specifying lexical structure using regular expressions Finite automata Deterministic Finite utomata (DFs) Non-deterministic Finite utomata (NFs) Implementation

### CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

### CSE 401 Midterm Exam Sample Solution 2/11/15

Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:

### Lecture 9 CIS 341: COMPILERS

Lecture 9 CIS 341: COMPILERS Announcements HW3: LLVM lite Available on the course web pages. Due: Monday, Feb. 26th at 11:59:59pm Only one group member needs to submit Three submissions per group START

### Chapter 2 - Programming Language Syntax. September 20, 2017

Chapter 2 - Programming Language Syntax September 20, 2017 Specifying Syntax: Regular expressions and context-free grammars Regular expressions are formed by the use of three mechanisms Concatenation Alternation

### THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

### CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

### Lexical Analysis. Implementation: Finite Automata

Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)

### CS 403: Scanning and Parsing

CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

### Implementation of Lexical Analysis

Implementation of Lexical Analysis Lecture 4 (Modified by Professor Vijay Ganesh) Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested

### ECE251 Midterm practice questions, Fall 2010

ECE251 Midterm practice questions, Fall 2010 Patrick Lam October 20, 2010 Bootstrapping In particular, say you have a compiler from C to Pascal which runs on x86, and you want to write a self-hosting Java

### CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

### Compiler Construction

Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: First-Longest-Match Analysis Outline of Lecture

### Kinder, Gentler Nation

Lexical Analysis Finite Automata (Part 2 of 2) # Kinder, Gentler Nation In our post drop-deadline world things get easier. While we re here: reading quiz. #2 Summary Regular expressions provide a concise

### Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

### Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Finite automata We have looked at using Lex to build a scanner on the basis of regular expressions. Now we begin to consider the results from automata theory that make Lex possible. Recall: An alphabet

### Compiler Construction

Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Recap: First-Longest-Match Analysis The Extended Matching

### Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re

### CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CD Assignment I 1. Explain the various phases of the compiler with a simple example. The compilation process is a sequence of various phases. Each phase takes input from the previous, and passes the output

### CS415 Compilers. Lexical Analysis

CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

### Introduction to Parsing. Lecture 5

Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

### Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

### Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

### CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

### Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Class 5 Lex Spec Example delim [ \t\n] ws {delim}+ letter [A-Aa-z] digit [0-9] id {letter}({letter} {digit})* number {digit}+(\.{digit}+)?(e[+-]?{digit}+)? %% {ws} {/*no action and no return*?} if {return(if);}

### Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Introduction: Definitions Lexical analysis or scanning: To read from left-to-right a source

### 1. Lexical Analysis Phase

1. Lexical Analysis Phase The purpose of the lexical analyzer is to read the source program, one character at time, and to translate it into a sequence of primitive units called tokens. Keywords, identifiers,

### Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

### Chapter 2 :: Programming Language Syntax

Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular

### 10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

### 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator

### Parsing and Pattern Recognition

Topics in IT 1 Parsing and Pattern Recognition Week 10 Lexical analysis College of Information Science and Engineering Ritsumeikan University 1 this week mid-term evaluation review lexical analysis its

### Assignment 1 (Lexical Analyzer)

Assignment 1 (Lexical Analyzer) Compiler Construction CS4435 (Spring 2015) University of Lahore Maryam Bashir Assigned: Saturday, March 14, 2015. Due: Monday 23rd March 2015 11:59 PM Lexical analysis Lexical

### programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

### Compiler Construction

Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2

### Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

### Automatic Scanning and Parsing using LEX and YACC

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

### EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

### CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release

### COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System