# CS 314 Principles of Programming Languages

Size: px
Start display at page:

Transcription

1 CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018

2 Announcement First recitation starts this Wednesday Homework 1 will be release tomorrow My office hour: Wednesday 5pm to 6pm, CoRE

3 Syntax and Semantics of Prog. Languages Syntax: Describes what a legal program looks like Semantics: Describes what a correct (legal) program means A formal language is a (possibly infinite) set of sentences (finite sequences of symbols) over a finite alphabet Σ of (terminal) symbols: L Σ Examples: L = { identifiers of length 2} with Σ = {a, b, c} L = {strings of only 1s and only 0s} L = {strings starting with \$ and ending with #, and any combination of 0s and 1s in between } L = {all syntactically correct Java programs} 3

4 Syntax and Semantics: How does it work? Syntactic representation of values What do the following syntactic expressions have in common? X 1010 A \$ # (2 6) 4

5 Syntax and Semantics: How does it work? Syntactic representation of values What do the following syntactic expressions have in common? X 1010 A \$ # (2 6) Answer: They are possible representations of the integer value 10 (written as a decimal number) What is computation? Possible answer: A (finite) sequence of syntactic manipulations of value representations ending in a normal form which is called the result. Normal forms cannot be manipulated any further. 5

6 Syntax and Semantics: How does it work? Here is a game (rewrite system): Input: Sequence of characters starting with \$ and ending with #, and any combination of 0s and 1s in between. Rules: You may replace a character pattern X at any position within the character sequence on the left-hand-side by the pattern Y on the righthand-side: X Y: Rule 1 \$1 1& Rule 2 \$0 0\$ Rule 3 &1 1\$ Rule 4 &0 0& Rule 5 Rule 6 \$# A &# B Replace patterns using the rules as often as you can. When you cannot replace a pattern any more, stop. 6

7 Syntax and Semantics: How does it work? Example input: Rule 1 \$1 1& \$ 0 0 # Rule 2 \$0 0\$ \$ 0 0 # is rewritten as 0 \$ 0 # by rule 2 Rule 3 &1 1\$ 0 \$ 0 # is rewritten as 0 0 \$ # by rule 2 Rule 4 &0 0& 0 0 \$ # is rewritten as 0 0 A by rule 5 Rule 5 \$# A no more rules can be applied (STOP) Rule 6 &# B More examples: \$ # \$ # \$ # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 7

8 Syntax and Semantics: How does it work? Example input: Rule 1 \$1 1& \$ 0 0 # Rule 2 \$0 0\$ \$ 0 0 # is rewritten as 0 \$ 0 # by rule 2 Rule 3 &1 1\$ 0 \$ 0 # is rewritten as 0 0 \$ # by rule 2 Rule 4 &0 0& 0 0 \$ # is rewritten as 0 0 A by rule 5 Rule 5 \$# A no more rules can be applied (STOP) Rule 6 &# B More examples: \$ # \$ # \$ # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 8

9 Syntax and Semantics: How does it work? Example input: Rule 1 \$1 1& \$ 0 0 # Rule 2 \$0 0\$ \$ 0 0 # is rewritten as 0 \$ 0 # by rule 2 Rule 3 &1 1\$ 0 \$ 0 # is rewritten as 0 0 \$ # by rule 2 Rule 4 &0 0& 0 0 \$ # is rewritten as 0 0 A by rule 5 Rule 5 \$# A no more rules can be applied (STOP) Rule 6 &# B More examples: \$ # \$ # \$ # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 9

10 Syntax and Semantics: How does it work? Example input: Rule 1 \$1 1& \$ 0 0 # Rule 2 \$0 0\$ \$ 0 0 # is rewritten as 0 \$ 0 # by rule 2 Rule 3 &1 1\$ 0 \$ 0 # is rewritten as 0 0 \$ # by rule 2 Rule 4 &0 0& 0 0 \$ # is rewritten as 0 0 A by rule 5 Rule 5 \$# A no more rules can be applied (STOP) Rule 6 &# B More examples: \$ # \$ # \$ # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 10

11 Syntax and Semantics: How does it work? Example input: Rule 1 \$1 1& \$ 0 0 # Rule 2 \$0 0\$ \$ 0 0 # is rewritten as 0 \$ 0 # by rule 2 Rule 3 &1 1\$ 0 \$ 0 # is rewritten as 0 0 \$ # by rule 2 Rule 4 &0 0& 0 0 \$ # is rewritten as 0 0 A by rule 5 Rule 5 \$# A no more rules can be applied (STOP) Rule 6 &# B More examples: \$ # \$ # \$ # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 11

12 Syntax Without Semantics? The green apple is colorless. 12

13 Syntax Without Semantics? The green apple is colorless. Syntactically correct, but semantically incorrect! 13

14 Review: Compilers Source Code Compiler Machine Code Error Implications: Recognize legal (and illegal) programs Generate correct code Manage storage of all variables and code Need format for object (or assembly) code Big step up from assembler higher level notations 14

15 Traditional Two-Pass Compilers Pass: reading and writing entire program Source Code Front End IR Back End Machine Code Implications: Intermediate representation (IR) Front end maps legal code into IR Back end maps IR onto target machine Simplify retargeting Allows multiple front ends Multiple passes better code Front end is O(n) Back end is NP-Complete Errors 15

16 Front End Source Code Tokens Scanner Parser IR Front End Responsibilities: Recognize legal programs Report errors Produce IR Preliminary storage map Errors Parser: syntax & semantic analyzer, IR code generator (syntax-directed translator) Shape the code for the back end Much of front end construction can be automated 16

17 Syntax and Semantics of Prog. Languages The syntax of programming languages is often defined in two layers: tokens and sentences tokens - basic units of the language Question: How to spell a token (word)? Answer: Regular expressions sentences - legal combinations of tokens in the language Question: How to build correct sentences with tokens? Answer: (context - free) grammars (CFG) E.g., Backus-Naur form (BNF) is a formalism used to express the syntax of programming languages. 17

18 Formalisms for Lexical and Syntactic Analysis Two issues in Formal Languages: Language Specification formalism to describe what a valid program (sentence) looks like. Language Recognition formalism to describe a machine and an algorithm that can verify that a program is valid or not. 18

19 Formalisms for Lexical and Syntactic Analysis Two issues in Formal Languages: Language Specification formalism to describe what a valid program (sentence) looks like. Language Recognition formalism to describe a machine and an algorithm that can verify that a program is valid or not. 1. Lexical Analysis: Converts source code into sequence of tokens. Regular grammar/expression to specify. Finite automata to recognize. 2. Syntax Analysis: Structures tokens into parse tree. Context-free grammars to specify. Push-down automata to recognize (will be covered in CS 415) 19

20 Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 20

21 Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 21

22 Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 22

23 Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 23

24 Lexical Analysis ( Scott 2.1, 2.2) Tokens (Terminal Symbols of CFG, Words of Lang.) Smallest atomic units of syntax Used to build all the other constructs Example, C: keywords: for if goto volatile = * / - < > == <= >= <> ( ) [ ] ; :=., number: (Example: ) identifier: (Example: b square addentry ) 24

25 Lexical Analysis (cont. ) Identifiers Names of variables, etc. Sequence of terminals of restricted form; Example, in C language: A31, but not 1A3 Upper/lower case sensitive? Keywords Special identifiers which represent tokens in the language May be reserved (reserved words) or not - E.g., C: if is reserved. Delimiters When does character string for token end? Example: identifiers are longest possible character sequence that does not include a delimiter. Most languages have more delimiters such as, new line, keywords,... 25

26 Regular Expressions A syntax (notation) to specify regular languages. RE r Language L(r) a {a} ϵ r s {ϵ} L(r) L(s) rs {rs r L(r), s L(s)} r+ L(r) L(rr) L(rrr) (any number of L(r) s concatenated) r* (r* = r + ϵ) {ϵ} L(r) L(rr) L(rrr)... (s) L(s) 26

27 Regular Expressions A syntax (notation) to specify regular languages. RE r r s Language L(r) L(r) L(s) rs {rs r L(r), s L(s)} r+ L(r) L(rr) L(rrr) r* (r* = r + ϵ) {ϵ} L(r) L(rr) L(rrr)... (any number of r s concatenated) (s) L(s) 27

28 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 28

29 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 29

30 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 30

31 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 31

32 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 32

33 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 33

34 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 34

35 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 35

36 Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 36

37 Regular Expressions for Programming Languages Let letter stand for A B C... Z Let digit stand for integer constant: identifier: real constant: 37

38 Recognizers for Regular Expressions Example 1: integer constant RE: digit + digit FSA: S0 digit S2 38

39 Recognizers for Regular Expressions(Cont.) Example 2: identifier RE: letter(letter digit) * letter FSA: S0 letter S2 digit 39

40 Recognizers for Regular Expressions(Cont.) Example 3: Real constant RE: digit*.digit + digit digit FSA: S0. S1 digit S2 40

41 Finite State Automata 0 S1 0 S0 S3 1 S2 1 A Finite-State Automaton is a quadruple: < S, s, F, T > S is a set of states, e.g., {S0, S1, S2, S3} s is the start state, e.g., S0 F is a set of final states, e.g., {S3} T is a set of labeled transitions, of the form (state, input) state [i.e., S Σ S] 41

42 Finite State Automata Transitions can be represented using a transition table: S0 0 1 S1 S2 0 1 S3 State 0 1 S0 S1 S2 S1 S3 - S2 - S3 Input An FSA accepts or recognizes an input string N iff there is some path from start state to a final state such that the labels on the path spell N. Lack of entry in the table (or no arc for a given character) indicates an error reject. 42

43 Finite State Automata Transitions can be represented using a transition table: S0 0 1 S1 S2 0 1 S3 State 0 1 S0 S1 S2 S1 S3 - S2 - S3 Input An FSA accepts or recognizes an input string N iff there is some path from start state to a final state such that the labels on the path spell N. Lack of entry in the table (or no arc for a given character) indicates an error reject. Error 43

44 Practical Recognizers Recognizer should be a deterministic finite automaton (DFA) Read until the end of a token Report errors (error recovery) identifier letter (a b c... z A B C... Z) digit ( ) id letter (letter digit) Recognizer for identifier: (transition diagram) letter other S0 S1 S2 digit other letter/digit error S3 error 44

45 Implementation: Table for the recognizer Two tables control the recognizer. char_class: a - z A - Z 0-9 other value letter letter digit other next_state: class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 To change languages, we can just change tables. 45

46 Implementation: Code for the recognizer letter other S0 S1 S2 S3 digit other error letter/digit class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 error char next_char(); state S0; done false; while( not done ) { class char_class[char]; state next_state[class,state]; switch(state) { case S1: /* building an id */ token_value token_value + char; char next_char(); if (char == EOF) done = true; break; case S2: /* error state */ case S3: /* error state */ token type = error; done = true; break; } } return token_type; 46

47 Improved efficiency Table driven implementation is slow relative to direct code. Each state transition involves: 1. Classifying the input character 2. Finding the next state 3. An assignment to the state variable 4. A trip through the case statement logic 5. A branch (while loop) We can do better by encoding the state table in the scanner code 1. Classify the input character 2. Test character class locally 3. Branch directly to next state This takes many fewer instructions. 47

48 Implementation: Faster scanning S0: char next_char(); token value ; class char_class[char]; if (class!= letter) goto S3; S1: token value token_value + char; char next_char(); class char_class[char]; if (class!= other) goto S1; if (char == DELIMITER ) token type = identifier; return token_type; goto S2; S2: S3: token type error; return token type; class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 48

49 Implementation: Faster scanning S0: char next_char(); token value ; class char_class[char]; if (class!= letter) goto S3; S1: token value token_value + char; char next_char(); class char_class[char]; if (class!= other) goto S1; if (char == DELIMITER ) token type = identifier; return token_type; goto S2; S2: S3: token type error; return token type; char next_char(); state class S0; S0 S1 S2 S3 done false; while( letter not done S1 ) { S1 class digit char_class[char]; S3 S1 state next_state[class,state]; switch(state) other S3{ S2 case S1: /* building an id */ token_value token_value + char; char next_char(); if (char == EOF) done = true; break; case S2: /* error state */ case S3: /* error state */ token type = error; done = true; break; } } return token_type; Faster code Original code 49

50 Next Lecture Things to do: Read Scott, Chapters

### CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

### Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

### CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

### CSc 453 Lexical Analysis (Scanning)

CSc 453 Lexical Analysis (Scanning) Saumya Debray The University of Arizona Tucson Overview source program lexical analyzer (scanner) tokens syntax analyzer (parser) symbol table manager Main task: to

### CPSC 434 Lecture 3, Page 1

Front end source code tokens scanner parser il errors Responsibilities: recognize legal procedure report errors produce il preliminary storage map shape the code for the back end Much of front end construction

### Part 5 Program Analysis Principles and Techniques

1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

### COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

### Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

### CS415 Compilers. Lexical Analysis

CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

### Lexical Analysis. Introduction

Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

### Week 2: Syntax Specification, Grammars

CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences

### Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

### Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program

### Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Academic Formalities CS3300 - Language Translators Introduction V. Krishna Nandivada IIT Madras Written assignments = 20 marks. Midterm = 30 marks, Final = 50 marks. Extra marks During the lecture time

### CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

### Lexical Scanning COMP360

Lexical Scanning COMP360 Captain, we re being scanned. Spock Reading Read sections 2.1 3.2 in the textbook Regular Expression and FSA Assignment A new assignment has been posted on Blackboard It is due

### Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite Automata Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

### CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

### The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

### Compiling Regular Expressions COMP360

Compiling Regular Expressions COMP360 Logic is the beginning of wisdom, not the end. Leonard Nimoy Compiler s Purpose The compiler converts the program source code into a form that can be executed by the

### Non-deterministic Finite Automata (NFA)

Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design

### CSCE 314 Programming Languages

CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how

### CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

### CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers Overview of the Course These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Critical Facts Welcome to CS415 Compilers Topics in the

### Theory and Compiling COMP360

Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the

### Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

### COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept.. 22nd Jan 2018 CONTENTS :- 1. Role of lexical analysis 2.

### Formal Languages. Formal Languages

Regular expressions Formal Languages Finite state automata Deterministic Non-deterministic Review of BNF Introduction to Grammars Regular grammars Formal Languages, CS34 Fall2 BGRyder Formal Languages

### CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

### CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

### CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

CSE 311 Lecture 21: Context-Free Grammars Emina Torlak and Kevin Zatloukal 1 Topics Regular expressions A brief review of Lecture 20. Context-free grammars Syntax, semantics, and examples. 2 Regular expressions

### Compiler course. Chapter 3 Lexical Analysis

Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata

### CSE302: Compiler Design

CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 01, 2007 Outline Recap

### Zhizheng Zhang. Southeast University

Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

### CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution Exam Facts Format Wednesday, July 20 th from 11:00 a.m. 1:00 p.m. in Gates B01 The exam is designed to take roughly 90

### Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

### Lexical Analysis. Lecture 2-4

Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

### Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis - An Introduction Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

### programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

### Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

### Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical

### CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups

### Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

### Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

### Optimizing Finite Automata

Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

### Lecture 4: Syntax Specification

The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic

### 2. Lexical Analysis! Prof. O. Nierstrasz!

2. Lexical Analysis! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/! http://www.cs.purdue.edu/homes/hosking/!

### CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions

### Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

### KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

1 KEY CS 441G Fall 2018 Exam 1 Matching: match the best term from the following list to its definition by writing the LETTER of the term in the blank to the left of the definition. (1 point each) A Accepts

### Lecture 3: CUDA Programming & Compiler Front End

CS 515 Programming Language and Compilers I Lecture 3: CUDA Programming & Compiler Front End Zheng (Eddy) Zhang Rutgers University Fall 2017, 9/19/2017 1 Lecture 3A: GPU Programming 2 3 Review: Performance

### Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Compilation 2012 Parsers and Scanners Jan Midtgaard Michael I. Schwartzbach Aarhus University Context-Free Grammars Example: sentence subject verb object subject person person John Joe Zacharias verb asked

### MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

CS606- Compiler Construction MidTerm Papers Solved MCQS with Reference (1 to 22 lectures) by Arslan Arshad (Zain) FEB 21,2016 0300-2462284 http://lmshelp.blogspot.com/ Arslan.arshad01@gmail.com AKMP01

### Introduction to Lexing and Parsing

Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?

### CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

CSE 5317 Lecture 4: Scanning and parsing 28 Jan 2010 Nate Nystrom University of Texas at Arlington Administrivia hcp://groups.google.com/group/uta- cse- 3302 I will add you to the group soon TA Derek White

### CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

### Compiler Design Overview. Compiler Design 1

Compiler Design Overview Compiler Design 1 Preliminaries Required Basic knowledge of programming languages. Basic knowledge of FSA and CFG. Knowledge of a high programming language for the programming

### Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

### CST-402(T): Language Processors

CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

### 2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

2010: Compilers Lexical Analysis: Finite State Automata Dr. Licia Capra UCL/CS REVIEW: REGULAR EXPRESSIONS a Character in A Empty string R S Alternation (either R or S) RS Concatenation (R followed by

### Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

### Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Introduction: Definitions Lexical analysis or scanning: To read from left-to-right a source

### Implementation of Lexical Analysis

Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

### Introduction to Lexical Analysis

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

### CS308 Compiler Principles Lexical Analyzer Li Jiang

CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular

### MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

### MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

### Figure 2.1: Role of Lexical Analyzer

Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

### Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

### Lexical Analyzer Scanner

Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

### R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

R1 SET - 1 1. a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA? 2. a) Design a DFA that accepts the language over = {, 1} of all strings that

### Lexical Analyzer Scanner

Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

### Chapter 2 :: Programming Language Syntax

Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular

### Lexical Analysis. Chapter 2

Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

### David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

David Griol Barres dgriol@inf.uc3m.es Computer Science Department Carlos III University of Madrid Leganés (Spain) OUTLINE Introduction: Definitions The role of the Lexical Analyzer Scanner Implementation

### 10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

### COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

### Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

### Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

### Lexical and Syntax Analysis

Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input

### Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

CS 441 Fall 2018 Notes Compiler - software that translates a program written in a source file into a program stored in a target file, reporting errors when found. Source Target program written in a High

### 10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

### Implementation of Lexical Analysis

Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

### Front End. Hwansoo Han

Front nd Hwansoo Han Traditional Two-pass Compiler Source code Front nd IR Back nd Machine code rrors High level functions Recognize legal program, generate correct code (OS & linker can accept) Manage

### Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

dministrivia Lexical nalysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Moving to 6 Evans on Wednesday HW available Pyth manual available on line. Please log into your account and electronically

### Chapter 3. Parsing #1

Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)

### CSE 401/M501 Compilers

CSE 401/M501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Spring 2018 UW CSE 401/M501 Spring 2018 B-1 Administrivia No sections this week Read: textbook ch. 1 and sec. 2.1-2.4

### Regular Expressions & Automata

Regular Expressions & Automata CMSC 132 Department of Computer Science University of Maryland, College Park Regular expressions Notation Patterns Java support Automata Languages Finite State Machines Turing

i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

### CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CSE450 Translation of Programming Languages Lecture 4: Syntax Analysis http://xkcd.com/859 Structure of a Today! Compiler Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator

### CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

### CS606- compiler instruction Solved MCQS From Midterm Papers

CS606- compiler instruction Solved MCQS From Midterm Papers March 06,2014 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 Final Term MCQ s and Quizzes CS606- compiler instruction If X is a

### Writing a Lexical Analyzer in Haskell (part II)

Writing a Lexical Analyzer in Haskell (part II) Today Regular languages and lexicographical analysis part II Some of the slides today are from Dr. Saumya Debray and Dr. Christian Colberg This week PA1:

### CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

### Compilers and Interpreters

Overview Roadmap Language Translators: Interpreters & Compilers Context of a compiler Phases of a compiler Compiler Construction tools Terminology How related to other CS Goals of a good compiler 1 Compilers

### CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

### Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re