Lexical and Syntax Analysis
|
|
- Cameron Sutton
- 6 years ago
- Views:
Transcription
1 COS 301 Programming Languages Lexical and Syntax Analysis Sebesta, Ch. 4 Syntax analysis Programming languages compiled, interpreted, or hybrid All have to do syntax analysis For a compiled language parse trees Overall Syntax Analysis Program (string of characters) Lexical Analyzer Tokens & lexemes Syntax Analyzer Parse trees (decorated) Why separate phases? Different difficulties: Lexical analysis: Simple, so simple approach Optimize, since lot of time spent here Syntax analysis: more complex more complex approach Portability: Syntax analyzer: portable Lexical analyzer: maybe not But: May not really be totally separate phases
2 Lexical and syntax analysis Lexical analysis: Low-level analysis: looking for identifiers, constants Needs regular grammar Finite state machine (automaton) Syntax analysis: Needs context-free (or attribute) grammar Pushdown automaton (recursive transition network) Lexical Analysis Pattern matching Lexical analyzer (LA): pattern matcher Input: String of characters Look for patterns: lexemes (e.g., myarray) Also determine categories of lexemes: Categories = tokens (e.g., identifier) Often represented by numeric code Output: tokens + lexemes Strips out comments, whitespace
3 Tokens Identifiers Literals: Numbers: 2, 3, 5.7, 3E4 Characters: x Strings: foo Booleans: TRUE Keywords/reserved words: while, if,etc. Operators: +, -, *, /, **, ^, etc. Punctuation: ;, () {} [] Non-token strings Whitespace (space, tab ) Sometimes not just discarded (e.g., Python) Comments EOL Some operating systems: EOL+newline Sometimes whitespace (C, C++, Java, Lisp, ) Sometimes statement separators (FORTRAN, Basic) EOF Example output foo = foo * PI / 2; Token Lexeme IDENT foo ASSIGN_OP = IDENT foo MULT_OP * IDENT PI DIV_OP / INT_LIT 2 SEMICOLON ;
4 Building a lexical analyzer One way: Write regular grammar of tokens Give to lex, flex, flex++, etc. table-driven lexical analyzer Another way: Draw state transition diagram for tokens Write custom program to implement it Third way: Draw state transition diagram Construct table-driven implementation Review: Chomsky hierarchy Four levels of languages (grammars) Regular Context-free Context-sensitive Finite-state automaton Recursively-enumerable CFGs needed for syntax Pushdown automaton Linear-bounded automaton Turing machine Regular grammars sufficient for lexical analysis Each can be recognized/generated by automaton (formal machine) state diagram for LA should represent an FSA Regular grammars: Grammars LHS: single nonterminal RHS: at most 1 nonterminal, rightmost/leftmost Context-free grammars: only one nonterminal on LHS Context-sensitive grammars: LHS: any number of terminals, nonterminals Sentential form cannot shrink in derivation Recursively-enumerable (unrestricted) grammars
5 Regular grammars Tuple {P,T,N,S} P = productions T = terminals N = nonterminals S = start symbol(s) Must be right- or left-regular Right regular grammars RHS contains at most 1 nonterminal Nonterminal must be rightmost symbol Let ω T*, A,B N; productions: A ω B A ω E.g.: let a = an alphanumeric character, and n = numeral: S ar R ar R nr Left regular grammars Same except non-terminal on left A B ω A ω
6 Linear grammars Linear grammars: Both kinds of rules Not strictly a regular grammar: more powerful E.g.: balance (), {}, begin/end Regular grammar: no Linear grammar: yes E.g.: {a n b n n 1} S! aab or S! aa A! S ε A! Sb b Reg languages linear languages CF languages Example regular grammar: Integers Right-regular grammar for whole numbers: <num> 0 1 <num2> 2 <num2> 9 <num2> <num2> 0 <num2> 1 <num2> 2 <num2> 9 As EBNF: <num2> ε <num> (0 (1 9) {(0 1 9)}) Finite state automata (machine) Automaton = abstract machine Two types: nondeterministic FSA (NFSA) deterministic FSA (DFSA) Only DFSA useful for our purposes Equivalent in power: NFSA can be equivalent DFSA
7 DFSA DFSA: formal machine, finite # states Accepts input from a tape State + input symbol unique next state Start state, accepting (end) state(s) Transitions: consumes (reads) symbols Accepts string when: Reaches accepting state and no more input left Else: error Uses of FSAs Language recognition Describe other things Control things (i.e., represent simple programs) FSA as graph FSAs can be represented as directed graphs Nodes states Input alphabet + end-of-input symbol State transition function represented by directed edges in graph, labeled with symbols or set of symbols Unique start state One or more final (accepting) states
8 Example: Vending Machine Adapted from Wulf, Shaw, Hilfinger, Flon, Fundamental Structures of Computer Science, p.17. Example: Battery Charger From Regular expressions Regular expressions: Alternative to regular grammars Specify language at the lexical level Also: in text-processing, web applications Built-in support in many languages: e.g., Perl, Ruby, Java, Javascript, Python,.NET languages
9 Regular expression conventions Regex Meaning x a character x (stands for itself) \x an escaped character, e.g., \n M N M or N M N M followed by N Note: \ varies with software, typical usage: certain non-printable characters (e.g., \n = newline and \t=tab) ASCII hex (\xff) or Unicode hex (\xffff) Shorthand character classes (\w = word, \s = whitespace \d=digit) Escaping a literal, e.g. \* or \. Meta-symbols Regex Meaning M+ One or more occurrences of M M? Zero or one occurrence of M M* Zero or more occurrences of M [] surrounding a range or set: one of these E.g., [aeiou] the set of vowels E.g., [0-9] the set of digits E.g., [A-Z,a-z,0-9] the set of alphanumeric chars. Any single character ( ) Grouping Regex example Let Σ = { a, b, c } r = (a b)*c This regex specifies repetition (0, 1, 2, etc. occurrences) of either a or b followed by c. Strings that match this regular expression include: c ac bc abc aabbaabbc
10 Let Σ = { a, b, c } Regex example r = (a c)*b(a c)* This regular expression specifies repetition of either a or c followed by b followed by repetition of either a or c. b ab bcccc abc aaccaab aacabccca Signed integers Leading +/- (optional) At least 1 digit in 0..9 Regex: (\+ \-)?[0-9]+ Regex example Matches include +1, 0, -0, , , Regex example Create regular expression to represent a signed floating point number. There is an optional leading sign ( + or - ) followed by 1 or more digits in the range followed by an optional decimal point and then 1 or more digits in the range The \. symbol indicates. is the literal period and not the. symbol for any character. 1. (\+ \-)?[0-9]+(\.[0-9]+)? 2. [-+]?([0-9]+\.[0-9]+ [0-9]+) 3. [-+]?[0-9]+\.?[0-9]* will allow 9. This illustrates how complex regexes can be!
11 DFSA for regular grammar E.g.: A DFSA that accepts binary strings with an even number of 1 bits Right regular grammar A 0A 1B ε B 0B 1A Regex 0*(10*1)*0* 0 0 A 1 B 1 Regex libraries Many available online See for example Lexical analysis state transition diagram For recognizing/generating regular languages A DFSA Nodes states Arcs transitions between states Labels: input characters Actions (optional) Labels can be classes of characters (e.g., 0 9, [A Z,a z], etc.)
12 A FSA for identifiers Letter, Digit Letter ε S 1 F Explicit accepting state A FSA for identifiers Letter, Digit Letter ε S 1 F Explicit accepting state L, D Could also draw as: L S 1 What language is this? What language is described by this diagram? a S m a m d a d a
13 Lexical syntax for a simple C-like language anychar [ -~] Note: space(0x20) to tilde (0x7f) Letter [a-za-z] Digit [0-9] Whitespace [ \t] Again note literal space(0x20) EOL \n EOF \004 Lexical syntax for a simple C-like language Keyword bool char else false float if int main true while Identifier {Letter}({Letter} {Digit})* integerlit {Digit}+ floatlit {Digit}+\.{Digit}+ charlit {anychar} Operator = && ==!= < <= > >= + - * /! [ ] Separator :. { } ( ) Comment // ({anychar} {Whitespace})* {eol Some common FSA conventions Unlabeled arc: any other valid input symbol. Recognition of a token ends in a final state. Recognition of a non-token (e.g., whitespace, comment) transitions back to start state. Recognition of end symbol (end of file) ends in a final state.
14 FSA Automaton must be deterministic. Drop keywords; handle separately with lookup table We must consider all sequences with a common prefix together e.g., Floats and ints Comments and division DFSA for a small C-like language ws = whitespace, l = letter, d = digit, eoln = \n, eof = end of input, All others are literal Whitespace // comments Division op Identifiers DFSAs for a small C-like language Ints and floats Single & double quotes Assignment & comparison Addition Logical and bitwise AND
15 Lexical Rules <id> ::= <letter> <letter> <id2> <id2> ::= <letter> <id2> <digit> <id2> <letter> <digit> <int> ::= <digit> <digit> <int> <other> ::= + - * / ( ) State Diagram Implementation: Lexical Analyzer from Text front.c (pp ) - Following is the output of the lexical analyzer of front.c when used on (sum + 47) / total Next token is: 25 Next lexeme is ( Next token is: 11 Next lexeme is sum Next token is: 21 Next lexeme is + Next token is: 10 Next lexeme is 47 Next token is: 26 Next lexeme is ) Next token is: 24 Next lexeme is / Next token is: 11 Next lexeme is total Next token is: -1 Next lexeme is EOF
16 Program Structure Program is a DFSA with global variables Utility routines: getchar - gets the next character of input, puts it in nextchar, determines its class and puts the class in charclass getnonblank advances over whitespace to the first char of a token addchar - puts the character from nextchar into the place the lexeme is being accumulated, lexeme lookup - determines whether the string in lexeme is a reserved word (returns a code) front.c 1 #include <stdio.h> #include <ctype.h> /* global declarations */ /* variables */ int charclass; char lexeme[100]; char nextchar; int lexlen; int nexttoken; FILE *in_fp, *fopen(); /* Function declarations */ void void getchar(); void getnonblank(); int lex(); /* Character classes */ #define LETTER 0 #define DIGIT 1 #define UNKNOWN 99 /* Token codes */ #define INT_LIT 10 #define IDENT 11 #define ASSIGN_OP 20 #define ADD_OP 21 #define SUB_OP 22 #define MULT_OP 23 #define DIV_OP 24 #define LEFT_PAREN 25 #define RIGHT_PAREN 26 front.c 2
17 /* main driver */ main() { } front.c 3 /* open the input data file and process contents */ if ((in_fp = fopen = fopen("front.in","r")) == NULL) printf("error - cannot open front in \n"); else { getchar(); do { } lex(); } while nexttoken!= EOF front.c 4 /* lookup - a function to lookup operators and parentheses and return the token */ int lookup(char ch){ switch(ch){ case '(': nexttoken = LEFT_PAREN; case ')': nexttoken = RIGHT_PAREN; case '+': nexttoken = ADD_OP; case '-': nexttoken = SUB_OP; case '*': nexttoken = MULT_OP; case '/': nexttoken = DIV_OP; default: nexttoken = EOF; } return nexttoken; } front.c 5 /* addchar - a function to add next char to lexeme */ void addchar(){ if (lexlen <= 98){ lexeme[lexlen++] = nextchar; lexeme[lexlen] = 0; } else { printf("error - lexeme too long \n"); } } /* getchar - a function to get the next char of input and determine its character class */ void getchar(){ if ((nextchar = getc(in_fp))!= EOF){ if (isalpha(nextchar)) charclass = LETTER; else if (isdigit(nextchar)) charclass = DIGIT; else charclass = UNKNOWN; } else charclass = EOF; }
18 front.c 6 /* getnonblank - a function to call getchar until it returns a non-whitespace character */ void getnonblank(){ while (isspace(nextchar)) getchar(); } /* lex - a simple lexical analyzer for arithmetic expressions */ int lex(){ lexlen = 0; getnonblank(); switch (charclass){ case LETTER: /* parse identifiers */ getchar(); while (charclass == LETTER charclass == DIGIT){ getchar(); } nexttoken = IDENT; front.c 7 case DIGIT: /* parse integer literals */ getchar(); while (charclass == DIGIT){ getchar(); } nexttoken = INT_LIT; case UNKNOWN: /* parentheses and operators */ lookup(nextchar); getchar(); case EOF: /* EOF */ nexttoken = EOF; lexeme[0] = 'E'; lexeme[1] = 'O'; lexeme[2] = 'F'; lexeme[3] = 0; } /* end of switch */ printf("next token is: %d, next lexeme is %s\n", nexttoken, lexeme); return nexttoken; Example output (sum + 47) / total Next token is: 25 lexeme is ( Next token is: 11 lexeme is sum Next token is: 21 lexeme is + Next token is: 10 lexeme is 47 Next token is: 26 lexeme is ) Next token is: 24 lexeme is / Next token is: 11 lexeme is total Next token is: -1 lexeme is EOF
19 Quiz 1. Draw a DFSA that recognizes binary strings that start with 1 and end with 0 2. Draw a DFSA that recognizes binary strings with at least three consecutive 1 s 3. Below is a BNF grammar for fractional numbers: S -> -FN FN FN -> DL DL.DL DL -> D D DL D -> (a) Rewrite as EBNF (b) Now draw a corresponding DFSA Done? Quiz Answers 1. Draw a DFSA that recognizes binary strings that start with 1 and end with 0 S
20 DFSA for q2 2. Draw a DFSA that recognizes binary strings with at least three consecutive 1 s 0 0 1,0 S Below is a BNF grammar for fractional numbers. Rewrite as EBNF: <s> -<fn> <fn> <fn> <dl> <dl>.<dl> <dl> <d> <d> <dl> <d> <s> [-]<fn> <fn> <dl>[.<dl>] <dl> <d>{<d>} And as DFSA: Quiz Answers - 0,1,,9 0,1,,9 0,1,,9 S 0,1,,9. 0,1,,9 Could also have had another state to handle -
Lexical and Syntax Analysis
Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input
More informationIntroduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.
Chapter 4. Lexical and Syntax Analysis Introduction Introduction The Parsing Problem Three approaches to implementing programming languages Compilation Compiler translates programs written in a highlevel
More informationCSCI312 Principles of Programming Languages!
CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from
More information4. LEXICAL AND SYNTAX ANALYSIS
4. LEXICAL AND SYNTAX ANALYSIS CSc 4330/6330 4-1 9/15 Introduction Chapter 1 described three approaches to implementing programming languages: compilation, pure interpretation, and hybrid implementation.
More informationBuilding lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy
Building lexical and syntactic analyzers Chapter 3 Syntactic sugar causes cancer of the semicolon. A. Perlis Chomsky Hierarchy Four classes of grammars, from simplest to most complex: Regular grammar What
More information10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis
Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output
More informationChapter 3. Describing Syntax and Semantics ISBN
Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All
More informationProgramming Languages 2nd edition Tucker and Noonan"
Programming Languages 2nd edition Tucker and Noonan" Chapter 3 Lexical and Syntactic Analysis Syntactic sugar causes cancer of the semicolon. " " " " " " " "A. Perlis" Contents" 3.1 Chomsky Hierarchy"
More information10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis
Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output
More informationChapter 4. Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.
More informationCSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis
Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More information4. Lexical and Syntax Analysis
4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal
More informationLexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream
More informationSyntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.
Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will
More informationSyntax Intro and Overview. Syntax
Syntax Intro and Overview CS331 Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal
More informationUnit-1. Evaluation of programming languages:
Evaluation of programming languages: 1. Zuse s Plankalkül 2. Pseudocodes 3. The IBM 704 and Fortran 4. Functional Programming: LISP 5. The First Step Toward Sophistication: ALGOL 60 6. Computerizing Business
More informationCPS 506 Comparative Programming Languages. Syntax Specification
CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens
More informationProgramming Language Syntax and Analysis
Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of
More informationFormal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2
Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More informationLexical and Syntax Analysis (2)
Lexical and Syntax Analysis (2) In Text: Chapter 4 N. Meng, F. Poursardar Motivating Example Consider the grammar S -> cad A -> ab a Input string: w = cad How to build a parse tree top-down? 2 Recursive-Descent
More informationRegular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications
Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata
More informationPart 5 Program Analysis Principles and Techniques
1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape
More informationStructure of Programming Languages Lecture 3
Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis
More informationCOP 3402 Systems Software Syntax Analysis (Parser)
COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis
More informationCS 314 Principles of Programming Languages
CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release
More informationCSE 3302 Programming Languages Lecture 2: Syntax
CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:
More informationLanguages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1
CSE 401 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2010 1/8/2010 2002-10 Hal Perkins & UW CSE B-1 Agenda Quick review of basic concepts of formal grammars Regular
More informationConcepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens
Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationICOM 4036 Spring 2004
Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3 Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics Structure of a Compiler
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationCompiler course. Chapter 3 Lexical Analysis
Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata
More informationProgramming Language Specification and Translation. ICOM 4036 Fall Lecture 3
Programming Language Specification and Translation ICOM 4036 Fall 2009 Lecture 3 Some parts are Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely
More informationFormal Languages. Formal Languages
Regular expressions Formal Languages Finite state automata Deterministic Non-deterministic Review of BNF Introduction to Grammars Regular grammars Formal Languages, CS34 Fall2 BGRyder Formal Languages
More informationLexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual
Lexical Analysis Chapter 1, Section 1.2.1 Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens
More informationCS 230 Programming Languages
CS 230 Programming Languages 10 / 16 / 2013 Instructor: Michael Eckmann Today s Topics Questions/comments? Top Down / Recursive Descent Parsers Top Down Parsers We have a left sentential form xa Expand
More informationChapter 3 Lexical Analysis
Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure
More informationChapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.
Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches
More informationCSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1
CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions
More informationLexical Analysis. Chapter 2
Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples
More informationLexical Analysis (ASU Ch 3, Fig 3.1)
Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program
More informationWeek 2: Syntax Specification, Grammars
CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences
More information1 Lexical Considerations
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler
More informationCSE 401/M501 Compilers
CSE 401/M501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Spring 2018 UW CSE 401/M501 Spring 2018 B-1 Administrivia No sections this week Read: textbook ch. 1 and sec. 2.1-2.4
More informationUNIT -2 LEXICAL ANALYSIS
OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce
More informationChapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley
More informationLexical Analysis. Lecture 3-4
Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please
More informationThe Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing
ICOM 4036 Programming Languages Lexical and Syntax Analysis Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing This lecture covers review questions 14-27 This lecture covers
More informationLexical Analysis. Lecture 3. January 10, 2018
Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re
More informationCT32 COMPUTER NETWORKS DEC 2015
Q.2 a. Using the principle of mathematical induction, prove that (10 (2n-1) +1) is divisible by 11 for all n N (8) Let P(n): (10 (2n-1) +1) is divisible by 11 For n = 1, the given expression becomes (10
More informationLexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata
Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string
More informationConcepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis
More informationLexical Analysis. Lecture 2-4
Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More informationCS 314 Principles of Programming Languages. Lecture 3
CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information
More informationLexical Considerations
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a
More informationSyntax. In Text: Chapter 3
Syntax In Text: Chapter 3 1 Outline Syntax: Recognizer vs. generator BNF EBNF Chapter 3: Syntax and Semantics 2 Basic Definitions Syntax the form or structure of the expressions, statements, and program
More informationKEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language
1 KEY CS 441G Fall 2018 Exam 1 Matching: match the best term from the following list to its definition by writing the LETTER of the term in the blank to the left of the definition. (1 point each) A Accepts
More informationCS 441G Fall 2018 Exam 1 Matching: LETTER
CS 441G Fall 2018 Exam 1 Matching: match the best term from the following list to its definition by writing the LETTER of the term in the blank to the left of the definition. All 31 definitions are given
More informationCS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018
CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 / 17 Characterizing regular languages The following four statements about the language A are equivalent The language
More informationCompiler Construction D7011E
Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:
More informationCS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)
Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language
More informationCS321 Languages and Compiler Design I. Winter 2012 Lecture 4
CS321 Languages and Compiler Design I Winter 2012 Lecture 4 1 LEXICAL ANALYSIS Convert source file characters into token stream. Remove content-free characters (comments, whitespace,...) Detect lexical
More informationInterpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console
Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the
More informationFormal Languages and Compilers Lecture VI: Lexical Analysis
Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal
More informationLexical Considerations
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2010 Handout Decaf Language Tuesday, Feb 2 The project for the course is to write a compiler
More informationLexical Analysis. Sukree Sinthupinyo July Chulalongkorn University
Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn University 14 July 2012 Outline Introduction 1 Introduction 2 3 4 Transition Diagrams Learning Objectives Understand definition of
More informationGrammars and Parsing. Paul Klint. Grammars and Parsing
Paul Klint Grammars and Languages are one of the most established areas of Natural Language Processing and Computer Science 2 N. Chomsky, Aspects of the theory of syntax, 1965 3 A Language...... is a (possibly
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and
More informationProf. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan
Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens
More informationJME Language Reference Manual
JME Language Reference Manual 1 Introduction JME (pronounced jay+me) is a lightweight language that allows programmers to easily perform statistic computations on tabular data as part of data analysis.
More informationProgramming Languages and Compilers (CS 421)
Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha 10/30/17
More informationRevisit the example. Transformed DFA 10/1/16 A B C D E. Start
Revisit the example ε 0 ε 1 Start ε a ε 2 3 ε b ε 4 5 ε a b b 6 7 8 9 10 ε-closure(0)={0, 1, 2, 4, 7} = A Trans(A, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(A, b) = {1, 2, 4, 5, 6, 7} = C Trans(B, a) = {1,
More informationProgramming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers
Programming Languages & Compilers Programming Languages and Compilers (CS 421) I Three Main Topics of the Course II III Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 New Programming
More informationChapter 2 - Programming Language Syntax. September 20, 2017
Chapter 2 - Programming Language Syntax September 20, 2017 Specifying Syntax: Regular expressions and context-free grammars Regular expressions are formed by the use of three mechanisms Concatenation Alternation
More informationSyntax. 2.1 Terminology
Syntax 2 Once you ve learned to program in one language, learning a similar programming language isn t all that hard. But, understanding just how to write in the new language takes looking at examples
More informationprogramming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs
Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning
More informationProgramming Lecture 3
Programming Lecture 3 Expressions (Chapter 3) Primitive types Aside: Context Free Grammars Constants, variables Identifiers Variable declarations Arithmetic expressions Operator precedence Assignment statements
More informationFigure 2.1: Role of Lexical Analyzer
Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer
More informationCompiler phases. Non-tokens
Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree
More informationCSC 467 Lecture 3: Regular Expressions
CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token
More informationWhere We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser
More informationFeatures of C. Portable Procedural / Modular Structured Language Statically typed Middle level language
1 History C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell Labs. C was originally first implemented on the DEC
More informationHigh Level Languages. Java (Object Oriented) This Course. Jython in Java. Relation. ASP RDF (Horn Clause Deduction, Semantic Web) Dr.
10 High Level Languages This Course Java (Object Oriented) Jython in Java Relation ASP RDF (Horn Clause Deduction, Semantic Web) Dr. Philip Cannata 1 Dr. Philip Cannata 2 Programming Languages Lexical
More informationZhizheng Zhang. Southeast University
Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5
More informationParsing and Pattern Recognition
Topics in IT 1 Parsing and Pattern Recognition Week 10 Lexical analysis College of Information Science and Engineering Ritsumeikan University 1 this week mid-term evaluation review lexical analysis its
More informationfor (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }
Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas
More informationLexical Analysis. Textbook:Modern Compiler Design Chapter 2.1
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1 A motivating example Create a program that counts the number of lines in a given input text file Solution (Flex) int num_lines = 0; %% \n ++num_lines;.
More informationCS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08
CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions
More informationHabanero Extreme Scale Software Research Project
Habanero Extreme Scale Software Research Project Comp215: Grammars Zoran Budimlić (Rice University) Grammar, which knows how to control even kings - Moliere So you know everything about regular expressions
More informationR10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?
R1 SET - 1 1. a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA? 2. a) Design a DFA that accepts the language over = {, 1} of all strings that
More informationStructure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.
CS 441 Fall 2018 Notes Compiler - software that translates a program written in a source file into a program stored in a target file, reporting errors when found. Source Target program written in a High
More informationCS308 Compiler Principles Lexical Analyzer Li Jiang
CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular
More informationSoftware II: Principles of Programming Languages
Software II: Principles of Programming Languages Lecture 4 Language Translation: Lexical and Syntactic Analysis Translation A translator transforms source code (a program written in one language) into
More informationCOMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna
More information