Lexical analysis. Concepts. Lexical analysis in perspec>ve
|
|
- Emma Cobb
- 5 years ago
- Views:
Transcription
1 4a Lexical analysis Concepts Overview of syntax and seman3cs Step one: lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. Lexical analysis in perspec>ve LEXICAL ANALYZER: Transforms character stream to token stream. Also called scanner, lexer, linear analysis source program lexical analyzer token get next token symbol table parser This is an overview of the standard process of turning a text file into an executable program. LEXICAL ANALYZER Scans Input Removes whitespace, newlines, Iden3fies Tokens Creates Symbol Table Inserts Tokens into symbol table Generates Errors Sends Tokens to Parser PARSER Performs Syntax Analysis Ac3ons Dictated by Token Order Updates Symbol Table Entries Creates Abstract Rep. of Source Generates Errors
2 Where we are Basic lexical analysis terms Total=price+tax; Total = price + tax ; id assignment = Expr id + id price tax Lexical analyzer Parser Token A classifica3on for a common set of strings Examples: <iden3fier>, <number>, <operator>, <open paren>, etc. PaUern The rules which characterize the set of strings for a token Typically defined via regular expressions Lexeme Character sequence that matches pauern a token Iden3fiers: x, count, name, foo32, etc Integers: - 12, 101, 0, Open paren: ) Examples: token, lexeme, panern if (price + gst rebate <= 10.00) gift := false Token lexeme Informal description of pattern if if if Lparen ( ( Identifier price String consists of letters and numbers and starts with a letter operator + + identifier gst String consists of letters and numbers and starts with a letter operator - - identifier rebate String consists of letters and numbers and starts with a letter Operator <= Less than or equal to constant Any numeric constant rparen ) ) identifier gift String consists of letters and numbers and starts with a letter Operator := Assignment symbol identifier false String consists of letters and numbers and starts with a letter Regular expression (REs) Scanners are based on regular expressions that define simple pauerns Simpler and less expressive than BNF Examples of a regular expression lener: a b c... z A B C... Z digit: iden>fier: leuer (leuer digit)* Basic opera3ons are (1) set union, (2) concatena3on and (3) Kleene closure Plus: parentheses, naming pauerns No recursion
3 Regular expression (REs) Example: lener: a b c... z A B C... Z digit: iden>fier: leuer (leuer digit)* leuer ( leuer digit ) * leuer ( leuer digit ) * leuer ( leuer digit ) * concatena3on: one pauern followed by another set union: one pauern or another Kleene closure: zero or more repe3ons of a pauern Regular expressions are extremely useful in many applica3ons. Mastering them will serve you well. Another view Formal language opera>ons "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems. - - Jamie Zawinski (1997) alt.religion.emacs hup://bit.ly/jwzregex Operation Notation Definition union of L and M concatenation of L and M Kleene closure of L positive closure L M LM L M = {s s is in L or s is in M} LM = {st s is in L and t is in M} L* L* denotes zero or more concatenations of L L+ L+ denotes one or more concatenations of L Example L={a, b} M={0,1} {a, b, 0, 1} {a0, a1, b0, b1} All the strings consists of a and b, plus the empty string. {ε, a, b, aa, bb, ab, ba, aaa, } All the strings consists of a and b. {a, b, aa, bb, ab, ba, aaa, }
4 Regular expression Let Σ be an alphabet, r a regular expression then L(r) is the language that is characterized by the rules of r Defini3on of regular expression ε is a regular expression that denotes the language {ε} If a is in Σ, a is a regular expression that denotes {a} Let r & s be regular expressions with languages L(r) & L(s)» (r) (s) is a regular expression à L(r) L(s)» (r)(s) is a regular expression à L(r) L(s)» (r)* is a regular expression à (L(r))* It is an induc3ve defini3on A regular language is a language that can be defined by a regular expression RE example revisited Examples of regular expression Letter: a b c... z A B C... Z Digit: Identifier: letter (letter digit)* Q: why it is an regular expression? Because it only uses the opera3ons of union, concatena3on and Kleene closure Being able to name pauerns is just syntac3c sugar Using parentheses to group things is just syntac3c sugar provided we specify the precedence and associa3vely of the operators (i.e.,, * and concat ) +: Another common operator The + operator is commonly used to mean one or more repe33ons of a pauern For example, lener + means one or more leuers We can always do without this, e.g. leuer + is equivalent to leuer leuer * So the + operator is just syntac3c sugar Precedence of operators In interpre3ng a regular expression Parens scope sub- expressions * and + have the highest precedence Concatena3on comes next is lowest. All the operators are ley associa3ve Example (A) ((B)* (C)) is equivalent to A B * C What strings does this generate or match? Either an A or any number of Bs followed by a C
5 Epsilon: more syntac>c sugar Some3mes we d like a token that represents nothing This makes a regular expression matching more complex, but can be useful We use the lower case Greek leuer epsilon (ε) for this special token Example: digit: sign: + - ε int: sign digit+ Proper>es of regular expressions We can easily determine some basic proper3es of the operators involved in building regular expressions r s = s r Property r (s t) = (r s) t (rs)t=r(st) r(s t)=rs rt (s t)r=sr tr is commutative is associative Description Concatenation is associative Concatenation distributes over RE: S>ll more syntac>c sugar Zero or one instance L? = L ε Examples» Op3onal_frac3onà.digits ε» op3onal_frac3onà (.digits)? Character classes [abc] = a b c [a- z] = a b c... z Systems having RE support (e.g., Java, Python, Lex, Emacs) vary in the features supported and oyen in the nota3on But tend to be very similar Regular grammars / expressions Regular grammars and regular expressions are equivalent Every regular expression can be expressed by regular grammar Every regular grammar can be expressed by regular expression Example: an iden3fier must begin with a leuer and can be followed by any number of leuers and digits Regular expression ID: LETTER (LETTER IT)* Regular grammar ID à LETTER ID_REST ID_REST à LETTER ID_REST IT ID_REST EMPTY
6 Formal defini>on of tokens A set of tokens is a set of strings over an alphabet {read, write, +, -, *, /, :=, 1, 2,, 10,, 3.45e- 3, } A set of tokens is a regular set that can be defined by using a regular expression For every regular set, there is a finite automaton (FA) that can recognize it Aka determinis3c Finite State Machine (FSM) i.e. determine whether a string belongs to the set or not Scanners extract tokens from source code in the same way DFAs determine membership FSM = FA Finite state machine and finite automaton are different names for the same concept The concept is important and useful in almost every aspect of computer science Provides abstract way to define a process that Has a finite set of states it can be in, with a special statr state and a set of accep3ng states Gets a sequence of inputs Each input causes process to go from its current state to a new state (which might be the same) If ayer the input ends, we are in one of a set of accep3ng state, the input is accepted by the FA Example An FA that determines whether a binary number has an odd or even number of 0's, where S1 is an accep3ng state. transition label is input that triggers it Determinis>c finite automaton (DFA) A DFA has only one choice for a given input in every state No states with two arcs matching same input Incoming arrow identifies start state State names (e.g., S 1, S 2 ) for convenience Double circle identifies accepting state(s) For this FA inputs are expected to be a 0 or 1 Is this a DFA?
7 Determinis>c finite automaton (DFA) If an input symbol matches no arc for current state, input is not accepted This FA accepts only binary numbers that are multiples of three REs can be represented as DFAs Regular expression for a simple iden3fier Letter: a b c... z A B C... Z Digit: Identifier: letter (letter digit)* letter This DFA recognizes identifiers letter * Marking state with a * is another way to identify accepting state Is this a DFA? 0,1,2,3,4 9 RE < CFG Every language that can be described by a RE can be described by a CFG Some languages can be described by a CFG but not by a RE for example the set of palidromes made up of as and bs: S - > a S a b S b a aa b bb Token Defini>on Numeric literals in Pascal, e.g. 1, 123, , 10e- 3, 3.14e4 Defini>on of token unsignednum unsignedint * unsignednum unsignedint ((. unsignedint) ε) ((e ( + ε) unsignedint) ε) Note: Recursion restricted to leymost or rightmost posi3on on LHS Parentheses used to avoid ambiguity. * * e * + - FAs with epsilons are NFAs NFAs are harder to implement, use backtracking Every NFA can be rewriuen as a DFA (gets larger, though)
8 Simple Problem Read characters consis3ng of as and bs, one at a 3me. If it contains a double aa, print accepted else rejected. An abstract solu3on to this can be expressed as a DFA b a a 1 2 3* Start state b The DFA state transi3ons can be encoded as a table which specifies the new state for a given current state and input current state An accep>ng state input a, b a b State transi>on table State transition table, initial state and set of accepting states represent the DFA import sys state = 1 ok = [3] trans = {1:{'a':2,'b':1}, 2:{'a':3,'b':1}, 3:{'a':3,'b':3}} for char in sys.argv[1]: state = trans[state][char] print 'accepted' if state in ok else 'rejected b a 1 2 3* Start state b current state An accep>ng state input a b a, b Scanner Generators E.g. lex, flex Take a table as input, return scanner program that extracts tokens from character stream Useful programming u3lity, especially when coupled with a parser generator (e.g., yacc) Standard in Unix Lex Lexical analyzer generator It writes a lexical analyzer Assumes each token matches a regular expression Needs set of regular expressions for each expression an ac3on Produces a highly op3mized C program Automa3cally handles many tricky problems flex is the gnu version of the venerable unix tool lex
9 Lex example input Examples lex cc foolex foo.l foolex.c foolex > flex - ofoolex.c foo.l > cc - ofoolex foolex.c - lfl >more input begin if size>10 then size * end > foolex < input Keyword: begin Keyword: if Iden3fier: size Operator: > Integer: 10 (10) Keyword: then Iden3fier: size Operator: * Operator: - Float: (3.1415) Keyword: end tokens The examples to follow can be access on gl See /afs/umbc.edu/users/f/i/finin/pub/lex % ls -l /afs/umbc.edu/users/f/i/finin/pub/lex total 8 drwxr-xr-x 2 finin faculty 2048 Sep 27 13:31 aa drwxr-xr-x 2 finin faculty 2048 Sep 27 13:32 defs drwxr-xr-x 2 finin faculty 2048 Sep 27 11:35 footranscanner drwxr-xr-x 2 finin faculty 2048 Sep 27 11:34 simplescanner A Lex Program Simplest Example defini3ons rules subrou3nes [0-9] ID [a- z][a- z0-9]* {}+ prinƒ("integer\n ); {}+"."{}* prinƒ("float\n ); {ID} prinƒ("iden3fier\n ); [ \t\n]+ /* skip whitespace */. prinƒ( Huh?\n"); main(){yylex();}. \n ECHO; main() { yylex(); } No defini3ons One rule Minimal wrapper Echoes input
10 (a b)*aa(a b)* [a b]+ Strings containing aa. \n ECHO; main() {yylex();} {printf("accept %s\n", yytext);} {printf("reject %s\n", yytext);} Rules Each has a rule has a paiern and an acjon PaUerns are regular expression Only one ac3on is performed Ac3on corresponding to the pauern matched is performed If several pauerns match, one correspond- ing to the longest sequence is chosen Among the rules whose pauerns match the same number of characters, the first rule is preferred Defini>ons Defini3ons block allows you to name a RE If name in curly braces in a rule, the RE will be subs3tuted [0-9] {}+ printf("int: %s\n", yytext); {}+"."{}* printf("float: %s\n", yytext);. /* skip anything else */ main(){yylex();} /* scanner for a toy Pascal- like language */ %{ #include <math.h> /* needed for call to atof() */ %} [0-9] ID [a- z][a- z0-9]* {}+ prinƒ("integer: %s (%d)\n", yytext, atoi(yytext)); {}+"."{}* prinƒ("float: %s (%g)\n", yytext, atof(yytext)); if then begin end prinƒ("keyword: %s\n",yytext); {ID} prinƒ("iden3fier: %s\n",yytext); "+" "- " "*" "/" prinƒ("operator: %s\n",yytext); "{"[^}\n]*"}" /* skip one- line comments */ [ \t\n]+ /* skip whitespace */. prinƒ("unrecognized: %s\n",yytext); main(){yylex();}
11 Flex RE syntax x character 'x'. any character except newline [xyz] character class, in this case, matches either an 'x', a 'y', or a 'z' [abj- oz] character class with a range in it; matches 'a', 'b', any leuer from 'j' through 'o', or 'Z' [^A- Z] negated character class, i.e., any character but those in the class, e.g. any character except an uppercase leuer. [^A- Z\n] any character EXCEPT an uppercase leuer or a newline r* zero or more r's, where r is any regular expression r+ one or more r's r? zero or one r's (i.e., an op3onal r) {name} expansion of the "name" defini3on "[xy]\"foo" the literal string: '[xy]"foo' (note escaped ") \x if x is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI- C interpreta3on of \x. Otherwise, a literal 'x' (e.g., escape) rs RE r followed by RE s (e.g., concatena3on) r s either an r or an s <<EOF>> end- of- file
Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis
More informationChapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective
Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley
More informationRegular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications
Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata
More informationLexical Analysis (ASU Ch 3, Fig 3.1)
Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program
More informationCSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1
CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions
More informationConcepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens
Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate
More informationCSC 467 Lecture 3: Regular Expressions
CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token
More informationLexical Analyzer Scanner
Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce
More informationFigure 2.1: Role of Lexical Analyzer
Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer
More informationLexical Analyzer Scanner
Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce
More informationImplementation of Lexical Analysis
Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More informationImplementation of Lexical Analysis
Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation
More informationLexical Analysis. Chapter 2
Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationIntroduction to Lexical Analysis
Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular
More informationProgramming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson
Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationScanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012
Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved
More informationLexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream
More informationMonday, August 26, 13. Scanners
Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can
More informationWednesday, September 3, 14. Scanners
Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can
More informationCOMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna
More informationCS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08
CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions
More informationLexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual
Lexical Analysis Chapter 1, Section 1.2.1 Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens
More informationCSc 453 Lexical Analysis (Scanning)
CSc 453 Lexical Analysis (Scanning) Saumya Debray The University of Arizona Tucson Overview source program lexical analyzer (scanner) tokens syntax analyzer (parser) symbol table manager Main task: to
More informationLexical Analysis. Lecture 3-4
Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please
More informationPart 5 Program Analysis Principles and Techniques
1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape
More informationInterpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console
Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the
More informationLexical Analysis. Lecture 2-4
Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.
More informationECS 120 Lesson 7 Regular Expressions, Pt. 1
ECS 120 Lesson 7 Regular Expressions, Pt. 1 Oliver Kreylos Friday, April 13th, 2001 1 Outline Thus far, we have been discussing one way to specify a (regular) language: Giving a machine that reads a word
More informationFormal Languages and Compilers Lecture VI: Lexical Analysis
Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal
More informationLanguages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1
CSE 401 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2010 1/8/2010 2002-10 Hal Perkins & UW CSE B-1 Agenda Quick review of basic concepts of formal grammars Regular
More informationStructure of Programming Languages Lecture 3
Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis
More informationCMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters
: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter
More informationLexical Analysis. Introduction
Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies
More informationCompiler course. Chapter 3 Lexical Analysis
Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata
More informationParsing and Pattern Recognition
Topics in IT 1 Parsing and Pattern Recognition Week 10 Lexical analysis College of Information Science and Engineering Ritsumeikan University 1 this week mid-term evaluation review lexical analysis its
More informationWeek 2: Syntax Specification, Grammars
CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences
More informationLexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata
Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string
More informationPRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer
PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer As the first phase of a compiler, the main task of the lexical analyzer is to read the input
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back
More informationChapter 3 Lexical Analysis
Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical
More informationCOMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.
UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System
More informationCompiler phases. Non-tokens
Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree
More informationCSCI312 Principles of Programming Languages!
CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from
More informationWhere We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationCS 314 Principles of Programming Languages. Lecture 3
CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information
More informationGroup A Assignment 3(2)
Group A Assignment 3(2) Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Lexical analyzer using LEX. 3.1.1 Problem Definition: Lexical analyzer for sample language using LEX. 3.1.2 Perquisite:
More informationDr. D.M. Akbar Hussain
1 2 Compiler Construction F6S Lecture - 2 1 3 4 Compiler Construction F6S Lecture - 2 2 5 #include.. #include main() { char in; in = getch ( ); if ( isalpha (in) ) in = getch ( ); else error (); while
More informationCS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2 CS 536 Spring 2015 1 Reading Assignment Read Chapter 3 of Crafting a Com piler. CS 536 Spring 2015 21 The Structure
More informationCS321 Languages and Compiler Design I. Winter 2012 Lecture 4
CS321 Languages and Compiler Design I Winter 2012 Lecture 4 1 LEXICAL ANALYSIS Convert source file characters into token stream. Remove content-free characters (comments, whitespace,...) Detect lexical
More informationLECTURE 7. Lex and Intro to Parsing
LECTURE 7 Lex and Intro to Parsing LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens) and create real programs that can recognize them.
More informationCSE 401/M501 Compilers
CSE 401/M501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Spring 2018 UW CSE 401/M501 Spring 2018 B-1 Administrivia No sections this week Read: textbook ch. 1 and sec. 2.1-2.4
More informationCS308 Compiler Principles Lexical Analyzer Li Jiang
CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationLecture 4: Syntax Specification
The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic
More informationAnnouncements! P1 part 1 due next Tuesday P1 part 2 due next Friday
Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program
More informationLast lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions
Last lecture CMSC330 Finite Automata Languages Sets of strings Operations on languages Regular expressions Constants Operators Precedence 1 2 Finite automata States Transitions Examples Types This lecture
More informationCSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington
CSE 5317 Lecture 4: Scanning and parsing 28 Jan 2010 Nate Nystrom University of Texas at Arlington Administrivia hcp://groups.google.com/group/uta- cse- 3302 I will add you to the group soon TA Derek White
More informationOutline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata
Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:
More informationDVA337 HT17 - LECTURE 4. Languages and regular expressions
DVA337 HT17 - LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning
More informationArchitecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End
Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter
More informationAbout the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design
i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target
More informationImplementation of Lexical Analysis
Outline Implementation of Lexical nalysis Specifying lexical structure using regular expressions Finite automata Deterministic Finite utomata (DFs) Non-deterministic Finite utomata (NFs) Implementation
More informationLexical Analysis 1 / 52
Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA
More informationLexical Analysis. Sukree Sinthupinyo July Chulalongkorn University
Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn University 14 July 2012 Outline Introduction 1 Introduction 2 3 4 Transition Diagrams Learning Objectives Understand definition of
More informationCompilers CS S-01 Compiler Basics & Lexical Analysis
Compilers CS414-2017S-01 Compiler Basics & Lexical Analysis David Galles Department of Computer Science University of San Francisco 01-0: Syllabus Office Hours Course Text Prerequisites Test Dates & Testing
More informationZhizheng Zhang. Southeast University
Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5
More informationFormal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2
Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence
More informationCompilers CS S-01 Compiler Basics & Lexical Analysis
Compilers CS414-2005S-01 Compiler Basics & Lexical Analysis David Galles Department of Computer Science University of San Francisco 01-0: Syllabus Office Hours Course Text Prerequisites Test Dates & Testing
More informationCS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)
Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language
More informationUNIT II LEXICAL ANALYSIS
UNIT II LEXICAL ANALYSIS 2 Marks 1. What are the issues in lexical analysis? Simpler design Compiler efficiency is improved Compiler portability is enhanced. 2. Define patterns/lexeme/tokens? This set
More information10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis
Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output
More information2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS
2010: Compilers Lexical Analysis: Finite State Automata Dr. Licia Capra UCL/CS REVIEW: REGULAR EXPRESSIONS a Character in A Empty string R S Alternation (either R or S) RS Concatenation (R followed by
More informationUNIT -2 LEXICAL ANALYSIS
OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce
More informationDefining syntax using CFGs
Defining syntax using CFGs Roadmap Last 8me Defined context-free grammar This 8me CFGs for syntax design Language membership List grammars Resolving ambiguity CFG Review G = (N,Σ,P,S) means derives derives
More informationCMSC 330: Organization of Programming Languages. Context Free Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More informationTHE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES
THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely
More informationCS 403: Scanning and Parsing
CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax
More informationCSE 3302 Programming Languages Lecture 2: Syntax
CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:
More information2. Lexical Analysis! Prof. O. Nierstrasz!
2. Lexical Analysis! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/! http://www.cs.purdue.edu/homes/hosking/!
More informationLexical Analysis. Lecture 3. January 10, 2018
Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re
More informationThe Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?
The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed
More informationThe Structure of a Syntax-Directed Compiler
Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree (AST) Type Checker Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target
More informationCPS 506 Comparative Programming Languages. Syntax Specification
CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Recap: First-Longest-Match Analysis The Extended Matching
More informationCS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3 CS 536 Spring 2015 1 Scanning A scanner transforms a character stream into a token stream. A scanner is sometimes
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: First-Longest-Match Analysis Outline of Lecture
More informationCS 314 Principles of Programming Languages
CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release
More informationCS S-01 Compiler Basics & Lexical Analysis 1
CS414-2017S-01 Compiler Basics & Lexical Analysis 1 01-0: Syllabus Office Hours Course Text Prerequisites Test Dates & Testing Policies Projects Teams of up to 2 Grading Policies Questions? 01-1: Notes
More informationCS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]
CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups
More informationChapter 3 -- Scanner (Lexical Analyzer)
Chapter 3 -- Scanner (Lexical Analyzer) Job: Translate input character stream into a token stream (terminals) Most programs with structured input have to deal with this problem Need precise definition
More informationLecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou
Lecture Outline COMP-421 Compiler Design! Lexical Analyzer Lex! Lex Examples Presented by Dr Ioanna Dionysiou Figures and part of the lecture notes taken from A compact guide to lex&yacc, epaperpress.com
More informationCMSC 330: Organization of Programming Languages. Context Free Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler
More information1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.
UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More information