Regular expressions. LING78100: Methods in Computational Linguistics I

Size: px
Start display at page:

Download "Regular expressions. LING78100: Methods in Computational Linguistics I"

Transcription

1 Regular expressions LING78100: Methods in Computational Linguistics I

2 String methods Python strings have methods that allow us to determine whether a string: Contains another string; e.g., assert "and" in "vandal" Starts with another string; e.g., assert "vendetta".startswith("vend") Ends with another string; e.g., assert "vigorous".endswith("us") But what if we want to do more?

3 String matching Regular expressions are a domain-specific language (DSL) for representing strings. With a regular expression we can perform three closely-related operations: Matching: determine whether a given string has a certain desired format Parsing: extracting matching subsets (or groups ) of a given string Substitution: replace particular groups of a given string with another string

4 Notation In what follows, regular expressions are written in blue, underlined typewriter font, and strings are written in red typewriter font. (In Python, regular expressions are specified as strings, though Python compiles them into an faster internal representation.)

5 Learning regular expressions......is a life-long process. There are two parts we ll target this week: The domain-specific language used to specify regular expressions, which is common across many tools/programming languages The software interface for matching, parsing, and substitution, which is language-specific.

6 Motivating examples woo+f matches woof, wooof, woooof, wooooof, and so on. San?tan?a? matches Santa, Satan, and Santana. (\+?(\d{1,3}-)((\d{3})-)?(\d{3})-(\d{4}) matches 7-, 10-, and 11-to-digit phone numbers like and can be used to parse them into a tuple of (country code, area code, exchange, base). A substitution can be used to rewrite every instance of heck as h*ck.

7 The regular expression language

8 Printable characters A regular expression may contain printable alphanumeric characters; e.g., apple matches apple and bass matches bass.

9 The universal character class A period. matches any single character (except newline).

10 Predefined character classes \d: a digit character \s: a whitespace character (such as space, tab, newline, etc.) \w: a word character (i.e., an alphanumeric character) And, their complement classes: \D: a non-digit character \S: a non-whitespace character \W: a non- word character (This is not an exhaustive list.)

11 Ad-hoc character classes Consist of single characters in square brackets; e.g., [234] matches 2, 3, or 4.* Predefined character classes can be used in ad-hoc classes, too; e.g., [\dabc] matches one of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c}. A hyphen between two characters can be used to indicate a range; e.g., [a-z]: an lowercase ASCII character [A-Z]: an uppercase ASCII character [A-Za-Z]: an alphabetic ASCII character *In set theory terms, it s a union.

12 Boundary symbols ^: left edge (beginning of string) $: right edge (end of string) These can only be sensibly used at the start (resp. end) of a regular expression.

13 Escaping Characters used for other purposes, including. [ and ] ^ and $ \ must be escaped (preceded by a backslash) to receive a literal interpretation; e.g.: \. matches. \[ matches [ \^ matches ^

14 Basic quantification.? matches zero or one characters.* (or Kleene star ) matches zero or more characters.+ (or Kleene plus ) matches one or more characters E.g.: woo+f matches woof, wooof, woooof, wooooof, and so on. colou?r matches color and colour.

15 Bracket quantification.{n} matches N characters.{m,n} matches between M (inclusive) and M (inclusive) characters.{m,} matches M or more characters.{,n} matches 0 to N characters E.g., wo{2,4}f matches woof, wooof, and woooof.

16 Groups Parentheses are used to bound spans of symbols. There are two uses for this: We can use grouping to recover substrings of the match. E.g., for 7-digit phone numbers we can extract the exchange and the base as separate groups with (\d{3})-?(\d{4}). For the string group 1 is then 867 and group 2 is 5309 (the optional - is not part of a group). We can use grouping to indicate the scope of quantifiers: E.g., ba(na)+ matches bana, banana, bananana, etc. Note that groups can be nested.

17 The operator The operator indicates the union ( either-or ) of multiple substrings. E.g., cigar(ette illo)? matches cigar, cigarette, and cigarillo. Here, the parentheses delimit both the scope of the? quantifier and the operator.

18 Quantifier scope Quantifiers take scope over the preceding character, character class, or group. E.g., \d{3} matches any sequence of three digits. [a-z]{4} matches any sequence of four lowercase ASCII characters. ba(na)+ matches bana, banana, bananana, and so on.

19 Non-greedy matching By default, the quantifiers * (zero or more) and + (one or more) are greedy: they match the largest possible string. To make them non-greedy, we use *? and +?. E.g.: ^.*rd (greedy quantifier) matches the whole string reward the Nord. ^.*?rd (non-greedy quantifier) applied to that string just matches the reward part.

20 Review Printable characters Universal character class. Pre-defined character classes \d, \s, \w, \D, \S, \W,... Ad-hoc character classes (e.g., in square brackets) Boundary symbols ^ and $ Escaping (e.g. \[ vs. [) Quantification (e.g.,?, *, +, *?, {N},...) Groups (e.g., in parentheses) The union operator

21 Demo: re

22 Free advice Use regular expressions for moderately-complex string matching problems Use other tools such as finite-state transducers or context-free grammars when: More than a few regular expressions are required One-to-many substitution is required Linguistic insights are required Performance is required Readability is required Or, if things are getting complicated, consider just enumerating all the possible strings, if they re finite, and storing them in a set. Learn about re flags, which modify the behavior of a match; e.g.: re.ignorecase: makes matching case-insensitive re.dotall: makes. also match newline characters...

LING115 Lecture Note Session #7: Regular Expressions

LING115 Lecture Note Session #7: Regular Expressions LING115 Lecture Note Session #7: Regular Expressions 1. Introduction We need to refer to a set of strings for various reasons: to ignore case-distinction, to refer to a set of files that share a common

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017 Dr. Sarah Abraham University of Texas at Austin Computer Science Department Regular Expressions Elements of Graphics CS324e Spring 2017 What are Regular Expressions? Describe a set of strings based on

More information

Introduction to Regular Expressions Version 1.3. Tom Sgouros

Introduction to Regular Expressions Version 1.3. Tom Sgouros Introduction to Regular Expressions Version 1.3 Tom Sgouros June 29, 2001 2 Contents 1 Beginning Regular Expresions 5 1.1 The Simple Version........................ 6 1.2 Difficult Characters........................

More information

FSASIM: A Simulator for Finite-State Automata

FSASIM: A Simulator for Finite-State Automata FSASIM: A Simulator for Finite-State Automata P. N. Hilfinger Chapter 1: Overview 1 1 Overview The fsasim program reads in a description of a finite-state recognizer (either deterministic or non-deterministic),

More information

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9 Regular Expressions Computer Science and Engineering College of Engineering The Ohio State University Lecture 9 Language Definition: a set of strings Examples Activity: For each above, find (the cardinality

More information

Introduction to Regular Expressions

Introduction to Regular Expressions Introduction to Regular Expressions Basil L. Contovounesios blc@netsoc.tcd.ie For the Dublin University Internet Society [Netsoc] Tuesday 3 rd November, 2015 Finite Automata Imagine a mysterious black

More information

Index. caps method, 180 Character(s) base, 161 classes

Index. caps method, 180 Character(s) base, 161 classes A Abjads, 160 Abstract syntax tree (AST), 3 with action objects, 141 143 definition, 135 Action method for integers converts, 172 173 S-Expressions, 171 Action objects ASTs, 141 142 defined, 137.made attribute,

More information

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regular Expressions. Regular Expression Syntax in Python. Achtung! 1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are

More information

Regular Expressions in programming. CSE 307 Principles of Programming Languages Stony Brook University

Regular Expressions in programming. CSE 307 Principles of Programming Languages Stony Brook University Regular Expressions in programming CSE 307 Principles of Programming Languages Stony Brook University http://www.cs.stonybrook.edu/~cse307 1 What are Regular Expressions? Formal language representing a

More information

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010 Compiler Design. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 1, 010 Contents In these slides we will see 1.Introduction, Concepts and Notations.Regular Expressions, Regular

More information

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland Regular Expressions Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland November 11 th, 2015 Regular expressions provide a flexible way

More information

Advanced Handle Definition

Advanced Handle Definition Tutorial for Windows and Macintosh Advanced Handle Definition 2017 Gene Codes Corporation Gene Codes Corporation 525 Avis Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere)

More information

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python Regular Expressions Steve Renals s.renals@ed.ac.uk (based on original notes by Ewan Klein) ICL 12 October 2005 Introduction Formal Background to REs Extensions of Basic REs Overview Goals: a basic idea

More information

Lecture 4: Syntax Specification

Lecture 4: Syntax Specification The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic

More information

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) Regular Expressions Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) JavaScript started supporting regular expressions in

More information

Regular Expressions!!

Regular Expressions!! Regular Expressions!! In your mat219_class project 1. Copy code from D2L to download regex-prac9ce.r, and run in the Console. 2. Open a blank R script and name it regex-notes. library(tidyverse) regular

More information

Crayon (.cry) Language Reference Manual. Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361)

Crayon (.cry) Language Reference Manual. Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361) Crayon (.cry) Language Reference Manual Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361) 1 Lexical Elements 1.1 Identifiers Identifiers are strings used

More information

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines. Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Effective Programming Practices for Economists. 17. Regular Expressions

Effective Programming Practices for Economists. 17. Regular Expressions Effective Programming Practices for Economists 17. Regular Expressions Hans-Martin von Gaudecker Department of Economics, Universität Bonn Motivation Replace all occurences of my name in the project template

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Programming for Engineers Introduction to C

Programming for Engineers Introduction to C Programming for Engineers Introduction to C ICEN 200 Spring 2018 Prof. Dola Saha 1 Simple Program 2 Comments // Fig. 2.1: fig02_01.c // A first program in C begin with //, indicating that these two lines

More information

Regular Expressions & Automata

Regular Expressions & Automata Regular Expressions & Automata CMSC 132 Department of Computer Science University of Maryland, College Park Regular expressions Notation Patterns Java support Automata Languages Finite State Machines Turing

More information

Language Reference Manual simplicity

Language Reference Manual simplicity Language Reference Manual simplicity Course: COMS S4115 Professor: Dr. Stephen Edwards TA: Graham Gobieski Date: July 20, 2016 Group members Rui Gu rg2970 Adam Hadar anh2130 Zachary Moffitt znm2104 Suzanna

More information

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance. 2.1 Introduction (No questions.) 2.2 A Simple Program: Printing a Line of Text 2.1 Which of the following must every C program have? (a) main (b) #include (c) /* (d) 2.2 Every statement in C

More information

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual Contents 1 Introduction...2 2 Lexical Conventions...2 3 Types...3 4 Syntax...3 5 Expressions...4 6 Declarations...8 7 Statements...9

More information

Algorithmic Approaches for Biological Data, Lecture #8

Algorithmic Approaches for Biological Data, Lecture #8 Algorithmic Approaches for Biological Data, Lecture #8 Katherine St. John City University of New York American Museum of Natural History 17 February 2016 Outline More on Pattern Finding: Regular Expressions

More information

IT 374 C# and Applications/ IT695 C# Data Structures

IT 374 C# and Applications/ IT695 C# Data Structures IT 374 C# and Applications/ IT695 C# Data Structures Module 2.1: Introduction to C# App Programming Xianrong (Shawn) Zheng Spring 2017 1 Outline Introduction Creating a Simple App String Interpolation

More information

More Details about Regular Expressions

More Details about Regular Expressions More Details about Regular Expressions Basic Regular Expression Notation Summary of basic notations to match single characters and sequences of characters 1. /[abc]/ = /a b c/ Character class; disjunction

More information

CS 374 Fall 2014 Homework 2 Due Tuesday, September 16, 2014 at noon

CS 374 Fall 2014 Homework 2 Due Tuesday, September 16, 2014 at noon CS 374 Fall 2014 Homework 2 Due Tuesday, September 16, 2014 at noon Groups of up to three students may submit common solutions for each problem in this homework and in all future homeworks You are responsible

More information

STATS Data analysis using Python. Lecture 0: Introduction and Administrivia

STATS Data analysis using Python. Lecture 0: Introduction and Administrivia STATS 700-002 Data analysis using Python Lecture 0: Introduction and Administrivia Data science has completely changed our world Course goals Survey popular tools in academia/industry for data analysis

More information

CMSC 132: Object-Oriented Programming II

CMSC 132: Object-Oriented Programming II CMSC 132: Object-Oriented Programming II Regular Expressions & Automata Department of Computer Science University of Maryland, College Park 1 Regular expressions Notation Patterns Java support Automata

More information

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems aim to facilitate the solution of text manipulation problems are symbolic notations used to identify patterns in text; are supported by many command line tools; are supported by most programming languages;

More information

Essentials for Scientific Computing: Stream editing with sed and awk

Essentials for Scientific Computing: Stream editing with sed and awk Essentials for Scientific Computing: Stream editing with sed and awk Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Stream Editing sed and awk are stream processing commands. What this means is that they are

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a

More information

DEMO A Language for Practice Implementation Comp 506, Spring 2018

DEMO A Language for Practice Implementation Comp 506, Spring 2018 DEMO A Language for Practice Implementation Comp 506, Spring 2018 1 Purpose This document describes the Demo programming language. Demo was invented for instructional purposes; it has no real use aside

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

Regular Expressions. Perl PCRE POSIX.NET Python Java

Regular Expressions. Perl PCRE POSIX.NET Python Java ModSecurity rules rely heavily on regular expressions to allow you to specify when a rule should or shouldn't match. This appendix teaches you the basics of regular expressions so that you can better make

More information

Full file at C How to Program, 6/e Multiple Choice Test Bank

Full file at   C How to Program, 6/e Multiple Choice Test Bank 2.1 Introduction 2.2 A Simple Program: Printing a Line of Text 2.1 Lines beginning with let the computer know that the rest of the line is a comment. (a) /* (b) ** (c) REM (d)

More information

Language Reference Manual

Language Reference Manual TAPE: A File Handling Language Language Reference Manual Tianhua Fang (tf2377) Alexander Sato (as4628) Priscilla Wang (pyw2102) Edwin Chan (cc3919) Programming Languages and Translators COMSW 4115 Fall

More information

Regular Expressions 1 / 12

Regular Expressions 1 / 12 Regular Expressions 1 / 12 https://xkcd.com/208/ 2 / 12 Regular Expressions In computer science, a language is a set of strings. Like any set, a language can be specified by enumeration (listing all the

More information

4. Inputting data or messages to a function is called passing data to the function.

4. Inputting data or messages to a function is called passing data to the function. Test Bank for A First Book of ANSI C 4th Edition by Bronson Link full download test bank: http://testbankcollection.com/download/test-bank-for-a-first-book-of-ansi-c-4th-edition -by-bronson/ Link full

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

IC Language Specification

IC Language Specification CS 301 Spring 2016 IC Language Specification The IC Language For the implementation project, you will build a compiler for an object-oriented language called IC (for Irish Coffee 1 ), which is essentially

More information

CS 230 Programming Languages

CS 230 Programming Languages CS 230 Programming Languages 09 / 20 / 2013 Instructor: Michael Eckmann Today s Topics Questions/comments? Continue Regular expressions Matching string basics =~ (matches) m/ / (this is the format of match

More information

Using Lex or Flex. Prof. James L. Frankel Harvard University

Using Lex or Flex. Prof. James L. Frankel Harvard University Using Lex or Flex Prof. James L. Frankel Harvard University Version of 1:07 PM 26-Sep-2016 Copyright 2016, 2015 James L. Frankel. All rights reserved. Lex Regular Expressions (1 of 4) Special characters

More information

DECLARATIONS. Character Set, Keywords, Identifiers, Constants, Variables. Designed by Parul Khurana, LIECA.

DECLARATIONS. Character Set, Keywords, Identifiers, Constants, Variables. Designed by Parul Khurana, LIECA. DECLARATIONS Character Set, Keywords, Identifiers, Constants, Variables Character Set C uses the uppercase letters A to Z. C uses the lowercase letters a to z. C uses digits 0 to 9. C uses certain Special

More information

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou Lecture Outline COMP-421 Compiler Design! Lexical Analyzer Lex! Lex Examples Presented by Dr Ioanna Dionysiou Figures and part of the lecture notes taken from A compact guide to lex&yacc, epaperpress.com

More information

Understanding Regular Expressions, Special Characters, and Patterns

Understanding Regular Expressions, Special Characters, and Patterns APPENDIXA Understanding Regular Expressions, Special Characters, and Patterns This appendix describes the regular expressions, special or wildcard characters, and patterns that can be used with filters

More information

Decaf Language Reference

Decaf Language Reference Decaf Language Reference Mike Lam, James Madison University Fall 2016 1 Introduction Decaf is an imperative language similar to Java or C, but is greatly simplified compared to those languages. It will

More information

Lecture 18 Regular Expressions

Lecture 18 Regular Expressions Lecture 18 Regular Expressions In this lecture Background Text processing languages Pattern searches with grep Formal Languages and regular expressions Finite State Machines Regular Expression Grammer

More information

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang) Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang) 1 Regular Expression 2 http://rp1.monday.vip.tw1.yahoo.net/res/gdsale/st_pic/0469/st-469571-1.jpg 3 Text patterns and matches A regular

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2010 Handout Decaf Language Tuesday, Feb 2 The project for the course is to write a compiler

More information

DVA337 HT17 - LECTURE 4. Languages and regular expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions DVA337 HT17 - LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning

More information

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl. NAME DESCRIPTION perlrequick - Perl regular expressions quick start Perl version 5.16.2 documentation - perlrequick This page covers the very basics of understanding, creating and using regular expressions

More information

CSE : Python Programming

CSE : Python Programming CSE 399-004: Python Programming Lecture 11: Regular expressions April 2, 2007 http://www.seas.upenn.edu/~cse39904/ Announcements About those meeting from last week If I said I was going to look into something

More information

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the

More information

VENTURE. Section 1. Lexical Elements. 1.1 Identifiers. 1.2 Keywords. 1.3 Literals

VENTURE. Section 1. Lexical Elements. 1.1 Identifiers. 1.2 Keywords. 1.3 Literals VENTURE COMS 4115 - Language Reference Manual Zach Adler (zpa2001), Ben Carlin (bc2620), Naina Sahrawat (ns3001), James Sands (js4597) Section 1. Lexical Elements 1.1 Identifiers An identifier in VENTURE

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

Regular Expressions. Upsorn Praphamontripong. CS 1111 Introduction to Programming Spring [Ref: https://docs.python.org/3/library/re.

Regular Expressions. Upsorn Praphamontripong. CS 1111 Introduction to Programming Spring [Ref: https://docs.python.org/3/library/re. Regular Expressions Upsorn Praphamontripong CS 1111 Introduction to Programming Spring 2018 [Ref: https://docs.python.org/3/library/re.html] Overview: Regular Expressions What are regular expressions?

More information

The structure of a compiler

The structure of a compiler The structure of a compiler Source code front-end Intermediate front-end representation compiler back-end machine code Front-end & Back-end C front-end Pascal front-end C front-end Intel x86 back-end Motorola

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

The PCAT Programming Language Reference Manual

The PCAT Programming Language Reference Manual The PCAT Programming Language Reference Manual Andrew Tolmach and Jingke Li Dept. of Computer Science Portland State University September 27, 1995 (revised October 15, 2002) 1 Introduction The PCAT language

More information

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo Outline More regular expressions & pattern matching: groups substitute greed RegExpr Syntax They re strings Most punctuation is special; needs to be escaped

More information

'...' "..." escaping \u hhhh hhhh '''...''' """...""" raw string Example: r"abc\txyz\n" in code;

'...' ... escaping \u hhhh hhhh '''...''' ... raw string Example: rabc\txyz\n in code; Strings Writing strings Strings can be written in single quotes, '...', or double quotes, "..." These strings cannot contain an actual newline Certain special characters can be written in these strings

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Chapter 2, Part I Introduction to C Programming

Chapter 2, Part I Introduction to C Programming Chapter 2, Part I Introduction to C Programming C How to Program, 8/e, GE 2016 Pearson Education, Ltd. All rights reserved. 1 2016 Pearson Education, Ltd. All rights reserved. 2 2016 Pearson Education,

More information

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language? The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

More information

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017 Regex, Sed, Awk Arindam Fadikar December 12, 2017 Why Regex Lots of text data. twitter data (social network data) government records web scrapping many more... Regex Regular Expressions or regex or regexp

More information

Strings are actually 'objects' Strings

Strings are actually 'objects' Strings Strings are actually 'objects' Strings What is an object?! An object is a concept that we can encapsulate data along with the functions that might need to access or manipulate that data. What is an object?!

More information

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of

More information

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides

More information

Lecture 11: Regular Expressions. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Lecture 11: Regular Expressions. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Lecture 11: Regular Expressions LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Outline Language and Computers, Ch.4 Searching 4.4 Searching semi-structured data with regular expressions

More information

=~ determines to which variable the regex is applied. In its absence, $_ is used.

=~ determines to which variable the regex is applied. In its absence, $_ is used. NAME DESCRIPTION OPERATORS perlreref - Perl Regular Expressions Reference This is a quick reference to Perl's regular expressions. For full information see perlre and perlop, as well as the SEE ALSO section

More information

Language Syntax version beta Proposed Syntax

Language Syntax version beta Proposed Syntax Superpowered game development. Language Syntax version.0. beta Proposed Syntax Live/current version at skookumscript.com/docs/v.0/lang/syntax/ November, 0 Better coding through mad science. Copyright 00-0

More information

HAWK Language Reference Manual

HAWK Language Reference Manual HAWK Language Reference Manual HTML is All We Know Created By: Graham Gobieski, George Yu, Ethan Benjamin, Justin Chang, Jon Adelson 0. Contents 1 Introduction 2 Lexical Convetions 2.1 Tokens 2.2 Comments

More information

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP address as a string and do a search. But, what if you didn

More information

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis (ASU Ch 3, Fig 3.1) Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program

More information

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments. Java How to Program, 9/e Education, Inc. All Rights Reserved. } Java application programming } Use tools from the JDK to compile and run programs. } Videos at www.deitel.com/books/jhtp9/ Help you get started

More information

Gabriel Hugh Elkaim Spring CMPE 013/L: C Programming. CMPE 013/L: C Programming

Gabriel Hugh Elkaim Spring CMPE 013/L: C Programming. CMPE 013/L: C Programming 1 Literal Constants Definition A literal or a literal constant is a value, such as a number, character or string, which may be assigned to a variable or a constant. It may also be used directly as a function

More information

The Little Regular Expressionist

The Little Regular Expressionist The Little Regular Expressionist Vilja Hulden August 2016 v0.1b CC-BY-SA 4.0 This little pamphlet, which is inspired by The Little Schemer by Daniel Friedman and Matthias Felleisen, aims to serve as a

More information

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract iflex: A Lexical Analyzer Generator for Icon Ray Pereda Unicon Technical Report UTR-02 February 25, 2000 Abstract iflex is software tool for building language processors. It is based on flex, a well-known

More information

Context-free grammars (CFGs)

Context-free grammars (CFGs) Context-free grammars (CFGs) Roadmap Last time RegExp == DFA Jlex: a tool for generating (Java code for) a lexer/scanner Mainly a collection of regexp, action pairs This time CFGs, the underlying abstraction

More information

LECTURE 8. The Standard Library Part 2: re, copy, and itertools

LECTURE 8. The Standard Library Part 2: re, copy, and itertools LECTURE 8 The Standard Library Part 2: re, copy, and itertools THE STANDARD LIBRARY: RE The Python standard library contains extensive support for regular expressions. Regular expressions, often abbreviated

More information

Programming in ROBOTC ROBOTC Rules

Programming in ROBOTC ROBOTC Rules Programming in ROBOTC ROBOTC Rules In this lesson, you will learn the basic rules for writing ROBOTC programs. ROBOTC is a text-based programming language Commands to the robot are first written as text

More information

"Hello" " This " + "is String " + "concatenation"

Hello  This  + is String  + concatenation Strings About Strings Strings are objects, but there is a special syntax for writing String literals: "Hello" Strings, unlike most other objects, have a defined operation (as opposed to a method): " This

More information

Who This Book Is For What This Book Covers How This Book Is Structured What You Need to Use This Book. Source Code

Who This Book Is For What This Book Covers How This Book Is Structured What You Need to Use This Book. Source Code Contents Introduction Who This Book Is For What This Book Covers How This Book Is Structured What You Need to Use This Book Conventions Source Code Errata p2p.wrox.com xxi xxi xxii xxii xxiii xxiii xxiv

More information

Introduction to C++ IT 1033: Fundamentals of Programming

Introduction to C++ IT 1033: Fundamentals of Programming 2 Introduction to C++ IT 1033: Fundamentals of Programming Budditha Hettige Department of Computer Science C++ C++ is a middle-level programming language Developed by Bjarne Stroustrup Starting in 1979

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns Perl Regular Expressions Unlike most programming languages, Perl has builtin support for matching strings using regular expressions called patterns, which are similar to the regular expressions used in

More information

Babu Madhav Institute of Information Technology, UTU 2015

Babu Madhav Institute of Information Technology, UTU 2015 Five years Integrated M.Sc.(IT)(Semester 5) Question Bank 060010502:Programming in Python Unit-1:Introduction To Python Q-1 Answer the following Questions in short. 1. Which operator is used for slicing?

More information

Object-Oriented Software Engineering CS288

Object-Oriented Software Engineering CS288 Object-Oriented Software Engineering CS288 1 Regular Expressions Contents Material for this lecture is based on the Java tutorial from Sun Microsystems: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved. C How to Program, 6/e 1992-2010 by Pearson Education, Inc. An important part of the solution to any problem is the presentation of the results. In this chapter, we discuss in depth the formatting features

More information

VARIABLES AND CONSTANTS

VARIABLES AND CONSTANTS UNIT 3 Structure VARIABLES AND CONSTANTS Variables and Constants 3.0 Introduction 3.1 Objectives 3.2 Character Set 3.3 Identifiers and Keywords 3.3.1 Rules for Forming Identifiers 3.3.2 Keywords 3.4 Data

More information

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD GROUP - B EXPERIMENT NO : 07 1. Title: Write a program using Lex specifications to implement lexical analysis phase of compiler to total nos of words, chars and line etc of given file. 2. Objectives :

More information