Text & Patterns. stat 579 Heike Hofmann

Size: px
Start display at page:

Download "Text & Patterns. stat 579 Heike Hofmann"

Transcription

1 Text & Patterns stat 579 Heike Hofmann

2 Outline Character Variables Control Codes Patterns & Matching

3 Baby Names Data The social security agency keeps track of all baby names used in applications for social security numbers (effectively all births), see There is a really cool visualization by Martin Wattenberg: see In R there is a package called babynames

4 Your turn Load the babynames data from package babynames Warm-up: what were the top 10 names for boys and girls in 2013? which names have been in use every single year? extend the data set to include ranking by year and gender (use dplyr and the function mutate)

5 Character Variables cast between character and factor as.character(x) as.factor(x) > x <- bnames$name > is.character(x) [1] FALSE > head(sort(table(bn$name), decreasing=t), 20) Jessie Leslie Guadalupe Jean Lee James John William Robert Francis Dana Charles Marion Mary Willie Johnnie Joseph Billie Carmen Joe !! > x <- as.character(bnames$name) > summary(x) Length Class Mode character character

6 Brainstorm Thinking about the data, what are some of the trends that you might want to explore? What additional variables would you need to create? What other data sources might you want to use? Pair up and brainstorm for 2 minutes.

7 Some Ideas length first/last letter rank percent vowels/consonants influential people/events (brad, angelina, barack, george,... )

8 Some useful commands... back to the reference card nchar" substring" paste" tolower, toupper" print, cat

9 Your turn find length of each baby name get first and last letter for each baby name (make sure to convert all names to lower cases before) think about how to determine number of vowels in a name

10 Advanced Find graphics to answer the following questions: Does the first/last letter change over time? - does it depend on gender? Which names are used both for girls and boys?

11 Patterns & Matches gsub (pattern, replacement, x)" grep, regexpr, gregexpr, strsplit

12 Patterns & Matches > x <- tolower(sample(unique(bn$name), size=3)) > x [1] "evette" "tabatha" "bluford" grep > grep('a', x) [1] 2 regexpr > regexpr('a', x) [1] attr(,"match.length") [1] strsplit > strsplit(x, 'a') [[1]] [1] "evette"! [[2]] [1] "t" "b" "th"! [[3]] [1] "bluford" gsub > gsub('a','', x) [1] "evette" "tbth" "bluford" gregexpr > gregexpr('a', x) [[1]] [1] -1 attr(,"match.length") [1] -1! [[2]] [1] attr(,"match.length") [1] 1 1 1! [[3]] [1] -1 attr(,"match.length") [1] -1

13 Your turn Find number of a, e, i, o and u s in each name Find percentage of vowels in name Can you spot a difference in vowels between boys and girls?

14 Regular Expressions a e a or e [aei] a or e or i [^aei] neither a nor e nor i ^[aei] a, e, or i at the beginning [aei]$ a, e, or i at the end. any character (pattern) defines substring for re-use - call by \1 \2 \3...

15 Repetition Quantifiers? preceding pattern is optional (matched 0 or 1 time) * preceding pattern is matched zero or more times + preceding pattern is matched at least once {n} preceding pattern is matched exactly n times {n, } preceding pattern is matched at least n times {n, m} preceding pattern is matched at least n times and up to m times

16 Your turn Find one pattern that helps you to Find all names that start with Joh find all names of length 2 (without using nchar()) Find all names that have a pattern of consonant-vowelconsonant-vowel-consonant-vowel-consonant... Find all names that are palindromes (e.g. Anna, Hannah, Ava,...) - is it possible to find one pattern that recognizes all palindromes?

17 Advanced Patterns [:alpha:] Any alphabetic character [:lower:] Any lowercase character [:upper:] Any uppercase character [:digit:] Any digit [:alnum:] Any alphanumeric character (alphabetic or digit) [:space:] Any white space character (space, tab, vertical tab) see?regex [:graph:] Any printable character, except space [:print:] Any printable character, including the space [:punct:] Any punctuation (i.e., a printable character that is not white space or alphanumeric) [:cntrl:] Any nonprintable character

18 Special Characters "\n" newline "\r" carriage return "\t" tabulator "\b" "\\" backslash "\a" alert see?quotes Control Codes, Escape Sequences

19 More of a challenge The IMDb database publishes data in a slightly odd format - file movies.list is an example of (the first 100) movie names published. Read the data into R using readlines first (every line is one element of the resulting vector) Use the text commands we discussed today to extract the following pieces of information: Type, Year, Title

Computing Unit 3: Data Types

Computing Unit 3: Data Types Computing Unit 3: Data Types Kurt Hornik September 26, 2018 Character vectors String constants: enclosed in "... " (double quotes), alternatively single quotes. Slide 2 Character vectors String constants:

More information

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017 Regex, Sed, Awk Arindam Fadikar December 12, 2017 Why Regex Lots of text data. twitter data (social network data) government records web scrapping many more... Regex Regular Expressions or regex or regexp

More information

TCL - STRINGS. Boolean value can be represented as 1, yes or true for true and 0, no, or false for false.

TCL - STRINGS. Boolean value can be represented as 1, yes or true for true and 0, no, or false for false. http://www.tutorialspoint.com/tcl-tk/tcl_strings.htm TCL - STRINGS Copyright tutorialspoint.com The primitive data-type of Tcl is string and often we can find quotes on Tcl as string only language. These

More information

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems aim to facilitate the solution of text manipulation problems are symbolic notations used to identify patterns in text; are supported by many command line tools; are supported by most programming languages;

More information

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland Regular Expressions Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland November 11 th, 2015 Regular expressions provide a flexible way

More information

Lecture 5, Regular Expressions September 2014

Lecture 5, Regular Expressions September 2014 Lecture 5, Regular Expressions 36-350 10 September 2014 In Our Last Thrilling Episode Characters and strings Matching strings, splitting on strings, counting strings We need a ways to compute with patterns

More information

Converting a Lowercase Letter Character to Uppercase (Or Vice Versa)

Converting a Lowercase Letter Character to Uppercase (Or Vice Versa) Looping Forward Through the Characters of a C String A lot of C string algorithms require looping forward through all of the characters of the string. We can use a for loop to do that. The first character

More information

Transform Data! The Basics Part I continued!

Transform Data! The Basics Part I continued! Transform Data! The Basics Part I continued! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will

More information

Transform Data! The Basics Part I!

Transform Data! The Basics Part I! Transform Data! The Basics Part I! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used

More information

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub Lebanese University Faculty of Science Computer Science BS Degree Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub 2 Regular expressions

More information

Data Cleaning. Andrew Jaffe. January 6, 2016

Data Cleaning. Andrew Jaffe. January 6, 2016 Data Cleaning Andrew Jaffe January 6, 2016 Data We will be using multiple data sets in this lecture: Salary, Monument, Circulator, and Restaurant from OpenBaltimore: https: //data.baltimorecity.gov/browse?limitto=datasets

More information

CS160A EXERCISES-FILTERS2 Boyd

CS160A EXERCISES-FILTERS2 Boyd Exercises-Filters2 In this exercise we will practice with the Unix filters cut, and tr. We will also practice using paste, even though, strictly speaking, it is not a filter. In addition, we will expand

More information

STREAM EDITOR - REGULAR EXPRESSIONS

STREAM EDITOR - REGULAR EXPRESSIONS STREAM EDITOR - REGULAR EXPRESSIONS http://www.tutorialspoint.com/sed/sed_regular_expressions.htm Copyright tutorialspoint.com It is the regular expressions that make SED powerful and efficient. A number

More information

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved. C How to Program, 6/e 1992-2010 by Pearson Education, Inc. An important part of the solution to any problem is the presentation of the results. In this chapter, we discuss in depth the formatting features

More information

LESSON 4. The DATA TYPE char

LESSON 4. The DATA TYPE char LESSON 4 This lesson introduces some of the basic ideas involved in character processing. The lesson discusses how characters are stored and manipulated by the C language, how characters can be treated

More information

Introduction to String Manipulation

Introduction to String Manipulation Introduction to Computer Programming Introduction to String Manipulation CSCI-UA.0002 What is a String? A String is a data type in the Python programming language A String can be described as a "sequence

More information

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED UNIX / LINUX - REGULAR EXPRESSIONS WITH SED http://www.tutorialspoint.com/unix/unix-regular-expressions.htm Copyright tutorialspoint.com Advertisements In this chapter, we will discuss in detail about

More information

Unstructured Data. Unstructured Data Text Representations String Operations Length, Case, Concatenation, Substring, Split String Text Mining

Unstructured Data. Unstructured Data Text Representations String Operations Length, Case, Concatenation, Substring, Split String Text Mining Lecture 23: Apr 03, 2019 Unstructured Data Unstructured Data Text Representations String Operations Length, Case, Concatenation, Substring, Split String Text Mining James Balamuta STAT 385 @ UIUC Announcements

More information

Transformations. Hadley Wickham. October 2009

Transformations. Hadley Wickham. October 2009 Transformations Hadley Wickham October 2009 1. US baby names data 2. Transformations 3. Summaries 4. Doing it by group Baby names Top 1000 male and female baby names in the US, from 1880 to 2008. 258,000

More information

Topic 3. Big C++ by Cay Horstmann Copyright 2018 by John Wiley & Sons. All rights reserved

Topic 3. Big C++ by Cay Horstmann Copyright 2018 by John Wiley & Sons. All rights reserved Topic 3 1. Defining and using pointers 2. Arrays and pointers 3. C and C++ strings 4. Dynamic memory allocation 5. Arrays and vectors of pointers 6. Problem solving: draw a picture 7. Structures 8. Pointers

More information

- c list The list specifies character positions.

- c list The list specifies character positions. CUT(1) BSD General Commands Manual CUT(1)... 1 PASTE(1) BSD General Commands Manual PASTE(1)... 3 UNIQ(1) BSD General Commands Manual UNIQ(1)... 5 HEAD(1) BSD General Commands Manual HEAD(1)... 7 TAIL(1)

More information

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010 Fall 2010 Lecture 5 Hussam Abu-Libdeh based on slides by David Slater September 17, 2010 Reasons to use Unix Reason #42 to use Unix: Wizardry Mastery of Unix makes you a wizard need proof? here is the

More information

CS Unix Tools & Scripting

CS Unix Tools & Scripting Cornell University, Spring 2014 1 February 7, 2014 1 Slides evolved from previous versions by Hussam Abu-Libdeh and David Slater Regular Expression A new level of mastery over your data. Pattern matching

More information

Configuring the RADIUS Listener LEG

Configuring the RADIUS Listener LEG CHAPTER 16 Revised: July 28, 2009, Introduction This module describes the configuration procedure for the RADIUS Listener LEG. The RADIUS Listener LEG is configured using the SM configuration file p3sm.cfg,

More information

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1 CS214-AdvancedUNIX Lecture 2 Basic commands and regular expressions Ymir Vigfusson CS214 p.1 Shellexpansions Let us first consider regular expressions that arise when using the shell (shell expansions).

More information

Here's an example of how the method works on the string "My text" with a start value of 3 and a length value of 2:

Here's an example of how the method works on the string My text with a start value of 3 and a length value of 2: CS 1251 Page 1 Friday Friday, October 31, 2014 10:36 AM Finding patterns in text A smaller string inside of a larger one is called a substring. You have already learned how to make substrings in the spreadsheet

More information

Fundamentals of Programming. By Budditha Hettige

Fundamentals of Programming. By Budditha Hettige Fundamentals of Programming By Budditha Hettige Overview Exercises (Previous Lesson) The JAVA Programming Languages Java Virtual Machine Characteristics What is a class? JAVA Standards JAVA Keywords How

More information

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 More Scripting and Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Regular Expression Summary Regular Expression Examples Shell Scripting 2 Do not confuse filename globbing

More information

looks at class of input and calls appropriate function

looks at class of input and calls appropriate function Lecture 18: Regular expressions in R STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University We have seen the print function: In [48]: x

More information

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI CSCI 2010 Principles of Computer Science Data and Expressions 08/09/2013 CSCI 2010 1 Data Types, Variables and Expressions in Java We look at the primitive data types, strings and expressions that are

More information

Introduction to Regular Expressions Version 1.3. Tom Sgouros

Introduction to Regular Expressions Version 1.3. Tom Sgouros Introduction to Regular Expressions Version 1.3 Tom Sgouros June 29, 2001 2 Contents 1 Beginning Regular Expresions 5 1.1 The Simple Version........................ 6 1.2 Difficult Characters........................

More information

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions

More information

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9 Regular Expressions Computer Science and Engineering College of Engineering The Ohio State University Lecture 9 Language Definition: a set of strings Examples Activity: For each above, find (the cardinality

More information

6 Character Classification and Utilities Module (chars.hhf)

6 Character Classification and Utilities Module (chars.hhf) HLA Standard Library Reference 6 Character Classification and Utilities Module (chars.hhf) The HLA CHARS module contains several procedures that classify and convert various character subtypes. Conversion

More information

ISO/IEC JTC1/SC22/WG20 N

ISO/IEC JTC1/SC22/WG20 N Reference number of working document: ISO/IEC JTC1/SC22/WG20 N Date: 2001-12-25 Reference number of document: ISO/IEC DTR2 14652 Committee identification: ISO/IEC JTC1/SC22 Secretariat: ANSI Information

More information

正则表达式 Frank from https://regex101.com/

正则表达式 Frank from https://regex101.com/ 符号 英文说明 中文说明 \n Matches a newline character 新行 \r Matches a carriage return character 回车 \t Matches a tab character Tab 键 \0 Matches a null character Matches either an a, b or c character [abc] [^abc]

More information

Set and Set Operations

Set and Set Operations Set and Set Operations Introduction A set is a collection of objects. The objects in a set are called elements of the set. A well defined set is a set in which we know for sure if an element belongs to

More information

Lecture (03) Binary Codes Registers and Logic Gates

Lecture (03) Binary Codes Registers and Logic Gates Lecture (03) Binary Codes Registers and Logic Gates By: Dr. Ahmed ElShafee Binary Codes Digital systems use signals that have two distinct values and circuit elements that have two stable states. binary

More information

Lecture 15: Regular expressions in R

Lecture 15: Regular expressions in R Lecture 15: Regular expressions in R STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3) We have seen the

More information

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points)

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points) Standard 11 Lesson 9 Introduction to C++( Up to Operators) 2MARKS 1. Why C++ is called hybrid language? C++ supports both procedural and Object Oriented Programming paradigms. Thus, C++ is called as a

More information

Registration and Login

Registration and Login Registration and Login When a parent accesses txconnect, the following Login page is displayed. The parent needs to register as a new user. How to Register as a New User The registration process is self-administered,

More information

Information technology. Specification method for cultural conventions ISO/IEC JTC1/SC22/WG20 N690. Reference number of working document:

Information technology. Specification method for cultural conventions ISO/IEC JTC1/SC22/WG20 N690. Reference number of working document: Reference number of working document: ISO/IEC JTC1/SC22/WG20 N690 Date: 1999-06-28 Reference number of document: ISO/IEC PDTR 14652 Committee identification: ISO/IEC JTC1/SC22 Secretariat: ANSI Information

More information

Muntaser Abulafi Yacoub Sabatin Omar Qaraeen. C Data Types

Muntaser Abulafi Yacoub Sabatin Omar Qaraeen. C Data Types Programming Fundamentals for Engineers 0702113 5. Basic Data Types Muntaser Abulafi Yacoub Sabatin Omar Qaraeen 1 2 C Data Types Variable definition C has a concept of 'data types' which are used to define

More information

2 nd Week Lecture Notes

2 nd Week Lecture Notes 2 nd Week Lecture Notes Scope of variables All the variables that we intend to use in a program must have been declared with its type specifier in an earlier point in the code, like we did in the previous

More information

Computer Science 236 Fall Nov. 11, 2010

Computer Science 236 Fall Nov. 11, 2010 Computer Science 26 Fall Nov 11, 2010 St George Campus University of Toronto Assignment Due Date: 2nd December, 2010 1 (10 marks) Assume that you are given a file of arbitrary length that contains student

More information

ITST Searching, Extracting & Archiving Data

ITST Searching, Extracting & Archiving Data ITST 1136 - Searching, Extracting & Archiving Data Name: Step 1 Sign into a Pi UN = pi PW = raspberry Step 2 - Grep - One of the most useful and versatile commands in a Linux terminal environment is the

More information

Data and Expressions. Outline. Data and Expressions 12/18/2010. Let's explore some other fundamental programming concepts. Chapter 2 focuses on:

Data and Expressions. Outline. Data and Expressions 12/18/2010. Let's explore some other fundamental programming concepts. Chapter 2 focuses on: Data and Expressions Data and Expressions Let's explore some other fundamental programming concepts Chapter 2 focuses on: Character Strings Primitive Data The Declaration And Use Of Variables Expressions

More information

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) Regular Expressions Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) JavaScript started supporting regular expressions in

More information

User Commands sed ( 1 )

User Commands sed ( 1 ) NAME sed stream editor SYNOPSIS /usr/bin/sed [-n] script [file...] /usr/bin/sed [-n] [-e script]... [-f script_file]... [file...] /usr/xpg4/bin/sed [-n] script [file...] /usr/xpg4/bin/sed [-n] [-e script]...

More information

Skill Area 306: Develop and Implement Computer Program

Skill Area 306: Develop and Implement Computer Program Add your company slogan Skill Area 306: Develop and Implement Computer Program Computer Programming (YPG) LOGO Skill Area 306.2: Produce Structured Program 306.2.1 Write Algorithm 306.2.2 Apply appropriate

More information

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26, Part III Shell Config Compact Course @ Max-Planck, February 16-26, 2015 33 Special Directories. current directory.. parent directory ~ own home directory ~user home directory of user ~- previous directory

More information

Configuring the RADIUS Listener Login Event Generator

Configuring the RADIUS Listener Login Event Generator CHAPTER 19 Configuring the RADIUS Listener Login Event Generator Published: December 21, 2012 Introduction This chapter describes the configuration procedure for the RADIUS listener Login Event Generator

More information

CS 106A, Lecture 9 Problem-Solving with Strings

CS 106A, Lecture 9 Problem-Solving with Strings CS 106A, Lecture 9 Problem-Solving with Strings suggested reading: Java Ch. 8.5 This document is copyright (C) Stanford Computer Science and Marty Stepp, licensed under Creative Commons Attribution 2.5

More information

Reference number of working document: Reference number of document: ISO/IEC FCD

Reference number of working document: Reference number of document: ISO/IEC FCD Reference number of working document: ISO/IEC JTC1/SC22/WG20 N634 Date: 1998-12-21 Reference number of document: ISO/IEC FCD2 14652 Committee identification: ISO/IEC JTC1/SC22 Secretariat: ANSI Information

More information

Strings. Upsorn Praphamontripong. Note: for reference when we practice loop. We ll discuss Strings in detail after Spring break

Strings. Upsorn Praphamontripong. Note: for reference when we practice loop. We ll discuss Strings in detail after Spring break Note: for reference when we practice loop. We ll discuss Strings in detail after Spring break Strings Upsorn Praphamontripong CS 1111 Introduction to Programming Spring 2018 Strings Sequence of characters

More information

Discussion 1H Notes (Week 4, April 22) TA: Brian Choi Section Webpage:

Discussion 1H Notes (Week 4, April 22) TA: Brian Choi Section Webpage: Discussion 1H Notes (Week 4, April 22) TA: Brian Choi (schoi@cs.ucla.edu) Section Webpage: http://www.cs.ucla.edu/~schoi/cs31 Passing Arguments By Value and By Reference So far, we have been passing in

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

Stat405. Tables. Hadley Wickham. Tuesday, October 23, 12

Stat405. Tables. Hadley Wickham. Tuesday, October 23, 12 Stat405 Tables Hadley Wickham Today we will use the reshape2 and xtable packages, and the movies.csv.bz2 dataset. install.packages(c("reshape2", "xtable")) 2.0 1.5 height 1.0 subject John Smith Mary Smith

More information

Fundamentals of Programming. Lecture 11: C Characters and Strings

Fundamentals of Programming. Lecture 11: C Characters and Strings 1 Fundamentals of Programming Lecture 11: C Characters and Strings Instructor: Fatemeh Zamani f_zamani@ce.sharif.edu Sharif University of Technology Computer Engineering Department The lectures of this

More information

Regular Expressions Explained

Regular Expressions Explained Found at: http://publish.ez.no/article/articleprint/11/ Regular Expressions Explained Author: Jan Borsodi Publishing date: 30.10.2000 18:02 This article will give you an introduction to the world of regular

More information

Number Systems, Scalar Types, and Input and Output

Number Systems, Scalar Types, and Input and Output Number Systems, Scalar Types, and Input and Output Outline: Binary, Octal, Hexadecimal, and Decimal Numbers Character Set Comments Declaration Data Types and Constants Integral Data Types Floating-Point

More information

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Digital Humanities. Tutorial Regular Expressions. March 10, 2014 Digital Humanities Tutorial Regular Expressions March 10, 2014 1 Introduction In this tutorial we will look at a powerful technique, called regular expressions, to search for specific patterns in corpora.

More information

Section Sets and Set Operations

Section Sets and Set Operations Section 6.1 - Sets and Set Operations Definition: A set is a well-defined collection of objects usually denoted by uppercase letters. Definition: The elements, or members, of a set are denoted by lowercase

More information

Fundamentals of Programming Session 4

Fundamentals of Programming Session 4 Fundamentals of Programming Session 4 Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu 1 Fall 2011 These slides are created using Deitel s slides, ( 1992-2010 by Pearson Education, Inc).

More information

Microsoft Visual Basic 2005: Reloaded

Microsoft Visual Basic 2005: Reloaded Microsoft Visual Basic 2005: Reloaded Second Edition Chapter 4 Making Decisions in a Program Objectives After studying this chapter, you should be able to: Include the selection structure in pseudocode

More information

Computing with Strings. Learning Outcomes. Python s String Type 9/23/2012

Computing with Strings. Learning Outcomes. Python s String Type 9/23/2012 Computing with Strings CMSC 201 Fall 2012 Instructor: John Park Lecture Section 01 Discussion Sections 02-08, 16, 17 1 Learning Outcomes To understand the string data type and how strings are represented

More information

String Manipulation. Module 6

String Manipulation.  Module 6 String Manipulation http://datascience.tntlab.org Module 6 Today s Agenda Best practices for strings in R Code formatting Escaping Formatting Base R string construction Importing strings with stringi Pattern

More information

Regular expressions. LING78100: Methods in Computational Linguistics I

Regular expressions. LING78100: Methods in Computational Linguistics I Regular expressions LING78100: Methods in Computational Linguistics I String methods Python strings have methods that allow us to determine whether a string: Contains another string; e.g., assert "and"

More information

Strings and Library Functions

Strings and Library Functions Unit 4 String String is an array of character. Strings and Library Functions A string variable is a variable declared as array of character. The general format of declaring string is: char string_name

More information

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...]

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...] NAME SYNOPSIS DESCRIPTION OPTIONS psed - a stream editor psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...] s2p [-an] [-e script] [-f script-file] A stream editor reads the input

More information

Chapter 1: Thinking Critically Lecture notes Math 1030 Section C

Chapter 1: Thinking Critically Lecture notes Math 1030 Section C Section C.1: Sets and Venn Diagrams Definition of a set A set is a collection of objects and its objects are called members. Ex.1 Days of the week. Ex.2 Students in this class. Ex.3 Letters of the alphabet.

More information

CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky

CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky CS 124/LINGUIST 180 From Languages to Information Unix for Poets Dan Jurafsky (original by Ken Church, modifications by me and Chris Manning) Stanford University Unix for Poets Text is everywhere The Web

More information

Why teach it? Text Manipula2on. Spam filtering. Why we teach it 8/22/10. Header from three messages.

Why teach it? Text Manipula2on. Spam filtering. Why we teach it 8/22/10. Header from three  messages. Why teach it? Text Manipula2on These slides are a combina2on of Deb s, Duncan s and Mark s materials Expand horizon about what are data Text data are plen2ful These sources can be more compelling for students;

More information

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regular Expressions. Regular Expression Syntax in Python. Achtung! 1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are

More information

String Processing CS 1111 Introduction to Programming Fall 2018

String Processing CS 1111 Introduction to Programming Fall 2018 String Processing CS 1111 Introduction to Programming Fall 2018 [The Coder s Apprentice, 10] 1 Collections Ordered, Dup allow List Range String Tuple Unordered, No Dup Dict collection[index] Access an

More information

Note: If a New Account Representative provided you a NetTeller ID at new account opening, skip this section of the enrollment process.

Note: If a New Account Representative provided you a NetTeller ID at new account opening, skip this section of the enrollment process. Thank you for choosing to bank online with First Security Bank! To begin the process of enrolling for online banking, visit our website at www.fsbmsla.com and click on the Enroll Now! link. Note: If a

More information

CS 106A, Lecture 9 Problem-Solving with Strings

CS 106A, Lecture 9 Problem-Solving with Strings CS 106A, Lecture 9 Problem-Solving with Strings suggested reading: Java Ch. 8.5 This document is copyright (C) Stanford Computer Science and Marty Stepp, licensed under Creative Commons Attribution 2.5

More information

Chapter 2 THE STRUCTURE OF C LANGUAGE

Chapter 2 THE STRUCTURE OF C LANGUAGE Lecture # 5 Chapter 2 THE STRUCTURE OF C LANGUAGE 1 Compiled by SIA CHEE KIONG DEPARTMENT OF MATERIAL AND DESIGN ENGINEERING FACULTY OF MECHANICAL AND MANUFACTURING ENGINEERING Contents Introduction to

More information

Regular Expressions!!

Regular Expressions!! Regular Expressions!! In your mat219_class project 1. Copy code from D2L to download regex-prac9ce.r, and run in the Console. 2. Open a blank R script and name it regex-notes. library(tidyverse) regular

More information

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009 Bashed One Too Many Times Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009 What is a Shell? The shell interprets commands and executes them It provides you with an environment

More information

CS2 Practical 1 CS2A 22/09/2004

CS2 Practical 1 CS2A 22/09/2004 CS2 Practical 1 Basic Java Programming The purpose of this practical is to re-enforce your Java programming abilities. The practical is based on material covered in CS1. It consists of ten simple programming

More information

UNIX II:grep, awk, sed. October 30, 2017

UNIX II:grep, awk, sed. October 30, 2017 UNIX II:grep, awk, sed October 30, 2017 File searching and manipulation In many cases, you might have a file in which you need to find specific entries (want to find each case of NaN in your datafile for

More information

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP address as a string and do a search. But, what if you didn

More information

Scientific Programming in C V. Strings

Scientific Programming in C V. Strings Scientific Programming in C V. Strings Susi Lehtola 1 November 2012 C strings As mentioned before, strings are handled as character arrays in C. String constants are handled as constant arrays. const char

More information

CMPT 125: Lecture 3 Data and Expressions

CMPT 125: Lecture 3 Data and Expressions CMPT 125: Lecture 3 Data and Expressions Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 1 Character Strings A character string is an object in Java,

More information

C OVERVIEW BASIC C PROGRAM STRUCTURE. C Overview. Basic C Program Structure

C OVERVIEW BASIC C PROGRAM STRUCTURE. C Overview. Basic C Program Structure C Overview Basic C Program Structure C OVERVIEW BASIC C PROGRAM STRUCTURE Goals The function main( )is found in every C program and is where every C program begins speed execution portability C uses braces

More information

Regular Expressions. using REs to find patterns. implementing REs using finite state automata. Sunday, 4 December 11

Regular Expressions. using REs to find patterns. implementing REs using finite state automata. Sunday, 4 December 11 Regular Expressions using REs to find patterns implementing REs using finite state automata REs and FSAs Regular expressions can be viewed as a textual way of specifying the structure of finite-state automata

More information

TECHNICAL ISO/IEC REPORT TR 14652

TECHNICAL ISO/IEC REPORT TR 14652 TECHNICAL ISO/IEC REPORT TR 14652 Final text 2002-08-12 Information technology Specification method for cultural conventions Technologies de l information Méthode de modélisation des conventions culturelles

More information

1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical d.

1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical d. Gaddis: Starting Out with Python, 2e - Test Bank Chapter Two MULTIPLE CHOICE 1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical

More information

The Java Language Rules And Tools 3

The Java Language Rules And Tools 3 The Java Language Rules And Tools 3 Course Map This module presents the language and syntax rules of the Java programming language. You will learn more about the structure of the Java program, how to insert

More information

Programming for Engineers Introduction to C

Programming for Engineers Introduction to C Programming for Engineers Introduction to C ICEN 200 Spring 2018 Prof. Dola Saha 1 Simple Program 2 Comments // Fig. 2.1: fig02_01.c // A first program in C begin with //, indicating that these two lines

More information

Computers Programming Course 5. Iulian Năstac

Computers Programming Course 5. Iulian Năstac Computers Programming Course 5 Iulian Năstac Recap from previous course Classification of the programming languages High level (Ada, Pascal, Fortran, etc.) programming languages with strong abstraction

More information

Understanding Regular Expressions, Special Characters, and Patterns

Understanding Regular Expressions, Special Characters, and Patterns APPENDIXA Understanding Regular Expressions, Special Characters, and Patterns This appendix describes the regular expressions, special or wildcard characters, and patterns that can be used with filters

More information

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014 1 If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING Lecture 8 CS2110 Spring 2014 Pointers. DO visit the java spec website 2 Parse trees: Text page 592

More information

Strings. Strings and their methods. Dr. Siobhán Drohan. Produced by: Department of Computing and Mathematics

Strings. Strings and their methods. Dr. Siobhán Drohan. Produced by: Department of Computing and Mathematics Strings Strings and their methods Produced by: Dr. Siobhán Drohan Department of Computing and Mathematics http://www.wit.ie/ Topics list Primitive Types: char Object Types: String Primitive vs Object Types

More information

DECLARATIONS. Character Set, Keywords, Identifiers, Constants, Variables. Designed by Parul Khurana, LIECA.

DECLARATIONS. Character Set, Keywords, Identifiers, Constants, Variables. Designed by Parul Khurana, LIECA. DECLARATIONS Character Set, Keywords, Identifiers, Constants, Variables Character Set C uses the uppercase letters A to Z. C uses the lowercase letters a to z. C uses digits 0 to 9. C uses certain Special

More information

Chapter 8: Character & String. In this chapter, you ll learn about;

Chapter 8: Character & String. In this chapter, you ll learn about; Chapter 8: Character & String Principles of Programming In this chapter, you ll learn about; Fundamentals of Strings and Characters The difference between an integer digit and a character digit Character

More information

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid:

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid: 1 STRINGS Objectives: How text data is internally represented as a string Accessing individual characters by a positive or negative index String slices Operations on strings: concatenation, comparison,

More information

PowerGREP. Manual. Version October 2005

PowerGREP. Manual. Version October 2005 PowerGREP Manual Version 3.2 3 October 2005 Copyright 2002 2005 Jan Goyvaerts. All rights reserved. PowerGREP and JGsoft Just Great Software are trademarks of Jan Goyvaerts i Table of Contents How to

More information

More text file manipulation: sorting, cutting, pasting, joining, subsetting,

More text file manipulation: sorting, cutting, pasting, joining, subsetting, More text file manipulation: sorting, cutting, pasting, joining, subsetting, Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Inverse cat Last week we learned

More information