Indian Institute of Technology Kharagpur. PERL Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

Similar documents
Indian Institute of Technology Kharagpur. PERL Part II. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

Practical Report and Extraction Language (PERL)

(Refer Slide Time: 01:12)

Understanding Regular Expressions, Special Characters, and Patterns

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

Perl. Interview Questions and Answers

Perl Programming. Bioinformatics Perl Programming

Number Systems Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Number Representation

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Pathologically Eclectic Rubbish Lister

They grow as needed, and may be made to shrink. Officially, a Perl array is a variable whose value is a list.

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

1. Introduction. 2. Scalar Data

Modularity and Reusability I. Functions and code reuse

Regular expressions and case insensitivity

Outline. CS3157: Advanced Programming. Feedback from last class. Last plug

COMS 3101 Programming Languages: Perl. Lecture 2

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

I/O and Text Processing. Data into and out of programs

IT441. Network Services Administration. Perl: File Handles

What is PERL?

CSCI 4152/6509 Natural Language Processing. Perl Tutorial CSCI 4152/6509. CSCI 4152/6509, Perl Tutorial 1

Introductory Perl. What is Perl?

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

LING/C SC/PSYC 438/538. Lecture 10 Sandiway Fong

Programming Fundamentals and Python

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Advanced Handle Definition

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Beginning Perl. Mark Senn. September 11, 2007

Class 1 Supplement: Pattern matching, and dealing with files

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

COMS 3101 Programming Languages: Perl. Lecture 3

JavaScript Functions, Objects and Array

9.1 Origins and Uses of Perl

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

CSE 154 LECTURE 11: REGULAR EXPRESSIONS

IT441. Regular Expressions. Handling Text: DRAFT. Network Services Administration

# Extract the initial substring of $text that is delimited by # two (unescaped) instances of the first character in $delim.

Regular Expressions Explained

2.8. Decision Making: Equality and Relational Operators

Learning Ruby. Regular Expressions. Get at practice page by logging on to csilm.usu.edu and selecting. PROGRAMMING LANGUAGES Regular Expressions

Basics Wildcard and multipliers Special characters Negation Other functions Programming. Regular Expressions. Web Programming

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

A control expression must evaluate to a value that can be interpreted as true or false.

Regular expressions and case insensitivity

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...]

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Java+- Language Reference Manual

Introduction to Python

WEBD 236 Web Information Systems Programming

Using Lex or Flex. Prof. James L. Frankel Harvard University

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

Programming Perls* Objective: To introduce students to the perl language.

Introductory Perl. Boston University Information Services & Technology. Course Coordinator: Timothy Kohl. What is Perl?

Indian Institute of Technology Kharagpur. Javascript Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

Language Reference Manual

Introduction to C Programming

CS 230 Programming Languages

Introduction to Regular Expressions Version 1.3. Tom Sgouros

PESIT Bangalore South Campus

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Perl Basics. Structure, Style, and Documentation

Structure of Programming Languages Lecture 3

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Switching Circuits and Logic Design Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Appendix B WORKSHOP. SYS-ED/ Computer Education Techniques, Inc.

Arrays (Lists) # or, = ("first string", "2nd string", 123);

Welcome to Research Computing Services training week! November 14-17, 2011

Internet Routing Protocols Part II

LING115 Lecture Note Session #7: Regular Expressions

Introduction to Perl. Perl Background. Sept 24, 2007 Class Meeting 6

More Data Types. The Char Data Type. Variable Declaration. CS200: Computer Science I. Module 14 More Data Types

More Perl. CS174 Chris Pollett Oct 25, 2006.

Fundamental of Programming (C)

Lecture 15 (05/08, 05/10): Text Mining. Decision, Operations & Information Technologies Robert H. Smith School of Business Spring, 2017

Getting Started Values, Expressions, and Statements CS GMU

Programming for Engineers Introduction to C

Ruby: Introduction, Basics

Ruby: Introduction, Basics

floatingdataextract Setup Guide

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

CMSC201 Computer Science I for Majors

CMSC 330: Organization of Programming Languages. Ruby Regular Expressions

Indian Institute of Technology Kharagpur. HTML Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

COMP 110 Project 1 Programming Project Warm-Up Exercise

CSCI-GA Scripting Languages

Control Structures. CIS 118 Intro to LINUX

QUIZ: What value is stored in a after this

PERL Scripting - Course Contents

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Week 4. Week 4 Goals & Reading. Strict pragma P24H: Hour 8: Making a stricter Perl PP: Ch 6 (using the strict pragma)

COMS 469: Interactive Media II

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

Transcription:

Indian Institute of Technology Kharagpur PERL Part III Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Kharagpur, INDIA Lecture 23: PERL Part III On completion, the student will be able to: Define the string matching functions in Perl. Explain the different ways of specifying regular expressions. Define the string substitution operators, with examples. Illustrate the use of special variables $, $& and $`. 1

String Functions The Split Function split is used to split a string into multiple pieces using a delimiter, and create a list out of it. $_= Red:Blue:Green:White:255'; @details = split /:/, $_; foreach (@details) { print $_\n ; The first parameter to split is a regular expression that specifies what to split on. The second specifies what to split. 2

Another example: $_= Indranil isg@iitkgp.ernet.in 283493 ; ($name, $email, $phone) = split / /, $_; By default, split breaks a string using space as delimiter. The Join Function join is used to concatenate several elements into a single string, with a specified delimiter in between. $new = join ' ', $x1, $x2, $x3, $x4, $x5, $x6; $sep = :: ; $new = join $sep, $x1, $x2, $w3, @abc, $x4, $x5; 3

Regular Expressions Introduction One of the most useful features of Perl. What is a regular expression (RegEx)? Refers to a pattern that follows the rules of syntax. Basically specifies a chunk of text. Very powerful way to specify string patterns. 4

An Example: without RegEx $found = 0; $_ = Hello good morning everybody ; $search = every ; foreach $word (split) { if ($word eq $search) { $found = 1; last; if ($found) { print Found the word every \n ; Using RegEx $_ = Hello good morning everybody ; if ($_ =~ /every/) { print Found the word every \n ; Very easy to use. The text between the forward slashes defines the regular expression. If we use!~ instead of =~, it means that the pattern is not present in the string. 5

The previous example illustrates literal texts as regular expressions. Simplest form of regular expression. Point to remember: When performing the matching, all the characters in the string are considered to be significant, including punctuation and white spaces. For example, /every / will not match in the previous example. Another Simple Example $_ = Welcome to IIT Kharagpur, students ; if (/IIT K/) { print IIT K is present in the string\n ; { if (/Kharagpur students/) { print This will not match\n ; 6

Types of RegEx Basically two types: Matching Checking if a string contains a substring. The symbol m is used (optional if forward slash used as delimiter). Substitution Replacing a substring by another substring. The symbol s is used. Matching 7

The =~ Operator Tells Perl to apply the regular expression on the right to the value on the left. The regular expression is contained within delimiters (forward slash by default). If some other delimiter is used, then a preceding m is essential. Examples $string = Good day ; if ($string =~ m/day/) { print Match successful \n"; if ($string =~ /day/) { print Match successful \n"; Both forms are equivalent. The m in the first form is optional. 8

$string = Good day ; if ($string =~ m@day@) { print Match successful \n"; if ($string =~ m[day[ ) { print Match successful \n"; Both forms are equivalent. The character following m is the delimiter. Character Class Use square brackets to specify any value in the list of possible values. my $string = Some test string 1234"; if ($string =~ /[0123456789]/) { print "found a number \n"; if ($string =~ /[aeiou]/) { print "Found a vowel \n"; if ($string =~ /[0123456789ABCDEF]/) { print "Found a hex digit \n"; 9

Character Class Negation Use ^ at the beginning of the character class to specify any single element that is not one of these values. my $string = Some test string 1234"; if ($string =~ /[^aeiou]/) { print "Found a consonant\n"; Pattern Abbreviations Useful in common cases. \d \w \s \D \W \S Anything except newline (\n) A digit, same as [0-9] A word character, [0-9a-zA-Z_] A space character (tab, space, etc) Not a digit, same as [^0-9] Not a word character Not a space character 10

$string = Good and bad days"; if ($string =~ /d..s/) { print "Found something like days\n"; if ($string =~ /\w\w\w\w\s/) { print "Found a four-letter word!\n"; Anchors Three ways to define an anchor: ^ :: anchors to the beginning of string $ :: anchors to the end of the string \b :: anchors to a word boundary 11

if ($string =~ /^\w/) :: does string start with a word character? if ($string =~ /\d$/) :: does string end with a digit? if ($string =~ /\bgood\b/) :: Does string contain the word Good? Multipliers There are three multiplier characters. * :: Find zero or more occurrences + :: Find one or more occurrences? :: Find zero or one occurrence Some example usages: $string =~ /^\w+/; $string =~ /\d?/; $string =~ /\b\w+\s+/; $string =~ /\w+\s?$/; 12

Substitution Basic Usage Uses the s character. Basic syntax is: $new =~ s/pattern_to_match/new_pattern/; What this does? Looks for pattern_to_match in $new and, if found, replaces it with new_pattern. It looks for the pattern once. That is, only the first occurrence is replaced. There is a way to replace all occurrences (to be discussed shortly). 13

Examples $xyz = Rama and Lakshman went to the forest ; $xyz =~ s/lakshman/bharat/; $xyz =~ s/r\w+a/bharat/; $xyz =~ s/[aeiou]/i/; $abc = A year has 11 months \n ; $abc =~ s/\d+/12/; $abc =~ s /\n$/ /; Common Modifiers Two such modifiers are defined: /i :: ignore case /g :: match/substitute all occurrences $string = Ram and Shyam are very honest"; if ($string =~ /RAM/i) { print Ram is present in the string ; $string =~ s/m/j/g; # Ram -> Raj, Shyam -> Shyaj 14

Use of Memory in RegEx We can use parentheses to capture a piece of matched text for later use. Perl memorizes the matched texts. Multiple sets of parentheses can be used. How to recall the captured text? Use \1, \2, \3, etc. if still in RegEx. Use $1, $2, $3 if after the RegEx. Examples $string = Ram and Shyam are honest"; $string =~ /^(\w+)/; print $1, "\n"; # prints Ra\n $string =~ /(\w+)$/; print $1, "\n"; # prints st\n $string =~ /^(\w+)\s+(\w+)/; print "$1 $2\n"; # prints Ramnd Shyam are honest ; 15

$string = Ram and Shyam are very poor"; if ($string =~ /(\w)\1/) { print "found 2 in a row\n"; if ($string =~ /(\w+).*\1/) { print "found repeat\n"; $string =~ s/(\w+) and (\w+)/$2 and $1/; Example 1 validating user input print Enter age (or 'q' to quit): "; chomp (my $age = <STDIN>); exit if ($age =~ /^q$/i); if ($age =~ /\D/) { print "$age is a non-number!\n"; 16

Example 2: validation contd. File has 2 columns, name and age, delimited by one or more spaces. Can also have blank lines or commented lines (start with #). open IN, $file or die "Cannot open $file: $!"; while (my $line = <IN>) { chomp $line; next if ($line =~ /^\s*$/ or $line =~ /^\s*#/); my ($name, $age) = split /\s+/, $line; print The age of $name is $age. \n"; Some Special Variables 17

$&, $` and $ What is $&? It represents the string matched by the last successful pattern match. What is $`? It represents the string preceding whatever was matched by the last successful pattern match. What is $? It represents the string following whatever was matched by the last successful pattern match. Example: $_ = 'abcdefghi'; /def/; print "$\`:$&:$'\n"; # prints abc:def:ghi 18

So actually. S` represents pre match $& represents present match $ represents post match 19

SOLUTIONS TO QUIZ QUESTIONS ON LECTURE 22 Quiz Solutions on Lecture 22 1. How to sort the elements of an array in the numerical order? @num = qw (10 2 5 22 7 15); @new = sort {$a <=> $b @num; 2. Write a Perl program segment to sort an array in the descending order. @new = sort {$a <=> $b @num; @new = reverse @new; 20

Quiz Solutions on Lecture 22 3. What is the difference between the functions chop and chomp? chop removes the last character in a string. chomp does the same, but only if the last character is the newline character. 4. Write a Perl program segment to read a text file input.txt, and generate as output another file out.txt, where a line number precedes all the lines. Quiz Solutions on Lecture 22 open INP, input.txt or die Error in open: $! ; open OUT, >$out.txt or die Error in write: $! ; while <INP> { print OUT $. : $_ ; close INP; close OUT; 21

Quiz Solutions on Lecture 22 5. How does Perl check if the result of a relational expression is TRUE of FALSE. Only the values 0, undef and empty string are considered as FALSE. All else is TRUE. 6. For comparison, what is the difference between lt and <? lt compares two character strings, while < compares two numbers. Quiz Solutions on Lecture 22 7. What is the significance of the file handle <ARGV>? It reads the names of files from the command line and opens them all (reads line by line). 8. How can you exit a loop in Perl based on some condition? Using the last keyword. last if (i > 10); 22

QUIZ QUESTIONS ON LECTURE 23 Quiz Questions on Lecture 23 1. Show an example illustrating the split function. 2. Write a Perl code segment to join three strings $a, $b, and $c, separated by the delimiter string <=>. 3. What is the difference between =~ and!~? 4. Is it possible to change the forward slash delimiter while specifying a regular expression? If so, how? 5. Write Perl code segment to search for the presence of a vowel (and a consonant) in a given string. 23

Quiz Questions on Lecture 23 6. How do you specify a RegEx indicating a word preceding and following a space, and starting with b, ending with d, with the letter a somewhere in between. 7. Write a Perl command to replace all occurrences of the string bad to good in a given string. 8. Write a Perl code segment to replace all occurrences of the string bad to good in a given file. 9. Write a Perl command to exchange the first two words starting with a vowel in a given character string. 10. What are the meanings of the variables S`, $@, and S? 24