Java Programming String Processing 1 Copyright 2013, Oracle and/or its affiliates. All rights
Overview This lesson covers the following topics: Read, search, and parse Strings Use StringBuilder to create Strings Use regular expressions to search, parse, and replace Strings 2 Copyright 2013, Oracle and/or its affiliates. All rights
Strings The String object can be interpreted as a group of characters in a single memory location. Strings, like arrays, begin their index at 0 and end their index at StringName.length()-1. There are many different ways of approaching String manipulation. 3 Copyright 2013, Oracle and/or its affiliates. All rights
Using a FOR Loop One way to manipulate Strings is to use a FOR loop. This code segment initializes a String and increments through its characters, printing each one to the console. String str = Sample String ; for(int index=0;index<str.length();index++){ System.out.print(str.charAt(index));} 4 Copyright 2013, Oracle and/or its affiliates. All rights
Benefits to Using a FOR Loop Using the FOR loop method of incrementing through a String is beneficial if you desire to: Search for a specific character or String inside of the String. Read the String backwards (from last element to first element). Parse the String. 5 Copyright 2013, Oracle and/or its affiliates. All rights
Print String to Console An easier way to print a String to the console does not involve incrementing through the String. This code is shown below. System.out.print(str); 6 Copyright 2013, Oracle and/or its affiliates. All rights
Common String Methods A few other common String methods are: String Method length() Description Returns the number of characters in the String. chatat(int i) Returns the character at index i. substring(int start) substring(int start, int end) replace(char oldc, char newc) Returns part of the String from index start to the end of the String. Returns part of the String from index start to index end, but does not include the character at index end. Returns a String where all occurrences of character oldc have been replaced with newc. 7 Copyright 2013, Oracle and/or its affiliates. All rights
Searching and Strings There are a few different ways to search for a specific character or String inside of the String. The first is a for loop, which can be altered to search, count, or replace characters or Substrings contained in Strings. 8 Copyright 2013, Oracle and/or its affiliates. All rights
Searching and Strings Example The code below uses a for loop to count the number of spaces found in the String. String str = Searching for spaces ; int count=0; for(int i=0;i<str.length();i++){ if(str.charat(i)==' ') count++; } Since the index of a String begins at 0, we must begin searching for a ' ' at index 0. Search through the String until the index reaches the last element of the String, which is at index.strlength()-1. This means that i cannot be > or = to the str.length(). If it does exceed str.length()-1, an "index out of bounds" error will occur. 9 Copyright 2013, Oracle and/or its affiliates. All rights
Calling Methods on the String Other ways to search for something in a String is by calling any of the following methods on the String. These methods are beneficial when working with programming problems that involve the manipulation of Strings. String Method Description contains(charsequence s) Returns true if the String contains s. indexof(char ch) indexof(string str) Returns the index within this String of the first occurrence of the specified character and -1 if the character is not in the String. Returns the index within this String of the first occurrence of the specified Substring and -1 if the String does not contain the Substring str. 10 Copyright 2013, Oracle and/or its affiliates. All rights
Reading Strings Backwards Typically a String is read from left to right. To read a String backwards, simply change the starting index and ending index of the FOR loop that increments through the String. String str = Read this backwards ; String strbackwards = ; for(int i=str.length()-1; i>=0 ; i--){ strbackwards+=str.substring(i,i+1); } Start the FOR loop at the last index of the array (which is str.length()-1), and decrease the index all the way through to the first index (which is 0). 11 Copyright 2013, Oracle and/or its affiliates. All rights
Parsing a String Parsing means dividing a String into a set of Substrings. Typically, a sentence (stored as a String) is parsed by spaces to separate the words of the sentence rather than the whole sentence. This makes it easier to rearrange the words than if they were all together in one String. You may parse a String by any character or Substring. Below are two techniques for parsing a String: For loop Split 12 Copyright 2013, Oracle and/or its affiliates. All rights
Steps to Parsing a String with a For Loop 1. Increment through the for loop until you find the character or Substring where you wish to parse it. 2. Store the parsed components. 3. Update the String. 4. Manipulate the parsed components as desired. 13 Copyright 2013, Oracle and/or its affiliates. All rights
Steps to Parsing a String with a For Loop import java.util.*; public class StringParser { public static void main(string[] args) { String str = "Parse this String"; ArrayList<String> words = new ArrayList<String>(); } } while(str.length() > 0){ for(int i=0; i<str.length(); i++){ if(i==str.length()-1){ words.add(str.substring(0)); str = ""; break; } else if(str.charat(i)==' '){ words.add(str.substring(0,i)); str=str.substring(i+1); break; } } } for(string s : words) System.out.print(s + ' '); 14 Copyright 2013, Oracle and/or its affiliates. All rights
Parsing a String: Split Split is a method inside the String class that parses a String at specified characters, or if unspecified, spaces. It returns an array of Strings that contains the Substrings (or words) that parsing the String gives. How to call split on a String: String sentence = This is my sentence ; String[] words = sentence.split( ); //words will look like {This,is,my,sentence} String[] tokens = sentence.split( i ); //tokens will look like {Th,s,s my sentence} 15 Copyright 2013, Oracle and/or its affiliates. All rights
Split a String by More than One Character It is also possible to split a String by more than one specified character if you use brackets [ ] around the characters. Here is an example. String sentence = This is my sentence ; String[] tokens = sentence.split( [ie] ); //tokens will look like {Th,s,s my s,nt,nc} //each token is separated by any occurrence of //an i or any occurrence of an e. Notice how the brackets are used to include i and e. 16 Copyright 2013, Oracle and/or its affiliates. All rights
StringBuilder StringBuilder is a class that represents a String-like object. It is made of a sequence of characters, like a String. The difference between String and StringBuilder objects is that: StringBuilder includes methods that can modify the StringBuilder once it has been created by appending, removing, replacing, or inserting characters. Once created, a String cannot be changed. It is replaced by a new String instead. 17 Copyright 2013, Oracle and/or its affiliates. All rights
Strings Cannot be Modified It is not possible to make modifications to a String. Methods used to modify a String actually create a new String in memory with the specified changes, they do not modify the old one. This is why StringBuilders are much faster to work with: They can be modified and do not require you to create a new String with each modification. 18 Copyright 2013, Oracle and/or its affiliates. All rights
StringBuilder and String Shared Methods StringBuilder shares many of the same methods with String, including but not limited to: charat(int index) indexof(string str) length() substring(int start, int end) 19 Copyright 2013, Oracle and/or its affiliates. All rights
StringBuilder Methods StringBuilder also has some methods specific to its class, including the four below: Method append(type t) delete(int start, int end) insert(int offset, Type t) replace(int start, int end, String str) Description Is compatible with any Java type or object, appends the String representation of the Type argument to the end of the sequence. Removes the character sequence included in the Substring from start to end. Is compatible with any Java type, inserts the String representation of Type argument into the sequence. Replaces the characters in a Substring of this sequence with characters in str. 20 Copyright 2013, Oracle and/or its affiliates. All rights
Methods to Search Using a StringBuilder Searching using a StringBuilder can be done using either of the below methods: Method charat(int index) Description Returns the character at index. indexof(string str, int fromindex) Returns index of first occurrence of str. 21 Copyright 2013, Oracle and/or its affiliates. All rights
StringBuilder versus String StringBuilder Changeable Easier insertion, deletion, and replacement. Can be more difficult to use, especially when using regular expressions introduced on the next few slides. Use when memory needs to be conserved. String Immutable Easier concatenation. Visually simpler to use, similar to primitive types rather than objects. Use with simpler programs where memory is not a concern. 22 Copyright 2013, Oracle and/or its affiliates. All rights
Regular Expressions A regular expression is a character or a sequence of characters that represent a String or multiple Strings. Regular expressions: Are part of the java.util.regex package, thus any time regular expressions are used in your program you must import this package. Syntax is different than what you are used to but allows for quicker, easier searching, parsing, and replacing of characters in a String. 23 Copyright 2013, Oracle and/or its affiliates. All rights
String.matches(String regex) The String class contains a method matches(string regex) that returns true if a String matches the given regular expression. This is similar to the String method equals(string str). The difference is that comparing the String to a regular expression allows variability. For example, how would you write code that returns true if the String animal is cat or dog and returns false otherwise? 24 Copyright 2013, Oracle and/or its affiliates. All rights
Equals Versus Matches A standard answer may look something like this: if(animal.equals( cat )) return true; else if(animal.equals( dog )) return true; return false; An answer using regular expressions would look something like this: return animal.matches( cat dog ); The second solution is much shorter. The regular expression symbol allows for the method matches to check if animal is equal to cat or dog and return true accordingly. 25 Copyright 2013, Oracle and/or its affiliates. All rights
Square Brackets Square brackets are used in regular expression to allow for character variability. For example, if you wanted to return true if animal is equal to dog or Dog, but not dog, using equalsignorecase would not work and using equals would take time and multiple lines. If you use regular expression, this task can be done in one line as follows. This code tests if animal matches Dog or dog and returns true if it does. return animal.matches( [Dd]og ); 26 Copyright 2013, Oracle and/or its affiliates. All rights
Include Any Range of Characters Square brackets aren't restricted to two character options. They can be combined with a hyphen to include any range of characters. For example, you are writing code to create a rhyming game and you want to see if String word rhymes with banana. The definition of a rhyming word is a word that contains all the same letters except the first letter may be any letter of the alphabet. Your first attempt at coding may look like this: if(word.length()==6) if(word.substring(1,6).equals( anana )) return true; return false; 27 Copyright 2013, Oracle and/or its affiliates. All rights
Using Square Brackets and a Hyphen A shorter, more generic way to complete the same task is to use square brackets and a hyphen (regular expression) as shown below. return word.matches( [a-z]anana ); This code returns true if word begins with any lower case letter and ends in anana. To include upper case characters we would write: return word.matches( [a-za-z]anana ); 28 Copyright 2013, Oracle and/or its affiliates. All rights
Using Square Brackets and a Hyphen To allow the first character to be any number or a space in addition to a lower or upper case character, simply add 0-9 inside the brackets (note the space before 0). return word.matches( [ 0-9a-zA-Z]anana ); 29 Copyright 2013, Oracle and/or its affiliates. All rights
The Dot The dot (.) is a representation for any character in regular expressions. For example, you are writing a decoder for a top secret company and you think that you have cracked the code. You need to see if String element consists of a number followed by any other single character. This task is done easily with use of the dot as shown below. This code returns true if element consists of a number followed by any character. The dot matches any character. return element.matches( [0-9]. ); 30 Copyright 2013, Oracle and/or its affiliates. All rights
Repetition Operators A repetition operator is any symbol in regular expressions that indicates the number of times a specified character appears in a matching String. Repetition Operator * Definition Sample Code Code Meaning 0 or more occurrences return str.matches( A* ); Returns true if str consists of zero or more A's but no other letter.? 0 or 1 occurrence return str.matches( A? ); Returns true if str is or A. + 1 or more occurrences return str.matches( A+ ); Returns true if str is 1 or more A's in a sequence. 31 Copyright 2013, Oracle and/or its affiliates. All rights
More Repetition Operators Repetition Operator Definition Sample Code Code Meaning {x} X occurrences return str.matches( A{7} ); {x,y} {x,} Between x & y occurrences X or more occurrences return str.matches( A{7,9} ); Return str.matches( A{5,} ); Returns true if str is a sequence of 7 A's. Returns true if str is a sequence of 7, 8, or 9 A's. Returns true if str is a sequence of 5 or more A's. 32 Copyright 2013, Oracle and/or its affiliates. All rights
Combining Repetition Operators Example 1 In the code below: The dot represents any character. The asterisk represents any number of occurrences of the character preceding it. The.* means any number of any characters in a sequence will return true. return str.matches(.* ); 33 Copyright 2013, Oracle and/or its affiliates. All rights
Combining Repetition Operators Example 2 If the code below returns true, str must be a sequence of 10 digits (between 0 and 5) and may have 0 or 1 characters preceding the sequence. Remember, all symbols of regular expressions may be combined with each other, as shown below, and with standard characters. return str.matches(.?[0-5]{10} ); 34 Copyright 2013, Oracle and/or its affiliates. All rights
Pattern A Pattern is a class in the java.util.regex package that stores the format of the regular expression. For example, to initialize a Pattern of characters as defined by the regular expression [A-F]{5,}.* you would write the following code: Pattern p = Pattern.compile( [A-F]{5,}.* ); The compile method returns a Pattern as defined by the regular expression given in the parameter. 35 Copyright 2013, Oracle and/or its affiliates. All rights
Matcher A matcher is a class in the java.util.regex package that stores a possible match between the Pattern and a String. A Matcher is initialized as follows: Matcher match = patternname.matcher(stringname); The matcher method returns a Matcher object. The following code returns true if the regular expression given in the Pattern patternname declaration matches the String StringName. return match.matches(); 36 Copyright 2013, Oracle and/or its affiliates. All rights
Matcher: Putting it All Together To put it all together, we have: Pattern p = Pattern.compile( [A-F]{5,}.* ); Matcher match = patternname.matcher(stringname); return match.matches(); 37 Copyright 2013, Oracle and/or its affiliates. All rights
Benefits to Using Pattern and Matcher This seems like a very complex way of completing the same task as the String method matches. Although that may be true, there are benefits to using a Pattern and Matcher such as: Capturing groups of Strings and pulling them out, allowing to keep specific formats for dates or other specific formats without having to create special classes for them. Matches has a find() method that allows for detection of multiple instances of a pattern within the same String. 38 Copyright 2013, Oracle and/or its affiliates. All rights
Regular Expressions and Groups Segments of regular expressions can be grouped using parentheses, opening the group with ( and closing it with ). These groups can later be accessed with the Matcher method group(groupnumber). For example, consider reading in a sequence of dates, Strings in the format DD/MM/YYYY, and printing out each date in the format MM/DD/YYYY. Using groups would make this task quite simple. 39 Copyright 2013, Oracle and/or its affiliates. All rights
Regular Expressions and Groups Example import java.util.regex.pattern; import java.util.regex.matcher; import java.util.scanner; <Enter first-level introductory paragraph, sentence, or public class RegExpressionsPractice { phrase public static here.> void main(string[] (24 pt args) Arial { Regular) Group 1 Group 2 } Pattern datep; <Enter datep second-level = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})"); bullet text here.> (22 pt Arial Regular) Scanner in = new Scanner(System.in); <Enter System.out.println("Enter third-level bullet a Date: text (dd/mm/yyyy)"); here.> (20 pt Arial Regular) Group 3 } while(!date.equals("")){ <Enter fourth-level bullet text here.> (18 pt Arial Regular) 40 Copyright 2013, Oracle and/or its affiliates. All rights String date = in.nextline(); Matcher datem = datep.matcher(date); Recalls each group <Enter if(datem.matches()){ fifth-level bullet text here.> (16 pt Arial Regular) of the Matcher. String day = datem.group(1); String month = datem.group(2); String year = datem.group(3); System.out.println(month+"/"+day+"/"+year); } System.out.println("Enter a Date: (dd/mm/yyyy)"); date=in.nextline(); } Group 1 and Group 2 are defined to consist of 2 digits each. Group 3 (the year) is defined to consist of 4 digits. Note: It is still possible to get the whole Matcher by calling group (0).
Matcher.find() Matcher's find method will return true if the defined Pattern exists as a Substring of the String of the Matcher. For example, if we had a pattern defined by the regular expression [0-9], as long as we give the Matcher a String that contains at least one digit somewhere in the String, calling find() on this Matcher will return true. 41 Copyright 2013, Oracle and/or its affiliates. All rights
Parsing a String with Regular Expressions Recall the String method split() introduced earlier in the lesson, which splits a String by spaces and returns the split Strings in an array of Strings. The split method has an optional parameter, a regular expression that describes where the operator wishes to split the String. For example, if we wished to split the String at any sequence of one or more digits, we could write something like this: String[] tokens = str.split( [0-9]+ ); 42 Copyright 2013, Oracle and/or its affiliates. All rights
Replacing with Regular Expressions There are a few simple options for replacing Substrings using regular expressions. The following two are the most commonly used methods. For use with Strings, the method replaceall( insertregularexpressionhere, newsubstring) will replace all occurrences of the defined regular expression found in the String with the defined String newsubstring. 43 Copyright 2013, Oracle and/or its affiliates. All rights
replaceall Method This method works the same if called by a Matcher rather than a String. However, it does not require the regular expression. It will simply replace any matches of the Pattern you gave it when you initialized the Matcher. The method example shown below results in a replacement of all matches identified by Matcher with the String abc. MatcherName.replaceAll( abc ); 44 Copyright 2013, Oracle and/or its affiliates. All rights
Terminology Key terms used in this lesson included: Matcher Parsing Pattern Regular Expression Regular Expression Dot 45 Copyright 2013, Oracle and/or its affiliates. All rights
Terminology Key terms used in this lesson included: Regular Expression Groups Regular Expression Square Brackets Repetition Operator Split StringBuilder 46 Copyright 2013, Oracle and/or its affiliates. All rights
Summary In this lesson, you should have learned how to: Read, search, and parse Strings Use StringBuilder to create Strings Use regular expressions to search, parse, and replace Strings 47 Copyright 2013, Oracle and/or its affiliates. All rights