Chapter 10: Strings and Hashtables

Similar documents
Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid:

Computing with Numbers

Computing with Strings. Learning Outcomes. Python s String Type 9/23/2012

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

Introduction to: Computers & Programming: Strings and Other Sequences

Sequences: Strings, Lists, and Files

Types, lists & functions

Python: common syntax

Worksheet 6: Basic Methods Methods The Format Method Formatting Floats Formatting Different Types Formatting Keywords

Variables, Constants, and Data Types

CS2304: Python for Java Programmers. CS2304: Sequences and Collections

Sequence of Characters. Non-printing Characters. And Then There Is """ """ Subset of UTF-8. String Representation 6/5/2018.

Programming Fundamentals and Python

Working with Strings. Husni. "The Practice of Computing Using Python", Punch & Enbody, Copyright 2013 Pearson Education, Inc.

Introduction to: Computers & Programming: Strings and Other Sequences

Visual C# Instructor s Manual Table of Contents

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

CSC Web Programming. Introduction to JavaScript

Python for Non-programmers

COMP1730/COMP6730 Programming for Scientists. Strings

MITOCW watch?v=rvrkt-jxvko

3. Conditional Execution

CMSC 201 Fall 2015 Lab 12 Tuples and Dictionaries

3. Conditional Execution

Python and Bioinformatics. Pierre Parutto

COMS 469: Interactive Media II

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Unit E Step-by-Step: Programming with Python

Python Intro GIS Week 1. Jake K. Carr

Control structure: Repetition - Part 1

Topic 7: Lists, Dictionaries and Strings

Overview of List Syntax

Babu Madhav Institute of Information Technology, UTU 2015

Part III Appendices 165

Semester 2, 2018: Lab 5

Full file at

LISTS WITH PYTHON. José M. Garrido Department of Computer Science. May College of Computing and Software Engineering Kennesaw State University

MICROPROCESSOR SYSTEMS INTRODUCTION TO PYTHON

UNIT 5. String Functions and Random Numbers

To add something to the end of a list, we can use the append function:

CPS122 Lecture: From Python to Java

Slicing. Open pizza_slicer.py

CS Summer 2013

ENGR 102 Engineering Lab I - Computation

The second statement selects character number 1 from and assigns it to.

Python Input, output and variables. Lecture 23 COMPSCI111/111G SS 2018

Strings are actually 'objects' Strings

Assoc. Prof. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

Introduction to Programming, Aug-Dec 2006

Computer Science 121. Scientific Computing Winter 2016 Chapter 3 Simple Types: Numbers, Text, Booleans

Strings. Upsorn Praphamontripong. Note: for reference when we practice loop. We ll discuss Strings in detail after Spring break

CS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017

CS 234 Python Review Part 2

Advanced Algorithms and Computational Models (module A)

C++ PROGRAMMING. For Industrial And Electrical Engineering Instructor: Ruba A. Salamh

Assoc. Prof. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program

Chapter 2: Basic Elements of C++

Chapter 3: Creating and Modifying Text

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction

3 The Building Blocks: Data Types, Literals, and Variables

Iteration Part 1. Motivation for iteration. How does a for loop work? Execution model of a for loop. What is Iteration?

INTERMEDIATE LEVEL PYTHON PROGRAMMING SELECTION AND CONDITIONALS V1.0

Python - Variable Types. John R. Woodward

Introduction to Python

Python for ArcGIS. Lab 1.

CSCE 110 Programming I Basics of Python: Variables, Expressions, Input/Output

Introduction to: Computers & Programming: Review prior to 1 st Midterm

Strings, Lists, and Sequences

Lecture 3. Input, Output and Data Types

Python I. Some material adapted from Upenn cmpe391 slides and other sources

For strings (and tuples, when we get to them), its easiest to think of them like primitives directly stored in the variable table.

CSCA20 Worksheet Strings

CMSC201 Computer Science I for Majors

Python for Everybody. Exploring Data Using Python 3. Charles R. Severance

Lists, loops and decisions

Introduction to String Manipulation

Basic data types. Building blocks of computation

Key Differences Between Python and Java

6. Data Types and Dynamic Typing (Cont.)

Notebook. March 30, 2019

Objectives. In this chapter, you will:

String Processing CS 1111 Introduction to Programming Fall 2018

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #29 Arrays in C

Python The way of a program. Srinidhi H Asst Professor Dept of CSE, MSRIT

CMSC201 Computer Science I for Majors

JavaScript Basics. The Big Picture

If Statements, For Loops, Functions

UNIVERSITÀ DI PADOVA. < 2014 March >

ENGR 102 Engineering Lab I - Computation

Python Input, output and variables

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI

06/11/2014. Subjects. CS Applied Robotics Lab Gerardo Carmona :: makeroboticsprojects.com June / ) Beginning with Python

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

CSI33 Data Structures

Python in 10 (50) minutes

The Big Python Guide

CMSC201 Computer Science I for Majors

Python Input, output and variables. Lecture 22 COMPSCI111/111G SS 2016

Transcription:

Chapter 10: Strings and Hashtables This chapter describes the string and hashtable data types in detail. Strings hold text-- words and phrases-- and are used in all applications with natural language processing. Hashtables are like lists, but they allow for non-integer indexes-- you can find information using a key word instead of by number. Strings In computer science, the term character refers to a symbol that appears on the keyboard-- a letter, digit, or punctuation mark. The term string refers to sequences of characters. In most programming languages, including Python, string literals are specified within quotation marks. 'dog', 'honey I ate the kids', and '327' are examples of string literals. Quotations marks are used to distinguish such string literals from variables, as variable names are also sequences of characters In the assignment statement: animal = 'dog' animal is a variable and its value is dog. If we were tracing the program, we'd write: animal 'dog' Strings like '327' can be confusing to beginning programmers. Consider the following code: number = '327' doublenumber = 2*number print doublenumber What do you think will be printed out? If you said 654, you'd be wrong. The correct answer is '327327'. The reason is that, to the computer, '327' is not a number, it's just a sequence of characters, a '3' followed by a '2' and then a '7'. '327' is a string literal, so when you assign the variable number to it, the variable number is marked as type string. When you multiply any variable of type string, you just repeat the characters of the string n times,

where n is the multiplier. Thus, 2*number=2*'327'='327327' If you left off the quotes, you'd get your 654: number = 327 doublenumber = 2*number print doublenumber Python does provide the int() function for converting a string like '327' into its integer equivalent, so you can write: number = '327' doublenumber= 2*int(number) print doublenumber and get 654. Conversely, you can also convert an integer into a string using the str() function. ASCII Table A string is a sequence of characters-- symbols on the keyboard. But how does a computer store each of the characters of a string? For the string 'aaa', does it store three images that look like the letter 'a'? The answer is no. Instead, there is a mapping table, the ASCII table, that maps a number to each symbol. So the letter 'a' is represented by the number 97, 'b' is 98, and so on. The digit '0' is represented by the number 48, '1' is 49, etc. The string 'aaa' is actually stored as four numbers: 97 97 97 0 As the examples shows, a 0 is stored to denote the end of the string. We call it the end-of-string character. The string 'cat' is: 99 97 116 0 Early computer systems stored the number representation for each character in 8 bits, but that only allowed for 2 8 = 256 possible characters. This worked fine for the English language and Arabic character set, but was insufficient for internationalization and characters sets like those in Japan and China. Now computer systems store each character with a 16 bit

number in a table known as Unicode. In most programs, the fact that the letter 'a' is really stored as 97 is inconsequential. However, there are times when the ASCII mapping table is needed, such as when converting a string representation of a number to the number itself. Python provides two functions, ord(s) and chr(s), that give programmers access to the ASCII mapping table. ord returns the ascii number of a character, while chr returns the character for a given number. So ord('a') is 97, and chr(97) is 'a'. Concatenation The familiar use of the plus sign (+) is to add numbers-- everyone understands 1+1. In Python, you can also apply the plus sign to strings. Adding two strings is called concatenation and results in the two strings being joined together, e.g., the result of the expression: "abc"+"def" is a new string, "abcdef". Let's consider an example: say we have a list holding a players golf scores, and we want to display the scores in the form hole:score, hole:score, etc. So if the golfer scored a 3 on hole 1 and a 5 on hole 2, our code would build the string: hole 1:3, hole 2:5 Here's the code: golfscorelist = [3,5] # could be any list of numbers allscores="" # begin with the empty string i=0 while i<len(golfscorelist): score="hole" + str(i+1) + ":" +str(golfscorelist[i]) #build element allscores = allscores+score #concat to allscores i=i+1 if i<len(golfscorelist): allscores=allscores+"," print allscores We start by initializing the string allscores to the empty string. This is the string equivalent to the list initialization list=[]. Just as we use append to build lists, we'll use concatenation to incrementally build allscores into our final result. We iterate through the list, as usual, with a while loop. On each iteration, we build a string for that score (e.g., ''hole 1:3'). The line:

score="hole" + str(i+1) + ":" +str(golfscorelist[i]) does this by concatenating the string "hole" with the hold number (i+1), a colon, and the score (golfscorelist[i]). Because (i+1) and golfscorelist[i] are both integers, we use Python's str function to convert them into strings. On the second line within the while loop: allscores = allscores+score #concat to allscores we append the current score string to the allscores string using concatenation. After incrementing our hole counter i, we check if we are on the last score: i<len(golfscorelist If not, we know we have another score to come so we can add a comma to allscores. Iterating Through a String To Python, a string is a different type than a list, but in essence a string is a list of characters. And in fact, in Python you can loop through a string like you do a list. Take for instance this code to count the occurrences of the letter 'a' in a string variable word: word = "aabbbaaabb" # could be any string i=0 count=0 while i<len(word): # do something with each character if word[i]=='a': count=count+1 i=i+1 Note the use of the function len: you can use it to find the number of characters in a string, just as you can use it to find out how many elements are in a list. Note also the use of the index operator in the statement: if word[i]=='a'

Just as the index operator gets an element of a list, it can also get a character in a string. If word was 'cat', then word[0] would be 'c', word[1] would be 'a', and word[2] 't'. One thing you cannot do to a string, which you can do to a list, is modify an element using an index. If word is a string, the following will give you an error: word[i]='x' in this sense, strings are immutable. You can get a slice of a string. For instance, the following code gets the first two characters of a string: astring=astring[0:2] Slice can be used to get around the 'immutable' nature of strings. Consider this function to replace the ith character of a string: def replace(astring,i,replacement): return astring[0:i]+replacement+astring[i+1:len(astring)] The code grabs the slice of the string up to i, then concatenates the replacement character, then appends the last part of the original string (from i+1 to the length of the string). For the following code: word = 'cat' word2 = replace(word,1,'x') print word2 'cxt' would be printed. Note that the variable word is not modified-- it still holds the value 'cat', and the variable word2 holds 'cxt'. We could have changed the variable word by writing: word = replace(word,1,'x')

Python String Functions Python has many built-in functions for manipulating strings, all listed at the Python web site: http://www.python.org/doc/2.5.2/lib/string-methods.html. The functions are object-oriented, meaning they are called in the form: somestring.function(p1,p2) where somestring is any string and p1 and p2 are the required parameters for the functions. This object-oriented way of calling a function is different from previous functions we've looked at in that the string being manipulated is considered the "object" of the function and is found to the left of a dot and the function name. We'll discuss object-oriented programming in depth in chapter 11. The string library functions perform just about any operation one could think of. One example is the split function, which splits a string into a list of subparts based on a given delimiter. For example, consider a bookmarking site that allows a user to tag articles with keywords. Some such sites allow the user to enter the tags separated by commas. So the user might tag an article about Babe Ruth with "baseball, drinkers", meaning that the article should be categorized under "baseball" and "drinkers". The split function could be used to separate the tags: userinput="baseball,drinkers" taglist= userinput.split(",") After the call to split, taglist[0] would be "baseball", and taglist[1] would be "drinkers". Other string functions include upper and lower, which return the upper and lower cases of strings, find which returns the index of the occurrence of a substring, and startswith, which returns True if a string begins with a particular substring. Hashtables A hash table consists of key-value pairs. It is like a list, but whereas a list is indexed with a number, a hash table is indexed with a string called a key. Hashtables are useful when data is best accessed using a keyword. One example is an english-to-spanish mapping:

engtospan = {} # initialize the hash table engtospan['hello']='hola' #map the key 'hello' to the value 'hola' engtospan['goodbye']='adios' #map the key 'goodbye' to the value 'adios' For the hashtable engtospan, the keys are English words, and they are mapped to values that are Spanish words. Each entry in the hashtable is a key-value pair: key hello goodbye value hola adios In Python, Hashtables are called Dictionaries. Hashtable Initialization Recall that lists can be initialized either with an empty list: list=[] or with some initial data: list=[3,5,9] Hashtables are initialized in a similar fashion, using {} instead of []. You can create an empty hashtable with: engtospan={} or you can create one with initial key-value pairs: engtospan={'hello':'hola','goodbye','adios','beer','cerveza'} Note that each entry in a hash-table has two parts, the key and the value, separated by a colon. So the first entry in the code above is 'hello':'hola', with 'hello' being the key, and 'hola' the value. Commas separate the entries of the table. Hashtable Modification Recall that new items can be added to a list using the function append. With hashtables, new items are added using an assignment statement and an index:

engtospan["horse"]="caballo" The index "horse" is the key, and the value is "caballo". If this line of code followed the three entry initialization above, the hashtable would then have four entries. Accessing Hashtable Data Recall that list data is accessed by indexing into the list with a number. One can print the third item of a list with: print list[2] Hashtables are also accessed by indexing, but instead of accessing the ith item, one accesses data using a key (string) index. So one could print the Spanish word for "goodbye" with: print engtospan("goodbye") Or one could print the Spanish equivalent to a word input by the user with: engword = raw_input('please enter an English word:') spanword = engtospan[engword] print 'The spanish eqivalent is: ',spanword Python also provides the keys() and values() functions for iterating through all the entries of a hashtable. For instance, one could print an entire hashtable with the following code: for key in engtospan.keys(): print key+":"+engtospan[key] Using Hashtables in App Engine Hashtables have many uses. For instance, you could use a hash table to store data for each user in your system, using the user's id as a key. Hashtables are also used in web programming. In the App Engine system, which we'll be studying soon, a hashtable is used to build dynamic web pages. Consider the following HTML code: <html> <body> <p> The interest is: {{interest}} </p>

<p> The principal after one year is: {{principal}} </p> </body> </html> The interest and principal variable defined within double-curly brackets are called template variables. The template variables are the part of the web page that shows dynamic information, in this case the results of some banking interest computations performed by the web site server. With App Engine, the web site server code is written in Python. Suppose that the user has entered the original principal and rate of a bank account into an HTML form on a different page. The App Engine controller on the server would compute the interest earned and the new principal, then stick these results into a hash table called 'template_values' class ComputeInterestHandler(webapp.RequestHandler): def get(self): principalstring=self.request.get('principal') ratestring = self.request.get('rate') interest = int(principalstring)*int(ratestring)/100 intereststring = str(interest) principalstring = str(principal+interest) template_values={'interest':intereststring,'principal':principalstring} # render the page using the template engine path = os.path.join(os.path.dirname( file ),'index.html') self.response.out.write(template.render(path,template_values)) The template_values variable is a hash table. Each key represents one of the template variables in the HTML template. Each value represents the data that should replace those template variables when the new dynamic page is sent to the browser. In this case, the server replaces the key 'interest' with the value that was computed for it (the value in the variable intereststring), and the key 'principal' with the value that was computed for it. Summary Strings and hashtables are fundamental data types used by programmers, along with lists, integers, floating point numbers, booleans. In the following chapter, will discuss classes, which allow programmers to define their own data types.

Problems 1. Write a function that takes a string representation of a positive whole number as a paramter and returns the number as an integer. You can assume that the parameter is a valid number, e.g., "327" and does not have any non-digits, as in: "3&27" 2. Write a modified version of problem 1 in which your function: handles negative numbers returns 0 if the given string is not a valid whole number 3.Write a Python program that uses a hash table to map US states to their capitols, e.g., 'California' would be a key, and 'Sacramento' a value. The program should initialize the hash table with three entries, add a fourth entry on the next line, then prompt the user to enter a state. The program should print the capitol of the state entered by the user.