3. Sequences: Strings, Tuples, Lists 15.10.2009
Comments and hello.py hello.py # Our code examples are starting to get larger. # I will display "real" programs like this, not as a # dialog with the Python interpreter. # # "Real" programs are usually more complicated and thus # need to be commented sometimes. # # Thus, I should say how comments are written in Python: # Everything that follows a hash mark (#) and is not part # of a string is a comment, up to the end of the line. print "Hello, world" # Thus, this is the famous # "hello world" program.
Sequences In this lesson, we deal with Python s sequence types: Strings: str and unicode (Immutable) tuples: tuple (Mutable) lists: list Moreover, we get to know for loops.
Sequence Example >>> first name = "John" >>> last name = Gambolputty >>> name = first name + " " + last name >>> print name John Gambolputty >>> print name.split() [ John, Gambolputty ] >>> primes = [2, 3, 5, 7] >>> print primes[1], sum(primes) 3 17 >>> squares = (1, 4, 9, 16, 25) >>> print squares[1:4] (4, 9, 16)
Strings We have seen strings a couple of times before. Python distinguishes two kinds of strings: Byte strings (sometimes called ASCII strings) correspond to the strings of C and C++ and have type str. Unicode strings correspond to Java s strings and have type unicode. We restrict ourselves to byte strings. Byte strings are written "like this" most of the time. We will see alternative notations later.
Tuples and Lists Tuples have been mentioned a few times. We haven t seen lists before. Tuples and lists are containers for other objects. They are roughly comparable to vectors in C++/Java. Tuples are written within parentheses, lists within brackets: (2, 1, "dangerous") vs. ["red", "green", "blue"]. Tuples and lists can hold arbitrary objects, of course including other tuples and lists: ([18, 20, 22, "Null"], [("spam", [])]) The main difference between tuples and lists: Lists are mutable (may be changed). It is possible to append, insert and delete elements. Tuples are immutable (may not be changed). A tuple never changes, it will always contain the same objects. (However, the contained objects themselves may change if they are mutable, e. g. when dealing with a tuple containing lists).
Sequences Strings, tuples and lists have something in common: They contain other things, and these things appear in a certain order. Types with this property are called sequence types, and their instances are called sequences. All sequence types support the following operations: Concatenation: "Gambol" + "putty" == "Gambolputty" Repetition: 2 * "spam" == "spam" * 2 == "spamspam" Indexing: "Python"[1] == "y" Membership test: "yth" in "Python" Slicing: "Monty Python s Flying Circus"[6:12] == "Python" Iteration: for x in "egg"
Concatenation >>> print "Gambol" + "putty" Gambolputty >>> mylist = ["spam", "egg"] >>> print ["spam"] + mylist [ spam, spam, egg ] >>> primes = (2, 3, 5, 7) >>> print primes + primes (2, 3, 5, 7, 2, 3, 5, 7) >>> print mylist + primes Traceback (most recent call last): File "<stdin>", line 1, in? TypeError: can only concatenate list (not "tuple") to list >>> print mylist + list(primes) [ spam, egg, 2, 3, 5, 7]
Repetition >>> print "*" * 20 ******************** >>> print [None, 2, 3] * 3 [None, 2, 3, None, 2, 3, None, 2, 3] >>> print 2 * ("parrot", ["is", dead"]) ( parrot, [ is, dead ], parrot, [ is, dead ])
Indexing Sequences can be indexed forwards and backwards. When indexing forwards, the first element has the index 0. When indexing backwards, negative indices are used, with the last element (first from the back) having the index 1. >>> primes = (2, 3, 5, 7, 11, 13) >>> print primes[1], primes[-1] 3 13 >>> animal = "parrot" >>> animal[-2] o >>> animal[10] Traceback (most recent call last): File "<stdin>", line 1, in? IndexError: string index out of range
Where are the Characters? Python has no separate datatype for characters (chars). For Python, a character is simply a string of length 1. >>> food = "spam" >>> food spam >>> food[0] s >>> type(food) <type str > >>> type(food[0]) <type str > >>> food[0][0][0][0][0] s
Indexing: Assigning to indices (1) Lists can be modified by assigning to indices: >>> primes = [2, 3, 6, 7, 11] >>> primes[2] = 5 >>> print primes [2, 3, 5, 7, 11] >>> primes[-1] = 101 >>> print primes [2, 3, 5, 7, 101] Again, the modified indices must be within bounds.
Indexing: Assigning to indices (2) Tuples and strings are immutable: >>> food = "ham" >>> food[0] = "j" Traceback (most recent call last): File "<stdin>", line 1, in? TypeError: object doesn t support item assignment >>> pair = (10, 3) >>> pair[1] = 4 Traceback (most recent call last): File "<stdin>", line 1, in? TypeError: object doesn t support item assignment
Membership test: The in operator item in seq (seq is tuple or list): True if seq contains item as an element. substr in string (string is a string): True if string contains substr as a substring. >>> print 2 in [1, 4, 2] True >>> if "spam" in ("ham", "eggs", "sausage"):... print "tasty"... >>> print "m" in "spam", "ham" in "spam", "pam" in "spam" True False True
Slicing Slicing is cutting a slice, i. e. a contiguous subsequence, out of a sequence: >>> primes = [2, 3, 5, 7, 11, 13] >>> print primes[1:4] [3, 5, 7] >>> print primes[:2] [2, 3] >>> print "egg, sausage and bacon"[-5:] bacon
Slicing: Explanation seq[i:j] returns the elements in the range [i, j), i. e. those at positions i, i + 1,..., j 1: ("do", "re", 5, 7)[1:3] == ("re", 5) If i is left out, the slice starts at position 0: ("do", "re", 5, 7)[:3] == ("do", "re", 5) If j is left out, the slice ends after the last position: ("do", "re", 5, 7)[1:] == ("re", 5, 7) If both are left out, the slice contains the whole sequence: ("do", "re", 5, 7)[:] == ("do", "re", 5, 7)
Slicing: Explanation (2) There are no index errors when slicing: Ranges beyond the end of the sequence are simply empty: >>> "spam"[2:10] am >>> "spam"[-6:3] spa >>> "spam"[7:] Counting from the back is also possible when slicing. For example, the last thee elements of a sequence can be obtained with seq[-3:].
Slicing: Step Size Using so-called extended slicing, we can also specify a step size (which may be negative): >>> numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> numbers[1:7:2] [1, 3, 5] >>> numbers[1:8:2] [1, 3, 5, 7] >>> numbers[7:2:-1] [7, 6, 5, 4, 3] >>> numbers[::-1] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Slicing: Assigning to Slices (1) When dealing with lists, we can also assign to slices, i. e. replace part of the list by another sequence: >>> dish = ["ham", "sausage", "eggs", "bacon"] >>> dish[1:3] = ["spam", "spam"] >>> print dish ["ham", "spam", "spam", "bacon"] >>> dish[:1] = ["spam"] >>> print dish ["spam", "spam", "spam", "bacon"]
Slicing: Assigning to Slices (2) The assigned sequences need not have the same length as the slice that is assigned to. In fact, both may even be empty: >>> print dish ["spam", "spam", "spam", "bacon"] >>> dish[1:4] = ["baked beans"] >>> print dish ["spam", "baked beans"] >>> dish[1:1] = ["sausage", "spam", "spam"] >>> print dish ["spam", "sausage", "spam", "spam", "baked beans"] >>> dish[2:4] = [] >>> print dish ["spam", "sausage", "baked beans"] When using extended slicing (with a step size), the slice and assigned sequence must have the same length.
Slicing and Lists: The del Statement Instead of assigning an empty sequences to a slice, we can also use the del statement to remove individual elements or slices from a list: >>> primes = [2, 3, 5, 7, 11, "spam", 13] >>> del primes[-2] >>> primes [2, 3, 5, 7, 11, 13] >>> months = ["april", "may", "grune", "sectober", "june"] >>> del months[2:4] >>> months [ april, may, june ]
Iteration (1) We can iterate through a sequence by using for loops: >>> primes = [2, 3, 5, 7] >>> product = 1 >>> for number in primes:... product = product * number...... print product 210
Iteration (2) for works with all sequence types: >>> for character in "spam":... print character * 2... ss pp aa mm >>> for ingredient in ("spam", "spam", "egg"):... if ingredient == "spam":... print "tasty!"... tasty! tasty!
Iteration: Several Loop Variables When iterating through a sequence of sequences, several loop variables can be bound at the same time: >>> couples = [("Jupiter", "Lys"), ("Peter", "Kelly"),... ("Bob", "Liz")] >>> for x, y in couples:... print x, "is cool;", y, "is irritating."... Jupiter is cool; Lys is irritating. Peter is cool; Kelly is irritating. Bob is cool; Liz is irritating. This is an application of tuple unpacking, which we saw earlier.
break, continue, else When dealing with loops, the following three statements are useful: break terminates the loop ahead of time. continue terminates the current iteration of the loop ahead of time, i. e. jumps to the head of the loop and gets the next value(s) for the loop variable(s). Moreover, loops can have an else branch just like if statements. This branch is executed after finishing the loop unless the loop has been terminated with break. break, continue and else also work with the previously encountered while loops.
break, continue, else: Example break-continue-else.py foods_and_amounts = [("sausage", 2), ("eggs", 0), ("spam", 2), ("ham", 1)] for food, amount in foods_and_amounts: if amount == 0: continue if food == "spam": print amount, "tasty piece(s) of spam." break else: print "No spam!" # Output: # 2 tasty piece(s) of spam.
Modifying Lists While Iterating (1) During a loop, the object that is being iterated on should not change its size. If it does, even though we don t get hard crashes like in C++, the result can be confusing: >>> numbers = [3, 5, 7] >>> for n in numbers:... print n... if n == 3:... del numbers[0]... 3 7 >>> print numbers [5, 7]
Modifying Lists While Iterating (2) We can avoid the problem by iterating through a copy of the list: >>> numbers = [3, 5, 7] >>> for n in numbers[:]:... print n... if n == 3:... del numbers[0]... 3 5 7 >>> print numbers [5, 7]
Useful Functions for for Loops Some builtins are often used in the context of for loops: range and xrange enumerate zip
range and xrange range creates lists of integers: range(stop) returns [0, 1,..., stop-1] range(start, stop) returns [start, 1,..., stop-1] range(start, stop, step) returns [start, start + step, start + 2 * step,..., stop-1] xrange works like range, but does not return a real list, but an object of a special type, which is (more or less) exclusively intended for iterating through it. xrange conserves memory compared to range as no list needs to be created.
range and xrange: Examples >>> range(5) [0, 1, 2, 3, 4] >>> range(3, 30, 10) [3, 13, 23] >>> for i in xrange(3, 7):... print i, "** 3 =", i ** 3... 3 ** 3 = 27 4 ** 3 = 64 5 ** 3 = 125 6 ** 3 = 216
enumerate Sometimes we need to know at which position we are within a sequence while iterating it. For this purpose, we can use the function enumerate, which takes a sequences as an argument and returns a series of pairs (index, element): >>> for i, char in enumerate("spam"):... print "At position", i, "the letter is", char... At position 0 the letter is s. At position 1 the letter is p. At position 2 the letter is a. At position 3 the letter is m. Like xrange, the enumerate function does not return a real list, but is mostly intended for for loops. More specifically, it returns a generator.
zip (1) The zip function accepts one or more sequences and returns a list of tuples with corresponding elements from those sequences: >>> detectives = ["Jupiter", "Peter", "Bob"] >>> girlfriends = ["Lys", "Kelly", "Liz"] >>> print zip(detectives, girlfriends) [( Jupiter, Lys ), ( Peter, Kelly ), ( Bob, Liz )] zip returns a real list.
zip (2) When several sequences should be iterated in parallel, zip comes useful: >>> for x, y, z in zip("ham", "spam", range(5, 10)):... print x, y, z... h s 5 a p 6 m a 7 If the input sequences are of different length, the result is as long as the shortest input.