1 BI296: Linux and Shell Programming Lecture 06: Compound Data Types in Python Maoying,Wu Dept. of Bioinformatics & Biostatistics Shanghai Jiao Tong University Spring, 2017 Maoying Wu (CBB) BI296-Lec05 Spring, / 52

2 Lecture Outline Python Compound Data Types (Python 复合数据类型介绍 ) String ( 字符串 ) List( 列表 ) Tuple ( 元组 ) Dict ( 字典 ) Set ( 集合 ) File I/O ( 文件输入 / 输出 ) open with read(), readline(), readlines() Some important Moules ( 一些重要模块 ) builtin ( 内置模块 ) sys (Python 解释器系统 ) os ( 操作系统 ) time ( 时间 ) re ( 正则表达式 ) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

3 Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, / 52

4 Strings Case Issue ( 字符串大小写问题 ) >>> s = "this is a book" >>> s.capitalize() This is a book >>> s.upper() THIS IS A BOOK >>> s.title() This Is A Book >>> s.lower() this is a book >>> s.title().swapcase() this is a book Maoying Wu (CBB) BI296-Lec05 Spring, / 52

5 String format ( 格式化输出 ) >>> s = This is a book >>> s.center(20) This is a book >>> s.ljust(20) This is a book >>> s.rjust(20) This is a book >>> s.zfill(20) This is a book >>> "{} is a {}".format( This, book ) This is a book Maoying Wu (CBB) BI296-Lec05 Spring, / 52

6 Type Assertion >>> s = "this is a book" >>> s.isalpha() False >>> s.isalnum() False >>> s.isdigit() False >>> s.islower() False >>> s.isupper() False >>> s.istitle() False >>> s.isspace() False Maoying Wu (CBB) BI296-Lec05 Spring, / 52

7 String content I >>> s = "this is a book" >>> s.find( a ) 8 >>> s.find( t, 2) -1 >>> s.index(, 5, 15) 11 >>> s.index( m ) Traceback (most recent call last) ValueError: substring not found >>> s.replace( a, an ) this is an book Maoying Wu (CBB) BI296-Lec05 Spring, / 52

8 String content II >>> s.strip() this is a book >>> s.lstrip() this is a book >>> s.rstrip() this is a book >>> s.startswith( th ) True >>> s.endswith( ok. ) False Maoying Wu (CBB) BI296-Lec05 Spring, / 52

9 String: join and split >>> s.split( a ) [ this is, book ] >>> s.split( ) [ this, is, a, book ] >>> m.join(s.split()) thismismambook >>> s.partition( ) [ this,, is a book ] >>> s.rpartition( ) [ this is a,, book ] Maoying Wu (CBB) BI296-Lec05 Spring, / 52

10 String is iterable for... in... >>> s = "hello" >>> for ch in s:... print ch h e l l o Maoying Wu (CBB) BI296-Lec05 Spring, / 52

11 Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, / 52

12 List ( 列表 ) A list is an ordered sequence of elements Enclosed by [ and ] >>> list_a = [1,2,3,4,5] >>> type(list_a) <type list > Accession ( 访问元素 ) >>> list_a[0] 1 >>> list_a[-1] 5 Slicing ( 切片 ) >>> list_a[1:3] [2, 3] >>> list_a[:0:-1] [5, 4, 3, 2] >>> list_a[2:] [3, 4, 5] Maoying Wu (CBB) BI296-Lec05 Spring, / 52

13 List is iterable ( 可迭代 ) for... in... >>> from collections import Iterable >>> list_a = [1,2,3,4,5] >>> isinstance(list_a, Iterable) True >>> for i in list_a:... print a Maoying Wu (CBB) BI296-Lec05 Spring, / 52

14 List is mutable ( 可变的 ) x[i]=v >>> x = [1, 2, 3, 4, 5] >>> id(x) >>> id(x[3]) >>> x[3] = 10 >>> id(x[3]) >>> x [1, 2, 3, 10, 5] >>> id(x) However, this does not work. Why? >>> x = [] >>> x[0] = IndexError Traceback (mos <ipython-input-32-fe39b5ac7f1b> Maoying Wu (CBB) BI296-Lec05 in <module>() Spring, / 52

15 List addition and multiplication ( 列表的加法和乘法 ) s1+s2, s1*n List addition will generate a new list that concatenate the two lists. >>> x = [1,2,3]; y = [4, 5, 6] >>> x + y [1, 2, 3, 4, 5, 6] List multiplication will result in the repeating of the list >>> x*3 [1, 2, 3, 1, 2, 3, 1, 2, 3] >>> 3*x [1, 2, 3, 1, 2, 3, 1, 2, 3] Maoying Wu (CBB) BI296-Lec05 Spring, / 52

16 List comprehension ( 列表解析 ) [op(x) for x in...] {x 2 x {0,..., n}} >>> square = lambda n: [x**2 for x in xrange(n)] >>> square(5) [0, 1, 4, 9, 16] Similar to map function >>> map(lambda x: x**2, xrange(5)) [0, 1, 4, 9, 16] Maoying Wu (CBB) BI296-Lec05 Spring, / 52

17 Exercise: List comprehension Let i, j = 1,..., n 1 Generate a list with elements [i,j]. 2 Generate a list with elements [i,j] with i < j 3 Generate a list with elements i + j with both i and j prime and i > j. 4 Write a function that evaluates an arbitrary polynomial a 0 + a 1 x + a 2 x a n x n using a list comprehension, where you are given x and a list with coefficients coefs (hint: use enumerate) >>> x = [1,2,3,4] >>> enumerate(x) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

18 filter and reduce functions filter method will pass the list through a filter: >>> y = xrange(8) >>> x = filter(lambda x: x**2 < 40, y) >>> x [0, 1, 2, 3, 4, 5, 6] Maoying Wu (CBB) BI296-Lec05 Spring, / 52

19 filter and reduce functions filter method will pass the list through a filter: >>> y = xrange(8) >>> x = filter(lambda x: x**2 < 40, y) >>> x [0, 1, 2, 3, 4, 5, 6] reduce will result in a single element: >>> x = reduce(lambda i,j: i+j, y) >>> x 28 Maoying Wu (CBB) BI296-Lec05 Spring, / 52

20 Example: Find all primes up to n factors = lambda n: [x for x in xrange(1,n+1) \ if n%x==0] isprime = lambda m: [1, m] == factors(m) allprimes = lambda n: [m for m in xrange(2,n+1) \ if isprime(m)] allprimes(37) Write a function to return the first n prime numbers. Maoying Wu (CBB) BI296-Lec05 Spring, / 52

21 More control over lists ( 其他列表方法 ) len(xs) xs.append(x) and xs.extend(ys): any difference? xs.count(x) xs.insert(i, x) xs.sort() and sorted(xs): what s the difference? xs.remove(x) xs.pop() or xs.pop(i) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

22 More control over lists ( 其他列表方法 ) len(xs) xs.append(x) and xs.extend(ys): any difference? xs.count(x) xs.insert(i, x) xs.sort() and sorted(xs): what s the difference? xs.remove(x) xs.pop() or xs.pop(i) Help documentation using dir(xs), dir(list) or dir([]) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

23 Copying lists Create a list a with some entries. Now set b = a Change b[1] What happened to a? Now set c = a[:] Change c[2] What happened to a? Now create a function set_first_elem_to_zero(l) that takes a list, sets its first entry to zero, and returns the list. What happens to the original list? Maoying Wu (CBB) BI296-Lec05 Spring, / 52

24 Exercise: Finding the longest word Write a function that returns the longest word in a variable text that contains a sentence. While text may contain punctuation, these should not be taken into account. What happens with ties? As an example, consider: Hello, how was the football match earlier today??? Hint: s.translate(none, punctuations) s.split() use builtin function max(list, key=func) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

25 Exercise: Pivot Write a function that takes a value x and a list ys, and returns a list that contains the value x and all elements of ys such that all values y in ys that are smaller than x come first, then we element x and then the rest of the values in ys For example, the output of f(3, [6, 4, 1, 7]) should be [1, 3, 6, 4, 7] Hint: Use list concatenation of list comprehensions Maoying Wu (CBB) BI296-Lec05 Spring, / 52

26 Exercise: Fair Split Write a function to split a set of integer numbers into three disjoint set, so that each set has the similar sums. Maoying Wu (CBB) BI296-Lec05 Spring, / 52

27 Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, / 52

28 Tuple Similar to list Enclosed by ( and ) >>> mytuple = (1, 2, 3) >>> mytuple[1] 2 >>> mytuple[1:3] (2, 3) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

29 Tuples are immutable Unlike lists, we cannot change elements. >>> mytuple = ([1, 2], [2, 3]) >>> mytuple[0] = [3,4] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: tuple object does not support item assignment >>> mytuple[0][1] = 3 >>> mytuple ([1, 3], [2, 3]) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

30 Packing and unpacking >>> t = 1, 2, 3 >>> x, y, z = t >>> print t (1, 2, 3) >>> print y 2 >>> print z 3 Maoying Wu (CBB) BI296-Lec05 Spring, / 52

31 Functions with multiple return values def simple(): return 0, 1, 2 print simple() # (0, 1, 2) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

32 Swapping two values In other language, you need to create a temporary variable: t = a a = b b = t In Python, it is much simpler: a, b = b, a Maoying Wu (CBB) BI296-Lec05 Spring, / 52

33 Exercise: Unzip the list of tuples Suppose we have two lists, x and y that give the x and y coordinates of a set of points. 1 Create a list with the coordinates (x,y) as a tuple. Hint: Find out about the zip function. 2 You have decided that actually, you need the two separate lists, but unfortunately, you have thrown them away. How can we use zip to unzip the list of tuples to get two lists again? Hint: When we pass multiple arguments to a function, in the form of list, we need to use the special * operator: >>> args = [( a, 1), ( b, 2), ( c, 3)] >>> zip(*args) [( a, b, c ), (1, 2, 3)] Maoying Wu (CBB) BI296-Lec05 Spring, / 52

34 Exercise: Compute the distances Suppose we have two n-order vectors, x and y, stored as tuples with n elements. Implement functions that compute the l 1 and l 2 distances between x and y. Note that n is not explicitly given. l 1 = x y 1 = n i=1 x i y i n l 2 = x y 2 = i=1 (x i y i ) 2 Maoying Wu (CBB) BI296-Lec05 Spring, / 52

35 Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, / 52

36 Dictionary ( 字典 ) A dictionary is a collection of key-value pairs. An example: the keys are all English words, and their corresponding values are the translations in Chinese. Lists + Dictionaries = $$$ Maoying Wu (CBB) BI296-Lec05 Spring, / 52

37 Defining a dictionary >>> d = {} >>> d[1] = "one" >>> d[2] = "two" >>> d {1: one, 2: two } >>> e = {1: one, hello : True} >>> e {1: one, hello : True} More key-value pairs can be added to the dictionary at any time. Note that keys should be immutable. Maoying Wu (CBB) BI296-Lec05 Spring, / 52

38 No duplicate keys Old value gets overwritten instead! >>> d = {1: one, 2: two } >>> d[1] = three >>> d {1: three, 2: two } Maoying Wu (CBB) BI296-Lec05 Spring, / 52

39 Access We can access values by keys, but not the other way around >>> d = { one : 1, two : 2} >>> print d[ one ] 1 Maoying Wu (CBB) BI296-Lec05 Spring, / 52

40 Access We can access values by keys, but not the other way around >>> d = { one : 1, two : 2} >>> print d[ one ] 1 Furthermore, we can check whether a key is in the dictionary by key in dict Maoying Wu (CBB) BI296-Lec05 Spring, / 52

41 Deleting elements We can remove a key-value pair by key using del. And we can clear the dictionary. >>> d = { one : 1, two :2, three :3} >>> del d[ one ] >>> d { two :2, three :3} >>> d.clear() >>> d {} Maoying Wu (CBB) BI296-Lec05 Spring, / 52

42 All keys, values or both Use d.keys(), d.values() and d.items() >>> d = {1: one, 2: two, 3: three } >>> d {1: one, 2: two, 3: three } >>> d.keys() [1, 2, 3] >>> d.values() [ one, two, three ] >>> d.items() [(1, one ), (2, two ), (3, three )] So how can you loop over dictionaries? Maoying Wu (CBB) BI296-Lec05 Spring, / 52

43 Small exercise Print all key-value pairs of a dictionary Maoying Wu (CBB) BI296-Lec05 Spring, / 52

44 Small exercise Print all key-value pairs of a dictionary >>> d = {1: one, 2: two, 3: three } >>> for key, value in d.items():... print key, value... 1 one 2 two 3 three Instead of d.items(), you can use d.iteritems() as well. Better performance for large dictionaries. Maoying Wu (CBB) BI296-Lec05 Spring, / 52

45 Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, / 52

46 Sets ( 集合 ) Sets are an unordered collection of unique elements >>> basket = [ apple, orange, apple, pear, orange, banana ] >>> fruits = set(basket) # create a set >>> fruits set([ orange, pear, apple, banana ]) >>> orange in fruit # fast membership testing True >>> watermelon in fruit False Implementation: Similar to a keys-only dictionary. Maoying Wu (CBB) BI296-Lec05 Spring, / 52

47 Set operations Union ( 并集 ): A B Intersection ( 交集 ):A mathbfb Difference ( 差集 ):A B >>> x = {1, 2, 4} >>> y = {2, 3, 4} >>> x.union(y) {1, 2, 3, 4} >>> x.intersection(y) {2, 4} >>> x.difference(y) {1} Maoying Wu (CBB) BI296-Lec05 Spring, / 52

48 Set comprehensions Similar to lists >>> a = {x for x in abracadabra if x not in abc } >>> a set([ r, d ]) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

49 Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, / 52

50 The file object ( 文件对象 ) Interaction with the file system is pretty straightforward in Python. Done using file objects We can instantiate a file object using open or file Have a look at dir(file) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

51 Open and read the file open(name[, mode[, buffering]]) file object Write a function that opens a file (input: filename), and prints the file line by line. Solution 1: def readfile(fname): f = open(fname, r ) lines = f.read() f.close() for line in lines: print lines Solution 2: def readfile(fname): with open(fname) as f: for line in f: print line Maoying Wu (CBB) BI296-Lec05 Spring, / 52

52 Opening a file f = open(fname, mode, buffering) fname: path and filename mode: r read file w write to file a append to file b binary file buffereing: 0 unbuffered 1 line-buffered NUMBER buffer size We need to close a file after we are done: f.close() Maoying Wu (CBB) BI296-Lec05 Spring, / 52

53 Reading files read() Read entire line (or first n characters, if supplied) readline() Reads a single line per call readlines() Returns a list with lines (splits at newline) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

54 Reading files read() Read entire line (or first n characters, if supplied) readline() Reads a single line per call readlines() Returns a list with lines (splits at newline) Another fast option to read a file with open( f.txt, r ) as f: for line in f: print line Maoying Wu (CBB) BI296-Lec05 Spring, / 52

55 Writing to file Use write() to write to a file with open(filename, w ) as f: f.write("hello, {}!\n".format(name)) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

56 More writing examples # write elements of list to file with open(filename, w ) as f: for x in xs: f.write( {}\n.format(x)) # write elements of dictionary to file with open(filename, w ) as f: for k, v in d.iteritems(): f.write( {}: {}\n.format(k, v)) Maoying Wu (CBB) BI296-Lec05 Spring, / 52

57 Exercise: Word Count Analyze the text file containing the complete works of William Shapespeare. 1 Find the 20 most frequently-used words. 2 How many unique words are used? 3 How many words are used at least 5 times? 4 Write the 200 most common words, and their counts, into a file. Maoying Wu (CBB) BI296-Lec05 Spring, / 52

