BI296: Linux and Shell Programming Lecture 06: Compound Data Types in Python Maoying,Wu ricket.woo@gmail.com Dept. of Bioinformatics & Biostatistics Shanghai Jiao Tong University Spring, 2017 Maoying Wu (CBB) BI296-Lec05 Spring, 2017 1 / 52
Lecture Outline Python Compound Data Types (Python 复合数据类型介绍 ) String ( 字符串 ) List( 列表 ) Tuple ( 元组 ) Dict ( 字典 ) Set ( 集合 ) File I/O ( 文件输入 / 输出 ) open with read(), readline(), readlines() Some important Moules ( 一些重要模块 ) builtin ( 内置模块 ) sys (Python 解释器系统 ) os ( 操作系统 ) time ( 时间 ) re ( 正则表达式 ) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 2 / 52
Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, 2017 3 / 52
Strings Case Issue ( 字符串大小写问题 ) >>> s = "this is a book" >>> s.capitalize() This is a book >>> s.upper() THIS IS A BOOK >>> s.title() This Is A Book >>> s.lower() this is a book >>> s.title().swapcase() this is a book Maoying Wu (CBB) BI296-Lec05 Spring, 2017 4 / 52
String format ( 格式化输出 ) >>> s = This is a book >>> s.center(20) This is a book >>> s.ljust(20) This is a book >>> s.rjust(20) This is a book >>> s.zfill(20) 000000This is a book >>> "{} is a {}".format( This, book ) This is a book Maoying Wu (CBB) BI296-Lec05 Spring, 2017 5 / 52
Type Assertion >>> s = "this is a book" >>> s.isalpha() False >>> s.isalnum() False >>> s.isdigit() False >>> s.islower() False >>> s.isupper() False >>> s.istitle() False >>> s.isspace() False Maoying Wu (CBB) BI296-Lec05 Spring, 2017 6 / 52
String content I >>> s = "this is a book" >>> s.find( a ) 8 >>> s.find( t, 2) -1 >>> s.index(, 5, 15) 11 >>> s.index( m ) Traceback (most recent call last) ValueError: substring not found >>> s.replace( a, an ) this is an book Maoying Wu (CBB) BI296-Lec05 Spring, 2017 7 / 52
String content II >>> s.strip() this is a book >>> s.lstrip() this is a book >>> s.rstrip() this is a book >>> s.startswith( th ) True >>> s.endswith( ok. ) False Maoying Wu (CBB) BI296-Lec05 Spring, 2017 8 / 52
String: join and split >>> s.split( a ) [ this is, book ] >>> s.split( ) [ this, is, a, book ] >>> m.join(s.split()) thismismambook >>> s.partition( ) [ this,, is a book ] >>> s.rpartition( ) [ this is a,, book ] Maoying Wu (CBB) BI296-Lec05 Spring, 2017 9 / 52
String is iterable for... in... >>> s = "hello" >>> for ch in s:... print ch h e l l o Maoying Wu (CBB) BI296-Lec05 Spring, 2017 10 / 52
Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, 2017 11 / 52
List ( 列表 ) A list is an ordered sequence of elements Enclosed by [ and ] >>> list_a = [1,2,3,4,5] >>> type(list_a) <type list > Accession ( 访问元素 ) >>> list_a[0] 1 >>> list_a[-1] 5 Slicing ( 切片 ) >>> list_a[1:3] [2, 3] >>> list_a[:0:-1] [5, 4, 3, 2] >>> list_a[2:] [3, 4, 5] Maoying Wu (CBB) BI296-Lec05 Spring, 2017 12 / 52
List is iterable ( 可迭代 ) for... in... >>> from collections import Iterable >>> list_a = [1,2,3,4,5] >>> isinstance(list_a, Iterable) True >>> for i in list_a:... print a 1 2 3 4 5 Maoying Wu (CBB) BI296-Lec05 Spring, 2017 13 / 52
List is mutable ( 可变的 ) x[i]=v >>> x = [1, 2, 3, 4, 5] >>> id(x) 139661667677896 >>> id(x[3]) 25297024 >>> x[3] = 10 >>> id(x[3]) 25296784 >>> x [1, 2, 3, 10, 5] >>> id(x) 139661667677896 However, this does not work. Why? >>> x = [] >>> x[0] = 5 -------------------------------------------------------- IndexError Traceback (mos <ipython-input-32-fe39b5ac7f1b> Maoying Wu (CBB) BI296-Lec05 in <module>() Spring, 2017 14 / 52
List addition and multiplication ( 列表的加法和乘法 ) s1+s2, s1*n List addition will generate a new list that concatenate the two lists. >>> x = [1,2,3]; y = [4, 5, 6] >>> x + y [1, 2, 3, 4, 5, 6] List multiplication will result in the repeating of the list >>> x*3 [1, 2, 3, 1, 2, 3, 1, 2, 3] >>> 3*x [1, 2, 3, 1, 2, 3, 1, 2, 3] Maoying Wu (CBB) BI296-Lec05 Spring, 2017 15 / 52
List comprehension ( 列表解析 ) [op(x) for x in...] {x 2 x {0,..., n}} >>> square = lambda n: [x**2 for x in xrange(n)] >>> square(5) [0, 1, 4, 9, 16] Similar to map function >>> map(lambda x: x**2, xrange(5)) [0, 1, 4, 9, 16] Maoying Wu (CBB) BI296-Lec05 Spring, 2017 16 / 52
Exercise: List comprehension Let i, j = 1,..., n 1 Generate a list with elements [i,j]. 2 Generate a list with elements [i,j] with i < j 3 Generate a list with elements i + j with both i and j prime and i > j. 4 Write a function that evaluates an arbitrary polynomial a 0 + a 1 x + a 2 x 2 +... + a n x n using a list comprehension, where you are given x and a list with coefficients coefs (hint: use enumerate) >>> x = [1,2,3,4] >>> enumerate(x) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 17 / 52
filter and reduce functions filter method will pass the list through a filter: >>> y = xrange(8) >>> x = filter(lambda x: x**2 < 40, y) >>> x [0, 1, 2, 3, 4, 5, 6] Maoying Wu (CBB) BI296-Lec05 Spring, 2017 18 / 52
filter and reduce functions filter method will pass the list through a filter: >>> y = xrange(8) >>> x = filter(lambda x: x**2 < 40, y) >>> x [0, 1, 2, 3, 4, 5, 6] reduce will result in a single element: >>> x = reduce(lambda i,j: i+j, y) >>> x 28 Maoying Wu (CBB) BI296-Lec05 Spring, 2017 18 / 52
Example: Find all primes up to n factors = lambda n: [x for x in xrange(1,n+1) \ if n%x==0] isprime = lambda m: [1, m] == factors(m) allprimes = lambda n: [m for m in xrange(2,n+1) \ if isprime(m)] allprimes(37) Write a function to return the first n prime numbers. Maoying Wu (CBB) BI296-Lec05 Spring, 2017 19 / 52
More control over lists ( 其他列表方法 ) len(xs) xs.append(x) and xs.extend(ys): any difference? xs.count(x) xs.insert(i, x) xs.sort() and sorted(xs): what s the difference? xs.remove(x) xs.pop() or xs.pop(i) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 20 / 52
More control over lists ( 其他列表方法 ) len(xs) xs.append(x) and xs.extend(ys): any difference? xs.count(x) xs.insert(i, x) xs.sort() and sorted(xs): what s the difference? xs.remove(x) xs.pop() or xs.pop(i) Help documentation using dir(xs), dir(list) or dir([]) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 20 / 52
Copying lists Create a list a with some entries. Now set b = a Change b[1] What happened to a? Now set c = a[:] Change c[2] What happened to a? Now create a function set_first_elem_to_zero(l) that takes a list, sets its first entry to zero, and returns the list. What happens to the original list? Maoying Wu (CBB) BI296-Lec05 Spring, 2017 21 / 52
Exercise: Finding the longest word Write a function that returns the longest word in a variable text that contains a sentence. While text may contain punctuation, these should not be taken into account. What happens with ties? As an example, consider: Hello, how was the football match earlier today??? Hint: s.translate(none, punctuations) s.split() use builtin function max(list, key=func) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 22 / 52
Exercise: Pivot Write a function that takes a value x and a list ys, and returns a list that contains the value x and all elements of ys such that all values y in ys that are smaller than x come first, then we element x and then the rest of the values in ys For example, the output of f(3, [6, 4, 1, 7]) should be [1, 3, 6, 4, 7] Hint: Use list concatenation of list comprehensions Maoying Wu (CBB) BI296-Lec05 Spring, 2017 23 / 52
Exercise: Fair Split Write a function to split a set of integer numbers into three disjoint set, so that each set has the similar sums. Maoying Wu (CBB) BI296-Lec05 Spring, 2017 24 / 52
Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, 2017 25 / 52
Tuple Similar to list Enclosed by ( and ) >>> mytuple = (1, 2, 3) >>> mytuple[1] 2 >>> mytuple[1:3] (2, 3) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 26 / 52
Tuples are immutable Unlike lists, we cannot change elements. >>> mytuple = ([1, 2], [2, 3]) >>> mytuple[0] = [3,4] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: tuple object does not support item assignment >>> mytuple[0][1] = 3 >>> mytuple ([1, 3], [2, 3]) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 27 / 52
Packing and unpacking >>> t = 1, 2, 3 >>> x, y, z = t >>> print t (1, 2, 3) >>> print y 2 >>> print z 3 Maoying Wu (CBB) BI296-Lec05 Spring, 2017 28 / 52
Functions with multiple return values def simple(): return 0, 1, 2 print simple() # (0, 1, 2) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 29 / 52
Swapping two values In other language, you need to create a temporary variable: t = a a = b b = t In Python, it is much simpler: a, b = b, a Maoying Wu (CBB) BI296-Lec05 Spring, 2017 30 / 52
Exercise: Unzip the list of tuples Suppose we have two lists, x and y that give the x and y coordinates of a set of points. 1 Create a list with the coordinates (x,y) as a tuple. Hint: Find out about the zip function. 2 You have decided that actually, you need the two separate lists, but unfortunately, you have thrown them away. How can we use zip to unzip the list of tuples to get two lists again? Hint: When we pass multiple arguments to a function, in the form of list, we need to use the special * operator: >>> args = [( a, 1), ( b, 2), ( c, 3)] >>> zip(*args) [( a, b, c ), (1, 2, 3)] Maoying Wu (CBB) BI296-Lec05 Spring, 2017 31 / 52
Exercise: Compute the distances Suppose we have two n-order vectors, x and y, stored as tuples with n elements. Implement functions that compute the l 1 and l 2 distances between x and y. Note that n is not explicitly given. l 1 = x y 1 = n i=1 x i y i n l 2 = x y 2 = i=1 (x i y i ) 2 Maoying Wu (CBB) BI296-Lec05 Spring, 2017 32 / 52
Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, 2017 33 / 52
Dictionary ( 字典 ) A dictionary is a collection of key-value pairs. An example: the keys are all English words, and their corresponding values are the translations in Chinese. Lists + Dictionaries = $$$ Maoying Wu (CBB) BI296-Lec05 Spring, 2017 34 / 52
Defining a dictionary >>> d = {} >>> d[1] = "one" >>> d[2] = "two" >>> d {1: one, 2: two } >>> e = {1: one, hello : True} >>> e {1: one, hello : True} More key-value pairs can be added to the dictionary at any time. Note that keys should be immutable. Maoying Wu (CBB) BI296-Lec05 Spring, 2017 35 / 52
No duplicate keys Old value gets overwritten instead! >>> d = {1: one, 2: two } >>> d[1] = three >>> d {1: three, 2: two } Maoying Wu (CBB) BI296-Lec05 Spring, 2017 36 / 52
Access We can access values by keys, but not the other way around >>> d = { one : 1, two : 2} >>> print d[ one ] 1 Maoying Wu (CBB) BI296-Lec05 Spring, 2017 37 / 52
Access We can access values by keys, but not the other way around >>> d = { one : 1, two : 2} >>> print d[ one ] 1 Furthermore, we can check whether a key is in the dictionary by key in dict Maoying Wu (CBB) BI296-Lec05 Spring, 2017 37 / 52
Deleting elements We can remove a key-value pair by key using del. And we can clear the dictionary. >>> d = { one : 1, two :2, three :3} >>> del d[ one ] >>> d { two :2, three :3} >>> d.clear() >>> d {} Maoying Wu (CBB) BI296-Lec05 Spring, 2017 38 / 52
All keys, values or both Use d.keys(), d.values() and d.items() >>> d = {1: one, 2: two, 3: three } >>> d {1: one, 2: two, 3: three } >>> d.keys() [1, 2, 3] >>> d.values() [ one, two, three ] >>> d.items() [(1, one ), (2, two ), (3, three )] So how can you loop over dictionaries? Maoying Wu (CBB) BI296-Lec05 Spring, 2017 39 / 52
Small exercise Print all key-value pairs of a dictionary Maoying Wu (CBB) BI296-Lec05 Spring, 2017 40 / 52
Small exercise Print all key-value pairs of a dictionary >>> d = {1: one, 2: two, 3: three } >>> for key, value in d.items():... print key, value... 1 one 2 two 3 three Instead of d.items(), you can use d.iteritems() as well. Better performance for large dictionaries. Maoying Wu (CBB) BI296-Lec05 Spring, 2017 40 / 52
Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, 2017 41 / 52
Sets ( 集合 ) Sets are an unordered collection of unique elements >>> basket = [ apple, orange, apple, pear, orange, banana ] >>> fruits = set(basket) # create a set >>> fruits set([ orange, pear, apple, banana ]) >>> orange in fruit # fast membership testing True >>> watermelon in fruit False Implementation: Similar to a keys-only dictionary. Maoying Wu (CBB) BI296-Lec05 Spring, 2017 42 / 52
Set operations Union ( 并集 ): A B Intersection ( 交集 ):A mathbfb Difference ( 差集 ):A B >>> x = {1, 2, 4} >>> y = {2, 3, 4} >>> x.union(y) {1, 2, 3, 4} >>> x.intersection(y) {2, 4} >>> x.difference(y) {1} Maoying Wu (CBB) BI296-Lec05 Spring, 2017 43 / 52
Set comprehensions Similar to lists >>> a = {x for x in abracadabra if x not in abc } >>> a set([ r, d ]) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 44 / 52
Next we will talk about... 1 String 2 List 3 Tuple 4 Dictionary 5 Set 6 File I/O Maoying Wu (CBB) BI296-Lec05 Spring, 2017 45 / 52
The file object ( 文件对象 ) Interaction with the file system is pretty straightforward in Python. Done using file objects We can instantiate a file object using open or file Have a look at dir(file) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 46 / 52
Open and read the file open(name[, mode[, buffering]]) file object Write a function that opens a file (input: filename), and prints the file line by line. Solution 1: def readfile(fname): f = open(fname, r ) lines = f.read() f.close() for line in lines: print lines Solution 2: def readfile(fname): with open(fname) as f: for line in f: print line Maoying Wu (CBB) BI296-Lec05 Spring, 2017 47 / 52
Opening a file f = open(fname, mode, buffering) fname: path and filename mode: r read file w write to file a append to file b binary file buffereing: 0 unbuffered 1 line-buffered NUMBER buffer size We need to close a file after we are done: f.close() Maoying Wu (CBB) BI296-Lec05 Spring, 2017 48 / 52
Reading files read() Read entire line (or first n characters, if supplied) readline() Reads a single line per call readlines() Returns a list with lines (splits at newline) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 49 / 52
Reading files read() Read entire line (or first n characters, if supplied) readline() Reads a single line per call readlines() Returns a list with lines (splits at newline) Another fast option to read a file with open( f.txt, r ) as f: for line in f: print line Maoying Wu (CBB) BI296-Lec05 Spring, 2017 49 / 52
Writing to file Use write() to write to a file with open(filename, w ) as f: f.write("hello, {}!\n".format(name)) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 50 / 52
More writing examples # write elements of list to file with open(filename, w ) as f: for x in xs: f.write( {}\n.format(x)) # write elements of dictionary to file with open(filename, w ) as f: for k, v in d.iteritems(): f.write( {}: {}\n.format(k, v)) Maoying Wu (CBB) BI296-Lec05 Spring, 2017 51 / 52
Exercise: Word Count Analyze the text file containing the complete works of William Shapespeare. 1 Find the 20 most frequently-used words. 2 How many unique words are used? 3 How many words are used at least 5 times? 4 Write the 200 most common words, and their counts, into a file. Maoying Wu (CBB) BI296-Lec05 Spring, 2017 52 / 52