Python Karin Lagesen karin.lagesen@bio.uio.no
Plan for the day Basic data types data manipulation Flow control and file handling Functions Biopython package
What is programming? Programming: ordered set of instructions Programming can be compared to a: Cooking recipe Ikea furniture instructions Lab protocol Language: instruction set Programming: combining instructions to solve problem
How to make a program Need a programming language Programming language dictates the set of available instructions Several types of languages several types of instruction sets Two main types: Interpreted languages Compiled languages
Interpreted languages No compilation needed Program interpreted on-the-fly Programs often called scripts Example of interpreted languages: General purpose: perl, python Special purpose: R Possible disadvantage: can be slower than compiled programs.
Interactive vs. batch mode Python can be used in the shell, interactively Useful for testing etc Exit from python: Ctrl-D Most common: save code in text file, run in shell Called batch mode
Python data types Numbers: integers and floats Strings Lists Dictionaries Not taught today: sets and tuples
Two different type features Sequence datatypes: Sequential order Strings, lists and tuples Immutable datatypes: Cannot be changed Numbers, strings, tuples
Exercise in class Log in to freebee.abel.uio.no Create a directory called python Type in module load python2 Type in python You are now in the python interactive shell
Python operators
Python as a calculator Do the following: 2+2 4-2 5*23 12/6 11/6 11.0/6 2**8
Python as a calculator [karinlag@freebee]~/teaching% python Python 2.7.3 (default, Jun 19 2012, 15:54:55) [GCC 4.4.6 20110731 (Red Hat 4.4.6 3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 2+2 4 >>> 4 2 2 >>> 5*23 115 >>> 12/6 2 >>> 11/6 1 >>> 11.0/6 1.8333333333333333 >>> 2**8 256 >>> In python 2x, division rounds numbers down
Creating script A script is code in a file which is run Create script file: Open nano Write in print 2+2 Save the file as first.py Run script: python first.py Modify: add other calculations
first.py [karinlag@freebee]~/teaching% cat first.py print 2+2 print 4 2 print 3*6 print 7**8 [karinlag@freebee]~/teaching% [karinlag@freebee]~/teaching% python first.py 4 2 18 5764801 [karinlag@freebee]~/teaching%
Strings Use ', or ''' to delineate Remember: same type on each end ''' can be used to create block text >>> '''This is a piece... of block text''' 'This is a piece\nof block text' >>> Newline: \n Tab: \t
String operations >>> "TTAAGAGGA".replace("T", "U") 'UUAAGAGGA' >>> "TTAAGAGGA".count("G") 3 >>> "TTAAGAGGA".find("AG") 3 >>> "TTAAGAGGA".find("AGX") 1 >>> "TTAAGAGGA".index("AG") 3 >>> "TTAAGAGGA".index("AGX") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not found >>> "TTAAGAGGA".split("A") ['TT', '', 'G', 'GG', ''] >>> "TTA,AGA,GGA".split(",") ['TTA', 'AGA', 'GGA'] >>> "TTA AGA GGA".split() ['TTA', 'AGA', 'GGA'] >>> Note the error message! This is pythons way of telling you something went wrong. Repeat on freebee
Variables A variable is something that has a value that may change Naming variables: Letters, numbers and _ CasE sensitive Numbers may not be first Some words are reserved Convention: small letters, underscore to separate words
Reserved words
Using variables >>> t1 = "TTAAGAGGA" >>> t2 = "GGGG" >>> t1 + t2 'TTAAGAGGAGGGG' >>> t1.replace("t", "U") 'UUAAGAGGA' >>> t1 'TTAAGAGGA' >>> t3 = t1.replace("t", "U") >>> t3 'UUAAGAGGA' >>> We are using the variable instead of the string itself Can do the same thing to another string
count.py Create script file: Open nano Write in text = "ATGGCGGAGGA" nogs = text.count("g") print text, "contains", nogs, "Gs" Save the file Run script: python count.py Modify: count GGs instead of Gs
Dynamic, strong typing Type in the following: >>> t1 = "TTAAGAGGA" >>> t4 = 123 >>> t1 + t4 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot concatenate 'str' and 'int' objects >>> No need to specify type Type is interpreted as we go along python objects we do something wrong
Lists Ordered collection of elements list1 = [elem, elem, elem] Can hold elements of any type, including another list >>> list1 = ["a", "c", b ] >>> list2 = ["X", "Y", list1] >>> list2 ['X', 'Y', ['a', 'c', 'b']] Can be sorted (in place) using.sort() >>> list1.sort() >>> list1 ['a', 'b', 'c'] >>>
Adding to list Create empty list: list1 = [] Add to list: list1.append(elem) Adds element to the end Extend list: list1.extend(elem) Extends list with elements
List adding example >>> list1 = "A,B,C,D,E".split(",") >>> list1 ['A', 'B', 'C', 'D', 'E'] >>> list1.append('f') >>> list1 ['A', 'B', 'C', 'D', 'E', 'F'] >>> list1.extend('g') >>> list1 ['A', 'B', 'C', 'D', 'E', 'F', 'G'] >>> list1.insert(3,'g') >>> list1 ['A', 'B', 'C', 'G', 'D', 'E', 'F', 'G'] >>> list1.extend([1,2]) >>> list1 ['A', 'B', 'C', 'G', 'D', 'E', 'F', 'G', 1, 2] >>> list1.append([1,2]) ` Basic adding to list Note difference between append and extend! >>> list1 ['A', 'B', 'C', 'G', 'D', 'E', 'F', 'G', 1, 2, [1, 2]] >>>
List removal list1.remove(elem) remove specified element list1.pop(index) return elem in index, default is last
Special list: sys.argv How do you get input from command line? Example: calculate AT content for a text that you give the script Everything after script name on command line is in list called sys.argv [karinlag@freebee]~/teaching% cat sys_argv.py import sys print sys.argv [karinlag@freebee]~/teaching% python sys_argv.py a b c ['sys_argv.py', 'a', 'b', 'c'] [karinlag@freebee]~/teaching% python sys_argv.py 1 2 ['sys_argv.py', '1', '2'] [karinlag@freebee]~/teaching% To use: import sys on top
Sequence methods Works on strings, lists and tuples Indexing Index starts at zero Negative indices go from right edge Slicing Can access portions of sequence using indices In operator test for membership Concatenation add two together with + Len, min, max
Indices >>> text = "ABCDEFG" >>> text[2] 'C' >>> text[ 2] 'F' >>> text[2:4] 'CD' >>> text[2: 2] 'CDE' >>> text[:4] 'ABCD' >>> text[4:] 'EFG' >>> 0 1 2 3 4 5 6 A B C D E F G -7-6 -5-4 -3-2 -1 Note: for slicing, it is [from and including : to but excluding]
in operator Test if element is in sequence Works with lists, sequences, tuples >>> X = [1,4,8,2,9] >>> X [1, 4, 8, 2, 9] >>> 5 in X False >>> 8 in X True >>> >>> X = "ABCDEF" >>> X 'ABCDEF' >>> "Y" in X False >>> >>> "BC" in X True >>>
Concatenation Concatenation: + sign Can only concatenate same types >>> a = [1,2] >>> b = [3,4] >>> a + b [1, 2, 3, 4] >>> c = 56 >>> a + c Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can only concatenate list (not "int") to list >>> Canging types is called casting: Change string to number: int(string), float(string) Change number to string: str(number)
Len, min, max Len: length of sequence object Min: minimum Max: maximum >>> txt1 = "ABCDEF" >>> len(txt1) 6 >>> max(txt1) 'F' >>> min(txt1) 'A' >>>
Sequence manipulation In python interactive shell: Create list [1,4,8,2,10] Find the maximum number Find out if the number 9 is in the list Add the number 9 to the list, repeat test Use sort to find the two lowest numbers All, except sort, works for strings too!
Sequence manipulation >>> list1 = [1,4,8,2,10] >>> list1 [1, 4, 8, 2, 10] >>> max(list1) 10 >>> 9 in list1 False >>> list1.append(9) >>> list1 [1, 4, 8, 2, 10, 9] >>> 9 in list1 True >>> list1.sort() >>> list1 [1, 2, 4, 8, 9, 10] >>> list1[:2] [1, 2] >>> Max in list Is 9 in list? Define list Add 9 to list Retest for 9 Sort list Get two lowest numbers
atcontent.py Create script file: Text is ATGGCCGG, read in as sys.argv[1] Figure out: Run script: Length Number of As Number of Ts python atcontent.py ATGGCCGG Modify: how do you figure out AT content?
atcontent.py [karinlag@freebee]~/teaching% cat atcontent.py import sys text = sys.argv[1] As = text.count("a") Ts = text.count("t") print "Length of", text, "is", len(text) print "Nos of As is", As print "Nos of Ts is", Ts print "AT content is", (As + Ts)*1.0/len(text) [karinlag@freebee]~/teaching% % python atcontent.py ATGGCGG Length of ATGGCCGG is 8 Nos of As is 1 Nos of Ts is 1 AT content is 0.25 [karinlag@freebee]~/teaching%
Dictionaries Stores unordered, arbitrarily indexed data Consists of key-value pairs Dict = {key:value, key:value, key:value...} Note: keys must be immutable! ergo: numbers, tuples or strings Values may be anything, incl. another dictionary Mainly used for storing associations or mappings
Create, add, lookup, remove Creation: mydict = {} (empty), or mydict = { mykey:myval, mykey2:myval2 } Adding: mydict[key] = value Lookup: mydict[key] Remove: del mydict[key]
Dictionary methods All keys: mylist.keys() - returns list of keys All values: mydict.values() - returns list of values All key-value pairs as list of tuples: mydict.items() Get one specific value: mydict.get(key [, default]) if default is given, that is returned if key is not present in the dictionary, else None is returned Test for presence of key: key in mydict returns True or False
dict_manip.py Create script file: Create this dictionary: { A : 1, 1: A, B :[1,2,3]} Find out the following: Run script: how many pairs are there? add str : {1: X } to the dictionary print the value that is stored with key str python dict_manip.py Modify: print the last element from the list stored under B
dict_manip.py [karinlag@freebee]~/teaching% cat dict_manip.py my_dict = {"A": 1, 1:"A", "B":[1,2,3]} print "Nos of elements", len(my_dict) my_dict["str"] = {1:"X"} print "my_dict['str']", my_dict["str"] print "last element in list with key 'B'", my_dict["b"][ 1] [karinlag@freebee]~/teaching% [karinlag@freebee]~/teaching% python dict_manip.py Nos of elements 3 my_dict['str'] {1: 'X'} last element in list with key 'B' 3 [karinlag@freebee]~/teaching%