has to Department of Epidemiology UT MD Anderson Cancer Center Houston, TX April 2, 2008 Programmers Cross Training
Outline has to 1 has 2 to 3 Going object-oriented:
Outline has Brief introduction to The GUI interface of The interface of 1 has Brief introduction to The GUI interface of The interface of Why do we need to write a script? Why do we need to write a script? to
What is? has Brief introduction to The GUI interface of The interface of Why do we need to write a script? A powerful and flexible system for population-based SNP analysis Supports case/control, quantitative trait loci (QTL) and categorical analysis Has and a interface An expensive genetic analysis software we ve already paid for to
GUI has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to
interface has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to
Advantages of using scripts, now has Brief introduction to 1 Power of Conditions, loops, command line arguments,... Multiple analyses in a script Different analyses using command line arguments Running (multiple jobs) in batch mode The GUI interface of The interface of Why do we need to write a script? to
Advantages of using scripts, now has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to 1 Power of 2 Extentions Conditions, loops, command line arguments,... Multiple analyses in a script Different analyses using command line arguments Running (multiple jobs) in batch mode Non- analyses: e.g. statistical analyses and graphics in R Control of output: automatic annotation, use filters to output selected fields Additional functions: mergespreadsheets(), calcindgenotypesex() Code reuse: function library, add menus to
But more importantly, for future 3 What if results look suspicious? has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to
But more importantly, for future 3 What if results look suspicious? AFAIR, I selected that option. has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to
But more importantly, for future has Brief introduction to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. The GUI interface of The interface of Why do we need to write a script? to
But more importantly, for future has Brief introduction to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. The GUI interface of The interface of Why do we need to write a script? to
But more importantly, for future has Brief introduction to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! The GUI interface of The interface of Why do we need to write a script? to
But more importantly, for future has Brief introduction to The GUI interface of 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? The interface of Why do we need to write a script? to
But more importantly, for future has Brief introduction to The GUI interface of The interface of 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. Why do we need to write a script? to
But more importantly, for future has Brief introduction to The GUI interface of The interface of Why do we need to write a script? 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. 5 I ve got a new project. to
But more importantly, for future has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. 5 I ve got a new project. OK, what is first step?
But more importantly, for future has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. 5 I ve got a new project. OK, what is first step? We have a script for a similar project, do we?
GUI is still nice to have, of course has 1 Debug Brief introduction to The GUI interface of The interface of Why do we need to write a script? to
GUI is still nice to have, of course has 1 Debug 2 Quick experimental runs Brief introduction to The GUI interface of The interface of Why do we need to write a script? to
GUI is still nice to have, of course has Brief introduction to 1 Debug 2 Quick experimental runs 3 View results/datasets The GUI interface of The interface of Why do we need to write a script? to
GUI is still nice to have, of course has Brief introduction to The GUI interface of The interface of 1 Debug 2 Quick experimental runs 3 View results/datasets 4 Quick plot Why do we need to write a script? to
Outline has to What do we need to know before 2 to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example Ways of in? How to use Python shell in? How to find command references? A tiny example
Things to know before has 1 What does boss want me to do??? to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Things to know before has 1 What does boss want me to do??? 2 Python Python Python to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Things to know before has to 1 What does boss want me to do??? 2 Python Python Python 3 integrated functional interface What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Something about Python 1 Python is a dynamic object-oriented programming language that supports has to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Something about Python has 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Something about Python has 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) 3 Python makes you become a good programmer to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Something about Python has to What do we need to know before 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) 3 Python makes you become a good programmer 4 Python is becoming more and more popular (NASA, Cisco, Google, Golden Helix...) Ways of in? How to use Python shell in? How to find command references? A tiny example
Something about Python has to What do we need to know before Ways of in? How to use Python shell in? How to find command references? 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) 3 Python makes you become a good programmer 4 Python is becoming more and more popular (NASA, Cisco, Google, Golden Helix...) 5 Resources: http://www.python.org/ Dive into Python Thinking in Python A tiny example
Scripting in has In Python shell to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Scripting in has In Python shell From drop-down menu to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example
Scripting in has to What do we need to know before Ways of in? In Python shell From drop-down menu From command line -s /path/to/script.py param1 param2 Note that current installation requires full path name to script. How to use Python shell in? How to find command references? A tiny example
Use Python shell in has to What do we need to know before The Python shell Acquire current object: obj = ghi.getcurrentobject() Get available methods: dir(obj) Get help: help(obj.associationtests) Ways of in? How to use Python shell in? How to find command references? A tiny example
command references has to What do we need to know before Ways of in? Offline Online pdf manual under directory Getting help from http://www.goldenhelix.com/snp_variation /Manual/manual.html How to use Python shell in? How to find command references? A tiny example
Open a project and run a study has to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example Code ghi.openproject( /home/yxu/research/projects/mockdataset /Mock_example/Mock_example.ghp ) obj = ghi.getobject( Mock_geno )[0] obj.hweplot(1) ghi.saveproject() Available at pcprhelix:yxu/research/projects/ MockDataSet/Mock_tiny.py
Outline has to Shortcomings of using functions directly 3 Going object-oriented: Shortcomings of using functions directly to An example Your work Additional tools Summary to An example Your work Additional tools Summary
Shortcomings of using commands directly has Not effective in some situations (rerun whole analysis after redo/modification of analysis) to Shortcomings of using functions directly to An example Your work Additional tools Summary
Shortcomings of using commands directly has to Not effective in some situations (rerun whole analysis after redo/modification of analysis) Interface parameter s redundance Shortcomings of using functions directly to An example Your work Additional tools Summary
Shortcomings of using commands directly has to Not effective in some situations (rerun whole analysis after redo/modification of analysis) Interface parameter s redundance No programming pattern Shortcomings of using functions directly to An example Your work Additional tools Summary
What is? has 1 What is a Python class A little bit object-oriented Everything in Python is object If you still don t know... (a function library) to Shortcomings of using functions directly to An example Your work Additional tools Summary
What is? has to Shortcomings of using functions directly 1 What is a Python class A little bit object-oriented Everything in Python is object If you still don t know... (a function library) 2 What is A wrapper for s project/spreadsheet management A data-processing class that enforces a few strategies to handle large GWAS dataset to An example Your work Additional tools Summary
Design idea one has to Automatically load existing project, spreadsheets, results when ir dependencies are unchanged; A signal is used to force rerun of some analyses Shortcomings of using functions directly to An example Your work Additional tools Summary
Design idea two Seperate stable large data (genotype) from variable small data description (meta) has to Shortcomings of using functions directly to An example Your work Additional tools Summary
Design idea two Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) has to Shortcomings of using functions directly to An example Your work Additional tools Summary
Design idea two has to Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Shortcomings of using functions directly to An example Your work Additional tools Summary
Design idea two has to Shortcomings of using functions directly Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Meta information can be changed programmatically or using outside programs such as excel to An example Your work Additional tools Summary
Design idea two has to Shortcomings of using functions directly Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Meta information can be changed programmatically or using outside programs such as excel The Info spreadsheets will be read into Python so that y can be used or changed easily to An example Your work Additional tools Summary
Design idea two has to Shortcomings of using functions directly to An example Your work Additional tools Summary Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Meta information can be changed programmatically or using outside programs such as excel The Info spreadsheets will be read into Python so that y can be used or changed easily Samples and markers are chosen according to Info spreadsheets, a new genotype spreadsheet, with case-control information is created for each analysis
Data structure (Genotype) has to Shortcomings of using functions directly to An example Your work Additional tools Summary
Data structure (indinfo) has to Shortcomings of using functions directly to An example Your work Additional tools Summary
Data structure (markerinfo) has to Shortcomings of using functions directly to An example Your work Additional tools Summary
GWAS member functions has to Shortcomings of using functions directly to An example Your work Additional tools Summary setoption() loadgenotype()/ applygeneticmap() loadindinfo()/ loadmarkerinfo() calcindcallrate()/calcmarkercallrate() (more powerful than menu option) calcindgenotypesex() (our own addition) calchwe() (copied from ) extractgenotype() mergespreadsheets() (our own addition) openorcreatecasecontrolspreadsheet() / getorimportspreadsheet() (time consuming steps...) associationstudy() saveresult() (choose selected columns, add annotation from external sources)
MockDataSet has to Shortcomings of using functions directly to An example Your work Additional tools Summary Code: case_control analysis def case_control(prj, excludesexmisspecified=false): case control association analysis indinfo = prj.indinfo markerinfo = prj.markerinfo inds = indinfo.labels() SNPs = markerinfo.labels() cases = [inds[x] for x in range(len(inds)) if indinfo[ aff ][x] == 2 \ and indinfo[ individual call rate ][x] > 0.929 and not indinfo[ exclude ][x] \ and (not excludesexmisspecified or indinfo[ gen controls = [inds[x] for x in range(len(inds)) if indinfo[ aff ][x] == 1 \ and indinfo[ individual call rate ][x] > 0.928 and not indinfo[ exclude ][x] \ and (not excludesexmisspecified or indinfo[ gen
MockDataSet (cont.) has to Shortcomings of using functions directly to An example Your work Additional tools Summary markers = [SNPs[x] for x in range(len(snps)) if markerinfo[ HWE p-value ][x] > 0.005 and markerinfo[ marker call rate ][x] > 0.9] if prj.verbose: print With %d cases, %d controls, and %d markers % \ (len(cases), len(controls), len(markers)) data = prj.openorcreatecasecontrolspreadsheet( case-control, cases, controls, markers) if prj.verbose: print Performing case-control association tests return prj.associationstudy(data, 3, 0, bonferroni=1, fdr=1, genocounts=1, allelecounts=1, usepca=1, numcomponents=10
MockDataSet (cont.) has to Shortcomings of using functions directly to An example Your work Additional tools Summary Code: define and run project prj = GWAS(projectName= Mock_example, projectpath=os.path.join(phome, MockDataSet ), datapath=os.path.join(phome, MockDataSet, data ), ghi=ghi) # se two options are turned on by default prj.setoption(verbose=true, cautious=true) # load genotype data prj.extractgenotype( MockData, Mock_geno, filename= MockD prj.loadgenotype(name= Mock_geno ) prj.applygeneticmap( HelixResult.csv, C, 0, markerid=1, d # indinfo, markerinfo = prepareindandmarkerinfo(prj) # data analysis function eval( %s(prj) % projname) # save results maffilter = greaterthanfilter( Minor Allele Freq., 0.05) chi2filter = lessthanfilter( Chi-Squared P, 0.001) resultname = %s_result % projname
MockDataSet (cont.) has to Shortcomings of using functions directly to Code: save results prj.saveresult(resultname, os.path.join(phome, MockData, %s.csv % resultname), columns = resultcolumns) prj.saveresult(resultname, os.path.join(phome, MockData, %s-chi2-0.001.csv % resultname), columns=resultcolumns, filter=chi2filter) prj.saveresult(resultname, os.path.join(phome, MockData, %s-maf-0.05-chi2-0.001.csv % resultname), columns=resultcolumns, filter=andfilter(maffilter, chi2filter)) An example Your work Additional tools Summary Available at pcprhelix:yxu/research/projects/mockdataset/
What need I to do indeed? has to Shortcomings of using functions directly Create a project Prepare datasets: indinfo, markerinfo Select cases/controls/markers Call associationstudy() Save results to An example Your work Additional tools Summary
gwasutil.py has to Shortcomings of using functions directly Defined classes of utility functions Write to a log file of screen output Defined filters Prepare info files Data-sources for saving results Plot figures using R to An example Your work Additional tools Summary
Summary has to Shortcomings of using functions directly is a good genetic analysis software Python is a great programming language Scripting is not difficult Scripting is very important and valuable to An example Your work Additional tools Summary
Thank you! has to Thank you! Shortcomings of using functions directly to An example Your work Additional tools Summary