Genome-Wide Association Study Using

Similar documents
BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

Importing and Merging Data Tutorial

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

6.034 Design Assignment 2

USER S MANUAL FOR THE AMaCAID PROGRAM

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.

Package REGENT. R topics documented: August 19, 2015

Step-by-Step Guide to Relatedness and Association Mapping Contents

Genetic Analysis. Page 1

The fgwas software. Version 1.0. Pennsylvannia State University

A whirlwind introduction to using R for your research

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual

Recalling Genotypes with BEAGLECALL Tutorial

Release Notes. JMP Genomics. Version 4.0

Package GWAF. March 12, 2015

Polymorphism and Variant Analysis Lab

Bioinformatics - Homework 1 Q&A style

PRSice: Polygenic Risk Score software - Vignette

Package SimGbyE. July 20, 2009

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

The Imprinting Model

Genomics. Nolan C. Kane

Using IDLE for

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

The H3ABioNet GWAS Pipeline

Analytical Processing of Data of statistical genetics research in UNIX like Systems

HybridCheck User Manual

DiskBoss DATA MANAGEMENT

A short manual for LFMM (command-line version)

DiskBoss DATA MANAGEMENT

UAccess ANALYTICS Next Steps: Working with Bins, Groups, and Calculated Items: Combining Data Your Way

Automatic Programming: How Far Can Machines Go? Hila Peleg Technion

Quick Start Guide. CodeGenerator v1.5.0

Week 8 Lecture: Getting Things Done

DiskBoss DATA MANAGEMENT

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two:

Spotter Documentation Version 0.5, Released 4/12/2010

Package EMLRT. August 7, 2014

Download PLINK from

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register

Sucuri Webinar Q&A HOW TO IDENTIFY AND FIX A HACKED WORDPRESS WEBSITE. Ben Martin - Remediation Team Lead

haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous

SISG/SISMID Module 3

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

Information Technology

CompClustTk Manual & Tutorial

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

SCRATCH MODULE 3: NUMBER CONVERSIONS

Part 6b: The effect of scale on raster calculations mean local relief and slope

R in Linguistic Analysis. Week 2 Wassink Autumn 2012

snpqc an R pipeline for quality control of Illumina SNP data

MAGA: Meta-Analysis of Gene-level Associations

Techniques for Optimizing Reusable Content in LibGuides

Digital Marketing & Sales Training. Part 1: SEO, Local, & AdWords Express Leadgenix & AG 431

Our legacy archival system resides in an Access Database lovingly named The Beast. Having the data in a database provides the opportunity and ability

PLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System

Ricopili: Introdution. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

Graphics Performance Benchmarking Framework ATI. Presented to: Jerry Howard. By: Drew Roberts, Nicholas Tower, Jason Underhill

Systems Software. Recitation 1: Intro & Revision Control. Quite different from 213. Our Philosophy. Partly-free lunch

Maximizing Public Data Sources for Sequencing and GWAS

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Small example of use of OmicABEL

Effective Recombination in Plant Breeding and Linkage Mapping Populations: Testing Models and Mating Schemes

Data to App: Web,Tablet and Smart Phone Duane Griffith, Montana State University,

Creating and Using Genome Assemblies Tutorial

BioBin User Guide Current version: BioBin 2.3

Tips and Guidance for Analyzing Data. Executive Summary

How to Set up a Budget Advanced Excel Part B

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

SPSS TRAINING SPSS VIEWS

IN-CLASS EXERCISE: INTRODUCTION TO R

Notes on QTL Cartographer

fasta2genotype.py Version 1.10 Written for Python Available on request from the author 2017 Paul Maier

0 Graphical Analysis Use of Excel

PRSice: Polygenic Risk Score software v1.22

GWAS Exercises 3 - GWAS with a Quantiative Trait

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH

SECTION 1: INTRODUCTION. ENGR 112 Introduction to Engineering Computing

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

Variant calling using SAMtools

Release Notes and Installation Guide (Unix Version)

Mascot Insight is a new application designed to help you to organise and manage your Mascot search and quantitation results. Mascot Insight provides

databuild Documentation

Introduction to Python Part 2

HVAC Designer Application Features

QTL Analysis with QGene Tutorial

JatinSir - Mastering Python

Snakemake overview. Thomas Cokelaer. Nov 9th 2017 Snakemake and Sequana overview. Institut Pasteur

Practical Unix exercise MBV INFX410

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

Transcription:

has to Department of Epidemiology UT MD Anderson Cancer Center Houston, TX April 2, 2008 Programmers Cross Training

Outline has to 1 has 2 to 3 Going object-oriented:

Outline has Brief introduction to The GUI interface of The interface of 1 has Brief introduction to The GUI interface of The interface of Why do we need to write a script? Why do we need to write a script? to

What is? has Brief introduction to The GUI interface of The interface of Why do we need to write a script? A powerful and flexible system for population-based SNP analysis Supports case/control, quantitative trait loci (QTL) and categorical analysis Has and a interface An expensive genetic analysis software we ve already paid for to

GUI has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to

interface has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to

Advantages of using scripts, now has Brief introduction to 1 Power of Conditions, loops, command line arguments,... Multiple analyses in a script Different analyses using command line arguments Running (multiple jobs) in batch mode The GUI interface of The interface of Why do we need to write a script? to

Advantages of using scripts, now has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to 1 Power of 2 Extentions Conditions, loops, command line arguments,... Multiple analyses in a script Different analyses using command line arguments Running (multiple jobs) in batch mode Non- analyses: e.g. statistical analyses and graphics in R Control of output: automatic annotation, use filters to output selected fields Additional functions: mergespreadsheets(), calcindgenotypesex() Code reuse: function library, add menus to

But more importantly, for future 3 What if results look suspicious? has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to

But more importantly, for future 3 What if results look suspicious? AFAIR, I selected that option. has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to

But more importantly, for future has Brief introduction to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. The GUI interface of The interface of Why do we need to write a script? to

But more importantly, for future has Brief introduction to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. The GUI interface of The interface of Why do we need to write a script? to

But more importantly, for future has Brief introduction to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! The GUI interface of The interface of Why do we need to write a script? to

But more importantly, for future has Brief introduction to The GUI interface of 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? The interface of Why do we need to write a script? to

But more importantly, for future has Brief introduction to The GUI interface of The interface of 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. Why do we need to write a script? to

But more importantly, for future has Brief introduction to The GUI interface of The interface of Why do we need to write a script? 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. 5 I ve got a new project. to

But more importantly, for future has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. 5 I ve got a new project. OK, what is first step?

But more importantly, for future has Brief introduction to The GUI interface of The interface of Why do we need to write a script? to 3 What if results look suspicious? AFAIR, I selected that option. I do not know, Qing has left. I can have a look and re-run script. 4 We have some additional data! Again? I can (modify and) re-run script. 5 I ve got a new project. OK, what is first step? We have a script for a similar project, do we?

GUI is still nice to have, of course has 1 Debug Brief introduction to The GUI interface of The interface of Why do we need to write a script? to

GUI is still nice to have, of course has 1 Debug 2 Quick experimental runs Brief introduction to The GUI interface of The interface of Why do we need to write a script? to

GUI is still nice to have, of course has Brief introduction to 1 Debug 2 Quick experimental runs 3 View results/datasets The GUI interface of The interface of Why do we need to write a script? to

GUI is still nice to have, of course has Brief introduction to The GUI interface of The interface of 1 Debug 2 Quick experimental runs 3 View results/datasets 4 Quick plot Why do we need to write a script? to

Outline has to What do we need to know before 2 to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example Ways of in? How to use Python shell in? How to find command references? A tiny example

Things to know before has 1 What does boss want me to do??? to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Things to know before has 1 What does boss want me to do??? 2 Python Python Python to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Things to know before has to 1 What does boss want me to do??? 2 Python Python Python 3 integrated functional interface What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Something about Python 1 Python is a dynamic object-oriented programming language that supports has to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Something about Python has 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Something about Python has 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) 3 Python makes you become a good programmer to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Something about Python has to What do we need to know before 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) 3 Python makes you become a good programmer 4 Python is becoming more and more popular (NASA, Cisco, Google, Golden Helix...) Ways of in? How to use Python shell in? How to find command references? A tiny example

Something about Python has to What do we need to know before Ways of in? How to use Python shell in? How to find command references? 1 Python is a dynamic object-oriented programming language that supports 2 Python is easy to get started (hours - a few days) 3 Python makes you become a good programmer 4 Python is becoming more and more popular (NASA, Cisco, Google, Golden Helix...) 5 Resources: http://www.python.org/ Dive into Python Thinking in Python A tiny example

Scripting in has In Python shell to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Scripting in has In Python shell From drop-down menu to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example

Scripting in has to What do we need to know before Ways of in? In Python shell From drop-down menu From command line -s /path/to/script.py param1 param2 Note that current installation requires full path name to script. How to use Python shell in? How to find command references? A tiny example

Use Python shell in has to What do we need to know before The Python shell Acquire current object: obj = ghi.getcurrentobject() Get available methods: dir(obj) Get help: help(obj.associationtests) Ways of in? How to use Python shell in? How to find command references? A tiny example

command references has to What do we need to know before Ways of in? Offline Online pdf manual under directory Getting help from http://www.goldenhelix.com/snp_variation /Manual/manual.html How to use Python shell in? How to find command references? A tiny example

Open a project and run a study has to What do we need to know before Ways of in? How to use Python shell in? How to find command references? A tiny example Code ghi.openproject( /home/yxu/research/projects/mockdataset /Mock_example/Mock_example.ghp ) obj = ghi.getobject( Mock_geno )[0] obj.hweplot(1) ghi.saveproject() Available at pcprhelix:yxu/research/projects/ MockDataSet/Mock_tiny.py

Outline has to Shortcomings of using functions directly 3 Going object-oriented: Shortcomings of using functions directly to An example Your work Additional tools Summary to An example Your work Additional tools Summary

Shortcomings of using commands directly has Not effective in some situations (rerun whole analysis after redo/modification of analysis) to Shortcomings of using functions directly to An example Your work Additional tools Summary

Shortcomings of using commands directly has to Not effective in some situations (rerun whole analysis after redo/modification of analysis) Interface parameter s redundance Shortcomings of using functions directly to An example Your work Additional tools Summary

Shortcomings of using commands directly has to Not effective in some situations (rerun whole analysis after redo/modification of analysis) Interface parameter s redundance No programming pattern Shortcomings of using functions directly to An example Your work Additional tools Summary

What is? has 1 What is a Python class A little bit object-oriented Everything in Python is object If you still don t know... (a function library) to Shortcomings of using functions directly to An example Your work Additional tools Summary

What is? has to Shortcomings of using functions directly 1 What is a Python class A little bit object-oriented Everything in Python is object If you still don t know... (a function library) 2 What is A wrapper for s project/spreadsheet management A data-processing class that enforces a few strategies to handle large GWAS dataset to An example Your work Additional tools Summary

Design idea one has to Automatically load existing project, spreadsheets, results when ir dependencies are unchanged; A signal is used to force rerun of some analyses Shortcomings of using functions directly to An example Your work Additional tools Summary

Design idea two Seperate stable large data (genotype) from variable small data description (meta) has to Shortcomings of using functions directly to An example Your work Additional tools Summary

Design idea two Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) has to Shortcomings of using functions directly to An example Your work Additional tools Summary

Design idea two has to Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Shortcomings of using functions directly to An example Your work Additional tools Summary

Design idea two has to Shortcomings of using functions directly Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Meta information can be changed programmatically or using outside programs such as excel to An example Your work Additional tools Summary

Design idea two has to Shortcomings of using functions directly Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Meta information can be changed programmatically or using outside programs such as excel The Info spreadsheets will be read into Python so that y can be used or changed easily to An example Your work Additional tools Summary

Design idea two has to Shortcomings of using functions directly to An example Your work Additional tools Summary Seperate stable large data (genotype) from variable small data description (meta) A single genotype spreadsheet without demographic information (apply genetic map if exists) Two spreadsheets DATA_NAME_ind_info and DATA_NAME_marker_info that keeps meta information for individuals and markers Meta information can be changed programmatically or using outside programs such as excel The Info spreadsheets will be read into Python so that y can be used or changed easily Samples and markers are chosen according to Info spreadsheets, a new genotype spreadsheet, with case-control information is created for each analysis

Data structure (Genotype) has to Shortcomings of using functions directly to An example Your work Additional tools Summary

Data structure (indinfo) has to Shortcomings of using functions directly to An example Your work Additional tools Summary

Data structure (markerinfo) has to Shortcomings of using functions directly to An example Your work Additional tools Summary

GWAS member functions has to Shortcomings of using functions directly to An example Your work Additional tools Summary setoption() loadgenotype()/ applygeneticmap() loadindinfo()/ loadmarkerinfo() calcindcallrate()/calcmarkercallrate() (more powerful than menu option) calcindgenotypesex() (our own addition) calchwe() (copied from ) extractgenotype() mergespreadsheets() (our own addition) openorcreatecasecontrolspreadsheet() / getorimportspreadsheet() (time consuming steps...) associationstudy() saveresult() (choose selected columns, add annotation from external sources)

MockDataSet has to Shortcomings of using functions directly to An example Your work Additional tools Summary Code: case_control analysis def case_control(prj, excludesexmisspecified=false): case control association analysis indinfo = prj.indinfo markerinfo = prj.markerinfo inds = indinfo.labels() SNPs = markerinfo.labels() cases = [inds[x] for x in range(len(inds)) if indinfo[ aff ][x] == 2 \ and indinfo[ individual call rate ][x] > 0.929 and not indinfo[ exclude ][x] \ and (not excludesexmisspecified or indinfo[ gen controls = [inds[x] for x in range(len(inds)) if indinfo[ aff ][x] == 1 \ and indinfo[ individual call rate ][x] > 0.928 and not indinfo[ exclude ][x] \ and (not excludesexmisspecified or indinfo[ gen

MockDataSet (cont.) has to Shortcomings of using functions directly to An example Your work Additional tools Summary markers = [SNPs[x] for x in range(len(snps)) if markerinfo[ HWE p-value ][x] > 0.005 and markerinfo[ marker call rate ][x] > 0.9] if prj.verbose: print With %d cases, %d controls, and %d markers % \ (len(cases), len(controls), len(markers)) data = prj.openorcreatecasecontrolspreadsheet( case-control, cases, controls, markers) if prj.verbose: print Performing case-control association tests return prj.associationstudy(data, 3, 0, bonferroni=1, fdr=1, genocounts=1, allelecounts=1, usepca=1, numcomponents=10

MockDataSet (cont.) has to Shortcomings of using functions directly to An example Your work Additional tools Summary Code: define and run project prj = GWAS(projectName= Mock_example, projectpath=os.path.join(phome, MockDataSet ), datapath=os.path.join(phome, MockDataSet, data ), ghi=ghi) # se two options are turned on by default prj.setoption(verbose=true, cautious=true) # load genotype data prj.extractgenotype( MockData, Mock_geno, filename= MockD prj.loadgenotype(name= Mock_geno ) prj.applygeneticmap( HelixResult.csv, C, 0, markerid=1, d # indinfo, markerinfo = prepareindandmarkerinfo(prj) # data analysis function eval( %s(prj) % projname) # save results maffilter = greaterthanfilter( Minor Allele Freq., 0.05) chi2filter = lessthanfilter( Chi-Squared P, 0.001) resultname = %s_result % projname

MockDataSet (cont.) has to Shortcomings of using functions directly to Code: save results prj.saveresult(resultname, os.path.join(phome, MockData, %s.csv % resultname), columns = resultcolumns) prj.saveresult(resultname, os.path.join(phome, MockData, %s-chi2-0.001.csv % resultname), columns=resultcolumns, filter=chi2filter) prj.saveresult(resultname, os.path.join(phome, MockData, %s-maf-0.05-chi2-0.001.csv % resultname), columns=resultcolumns, filter=andfilter(maffilter, chi2filter)) An example Your work Additional tools Summary Available at pcprhelix:yxu/research/projects/mockdataset/

What need I to do indeed? has to Shortcomings of using functions directly Create a project Prepare datasets: indinfo, markerinfo Select cases/controls/markers Call associationstudy() Save results to An example Your work Additional tools Summary

gwasutil.py has to Shortcomings of using functions directly Defined classes of utility functions Write to a log file of screen output Defined filters Prepare info files Data-sources for saving results Plot figures using R to An example Your work Additional tools Summary

Summary has to Shortcomings of using functions directly is a good genetic analysis software Python is a great programming language Scripting is not difficult Scripting is very important and valuable to An example Your work Additional tools Summary

Thank you! has to Thank you! Shortcomings of using functions directly to An example Your work Additional tools Summary