Spotter Documentation Version 0.5, Released 4/12/2010

Similar documents
git-pr Release dev2+ng5b0396a

X Generic Event Extension. Peter Hutterer

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

Feed Cache for Umbraco Version 2.0

Sensor-fusion Demo Documentation

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

mp3fm Documentation Release Akshit Agarwal

utidylib Documentation Release 0.4

Asthma Eliminator MicroMedic Competition Entry

Elegans Documentation

Genetic Analysis. Page 1

deepatari Documentation

Dellve CuDNN Documentation

MCAFEE THREAT INTELLIGENCE EXCHANGE RESILIENT THREAT SERVICE INTEGRATION GUIDE V1.0

sensor-documentation Documentation

RTI Connext DDS Core Libraries

GWAS Exercises 3 - GWAS with a Quantiative Trait

TWO-FACTOR AUTHENTICATION Version 1.1.0

BME280 Documentation. Release Richard Hull

ExaFMM. Fast multipole method software aiming for exascale systems. User's Manual. Rio Yokota, L. A. Barba. November Revision 1

Open Source Used In Cisco Configuration Professional for Catalyst 1.0

Inptools Manual. Steffen Macke

Firebase PHP SDK. Release

aiounittest Documentation

PyCon APAC 2014 Documentation

Tailor Documentation. Release 0.1. Derek Stegelman, Garrett Pennington, and Jon Faustman

XEP-0099: IQ Query Action Protocol

Daedalus Documentation

XEP-0087: Stream Initiation

SopaJS JavaScript library package

Folder Poll General User s Guide

agate-sql Documentation

PHP-FCM Documentation

NTLM NTLM. Feature Description

inflection Documentation

Guest Book. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

XTEST Extension Library

Industries Package. TARMS Inc.

dublincore Documentation

Step-by-Step Guide to Basic Genetic Analysis

ClassPad Manager Subscription

disspcap Documentation

MEAS HTU21D PERIPHERAL MODULE

Preprocessing of fmri data

VMware vcenter Log Insight Manager. Deployment Guide

Instagram PHP Documentation

Colgate, WI

Tenable Hardware Appliance Upgrade Guide

Importing and Merging Data Tutorial

User Guide. Calibrated Software, Inc.

abstar Documentation Release Bryan Briney

clipbit Release 0.1 David Fraser

Packet Trace Guide. Packet Trace Guide. Technical Note

PyDotPlus Documentation

HTNG Web Services Product Specification. Version 2014A

XTEST Extension Protocol

MEAS TEMPERATURE SYSTEM SENSOR (TSYS01) XPLAINED PRO BOARD

Testworks User Guide. Release 1.0. Dylan Hackers

Transparency & Consent Framework

NDIS Implementation Guide

josync Documentation Release 1.0 Joel Goop and Jonas Einarsson

Java Relying Party API v1.0 Programmer s Guide

User s Guide for macos with Stata and R

International Color Consortium

LANDISVIEW Beta v1.0-user Guide

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

MatPlotTheme Documentation

puppet-diamond Documentation

delegator Documentation

HTNG Web Services Product Specification. Version 2011A

Django Mail Queue Documentation

Transparency & Consent Framework

Statsd Metrics Documentation

The XIM Transport Specification

DATAGATE MK2. Box Contents. Additional Features (licenses) Features. Safety

Use in High-Safety Applications

CuteFlow-V4 Documentation

Imagination Documentation

mqtt-broker Documentation

Black Mamba Documentation

Agilent Genomic Workbench 7.0

XStatic Documentation

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

KEMP Driver for Red Hat OpenStack. KEMP LBaaS Red Hat OpenStack Driver. Installation Guide

Light&Efficient&Flutter&Shutter&

Quality of Service (QOS) With Tintri VM-Aware Storage

ProFont began life as a better version of Monaco 9 which is especially good for programmers. It was created circa 1987 by Andrew Welch.

Bluetooth Low Energy in C++ for nrfx Microcontrollers

Moodle. Moodle. Deployment Guide

Simba Cassandra ODBC Driver with SQL Connector

JavaScript Libraries User's Guide

Documents. OpenSim Tutorial. March 10, 2009 GCMAS Annual Meeting, Denver, CO. Jeff Reinbolt, Ajay Seth, Scott Delp. Website: SimTK.

Migration Tool. Migration Tool (Beta) Technical Note

SW MAPS TEMPLATE BUILDER. User s Manual

XEP-0399: Client Key Support

XEP-0140: Shared Groups

Piexif Documentation. Release 1.0.X. hmatoba

Additional License Authorizations for HPE OneView for Microsoft Azure Log Analytics

CS 4961 Senior Design. Planetary Surface Flyover Movie Generator. Software Design Specification

Piexif Documentation. Release 1.0.X. hmatoba

Transcription:

Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates, genetic distance, linkage disequilibrium, and association p values. Requirements A program for calculating LD information called new_fugue, written by Goncalo Abecasis. It can be downloaded from: http://genome.sph.umich.edu/wiki/new_fugue Python version 2.5 (or greater) and less than 3.0. Python 3 is a separate and incompatible branch. Synopsis Typical usage of Spotter: spotter --metal association_results_file.txt --snplist rs1,rs2,rs3 If you have a file that already contains a list of SNPs, you can do the following: spotter --metal association_results_file.txt --hits file_with_snps.txt Spotter will parse out the rs### SNPs from that file and run the algorithm on each one. Installation Spotter is ready to run out of the box, as long as new_fugue and a Python interpreter are already installed on your system. We provide as a starting point: HapMap phase II CEU build 36 genotype files, for computing LD information HapMap recombination rates, build 36 HapMap genetic map, build 36 UCSC refflat table (gene information), build 36 UCSC sequence gap table, build 36 These various pieces of data can be changed by the user. Configuration Users will likely wish to supply their own data. To change the data that Spotter uses, the user can edit the conf/config.xml file. Data sources listed under required_data can be provided by the user. Sources listed under created_data must be created by running bin/setup.py, a script for downloading and formatting data from the UCSC database. Input Spotter requires 2 pieces of information to run: a file containing association results, and a list of SNPs to use for defining regions.

The association results file should look like the following (note, the following is just an example): snp chr pos p value rs114141 3 191414141 2.7343e 04 Each row should be a SNP, with its chromosome, position, and p value (from GWAS or meta analysis). Please take care to provide SNP positions that are from the same build as the HapMap and UCSC files provided with Spotter (build 36 / hg18.) The file should be tab delimited. SNPs can be provided by using either the snplist option, or the hits option (see Options section below.) Options Argument Description General program options o, out <file> cache <file> Specify output file for report. Spotter attempts to create two files one containing the identified intervals, and one containing information on each gene found within those intervals. For example, if you specify: out my_project.txt Spotter will create the following two report files: my_project_intervals.txt my_project_genes.txt Change the location of the LD cache file. Spotter caches LD on each run of the program so that it only needs to be computed once per locus. This greatly speeds up additional runs of the program. The default location is to create a file called ld_cache.db in the current directory. Options for specifying SNPs snplist <string> Run algorithm for each SNP in this list. The list must be specified in quotes, separated by commas. Example: snplist rs1,rs2,rs3 hits <file> Run algorithm for each SNP present in this file. Spotter attempts to pattern match rs### identifiers from the file. Options for specifying association results metal <file> delim <character> Association results file. See the section on Input for more information about the format of this file. The delimiter of the association results file. Defaults to tab. Options for specifying algorithm method <string> Specify which algorithm Spotter should use. Currently only a sliding window method

is supported. See section on algorithms below for more information. Options for sliding window method slw_pval <float> P value threshold for sliding window method. This should be specified as a raw p value, and not a log transformed one. slw_usepval Disable using p values for sliding window. Default is True i.e., do not use p values. This should be enabled with caution, especially if running over many different loci. P values scales are often very different for each locus. slw_r2 <float> LD (r^2) threshold. Default is 0.5. slw_ratethresh <float> Recombination rate threshold, in cm/mb units. Default is 10. slw_winsize <int> slw_userate Sliding window size, in bases. Default is 75,000 bases (75 KB). Toggles using recombination rates instead of genetic distance. slw_cmdist <float> Genetic distance to travel beyond the LD interval, in cm. Default is 0.02. slw_usecm Toggles using genetic map distance. This is the default. Output Spotter creates two output files: one containing intervals identified by the algorithm, the other containing information on genes within each interval. See out" for controlling the names of these output files. Algorithm Sliding window method Spotter s primary algorithm uses a sliding window approach. Within each window, the algorithm checks to see if a SNP exceeds either the p value threshold, or the LD threshold. If at least one SNP within the window exceeds either of these thresholds, the window slides forward, and tries the same procedure again until failure. Once a window is found that does not contain a SNP exceeding either the p value or LD threshold, the algorithm stops scanning. It then finds the last best SNP i.e. the last SNP to pass the LD or p value threshold, and expands beyond this point until either 1) the nearest recombination peak, or 2) a specified genetic distance. The figure below shows how the algorithm works as it scans to the right from the index SNP (in this case, we choose the index SNP as the best p value in the region.) For this example, we use the following settings: P value threshold: 1E 05 ( slw_pval 1e 05, slw_usepval) LD threshold: r^2 > 0.4 ( slw_r2 0.5) Recombination peak: cm/mb > 10 ( slw_ratethresh 10, slw_userate) Sliding window size: 75 kilobases ( slw_winsize 75000) Here, we ve chosen a particular locus, and want to identify the interval near the index SNP. We start by scanning to the right of the index SNP: Window 1: succeeds, since SNPs underneath the window exceed both p value and LD threshold Window 2: succeeds, since SNPs underneath the window are exceeding the LD threshold Window 3: fails, no SNPs above p value or LD threshold

The last SNP to exceed either the p value or LD threshold is shown with a red arrow. From here, we scan to the right to find the nearest recombination peak, shown with a blue arrow (it s actually quite difficult to see here, you may need to zoom in!) This exact same procedure would also be repeated to scan to the left of the index SNP (not shown here.) The interval detected in this example is shown by two black lines (or the highlighted blue region in the gene track.) Figure a visual overview of how the sliding window algorithm detects an association signal interval. The plot shows association results from a GWAS or meta analysis of GWAS studies. Each point is a SNP, colored by its LD (r 2 ) value with the index SNP. The x axis is genomic position, and the y axis is the log 10 of the association p value. Visualizing Results Plots such as the one in the figure above can be created by using our software at: http://csg.sph.umich.edu/locuszoom/. The interval selected by Spotter can be plotted using the Highlight Region of Interest feature under Plot Using Your Data section, or by supplying histart=# and hiend=# parameters under the Plots Using Your Data And Your Hitspec File batch mode section. Licensing Spotter is written by Ryan Welch (welchr@umich.edu) and is copyrighted by the University of Michigan. This software is licensed under the MIT free software license: Copyright (c) 2010, The University of Michigan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be

included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.