Iterative Learning of Single Individual Haplotypes from High-Throughput DNA Sequencing Data
|
|
- Mitchell Glenn
- 5 years ago
- Views:
Transcription
1 Iterative Learning of Single Individual Haplotypes from High-Throughput DNA Sequencing Data Zrinka Puljiz and Haris Vikalo Electrical and Computer Engineering Department The University of Texas at Austin 8 th International Symposium on Turbo Codes & Iterative Information Processing Bremen, Germany, August 18-22, 2014 Iterative Learning of Single Individual Haplotypes 1 / 22
2 Overview of the Talk Motivation and background DNA sequencing and studies of genetic variations Haplotype assembly data structure and problem formulation graphical representation of the problem existing methods Communication systems analogy and belief propagation haplotype assembly as a decoding problem belief propagation algorithm performance analysis, comparison with existing methods Conclusions and future work Iterative Learning of Single Individual Haplotypes 2 / 22
3 DNA Sequencing: Discovering Genetic Blueprint Determine the order of nucleotides in a DNA sequence Human Genome Project: mapping the genetic blueprint followed by sequencing more individuals, studies of genetic variations Iterative Learning of Single Individual Haplotypes 3 / 22
4 Study of Genetic Variations in Humans Humans are diploid organism with 23 pairs of chromosomes chromosomes in a pair of autosomes are homologous the most common type of variation are SNPs Iterative Learning of Single Individual Haplotypes 4 / 22
5 Study of Genetic Variations in Humans Cont d Describing variations SNP calling determines locations and type of polymorphisms based on the detected SNPs, perform genotype calling example: A/T, A/C, G/T Genotypes provide only the list of unordered pairs of alleles no association of alleles with one of the chromosomes in a pair The complete information is provided by haplotypes the list of alleles at contiguous sites in a region of a chromosome example: (A,C,G) and (T,A,T) fundamental for many applications (personalized medicine!) Iterative Learning of Single Individual Haplotypes 5 / 22
6 Single Individual Haplotyping Determine a haplotype of an individual using DNA sequencing The SNP rate is low, typically estimated to be 10 3 high-throughput DNA sequencing provides reads that are too short get pairs of fragments at opposite ends of a strand of known length Iterative Learning of Single Individual Haplotypes 6 / 22
7 A Fragment Conflict Graph Interpretation Represent reads by nodes, conflicts by edges fragments are in conflict if they cover a common SNP location but have di erent nucleotides there (so, di erent chromosomes) If data is error-free, conflict graph is bipartite otherwise, the graph contains cycles Iterative Learning of Single Individual Haplotypes 7 / 22
8 Various Formulation of the Haplotype Assembly Problem If the conflict graph is not bipartite, assembly is non-trivial Approach: minimize the number of transformation steps needed to alter the graph so that it becomes bipartite minimum edge removal (MER), minimum fragment removal (MFR), minimum SNP removal Minimum error correction (MEC): find the smallest number of nucleotides in reads whose flipping to a di erent value resolves conflicts among the fragments from the same chromosome essentially, remove cycles in the conflict graph by assuming the fewest possible sequencing errors NP hard, various methods: HapCut [Bansal & Banfa, 2008], HapCompass [Aguiar & Istrail, 2013], HapTree [Berger et al., 2014] Iterative Learning of Single Individual Haplotypes 8 / 22
9 Minimum Error Correction Formulation Label bases in heterozygous sites as h 1 i, h 2 i 2 {1, 0} define h = h 1 = h 2 =[h 1 1 h h 1 n] Each read is as a ternary string with entries 0, 1 and organize reads into a matrix R, rowr i is the i th read 2 x x 0 x x 3 1 x 1 x x 0 x x x 0 x 0 x 0 x x 1 x x R = 6 1 x 1 x x x 7 4 x x 1 x 0 x 5 x 0 x 0 x x x x x 0 x 0 The MEC formulation is concerned with minimizing Z over h, mx nx Z = min(hd(r i, h), hd(r i, h)), hd(r i, h) = d(r i,j, h j ) i=1 j=1 Iterative Learning of Single Individual Haplotypes 9 / 22
10 Structure of the Data Matrix Consider the error-free SNP fragment matrix 2 R = 6 4 x x 0 x x 1 x 1 x x 0 x x x 0 x 0 x 0 x x 1 x x 1 x 1 x x x x x 1 x 0 x x 0 x 0 x x x x x 0 x 0 Let h = [ ], and the origin of the reads in R be s = [ ]. Then for a binary R i,j it holds s i h j R i,j Iterative Learning of Single Individual Haplotypes 10 / 22
11 Haplotype Assembly as a Decoding Problem Collect indices {(i k, j k )} identifying positions where the m n matrix R has binary entries (1 apple k apple M) Define the code generating matrix G, ( 1ifl = j k or l = i k + n, 1 apple k apple M, G(l, k) = 0, otherwise. apple 0 1 Example: for R= G= 6 4, we construct Iterative Learning of Single Individual Haplotypes 11 / 22
12 Haplotype Assembly as a Decoding Problem Cont d Define a message m =[h s] and a codeword c = mg c collects binary entries from an error-free data matrix R Due to sequencing errors, entries in R erroneously flipped this can be interpreted as the e ect of a binary symmetric channel on c = mg formally, y = c + e =[h s]g + e, wherey k = R(i k, j k ) Iterative Learning of Single Individual Haplotypes 12 / 22
13 Graphical Model Graphical representation of the problem Iterative Learning of Single Individual Haplotypes 13 / 22
14 Graphical Model Cont d Haplotyping with MEC criterion min distance decoding using the parity check matrix H: MEC = min H(y+e)=0 kek0 Iterative Learning of Single Individual Haplotypes 14 / 22
15 Belief Propagation for Haplotype Assembly Graphical model for the belief propagation algorithm Iterative Learning of Single Individual Haplotypes 15 / 22
16 Belief Propagation for Haplotype Assembly Cont d Iterative Learning of Single Individual Haplotypes 16 / 22
17 Belief Propagation for Haplotype Assembly Cont d Iterative Learning of Single Individual Haplotypes 17 / 22
18 Belief Propagation for Haplotype Assembly Cont d Stopping criterion: threshold, max # of iterations reached Iterative Learning of Single Individual Haplotypes 18 / 22
19 Computational Complexity Belief propagation algorithm: Allow random restarts, MAXITER iterations Schemes relying on parity-check need preprocessing: Parity check matrix transformation Complexity for each haplotype block: O((#SNP +#Reads) (#entries in R)) This step depends on the locations of the binary entries in the matrix R Iterative Learning of Single Individual Haplotypes 19 / 22
20 Results on 1000 Genomes Project Data Iterative Learning of Single Individual Haplotypes 20 / 22
21 Performance Guarantees Found lower bounds on Pr{ĥ 6= h}, E[ ĥ h 0 ], E[#switch errors] [SVV, ITW 2014] Consider the haplotype of length n, error rate p, and probability of assembly error P e =Pr{ĥ 6= h R}. The number of reads m necessary for the assembly satisfies m (1 P e )n 2[1 H(p)]. If m = (n ln n), one can determine h accurately with high probability. Specifically, given a target small constant > 0, there exists n large enough such that by choosing m = (n ln n) theprobabilityoferror P e apple. Iterative Learning of Single Individual Haplotypes 21 / 22
22 Summary and Future Work Developed a novel framework for haplotype assembly rephrased assembly as a decoding problem belief propagation algorithm as a solution outperforms existing methods on 1000 Genomes Project data Several possible extensions exploit possible prior SNP/genotype information develop joint base/snp/genotype calling and haplotype assembly schemes Explore other suitable methods and techniques sparse low-rank matrix completion, spectral partitioning, correlation clustering Analyze limits of performance, experimental conditions needed to achieve desired accuracy Iterative Learning of Single Individual Haplotypes 22 / 22
RECENT advancements in high-throughput DNA sequencing
IEEE TRANSACTIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Decoding Genetic Variations: Communications-Inspired aplotype Assembly Zrinka Puljiz, Student Member, IEEE, aris Vikalo, Senior Member, IEEE
More informationNetwork Based Models For Analysis of SNPs Yalta Opt
Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationOn a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony
On a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony Konstantinos Kalpakis, and Parag Namjoshi Department of Computer Science and Electrical Engineering University of Maryland Baltimore
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationMinimum Recombinant Haplotype Configuration on Tree Pedigrees (Extended Abstract)
Minimum Recombinant Haplotype Configuration on Tree Pedigrees (Extended Abstract) Koichiro Doi 1, Jing Li 2, and Tao Jiang 2 1 Department of Computer Science Graduate School of Information Science and
More informationError correction guarantees
Error correction guarantees Drawback of asymptotic analyses Valid only as long as the incoming messages are independent. (independence assumption) The messages are independent for l iterations only if
More informationShuheng Zhou. Annotated Bibliography
Shuheng Zhou Annotated Bibliography High-dimensional Statistical Inference S. Zhou, J. Lafferty and L. Wasserman, Compressed Regression, in Advances in Neural Information Processing Systems 20 (NIPS 2007).
More informationLDPC Codes a brief Tutorial
LDPC Codes a brief Tutorial Bernhard M.J. Leiner, Stud.ID.: 53418L bleiner@gmail.com April 8, 2005 1 Introduction Low-density parity-check (LDPC) codes are a class of linear block LDPC codes. The name
More informationA GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS
A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu
More informationGenetic type 1 Error Calculator (GEC)
Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development
More informationGenetic Master-Slave Algorithm for Haplotype Inference by Parsimony
Alma Mater Studiorum Università degli Studi di Bologna DEIS Genetic Master-Slave Algorithm for Haplotype Inference by Parsimony Stefano Benedettini Luca Di Gaspero Andrea Roli January 10, 2009 DEIS Technical
More information4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-
1. PURPOSE To provide instructions for finding rs Numbers (SNP database ID numbers) and increasing sequence length by utilizing the UCSC Genome Bioinformatics Database. 2. MATERIALS 2.1. Sequence Information
More informationGenetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland
Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming
More informationAn Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis
An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer Science The University of Oklahoma Norman, Oklahoma,
More informationHaplotype reconstruction using perfect phylogeny and sequence data
Haplotype reconstruction using perfect phylogeny and sequence data Anatoly Efros 1 and Eran Halperin 1,2,3,0 1 The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel. 2 International
More informationIDBA A Practical Iterative de Bruijn Graph De Novo Assembler
IDBA A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry C.M. Leung, S.M. Yiu, and Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong
More informationMinimum Multicolored Subgraph Problem in Multiplex PCR Primer Set Selection and Population Haplotyping
Minimum Multicolored Subgraph Problem in Multiplex PCR Primer Set Selection and Population Haplotyping M.T. Hajiaghayi 1,K.Jain 2,L.C.Lau 3,I.I.Măndoiu 4,A.Russell 4,and V.V. Vazirani 5 1 Laboratory for
More informationRandom Forest in Genomic Selection
Random Forest in genomic selection 1 Dpto Mejora Genética Animal, INIA, Madrid; Universidad Politécnica de Valencia, 20-24 September, 2010. Outline 1 Remind 2 Random Forest Introduction Classification
More informationIdentifying Blocks and Sub-Populations in Noisy SNP Data
Identifying Blocks and Sub-Populations in Noisy SNP Data Gad Kimmel 1, Roded Sharan 2, and Ron Shamir 1 1 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel. {kgad,rshamir}@tau.ac.il
More informationBEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010
BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...
More informationHeuristic Optimisation Methods for System Partitioning in HW/SW Co-Design
Heuristic Optimisation Methods for System Partitioning in HW/SW Co-Design Univ.Prof. Dipl.-Ing. Dr.techn. Markus Rupp Vienna University of Technology Institute of Communications and Radio-Frequency Engineering
More informationGSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu
GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics
More informationNextGenMap and the impact of hhighly polymorphic regions. Arndt von Haeseler
NextGenMap and the impact of hhighly polymorphic regions Arndt von Haeseler Joint work with: The Technological Revolution Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program
More informationGenetic Algorithms Variations and Implementation Issues
Genetic Algorithms Variations and Implementation Issues CS 431 Advanced Topics in AI Classic Genetic Algorithms GAs as proposed by Holland had the following properties: Randomly generated population Binary
More informationAdaptive Linear Programming Decoding of Polar Codes
Adaptive Linear Programming Decoding of Polar Codes Veeresh Taranalli and Paul H. Siegel University of California, San Diego, La Jolla, CA 92093, USA Email: {vtaranalli, psiegel}@ucsd.edu Abstract Polar
More informationLD vignette Measures of linkage disequilibrium
LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn
More informationSNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1
SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,
More informationIterative Learning for Reference-Guided DNA Sequence Assembly from Short Reads: Algorithms and Limits of Performance
Iterative Learning for Reference-Guided DNA Sequence Assembly from Short Reads: Algorithms and Limits of Performance Xiaohu Shen, Manohar Shamaiah, and Haris Vialo 1 arxiv:1403.5686v1 [q-bio.gn] 22 Mar
More informationDouble Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization
Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Kun Yuan, Jae-Seo Yang, David Z. Pan Dept. of Electrical and Computer Engineering The University of Texas at Austin
More informationModule 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur
Module 4 Constraint satisfaction problems Lesson 10 Constraint satisfaction problems - II 4.5 Variable and Value Ordering A search algorithm for constraint satisfaction requires the order in which variables
More informationTabu Search for the Founder Sequence Reconstruction Problem: A Preliminary Study
Alma Mater Studiorum Università degli Studi di Bologna DEIS Tabu Search for the Founder Sequence Reconstruction Problem: A Preliminary Study Andrea Roli and Christian Blum January 10, 2009 DEIS Technical
More informationEstimating. Local Ancestry in admixed Populations (LAMP)
Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationOn the construction of Tanner graphs
On the construction of Tanner graphs Jesús Martínez Mateo Universidad Politécnica de Madrid Outline Introduction Low-density parity-check (LDPC) codes LDPC decoding Belief propagation based algorithms
More informationFinding Small Stopping Sets in the Tanner Graphs of LDPC Codes
Finding Small Stopping Sets in the Tanner Graphs of LDPC Codes Gerd Richter University of Ulm, Department of TAIT Albert-Einstein-Allee 43, D-89081 Ulm, Germany gerd.richter@uni-ulm.de Abstract The performance
More informationOn the Relationships between Zero Forcing Numbers and Certain Graph Coverings
On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationIDBA - A Practical Iterative de Bruijn Graph De Novo Assembler
IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry Leung, S.M. Yiu, Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong {ypeng,
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationA Genome Assembly Algorithm Designed for Single-Cell Sequencing
SPAdes A Genome Assembly Algorithm Designed for Single-Cell Sequencing Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput
More informationStatistical relationship discovery in SNP data using Bayesian networks
Statistical relationship discovery in SNP data using Bayesian networks Pawe l Szlendak and Robert M. Nowak Institute of Electronic Systems, Warsaw University of Technology, Nowowiejska 5/9, -665 Warsaw,
More informationNormalized cuts and image segmentation
Normalized cuts and image segmentation Department of EE University of Washington Yeping Su Xiaodan Song Normalized Cuts and Image Segmentation, IEEE Trans. PAMI, August 2000 5/20/2003 1 Outline 1. Image
More informationRAD Population Genomics Programs Paul Hohenlohe 6/2014
RAD Population Genomics Programs Paul Hohenlohe (hohenlohe@uidaho.edu) 6/2014 I. Overview These programs are designed to conduct population genomic analysis on RAD sequencing data. They were designed for
More informationELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2
ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................
More informationQ-Clustering. Abstract
Q-Clustering Mukund Narasimhan Nebojsa Jojic Jeff Bilmes Dept of Electrical Engineering, University of Washington, Seattle WA Microsoft Research, Microsoft Corporation, Redmond WA {mukundn,bilmes}@ee.washington.edu
More informationM 100 G 3000 M 3000 G 100. ii) iii)
A) B) RefSeq 1 Other Alignments 180000 1 1 Simulation of Kim et al method Human Mouse Rat Fruitfly Nematode Best Alignment G estimate 1 80000 RefSeq 2 G estimate C) D) 0 350000 300000 250000 0 150000 Interpretation
More informationRecalling Genotypes with BEAGLECALL Tutorial
Recalling Genotypes with BEAGLECALL Tutorial Release 8.1.4 Golden Helix, Inc. June 24, 2014 Contents 1. Format and Confirm Data Quality 2 A. Exclude Non-Autosomal Markers......................................
More informationExample: Map coloring
Today s s lecture Local Search Lecture 7: Search - 6 Heuristic Repair CSP and 3-SAT Solving CSPs using Systematic Search. Victor Lesser CMPSCI 683 Fall 2004 The relationship between problem structure and
More informationClustering Algorithms for general similarity measures
Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative
More informationCSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly
CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 6 Coding I Chapter 3 Information Redundancy Part.6.1 Information Redundancy - Coding A data word with d bits is encoded
More informationNotes 8: Expander Codes and their decoding
Introduction to Coding Theory CMU: Spring 010 Notes 8: Expander Codes and their decoding March 010 Lecturer: Venkatesan Guruswami Scribe: Venkat Guruswami & Ankit Sharma In this lecture, we shall look
More informationSparse Matrix Reordering Algorithms for Cluster Identification
Sparse Matrix Reordering Algorithms for Cluster Identification Chris Mueller For I532, Machine Learning in Bioinformatics December 17, 2004 Introduction The dot plot (Figure 1) is a technique for displaying
More informationRESEARCH TOPIC IN BIOINFORMANTIC
RESEARCH TOPIC IN BIOINFORMANTIC GENOME ASSEMBLY Instructor: Dr. Yufeng Wu Noted by: February 25, 2012 Genome Assembly is a kind of string sequencing problems. As we all know, the human genome is very
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationPackage HMMASE. February 4, HMMASE R package
Package HMMASE February 4, 2014 Type Package Title HMMASE R package Version 1.0 Date 2014-02-04 Author Juan R. Steibel, Heng Wang, Ping-Shou Zhong Maintainer Heng Wang An R package that
More informationPackage inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version
Package inversion July 18, 2013 Type Package Title Inversions in genotype data Version 1.8.0 Date 2011-05-12 Author Alejandro Caceres Maintainer Package to find genetic inversions in genotype (SNP array)
More informationLow Cost Convolutional Code Based Concurrent Error Detection in FSMs
Low Cost Convolutional Code Based Concurrent Error Detection in FSMs Konstantinos Rokas & Yiorgos Makris Electrical Engineering Department Yale University {konstantinos.rokas, yiorgos.makris}@yale.edu
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationIntroduction to GDS. Stephanie Gogarten. July 18, 2018
Introduction to GDS Stephanie Gogarten July 18, 2018 Genomic Data Structure CoreArray (C++ library) designed for large-scale data management of genome-wide variants data format (GDS) to store multiple
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationOptimal Partition with Block-Level Parallelization in C-to-RTL Synthesis for Streaming Applications
Optimal Partition with Block-Level Parallelization in C-to-RTL Synthesis for Streaming Applications Authors: Shuangchen Li, Yongpan Liu, X.Sharon Hu, Xinyu He, Pei Zhang, and Huazhong Yang 2013/01/23 Outline
More informationCSEP 561 Error detection & correction. David Wetherall
CSEP 561 Error detection & correction David Wetherall djw@cs.washington.edu Codes for Error Detection/Correction ti ti Error detection and correction How do we detect and correct messages that are garbled
More informationPerformance analysis of LDPC Decoder using OpenMP
Performance analysis of LDPC Decoder using OpenMP S. V. Viraktamath Faculty, Dept. of E&CE, SDMCET, Dharwad. Karnataka, India. Jyothi S. Hosmath Student, Dept. of E&CE, SDMCET, Dharwad. Karnataka, India.
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationMathematical Programming Formulations, Constraint Programming
Outline DM87 SCHEDULING, TIMETABLING AND ROUTING Lecture 3 Mathematical Programming Formulations, Constraint Programming 1. Special Purpose Algorithms 2. Constraint Programming Marco Chiarandini DM87 Scheduling,
More informationARELAY network consists of a pair of source and destination
158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper
More information12. Use of Test Generation Algorithms and Emulation
12. Use of Test Generation Algorithms and Emulation 1 12. Use of Test Generation Algorithms and Emulation Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin
More informationde novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis
de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare
More informationPackage SimGbyE. July 20, 2009
Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie
More informationMLSTest Tutorial Contents
MLSTest Tutorial Contents About MLSTest... 2 Installing MLSTest... 2 Loading Data... 3 Main window... 4 DATA Menu... 5 View, modify and export your alignments... 6 Alignment>viewer... 6 Alignment> export...
More informationDesign and Implementation of Low Density Parity Check Codes
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 09 (September. 2014), V2 PP 21-25 www.iosrjen.org Design and Implementation of Low Density Parity Check Codes
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline
More informationSpectral Clustering and Community Detection in Labeled Graphs
Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu
More informationAxiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.)
Axiom Analysis Suite 4.0.1 Release Notes (For research use only. Not for use in diagnostic procedures.) Axiom Analysis Suite 4.0.1 includes the following changes/updates: 1. For library packages that support
More informationREVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE
REVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE Seema S. Gumbade 1, Anirudhha S. Wagh 2, Dr.D.P.Rathod 3 1,2 M. Tech Scholar, Veermata Jijabai Technological Institute (VJTI), Electrical Engineering
More informationOutline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search
Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological
More informationGraph based codes for distributed storage systems
/23 Graph based codes for distributed storage systems July 2, 25 Christine Kelley University of Nebraska-Lincoln Joint work with Allison Beemer and Carolyn Mayer Combinatorics and Computer Algebra, COCOA
More informationSparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach
1 Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach David Greiner, Gustavo Montero, Gabriel Winter Institute of Intelligent Systems and Numerical Applications in Engineering (IUSIANI)
More informationAddendum to the proof of log n approximation ratio for the greedy set cover algorithm
Addendum to the proof of log n approximation ratio for the greedy set cover algorithm (From Vazirani s very nice book Approximation algorithms ) Let x, x 2,...,x n be the order in which the elements are
More informationGeneric Topology Mapping Strategies for Large-scale Parallel Architectures
Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous
More informationCycles in Random Graphs
Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach
More informationParsimony-Based Approaches to Inferring Phylogenetic Trees
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:
More informationSummary of Raptor Codes
Summary of Raptor Codes Tracey Ho October 29, 2003 1 Introduction This summary gives an overview of Raptor Codes, the latest class of codes proposed for reliable multicast in the Digital Fountain model.
More informationIntroduction to GDS. Stephanie Gogarten. August 7, 2017
Introduction to GDS Stephanie Gogarten August 7, 2017 Genomic Data Structure Author: Xiuwen Zheng CoreArray (C++ library) designed for large-scale data management of genome-wide variants data format (GDS)
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationmotifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs { }
motifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs G M { } 2 subgraph motifs 3 motifs Find interesting patterns in a network. 4 motifs Find
More informationTypes of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters
Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationSet Cover with Almost Consecutive Ones Property
Set Cover with Almost Consecutive Ones Property 2004; Mecke, Wagner Entry author: Michael Dom INDEX TERMS: Covering Set problem, data reduction rules, enumerative algorithm. SYNONYMS: Hitting Set PROBLEM
More informationMapping Reads to Reference Genome
Mapping Reads to Reference Genome DNA carries genetic information DNA is a double helix of two complementary strands formed by four nucleotides (bases): Adenine, Cytosine, Guanine and Thymine 2 of 31 Gene
More informationWhen we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame
1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationC LDPC Coding Proposal for LBC. This contribution provides an LDPC coding proposal for LBC
C3-27315-3 Title: Abstract: Source: Contact: LDPC Coding Proposal for LBC This contribution provides an LDPC coding proposal for LBC Alcatel-Lucent, Huawei, LG Electronics, QUALCOMM Incorporated, RITT,
More informationHaplotype Inference by Pure Parsimony with Constraint Programming
IT 09 050 Examensarbete 30 hp Oktober 2009 Haplotype Inference by Pure Parsimony with Constraint Programming Xiaoyue Pan Institutionen för informationsteknologi Department of Information Technology Abstract
More informationAccelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture
Accelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture Da Zhang Collaborators: Hao Wang, Kaixi Hou, Jing Zhang Advisor: Wu-chun Feng Evolution of Genome Sequencing1 In 20032: 1 human genome
More informationSection 7.12: Similarity. By: Ralucca Gera, NPS
Section 7.12: Similarity By: Ralucca Gera, NPS Motivation We talked about global properties Average degree, average clustering, ave path length We talked about local properties: Some node centralities
More informationReducing Genome Assembly Complexity with Optical Maps
Reducing Genome Assembly Complexity with Optical Maps Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational Biology mpop@umiacs.umd.edu
More information