Accelerating the Prediction of Protein Interactions
|
|
- Janis Barton
- 5 years ago
- Views:
Transcription
1 Accelerating the Prediction of Protein Interactions Alex Rodionov, Jonathan Rose, Elisabeth R.M. Tillier, Alexandr Bezginov October 21 21
2 Motivation The human genome is sequenced, but we don't know what all the genes do Genes code for proteins Genome function = protein interaction We can learn about proteins and the genome by studying protein-protein interactions
3 Motivation Best known way of studying interactions is in the lab ( high-throughput experiments ) Lots of proteins, O(N2) possible pairs We would like to be able to predict which protein pairs interact before undertaking tedious and expensive lab experiments
4 Coevolution One way to predict protein-protein interactions is by using coevolution: two proteins that interact tend to evolve over time at similar rates.
5 Coevolution: Background Protein = string of amino acids protein amino acids Amino acids = coded by DNA {A,G,T,C} ^3 = 64 possible amino acids (2 occur in nature)
6 Coevolution: Background Proteins interact physically at amino acid sites Protein A Protein B
7 Coevolution: Background Mutations can cause amino acid substitutions Protein A! Protein B' Breaks interactions organism more likely to die
8 Coevolution: Background Pressure for BOTH proteins to evolve together Protein A' Protein B' Interaction is maintained
9 Coevolution: Summary Shared evolutionary history Coevolution Interaction
10 MatrixMatchMaker A method of protein coevolution detection Developed by Elisabeth Tillier & Robert Charlebois (Department of Molecular Biophysics) Has been shown to work better than other methods, at the expense of compute time
11 MMM: Big Picture Looks at the evolutionary histories of two proteins Measures the similarity of the histories by looking for common sections of those histories Generates a numerical score indicating the strength of evidence for coevolution
12 MMM: Homologous Proteins There can exist many versions of one protein, with slight variations, found in different species These homologous protein variations form a family. amino acids Human Mouse Chicken Rabbit Frog G T S Q R V Q N S L R G A S N Y L P N S K P S T V N W V R F E L Q S A N E W L E F E V T V A E S L P I Y V T T
13 MMM: Homologous Proteins The differences amongst the homologous proteins provide an evolutionary context/history of the family How to quantify these differences and history?
14 MMM: Distance Matrices Protein = sequence of amino acids = string Can take two such strings and calculate a number representing how different they are (=distance) MADSTHRNMILEVNDEFHT MLEIMTHRNMILEVNRRFYY MAD-STHRNMILEVNDEFHT MLEIMTHRNMILEVNRRFYY.4
15 MMM: Distance Matrices Multiple Sequence Alignment takes all the members of a protein family and generates all possible pairwise distances These distances form a distance matrix p1 p1 p2 p3 p4 MAD-STHRNMILEVNDEFHT MVDASTHRNMILEVNDEFTI MID-MTHRNMILEVNDEFHT MLEIMTHRNMILEVNRRFYY p1 p2 p3 p p p p
16 MMM: Evolutionary History Distance matrix implies evolutionary history of protein Submatrices represent histories of subsets of the homologous variations p1 p1 p2 p3 p p p p p1 p2 p3 p4
17 MMM: Big Picture Revisited MMM looks at two distance matrices A and B, which represent the evolutionary histories of two proteins (two families of homologous proteins) Measures the similarity of the histories by looking for similar sub-matrices The size of the largest similar sub-matrices is output as a score, indicating the strength for the evidence of co-evolution
18 MMM: Sub-matrix Similarity Distance within a matrix is relative to the family, not absolute submatrix equality isn't enough for similarity Use instead equality up to a scale factor (with some tolerance) similar
19 MMM: Sub-matrix Similarity a2 Sub-matrices need to be at least size 3 (for concept of similarity to make sense) x x always similar y y Two similar sub-matrices of size K*K create K pairings of homologous proteins a2 a8 a9 b3 b5 b7 a a b b5 b7 Pairs: a2 b3 a8 b5 a9 b7
20 MMM: Sub-matrix Similarity Additional constraint: both proteins in a pairing must belong to the same species Size of largest similar submatrices = amount of coevolution Homologous proteins paired up by submatrices are useful for other purposes (downstream analysis tools)
21 MMM: Problem Definition Inputs: distance matrices A and B, tolerance α A a1 a2 a3 B a4 a1 a2... an b1 b1 A2,4 b2... B1,2 b2 a3... a4 bm... an [,1] bm
22 MMM: Problem Definition A' a'1 a'2 a'1 a'k A'1,2... A'1,k a'2 A'2,1... b'1... b'k B'1,2... B'1,k b'2 B'2, B'2,k A'2,k b'1 b'2... a'k A'k,1 A'k,2... b'k B'k,1 B'k, B' Submatrices A' of A and B' of B form a match M={(a'1,b'1), (a'2,b'2),, (a'k,b'k)} iff: ' ' ' A A A 1 u, v 1 u, v i, j 1: ' ' ' i j, u v 1 Bu, v 1 B u, v Bi, j 2: A'i, j, B'i, j i j 3: species a 'i =species b 'i i
23 MMM: Problem Definition α : more strict matching α 1 : more lenient matching Goal: Find the set of matches of largest size Outputs: Largest match size, protein pairs for each match
24 Initial Algorithm Tillier et al. already had a first try at an algorithm It took 6 days to process ~6 million matrix pairs
25 Initial Algorithm For all protein triplets (a,b,c) from A For all protein triplets (w,x,y) from B If {(a,w), (b,x), (c,y)} is a match then * For remaining proteins d from A! For remaining proteins z from B If current match plus (d,z) is also a match, add (d,z), goto * Else record current match Remove latest pair from match, goto! to resume loop Keep largest matches, clear list when larger example found, report match list at the end Slow, exhaustive, no pruning of recursion tree
26 New Algorithm Our (ECE) work begins here Make a faster algorithm, maintain correctness Big picture: recast MMM problem as a graph problem, use well-known and efficient algorithms to solve sub-problems
27 New Algorithm: Representation Vertices = allowable protein pairs Edges = ratio of distance matrix entries a1 a2 a3 a4 a3b1 b1 a4b2 b bm... an B 1,2 A3,4
28 New Algorithm: Representation a1 a2 b1 b2 a3 a4... an R 1 and R 2 are compatible within tolerance [,1] iff : 1 R1 R 2 R1 R1... R2 with 1 = 1 bm compatible range for R2 decreasing ratio 1 R1 R1 increasing ratio R1 Compatible edges = similar ratios
29 New Algorithm: Representation Match = clique with mutually compatible edges a1 b1 b2 b3 b4 a2 a3 a4 a5 a6 Match: a1 a3 a5 a6 b4 b1 b3 b2
30 New Algorithm: Method Every match has edge of minimum ratio Edges in match are mutually compatible iff they are forward-compatible with the edge of minimum ratio R=R 3 min R2 R1 R4 R5 R6 forward-compatible range for Rmin decreasing ratio 1 Rmin Rmin R2 R5 R1 R6 R4 R min increasing ratio
31 New Algorithm: Method Go through every edge e in the graph Assume e is the edge of minimum ratio of some match(es) Work backwards to find the largest of those matches After picking e, this just means finding the maximum cliques on a subgraph H: V(H) = vertices adjacent to e E(H) = edges forward-compatible with e Repeat for all e Find all largest matches
32 New Algorithm: Method Step 1: Pick a vertex vx, sort its neighbours by increasing ratio foreach vertex vx sorted ascending by ratio
33 New Algorithm: Method Step 2: Pick a neighbour vy, set minratio to that of the edge between vx and vy vx minratio vy foreach neighbour vy with y > x
34 New Algorithm: Method Step 3: Find vertices whose edges to both vx and vy are forward-compatible vx can ignore due to sorting vy minratio * delta
35 New Algorithm: Method Step 4: Run maximum clique algorithm on the subgraph induced by candidates to find matches. All matches also include vx and vy. find all maximum cliques ignore non-forward-compatible edges between vertices vx vy + largest matches
36 New Algorithm: Method Repeat Every choice of vx and vy creates a max clique problem Keep list of largest matches, and the largest match size
37 Max Clique Algorithm Using Ostergard (22) algorithm Recursive, branch and bound algorithm: clique:,5 (less than max, no report) clique:,3,5 (new max) clique:,6 (less than max, no report) back out: can't match or beat max of 3 etc
38 Max Clique Algorithm Outer loop: MSPV[ ] perform B&B algorithm on last vertex to the end record max clique size in MSPV work backwards new bound condition: if current size + MSPV[v] < max size then leave early
39 Max Clique Algorithm Ordering of vertices important for performance Recommended ordering (Ostergard): Perform greedy vertex coloring Sort vertices by decreasing color class Sort by decreasing degree within color class
40 New Algorithm: Results Data set: ~35 matrices, ~6 million matrix pairs Matrices represent proteins with at least 1 human variant Tolerance (alpha) set to.1 Compared total and per-problem runtime of new vs. old algorithms
41 New Algorithm: Results Only 88 pairs had nonzero 'new' runtimes Geometric mean speedup: 97x Total speedup: 377x (6 days 21 minutes)
42 Hardware Acceleration FPGA-related part of this talk Max clique is NP-complete Setup portion of algorithm is O(n^4) Max clique time 1% for larger problems Setup tasks include things like sorting best leave this on the CPU Max clique algorithm is mostly serial, but there are LOTS of max clique problems to solve!
43 Max Clique: FPGA vs GPU Recursive, depth-first Not data parallel at all Input data = adjacency matrix = can be represented as array of bits
44 Hardware Platform Terasic DE3 Stratix III L34 USB 2. 1GB DDR2 memory Hopefully porting to DE4 PCI Express!
45 HW SW Interface 'Ports' package over USB 2. Looks like open/read/write C calls to SW Data sent via TCP/IP to computer hosting the DE3 Auto-generated hardware block feeding desired signals + handshaking to design Work in progress itself
46 Hardware: System Block Diagram matrices MMM executable MMM ports Algorithm lib USB daemon cliques Host PC DE3 USB PHY Stratix III FPGA (Max Clique Solver) DDR2 Memory
47 Hardware: FPGA block diagram to DDR2 matrices 266MHz cliques DDR2 IP Core control signals FIFO to USB p o r t m u x m a I n fillmem sched c t l FIFO DDR2 Interface WU WU WU Clique Buffer 15MHz 3MHz WU...
48 Hardware: Work Unit to clique buffer max clique size to Main Control to DDR2 interface update clique unit vertices stack ptrs main SM first vertex MSPV Array isect SM Intersection Pipeline Cache (32kb) vstack pointers Pointer Stack Vertex Stack
49 Hardware: Progress and Future Work No speed results yet Version 1: One WU, no DDR2, no MMM support Used to verify WU operation Version 2: One WU, DDR2 Hardware ready, software support coming Version 3: Multiple WU, DDR2 Problem: USB 2. bandwidth need PCIe
50 Conclusion Interesting biological problem Interesting mathematical problem Great algorithmic/software speedup achieved Hardware work in progress
51 Done Questions?
Special course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive
More informationOn the Optimality of the Neighbor Joining Algorithm
On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville
More informationLecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:
Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating
More informationSequence Alignment & Search
Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version
More informationBasic Local Alignment Search Tool (BLAST)
BLAST 26.04.2018 Basic Local Alignment Search Tool (BLAST) BLAST (Altshul-1990) is an heuristic Pairwise Alignment composed by six-steps that search for local similarities. The most used access point to
More informationCME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 03/02/17
CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 03/02/17 1. Consider a model of a nonbipartite undirected graph in which
More informationBioinformatics explained: Smith-Waterman
Bioinformatics Explained Bioinformatics explained: Smith-Waterman May 1, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More informationParsimony-Based Approaches to Inferring Phylogenetic Trees
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:
More informationDarwin-WGA. A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup
Darwin-WGA A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup Yatish Turakhia*, Sneha D. Goenka*, Prof. Gill Bejerano, Prof. William J. Dally * Equal contribution
More informationMultiple Sequence Alignment: Multidimensional. Biological Motivation
Multiple Sequence Alignment: Multidimensional Dynamic Programming Boston University Biological Motivation Compare a new sequence with the sequences in a protein family. Proteins can be categorized into
More informationDynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014
Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationAlignment of Long Sequences
Alignment of Long Sequences BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2009 Mark Craven craven@biostat.wisc.edu Pairwise Whole Genome Alignment: Task Definition Given a pair of genomes (or other large-scale
More informationCISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment
CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment Courtesy of jalview 1 Motivations Collective statistic Protein families Identification and representation of conserved sequence features
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationMaximum Clique Solver using Bitsets on GPUs
Maximum Clique Solver using Bitsets on GPUs Matthew VanCompernolle 1, Lee Barford 1,2, and Frederick Harris, Jr. 1 1 Department of Computer Science and Engineering, University of Nevada, Reno 2 Keysight
More informationMultiple Sequence Alignment Gene Finding, Conserved Elements
Multiple Sequence Alignment Gene Finding, Conserved Elements Definition Given N sequences x 1, x 2,, x N : Insert gaps (-) in each sequence x i, such that All sequences have the same length L Score of
More informationOn the Efficacy of Haskell for High Performance Computational Biology
On the Efficacy of Haskell for High Performance Computational Biology Jacqueline Addesa Academic Advisors: Jeremy Archuleta, Wu chun Feng 1. Problem and Motivation Biologists can leverage the power of
More informationLecture 3, Review of Algorithms. What is Algorithm?
BINF 336, Introduction to Computational Biology Lecture 3, Review of Algorithms Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Algorithm? Definition A process
More information8/19/13. Computational problems. Introduction to Algorithm
I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship
More informationChapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 4 Greedy Algorithms Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 4.5 Minimum Spanning Tree Minimum Spanning Tree Minimum spanning tree. Given a connected
More informationRecent Research Results. Evolutionary Trees Distance Methods
Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how
More informationPROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota
Marina Sirota MOTIVATION: PROTEIN MULTIPLE ALIGNMENT To study evolution on the genetic level across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein
More informationChordal graphs MPRI
Chordal graphs MPRI 2017 2018 Michel Habib habib@irif.fr http://www.irif.fr/~habib Sophie Germain, septembre 2017 Schedule Chordal graphs Representation of chordal graphs LBFS and chordal graphs More structural
More informationLast class: Today: Deadlocks. Memory Management
Last class: Deadlocks Today: Memory Management CPU il1 L2 Memory Bus (e.g. PC133) Main Memory dl1 On-chip I/O Bus (e.g. PCI) Disk Ctrller Net. Int. Ctrller Network Binding of Programs to Addresses Address
More informationEvolutionary tree reconstruction (Chapter 10)
Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early
More informationEE512 Graphical Models Fall 2009
EE512 Graphical Models Fall 2009 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2009 http://ssli.ee.washington.edu/~bilmes/ee512fa09 Lecture 11 -
More informationTracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers
Tracking Acceleration with FPGAs Future Tracking, CMS Week 4/12/17 Sioni Summers Contents Introduction FPGAs & 'DataFlow Engines' for computing Device architecture Maxeler HLT Tracking Acceleration 2 Introduction
More informationGPU Accelerated Smith-Waterman
GPU Accelerated Smith-Waterman Yang Liu 1,WayneHuang 1,2, John Johnson 1, and Sheila Vaidya 1 1 Lawrence Livermore National Laboratory 2 DOE Joint Genome Institute, UCRL-CONF-218814 {liu24, whuang, jjohnson,
More informationHow Do We Measure Protein Shape? A Pattern Matching Example. A Simple Pattern Matching Algorithm. Comparing Protein Structures II
How Do We Measure Protein Shape? omparing Protein Structures II Protein function is largely based on the proteins geometric shape Protein substructures with similar shapes are likely to share a common
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More information15 Sharing Main Memory Segmentation and Paging
Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationAccelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture
Accelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture Da Zhang Collaborators: Hao Wang, Kaixi Hou, Jing Zhang Advisor: Wu-chun Feng Evolution of Genome Sequencing1 In 20032: 1 human genome
More informationCSC D70: Compiler Optimization Register Allocation
CSC D70: Compiler Optimization Register Allocation Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry and Phillip Gibbons
More informationHIGH PERFORMANCE NUMERICAL LINEAR ALGEBRA. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 HIGH PERFORMANCE NUMERICAL LINEAR ALGEBRA Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 BLAS BLAS 1, 2, 3 Performance GEMM Optimized BLAS Parallel
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationSequence clustering. Introduction. Clustering basics. Hierarchical clustering
Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering
More informationof the Balanced Minimum Evolution Polytope Ruriko Yoshida
Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida Figure 19.1 Genomes 3 ( Garland Science 2007) Origins of Species Tree (or web) of life eukarya
More informationTools and Primitives for High Performance Graph Computation
Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World
More informationProfiles and Multiple Alignments. COMP 571 Luay Nakhleh, Rice University
Profiles and Multiple Alignments COMP 571 Luay Nakhleh, Rice University Outline Profiles and sequence logos Profile hidden Markov models Aligning profiles Multiple sequence alignment by gradual sequence
More informationNetwork Based Models For Analysis of SNPs Yalta Opt
Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department
More informationCycle Time for Non-pipelined & Pipelined processors
Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all
More informationParallel Longest Increasing Subsequences in Scalable Time and Memory
Parallel Longest Increasing Subsequences in Scalable Time and Memory Peter Krusche Alexander Tiskin Department of Computer Science University of Warwick, Coventry, CV4 7AL, UK PPAM 2009 What is in this
More informationMaximum Clique Problem. Team Bushido bit.ly/parallel-computing-fall-2014
Maximum Clique Problem Team Bushido bit.ly/parallel-computing-fall-2014 Agenda Problem summary Research Paper 1 Research Paper 2 Research Paper 3 Software Design Demo of Sequential Program Summary Of the
More informationDouble-Precision Matrix Multiply on CUDA
Double-Precision Matrix Multiply on CUDA Parallel Computation (CSE 60), Assignment Andrew Conegliano (A5055) Matthias Springer (A995007) GID G--665 February, 0 Assumptions All matrices are square matrices
More informationMachine Learning. Computational biology: Sequence alignment and profile HMMs
10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth
More information16 Sharing Main Memory Segmentation and Paging
Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Director Bioinformatics & Research Computing Whitehead Institute Topics to Cover
More informationDATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305
Q.1 If h is any hashing function and is used to hash n keys in to a table of size m, where n
More informationComputational biology course IST 2015/2016
Computational biology course IST 2015/2016 Introduc)on to Algorithms! Algorithms: problem- solving methods suitable for implementation as a computer program! Data structures: objects created to organize
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationPrinciples of Bioinformatics. BIO540/STA569/CSI660 Fall 2010
Principles of Bioinformatics BIO540/STA569/CSI660 Fall 2010 Lecture 11 Multiple Sequence Alignment I Administrivia Administrivia The midterm examination will be Monday, October 18 th, in class. Closed
More informationDivide and Conquer Sorting Algorithms and Noncomparison-based
Divide and Conquer Sorting Algorithms and Noncomparison-based Sorting Algorithms COMP1927 16x1 Sedgewick Chapters 7 and 8 Sedgewick Chapter 6.10, Chapter 10 DIVIDE AND CONQUER SORTING ALGORITHMS Step 1
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationPage Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018
Page Replacement (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Today s Goals Making virtual memory virtual : incorporating disk backing. Explore page replacement policies
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationLesson 13 Molecular Evolution
Sequence Analysis Spring 2000 Dr. Richard Friedman (212)305-6901 (76901) friedman@cuccfa.ccc.columbia.edu 130BB Lesson 13 Molecular Evolution In this class we learn how to draw molecular evolutionary trees
More informationAnalyzing the performance of top-k retrieval algorithms. Marcus Fontoura Google, Inc
Analyzing the performance of top-k retrieval algorithms Marcus Fontoura Google, Inc This talk Largely based on the paper Evaluation Strategies for Top-k Queries over Memory-Resident Inverted Indices, VLDB
More informationPhylogenetics on CUDA (Parallel) Architectures Bradly Alicea
Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea
More informationReconstructing long sequences from overlapping sequence fragment. Searching databases for related sequences and subsequences
SEQUENCE ALIGNMENT ALGORITHMS 1 Why compare sequences? Reconstructing long sequences from overlapping sequence fragment Searching databases for related sequences and subsequences Storing, retrieving and
More informationChapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 4 Greedy Algorithms Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 4.5 Minimum Spanning Tree Minimum Spanning Tree Minimum spanning tree. Given a connected
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationBiclustering with δ-pcluster John Tantalo. 1. Introduction
Biclustering with δ-pcluster John Tantalo 1. Introduction The subject of biclustering is chiefly concerned with locating submatrices of gene expression data that exhibit shared trends between genes. That
More informationCompares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.
Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the
More informationThe Maximum Clique Problem
November, 2012 Motivation How to put as much left-over stuff as possible in a tasty meal before everything will go off? Motivation Find the largest collection of food where everything goes together! Here,
More informationLocality-sensitive hashing and biological network alignment
Locality-sensitive hashing and biological network alignment Laura LeGault - University of Wisconsin, Madison 12 May 2008 Abstract Large biological networks contain much information about the functionality
More informationFastA & the chaining problem
FastA & the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 1 Sources for this lecture: Lectures by Volker Heun, Daniel Huson and Knut Reinert,
More information15-780: Graduate Artificial Intelligence. Computational biology: Sequence alignment and profile HMMs
5-78: Graduate rtificial Intelligence omputational biology: Sequence alignment and profile HMMs entral dogma DN GGGG transcription mrn UGGUUUGUG translation Protein PEPIDE 2 omparison of Different Organisms
More informationMismatch String Kernels for SVM Protein Classification
Mismatch String Kernels for SVM Protein Classification by C. Leslie, E. Eskin, J. Weston, W.S. Noble Athina Spiliopoulou Morfoula Fragopoulou Ioannis Konstas Outline Definitions & Background Proteins Remote
More informationLecture 9: Core String Edits and Alignments
Biosequence Algorithms, Spring 2005 Lecture 9: Core String Edits and Alignments Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 9: String Edits and Alignments p.1/30 III:
More informationSingle Pass, BLAST-like, Approximate String Matching on FPGAs*
Single Pass, BLAST-like, Approximate String Matching on FPGAs* Martin Herbordt Josh Model Yongfeng Gu Bharat Sukhwani Tom VanCourt Computer Architecture and Automated Design Laboratory Department of Electrical
More informationLecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD
Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD Assumptions: Biological sequences evolved by evolution. Micro scale changes: For short sequences (e.g. one
More informationFastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:
FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:56 4001 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem
More informationMultipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs
Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs Haicheng Wu 1, Daniel Zinn 2, Molham Aref 2, Sudhakar Yalamanchili 1 1. Georgia Institute of Technology 2. LogicBlox
More informationBrief review from last class
Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationPage # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?
Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the
More informationECE 571 Advanced Microprocessor-Based Design Lecture 20
ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More information6.375 Ray Tracing Hardware Accelerator
6.375 Ray Tracing Hardware Accelerator Chun Fai Cheung, Sabrina Neuman, Michael Poon May 13, 2010 Abstract This report describes the design and implementation of a hardware accelerator for software ray
More informationAlgorithm Design Techniques (III)
Algorithm Design Techniques (III) Minimax. Alpha-Beta Pruning. Search Tree Strategies (backtracking revisited, branch and bound). Local Search. DSA - lecture 10 - T.U.Cluj-Napoca - M. Joldos 1 Tic-Tac-Toe
More informationKaisen Lin and Michael Conley
Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationData Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun
Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun Computer Systems Laboratory Stanford University http://www-hydra.stanford.edu A Chip Multiprocessor Implementation
More informationGlobal Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties
Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties From LCS to Alignment: Change the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence
More informationREDUCING GRAPH COLORING TO CLIQUE SEARCH
Asia Pacific Journal of Mathematics, Vol. 3, No. 1 (2016), 64-85 ISSN 2357-2205 REDUCING GRAPH COLORING TO CLIQUE SEARCH SÁNDOR SZABÓ AND BOGDÁN ZAVÁLNIJ Institute of Mathematics and Informatics, University
More informationCS 470/570 Exam 6 Spring 2017 Solution
CS 470/570 Exam 6 Spring 2017 Solution CS 470 Score is based on your best 4 out of 6 problems. CS 570 Score is based on your best 5 out of 6 problems. Extra credit will be awarded if you can solve additional
More informationEfficient Implementation of a Generalized Pair HMM for Comparative Gene Finding. B. Majoros M. Pertea S.L. Salzberg
Efficient Implementation of a Generalized Pair HMM for Comparative Gene Finding B. Majoros M. Pertea S.L. Salzberg ab initio gene finder genome 1 MUMmer Whole-genome alignment (optional) ROSE Region-Of-Synteny
More informationDesign Space Exploration Using Parameterized Cores
RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE
More informationCost Optimal Parallel Algorithm for 0-1 Knapsack Problem
Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Project Report Sandeep Kumar Ragila Rochester Institute of Technology sr5626@rit.edu Santosh Vodela Rochester Institute of Technology pv8395@rit.edu
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationFrom Smith-Waterman to BLAST
From Smith-Waterman to BLAST Jeremy Buhler July 23, 2015 Smith-Waterman is the fundamental tool that we use to decide how similar two sequences are. Isn t that all that BLAST does? In principle, it is
More informationFASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.
FASTA INTRODUCTION Definition (by David J. Lipman and William R. Pearson in 1985) - Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence
More informationCommunity detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati
Community detection algorithms survey and overlapping communities Presented by Sai Ravi Kiran Mallampati (sairavi5@vt.edu) 1 Outline Various community detection algorithms: Intuition * Evaluation of the
More informationChapter 8 Main Memory
COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 8 Main Memory Zhi Wang Florida State University Contents Background Swapping Contiguous memory allocation Paging Segmentation OS examples
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationChapter 9 Memory Management
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
More information