Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs
|
|
- Percival Dalton
- 5 years ago
- Views:
Transcription
1 Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs N. F. Samatova 1,2,+, M. C. Schmidt 1,2,, W. Hendrix 1,2,, P. Breimyer 1,2, K. Thomas 3 and B.-H. Park 2 1 Computer Science Department, North Carolina State University, Raleigh, NC 27695, USA 2 Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA 3 Cray, Inc. Seattle, WA 98104, USA Both authors contributed equally. + Corresponding author. samatovan@ornl.gov Abstract. Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP -hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation. 1. Introduction Many biological problems often reduce to graph problems, with the maximal clique enumeration (MCE) problem being ubiquitous. The solutions of the MCE problem are used, for example, to align 3-dimensional protein structures [1], to integrate genome mapping data [2], to identify coexpressed genes [3], to identify common secondary structure elements of proteins [4], to detect protein-protein interaction complexes [5], to cluster similar mass spectrometry spectra [6], and to find clusters of orthologous genes [7]. The MCE problem is NP -hard [8], and its run time, in practice, scales exponentially with the problem size, unless P = N P [9]. The challenge remains of how to scale an MCE algorithm to graphs with hundreds, thousands, or even millions of vertices, which are not unusual for the problems considered in these examples. c 2008 Ltd 1
2 The problem is exacerbated when, in addition to the size of the graphs, the uncertainty, or noise, in the data from which these graphs are derived is taken into account. In this case, multiple solutions of the MCE problem are often sought for various perturbed graphs. Perturbations may be induced by filtering out some edges due to applied edge weight cutoffs or by adding edges based on additional orthogonal information sources. For example, two genes in a gene expression network can be viewed as coexpressed (i.e., connected by an edge) if their Pearson correlation derived from microarray data is above a certain threshold; various thresholds will correspond to different network perturbations. Likewise, two proteins in a protein-protein interaction network can be considered interacting if, in addition to genomic-context information (e.g., their neighborhood colocation on the genome), mass spectrometry pull-down experiments become available. The challenge is to support such what-if explorations of biological graphs that are not only large but also uncertain and changing. In order to address both challenges, two major, interrelated advancements have been achieved. On the one hand, we developed a scalable, parallel MCE algorithm. It not only efficiently handles the data-intensive nature of the MCE problem but offers linear speedup, even using thousands of processors to run the algorithm on real-world biological graphs with thousands or hundreds of thousands of vertices. To the best of our knowledge, this is the first parallel MCE algorithm that scales to this number of processors on such large-scale, real-world problem sizes. On the other hand, we proposed and advanced a new theory, which we call Graph Perturbation Theory (GPT), that establishes a foundation for solving graph problems in perturbed graphs. The intuition behind GPT is quite simple: if a solution to the reference or unperturbed graph is known, then it can be used to find the exact solution for the perturbed graph more efficiently than complete recomputation, especially when the perturbation is relatively small. Specifically, we formulated necessary and sufficient conditions for the maximal cliques that are induced or eliminated by the addition or removal of an arbitrary number of edges to a reference graph. Based on this theory, we produced a practical MCE algorithm implementation for perturbed graphs that enumerates the maximal cliques of a perturbed graph by using efficient indexing of features derived from the MCE solution for the unperturbed graph to detect the changes in the composition of maximal cliques induced by the target perturbations. We demonstrated more than 80% efficiency improvement compared to the traditional enumeration of maximal cliques in protein interaction networks for multiple organisms, even when the number of added edges (i.e., perturbations) ranged between 20% and 136%. 2. Parallel MCE algorithm Our current parallel MCE algorithm is a parallelization of the widely used method of enumerating maximal cliques developed by Bron and Kerbosch (BK) [10]. Our previously developed pclique [11], the first parallel MCE algorithm, extends the algorithm of Kose et al [12] (dubbed as KOSE). In principle, KOSE is identical in spirit to the BK algorithm; it branches using alphanumeric ordering. However, whereas BK is a recursive algorithm with depth-first search (DFS) branching, KOSE is a serialized algorithm with breadth-first search (BFS) branching (see [13] for DFS and BFS definitions), which allows cliques of size k to be generated from cliques of size k 1. Consequently, all maximal cliques are produced in lexicographic order, which is an invaluable asset in certain applications. Nevertheless, the BFS branching strategy inevitably makes KOSE memory-intensive. Although pclique improves KOSE performance by using bitvector manipulation of common neighbors, the huge memory requirements remain unchanged. This limitation affects both the size of the graphs that can be handled by pclique and the speedup achieved by using more processors. As a result, on an SGI Altix 3700 machine, pclique achieved a speedup factor of just 91 on 256 processors for a graph of 2,895 vertices and 10,914 edges. This nonideal scaling motivated the development of a parallel DFS-based BK algorithm. To enable the parallelization of BK, we proposed an effective decomposition of the BK search 2
3 tree, in which leaf nodes represent maximal cliques, into subtasks of generating the child nodes of an interior node. To allow this decomposition, we introduced a candidate path data structure containing the minimal information required for a BK search subtree exploration [7, 14]. The difficulty in applying this decomposition lies in the fact that the nature of the BK search tree makes it impossible to determine a well-balanced distribution of subtasks a priori. In particular, the size and number of cliques in the subtree beneath a particular tree node are unknown until that subtree has been fully generated. Thus, certain unlucky computing elements may generate subtrees with more and larger maximal cliques than others. Without allowing these overloaded computing nodes to transfer work to underloaded nodes, the execution time for different computing elements may differ greatly (see figure 1). Figure 1. Clarifying example of the impact of dynamic load-balancing (DLB) on the execution times of the various processes. The black bars give the finishing times of the 16 processes used to run the parallel algorithm without dynamic load balancing. The white bars represent the finishing times with dynamic load balancing. The graphs were obtained by running the parallel algorithm on the Shewanella oneidensis gene expression graph. The load balancing scheme we proposed intelligently couples a dynamic (runtime) work stealing process [15] with a stack splitting procedure [16] in order to minimize the idle (noncomputing) time over all processors. The amount of work for a computing element can be measured by the number of candidate path structures left in that computing element s stack. However, because the number of candidate paths remaining in a stack decreases gradually and increases rapidly over the course of the algorithm s execution, predicting exactly when a computing element will become idle is virtually impossible. To overcome this, we implemented a receiver-initiated scheme, allowing a computing element to become almost idle (stack size below some threshold) before requesting more work. When this threshold is reached, the idle computing element requests more candidate paths from another randomly chosen computing element. (This process, called random polling, is one of the most efficient methods of requesting work when the underlying architecture of the computing system is unknown [15].) If the randomly chosen computing element has work available, it sends the candidate path structures most likely to represent large subtrees, a procedure motivated by the concept of stack splitting [16]. If the responding computing element has no work, then the requesting computing element selects another random computing element and repeats the request, with the program terminating when all computing elements are idle. Figure 2 shows that the speedup of the algorithm is linear, and thus the initialization and idle time are small relative to the total execution time. Parallel execution of the program is achieved by generating multiple processes, each capable of spawning multiple threads [17]. Interprocess communication is performed using MPI communication, and the threaded behavior of the application is enabled using POSIX threads 3
4 Figure 2. Speedup of the parallel algorithm on the Saccharomyces cerevisiae protein interaction network with between 1 and 2,048 processes on a Cray XT4. (Pthreads). Each process is assumed to have its own memory that its associated threads share. This hybrid parallelism is motivated by the fact that many modern high-performance machines consist of clusters of symmetric multiprocessing (SMP) units. By combining both sharedmemory and distributed memory parallelism techniques, better performance is achieved. In addition, the implementation is portable across different computer architectures. 3. Graph perturbation theory and algorithms The basic idea behind graph perturbation theory is to examine the differences (added or removed edges, in our case) between two graphs an original graph, for which the maximal cliques have already been enumerated, and a perturbed graph and to list only the set of maximal cliques that are introduced and destroyed by the perturbation. By leveraging the enumeration of the original graph, the maximal clique enumeration of the perturbed graph may be calculated more quickly. Intuitively, if the perturbation between the two is relatively small, the two difference sets will be smaller than the full enumeration for the perturbed graph. The following basic definitions are necessary before setting out our theory. Let G and G new denote the original and perturbed graphs, respectively, and let C and C new be the maximal clique enumeration of each. Define the difference sets C + = C new \ C and C = C \ C new. Theorems 3.1 and 3.2 establish simple necessary and sufficient conditions for containment in C + and C, respectively. Theorem 3.1. C C + if and only if C is a maximal clique in G new that contains some edge being added to G. Proof. Clearly, if a maximal clique A C + contained some edge being added to G, C would not be a clique in G, so A C +. Let C C new be such that C contains no edge being added to G. As such, C is be a clique of G. Thus, either C C, or there must be some C C that strictly contains C. By our definition of C +, C / C + if C C. Also, since no edges are being removed from G, C would have to be in C new, but this contradicts our assumption that C is a maximal clique in G. Theorem 3.2. A maximal clique C C if and only if C is a clique in G and C is a subset of some C C +. 4
5 Proof. Let C be an arbitrary maximal clique of C. Since no edges are being removed from G and C is a clique in G, C must be a clique in G new. Thus, C / C new if and only if C is not maximal, and C is not maximal if and only if there exists some clique C C new such that C is a proper subset of C. Such a C could not be a maximal clique of G as this would contradict the maximality of C, so C C +. By Theorem 3.1, we know that all cliques of C + are maximal cliques of G new containing some edge being added to G. Thus, to calculate this set, we use a modified version of the BK algorithm. On weighted protein-protein interaction networks for nine different organisms from [18], where the weight of each edge represents the probability two proteins interact, we generated graphs by applying thresholds at probabilities 0.75 and The perturbations introduced by lowering the threshold to 0.70 accounted for 20 40% of the edges in the networks for 6 of the 9 organisms. The networks for S. typhimurium and M. tuberculosis saw healthy perturbations of 48% and 68%, respectively, but the E. coli network underwent a full 136% change in its number of edges. After calculating the maximal clique enumeration for the network of each organism using the cutoff 0.75, we calculated the maximal clique enumeration for the graph induced by the cutoff 0.70 via both the perturbed graph algorithm as well as a single-threaded version of the original BK implementation. The percentage improvement of the perturbational algorithm over BK appears in figure 3. Figure 3. Percentage runtime improvement of perturbational algorithm over BK for the induced perturbations As shown in the figure, all clique enumerations were produced by the perturbational algorithm in 50 85% less time than full recalculation by BK even for the E. coli network, where more edges were added than existed in the original graph. While these results favor the perturbational algorithm, the algorithm performed better on this very large perturbation than would be suggested by intuition. Better performance is typically observed under smaller (less than 20%) perturbations applied to the reference graph (results are not reported). 4. Conclusion We reported a novel capability for efficient enumeration of maximal cliques in biological graphs derived from large-scale, uncertain, and dynamically changing biological data. We demonstrated 5
6 the first parallel MCE algorithm that scales linearly on thousands of processors for real-world biological networks with thousands and hundreds of thousands of vertices. We proposed the Graph Perturbation Theory (GPT) that takes advantage of the solution provided by parallel MCE on the reference graph to significantly reduce the time required to solve the MCE problem on the perturbed graphs. We developed a practical implementation of the perturbed MCE algorithm that utilizes efficient database indices, constructed using the GPT theory, to achieve improved performance. The application of the MCE algorithms to real-world biological networks across multiple organisms has been demonstrated. Acknowledgments The authors are thankful to Cray Inc. for the access to large-scale Cray XT systems and the insights into the code optimization and benchmarks. This research has been supported by the Exploratory Data Intensive Computing for Complex Biological Systems project from U.S. Department of Energy (Office of Advanced Scientific Computing Research, Office of Science). The work of NFS was also sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under contract no. DEAC05-00OR References [1] Chen Y and Crippen G M 2005 Protein Science [2] Harley E, Bonner A and Goodman N 2001 Bioinformatics [3] Rokhlenko O, Wexler Y and Yakhini Z 2007 Bioinformatics 23 e184 e190 [4] Grindley H M, Artymiuk P J, Rice D W and Willett P 1993 Journal of Molecular Biology [5] Zhang B, Park B H, Karpinets T and Samatova N F Bioinformatics (Oxford, England) [6] Tabb D L, Thompson M R, Khalsa-Moyers G, VerBerkmoes N C and McDonald W H 2005 Journal of the American Society for Mass Spectrometry [7] Park B H, Samatova N F, Karpinets T, Jallouk A, Molony S, Horton S and Arcangeli S 2007 SciDAC 2007 vol 78 (Boston, Massachusetts) [8] Lawler E L, Lenstra J K and Kan A H G R 1980 SIAM Journal on Computing [9] Garey M R and Johnson D S 1979 Computers and Intractability: A Guide to the Theory of NP-Completeness (WH Freeman & Co. New York, NY, USA) [10] Bron C and Kerbosch J 1973 Communications of the ACM [11] Zhang Y, Abu-Khzam F, Baldwin N, Chesler E, Langston M and Samatova N 2005 Supercomputing, Proceedings of the ACM/IEEE SC 2005 Conference p 12 [12] Kose F, Weckwerth W, Linke T and Fiehn O 2001 Bioinformatics [13] Cormen T, Leiserson C E, Rivest R L and Stein C 2001 Introduction to Algorithms 2nd ed (McGraw-Hill) [14] Park B H, Schmidt M, Thomas K, Karpinets T and Samatova N F 2008 Upcoming in Proceedings of IPDPS 2008 [15] Kumar V, Grama A Y and Vempaty N R 1994 Journal of Parallel and Distributed Computing [16] Finkel R and Manber U 1987 ACM Trans. Program. Lang. Syst [17] Thomas K, Samatova N F, Schmidt M and Park B H 2008 Upcoming in Proceedings of CUG 2008 [18] Flannick J, Novak A, Srinivasan B S, McAdams H H and Batzoglou S 2006 Genome Research
Parallel, Scalable, Memory-Efficient Backtracking for Combinatorial Modeling of Large-Scale Biological Systems 1
Parallel, Scalable, Memory-Efficient Backtracking for Combinatorial Modeling of Large-Scale Biological Systems 1 Byung-Hoon Park *, Matthew Schmidt *,+, Kevin Thomas #, Tatiana Karpinets *, Nagiza F. Samatova
More informationMaximum Clique Problem. Team Bushido bit.ly/parallel-computing-fall-2014
Maximum Clique Problem Team Bushido bit.ly/parallel-computing-fall-2014 Agenda Problem summary Research Paper 1 Research Paper 2 Research Paper 3 Software Design Demo of Sequential Program Summary Of the
More informationThe Maximum Common Subgraph Problem: Faster Solutions via Vertex Cover
The Maximum Common Subgraph Problem: Faster Solutions via Vertex Cover Faisal N. Abu-Khzam Division of Computer Science and Mathematics Lebanese American University Beirut, Lebanon faisal.abukhzam@lau.edu.lb
More informationHEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM
Proceedings of the International Conference on Theory and Applications of Mathematics and Informatics - ICTAMI 24, Thessaloniki, Greece HEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM
More informationLessons Learned from Exploring the Backtracking Paradigm on the GPU
Lessons Learned from Exploring the Backtracking Paradigm on the GPU John Jenkins 1,2, Isha Arkatkar 1,2, John D. Owens 3, Alok Choudhary 4, and Nagiza F. Samatova 1,2,5 1 North Carolina State University,
More informationComplexity Results on Graphs with Few Cliques
Discrete Mathematics and Theoretical Computer Science DMTCS vol. 9, 2007, 127 136 Complexity Results on Graphs with Few Cliques Bill Rosgen 1 and Lorna Stewart 2 1 Institute for Quantum Computing and School
More informationDistributed minimum spanning tree problem
Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with
More informationVISUALIZING NP-COMPLETENESS THROUGH CIRCUIT-BASED WIDGETS
University of Portland Pilot Scholars Engineering Faculty Publications and Presentations Shiley School of Engineering 2016 VISUALIZING NP-COMPLETENESS THROUGH CIRCUIT-BASED WIDGETS Steven R. Vegdahl University
More informationGuidelines for Efficient Parallel I/O on the Cray XT3/XT4
Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4
More informationA Connection between Network Coding and. Convolutional Codes
A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source
More informationA 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition
A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition Wenjun Li a, Binhai Zhu b, a Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha
More informationKernelization Algorithms for the Vertex Cover Problem: Theory and Experiments
Kernelization Algorithms for the Vertex Cover Problem: Theory and Experiments (Extended Abstract) Faisal N. Abu-Khzam, Rebecca L. Collins, Michael R. Fellows, Michael A. Langston, W. Henry Suters and Chris
More informationA SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES)
Chapter 1 A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES) Piotr Berman Department of Computer Science & Engineering Pennsylvania
More informationFast, Effective Vertex Cover Kernelization: A Tale of Two Algorithms
Fast, Effective Vertex Cover Kernelization: A Tale of Two Algorithms Faisal N. Abu-Khzam Division of Computer Science and Mathematics Lebanese American University Beirut, Lebanon faisal.abukhzam@lau.edu.lb
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM
ON THE COMPLEXITY OF THE BROADCAST SCHEDULING PROBLEM SERGIY I. BUTENKO, CLAYTON W. COMMANDER, AND PANOS M. PARDALOS Abstract. In this paper, a Broadcast Scheduling Problem (bsp) in a time division multiple
More informationThe NP-Completeness of Some Edge-Partition Problems
The NP-Completeness of Some Edge-Partition Problems Ian Holyer y SIAM J. COMPUT, Vol. 10, No. 4, November 1981 (pp. 713-717) c1981 Society for Industrial and Applied Mathematics 0097-5397/81/1004-0006
More informationDecoupled Software Pipelining in LLVM
Decoupled Software Pipelining in LLVM 15-745 Final Project Fuyao Zhao, Mark Hahnenberg fuyaoz@cs.cmu.edu, mhahnenb@andrew.cmu.edu 1 Introduction 1.1 Problem Decoupled software pipelining [5] presents an
More informationPreemptive Scheduling of Equal-Length Jobs in Polynomial Time
Preemptive Scheduling of Equal-Length Jobs in Polynomial Time George B. Mertzios and Walter Unger Abstract. We study the preemptive scheduling problem of a set of n jobs with release times and equal processing
More informationExact Algorithms Lecture 7: FPT Hardness and the ETH
Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,
More informationPhylogenetic networks that display a tree twice
Bulletin of Mathematical Biology manuscript No. (will be inserted by the editor) Phylogenetic networks that display a tree twice Paul Cordue Simone Linz Charles Semple Received: date / Accepted: date Abstract
More informationBI-OBJECTIVE EVOLUTIONARY ALGORITHM FOR FLEXIBLE JOB-SHOP SCHEDULING PROBLEM. Minimizing Make Span and the Total Workload of Machines
International Journal of Mathematics and Computer Applications Research (IJMCAR) ISSN 2249-6955 Vol. 2 Issue 4 Dec - 2012 25-32 TJPRC Pvt. Ltd., BI-OBJECTIVE EVOLUTIONARY ALGORITHM FOR FLEXIBLE JOB-SHOP
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationA Parallel Algorithm for Exact Structure Learning of Bayesian Networks
A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu
More informationThe strong chromatic number of a graph
The strong chromatic number of a graph Noga Alon Abstract It is shown that there is an absolute constant c with the following property: For any two graphs G 1 = (V, E 1 ) and G 2 = (V, E 2 ) on the same
More informationA Scalable Parallel HITS Algorithm for Page Ranking
A Scalable Parallel HITS Algorithm for Page Ranking Matthew Bennett, Julie Stone, Chaoyang Zhang School of Computing. University of Southern Mississippi. Hattiesburg, MS 39406 matthew.bennett@usm.edu,
More informationProvably Efficient Non-Preemptive Task Scheduling with Cilk
Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the
More informationComplementary Graph Coloring
International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,
More informationExtending scalability of the community atmosphere model
Journal of Physics: Conference Series Extending scalability of the community atmosphere model To cite this article: A Mirin and P Worley 2007 J. Phys.: Conf. Ser. 78 012082 Recent citations - Evaluation
More informationOn the Complexity of Broadcast Scheduling. Problem
On the Complexity of Broadcast Scheduling Problem Sergiy Butenko, Clayton Commander and Panos Pardalos Abstract In this paper, a broadcast scheduling problem (BSP) in a time division multiple access (TDMA)
More informationProject Report on. De novo Peptide Sequencing. Course: Math 574 Gaurav Kulkarni Washington State University
Project Report on De novo Peptide Sequencing Course: Math 574 Gaurav Kulkarni Washington State University Introduction Protein is the fundamental building block of one s body. Many biological processes
More informationTheoretical Computer Science
Theoretical Computer Science 407 (2008) 564 568 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Note A note on the problem of reporting
More informationA Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,
More informationScan Scheduling Specification and Analysis
Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract
More informationCombinatorial Optimization and Integer Linear Programming
Combinatorial Optimization and Integer Linear Programming 3 Combinatorial Optimization: Introduction Many problems arising in practical applications have a special, discrete and finite, nature: Definition.
More informationThe Complexity of the Network Design Problem
The Complexity of the Network Design Problem D. S. Johnson Bell Laboratories Murray Hill, New Jersey J. K. Lenstra Mathematisch Centrurn Amsterdam, The Netherlands A. H. G. Rinnooy Kan Erasmus University
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationPrinciples of Parallel Algorithm Design: Concurrency and Mapping
Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday
More informationpr: AUTOMATIC PARALLELIZATION OF DATA- PARALLEL STATISTICAL COMPUTING CODES FOR R IN HYBRID MULTI-NODE AND MULTI-CORE ENVIRONMENTS
pr: AUTOMATIC PARALLELIZATION OF DATA- PARALLEL STATISTICAL COMPUTING CODES FOR R IN HYBRID MULTI-NODE AND MULTI-CORE ENVIRONMENTS Paul Breimyer 1,2 Guruprasad Kora 2 William Hendrix 1,2 Neil Shah 1,2
More informationMissing Data Estimation in Microarrays Using Multi-Organism Approach
Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008
More informationOn the Min-Max 2-Cluster Editing Problem
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 29, 1109-1120 (2013) On the Min-Max 2-Cluster Editing Problem LI-HSUAN CHEN 1, MAW-SHANG CHANG 2, CHUN-CHIEH WANG 1 AND BANG YE WU 1,* 1 Department of Computer
More informationParallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle
Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Plamenka Borovska Abstract: The paper investigates the efficiency of parallel branch-and-bound search on multicomputer cluster for the
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationAn Effective Upperbound on Treewidth Using Partial Fill-in of Separators
An Effective Upperbound on Treewidth Using Partial Fill-in of Separators Boi Faltings Martin Charles Golumbic June 28, 2009 Abstract Partitioning a graph using graph separators, and particularly clique
More informationDesigning parallel algorithms for constructing large phylogenetic trees on Blue Waters
Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin Molloy University of Illinois at Urbana Champaign General Allocation (PI: Tandy Warnow) Exploratory Allocation
More informationMining maximal cliques from large graphs using MapReduce
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2012 Mining maximal cliques from large graphs using MapReduce Michael Steven Svendsen Iowa State University Follow
More informationComputing optimal total vertex covers for trees
Computing optimal total vertex covers for trees Pak Ching Li Department of Computer Science University of Manitoba Winnipeg, Manitoba Canada R3T 2N2 Abstract. Let G = (V, E) be a simple, undirected, connected
More informationEnhancing Internet Search Engines to Achieve Concept-based Retrieval
Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern
More informationTHE FIRST APPROXIMATED DISTRIBUTED ALGORITHM FOR THE MINIMUM DEGREE SPANNING TREE PROBLEM ON GENERAL GRAPHS. and
International Journal of Foundations of Computer Science c World Scientific Publishing Company THE FIRST APPROXIMATED DISTRIBUTED ALGORITHM FOR THE MINIMUM DEGREE SPANNING TREE PROBLEM ON GENERAL GRAPHS
More informationClustering Algorithms In Data Mining
2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017) Clustering Algorithms In Data Mining Xiaosong Chen 1, a 1 Deparment of Computer Science, University of Vermont,
More informationA Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase
More informationGraphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin
Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright
More informationFASCIA. Fast Approximate Subgraph Counting and Enumeration. 2 Oct Scalable Computing Laboratory The Pennsylvania State University 1 / 28
FASCIA Fast Approximate Subgraph Counting and Enumeration George M. Slota Kamesh Madduri Scalable Computing Laboratory The Pennsylvania State University 2 Oct. 2013 1 / 28 Overview Background Motivation
More informationON WEIGHTED RECTANGLE PACKING WITH LARGE RESOURCES*
ON WEIGHTED RECTANGLE PACKING WITH LARGE RESOURCES* Aleksei V. Fishkin, 1 Olga Gerber, 1 Klaus Jansen 1 1 University of Kiel Olshausenstr. 40, 24118 Kiel, Germany {avf,oge,kj}@informatik.uni-kiel.de Abstract
More informationA Note on Vertex Arboricity of Toroidal Graphs without 7-Cycles 1
International Mathematical Forum, Vol. 11, 016, no. 14, 679-686 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.1988/imf.016.667 A Note on Vertex Arboricity of Toroidal Graphs without 7-Cycles 1 Haihui
More informationLinear Problem Kernels for NP-Hard Problems on Planar Graphs
Linear Problem Kernels for NP-Hard Problems on Planar Graphs Jiong Guo and Rolf Niedermeier Institut für Informatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, D-07743 Jena, Germany. {guo,niedermr}@minet.uni-jena.de
More informationChordal graphs MPRI
Chordal graphs MPRI 2017 2018 Michel Habib habib@irif.fr http://www.irif.fr/~habib Sophie Germain, septembre 2017 Schedule Chordal graphs Representation of chordal graphs LBFS and chordal graphs More structural
More informationResponse Network Emerging from Simple Perturbation
Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced
More informationTowards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison
Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison Jing Jin, Biplab K. Sarker, Virendra C. Bhavsar, Harold Boley 2, Lu Yang Faculty of Computer Science, University of New
More informationOn Demand Phenotype Ranking through Subspace Clustering
On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu
More informationDIT411/TIN175, Artificial Intelligence. Peter Ljunglöf. 23 January, 2018
DIT411/TIN175, Artificial Intelligence Chapters 3 4: More search algorithms CHAPTERS 3 4: MORE SEARCH ALGORITHMS DIT411/TIN175, Artificial Intelligence Peter Ljunglöf 23 January, 2018 1 TABLE OF CONTENTS
More informationp v P r(v V opt ) = Algorithm 1 The PROMO algorithm for module identification.
BIOINFORMATICS Vol. no. 6 Pages 1 PROMO : A Method for identifying modules in protein interaction networks Omer Tamuz, Yaron Singer, Roded Sharan School of Computer Science, Tel Aviv University, Tel Aviv,
More informationSeminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm
Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of
More informationSet Cover with Almost Consecutive Ones Property
Set Cover with Almost Consecutive Ones Property 2004; Mecke, Wagner Entry author: Michael Dom INDEX TERMS: Covering Set problem, data reduction rules, enumerative algorithm. SYNONYMS: Hitting Set PROBLEM
More informationSolving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster
Solving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster Plamenka Borovska Abstract: The paper investigates the efficiency of the parallel computation of the travelling
More informationTreewidth and graph minors
Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under
More informationPrinciples of Parallel Algorithm Design: Concurrency and Mapping
Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction
More informationConsistency and Set Intersection
Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study
More informationThe Maximum Clique Problem
November, 2012 Motivation How to put as much left-over stuff as possible in a tasty meal before everything will go off? Motivation Find the largest collection of food where everything goes together! Here,
More informationA FAST LONGEST COMMON SUBSEQUENCE ALGORITHM FOR BIOSEQUENCES ALIGNMENT
A FAST LONGEST COMMON SUBSEQUENCE ALGORITHM FOR BIOSEQUENCES ALIGNMENT Wei Liu 1,*, Lin Chen 2, 3 1 Institute of Information Science and Technology, Nanjing University of Aeronautics and Astronautics,
More informationDeciphering the Information Encoded in RNA Viral Genomes
Deciphering the Information Encoded in RNA Viral Genomes Christine E. Heitsch Genome Center of Wisconsin and Mathematics Department University of Wisconsin Madison Detecting and Processing Regularities
More informationGrouping Genetic Algorithm with Efficient Data Structures for the University Course Timetabling Problem
Grouping Genetic Algorithm with Efficient Data Structures for the University Course Timetabling Problem Felipe Arenales Santos Alexandre C. B. Delbem Keywords Grouping Genetic Algorithm Timetabling Problem
More informationCLASS-ROOM NOTES: OPTIMIZATION PROBLEM SOLVING - I
Sutra: International Journal of Mathematical Science Education, Technomathematics Research Foundation Vol. 1, No. 1, 30-35, 2008 CLASS-ROOM NOTES: OPTIMIZATION PROBLEM SOLVING - I R. Akerkar Technomathematics
More informationFaster parameterized algorithms for Minimum Fill-In
Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht
More informationLecture 10: Strongly Connected Components, Biconnected Graphs
15-750: Graduate Algorithms February 8, 2016 Lecture 10: Strongly Connected Components, Biconnected Graphs Lecturer: David Witmer Scribe: Zhong Zhou 1 DFS Continued We have introduced Depth-First Search
More informationSCHEDULING WITH RELEASE TIMES AND DEADLINES ON A MINIMUM NUMBER OF MACHINES *
SCHEDULING WITH RELEASE TIMES AND DEADLINES ON A MINIMUM NUMBER OF MACHINES * Mark Cieliebak 1, Thomas Erlebach 2, Fabian Hennecke 1, Birgitta Weber 1, and Peter Widmayer 1 1 Institute of Theoretical Computer
More informationIntroducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence
Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence Ratko Orlandic Department of Computer Science and Applied Math Illinois Institute of Technology
More informationApplied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees
Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility
More informationPerformance of a Direct Numerical Simulation Solver forf Combustion on the Cray XT3/4
Performance of a Direct Numerical Simulation Solver forf Combustion on the Cray XT3/4 Ramanan Sankaran and Mark R. Fahey National Center for Computational Sciences Oak Ridge National Laboratory Jacqueline
More informationEnumerating the decomposable neighbours of a decomposable graph under a simple perturbation scheme
Enumerating the decomposable neighbours of a decomposable graph under a simple perturbation scheme Alun Thomas Department of Biomedical Informatics University of Utah Peter J Green Department of Mathematics
More informationModeling System Calls for Intrusion Detection with Dynamic Window Sizes
Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Eleazar Eskin Computer Science Department Columbia University 5 West 2th Street, New York, NY 27 eeskin@cs.columbia.edu Salvatore
More informationModified Vertex Support Algorithm: A New approach for approximation of Minimum vertex cover
Abstract Research Journal of Computer and Information Technology Sciences ISSN 2320 6527 Vol. 1(6), 7-11, November (2013) Modified Vertex Support Algorithm: A New approach for approximation of Minimum
More informationSolution of P versus NP problem
Algorithms Research 2015, 4(1): 1-7 DOI: 105923/jalgorithms2015040101 Solution of P versus NP problem Mustapha Hamidi Meknes, Morocco Abstract This paper, taking Travelling Salesman Problem as our object,
More informationHonour Thy Neighbour Clique Maintenance in Dynamic Graphs
Honour Thy Neighbour Clique Maintenance in Dynamic Graphs Thorsten J. Ottosen Department of Computer Science, Aalborg University, Denmark nesotto@cs.aau.dk Jiří Vomlel Institute of Information Theory and
More informationGenetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)
Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony
More informationComparing Implementations of Optimal Binary Search Trees
Introduction Comparing Implementations of Optimal Binary Search Trees Corianna Jacoby and Alex King Tufts University May 2017 In this paper we sought to put together a practical comparison of the optimality
More informationHardness of Subgraph and Supergraph Problems in c-tournaments
Hardness of Subgraph and Supergraph Problems in c-tournaments Kanthi K Sarpatwar 1 and N.S. Narayanaswamy 1 Department of Computer Science and Engineering, IIT madras, Chennai 600036, India kanthik@gmail.com,swamy@cse.iitm.ac.in
More informationA Visualization Program for Subset Sum Instances
A Visualization Program for Subset Sum Instances Thomas E. O Neil and Abhilasha Bhatia Computer Science Department University of North Dakota Grand Forks, ND 58202 oneil@cs.und.edu abhilasha.bhatia@my.und.edu
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming
More informationVertical decomposition of a lattice using clique separators
Vertical decomposition of a lattice using clique separators Anne Berry, Romain Pogorelcnik, Alain Sigayret LIMOS UMR CNRS 6158 Ensemble Scientifique des Cézeaux Université Blaise Pascal, F-63 173 Aubière,
More informationAdvances in Parallel Branch and Bound
ECAI 2012 Advances in Parallel Branch and Bound Lars Otten and Rina Dechter Dept. of Computer Science University of California, Irvine Summary Parallelizing AND/OR Branch and Bound: - Advanced optimization
More informationOn the Space-Time Trade-off in Solving Constraint Satisfaction Problems*
Appeared in Proc of the 14th Int l Joint Conf on Artificial Intelligence, 558-56, 1995 On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Roberto J Bayardo Jr and Daniel P Miranker
More informationA Virtual Laboratory for Study of Algorithms
A Virtual Laboratory for Study of Algorithms Thomas E. O'Neil and Scott Kerlin Computer Science Department University of North Dakota Grand Forks, ND 58202-9015 oneil@cs.und.edu Abstract Empirical studies
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationOn the Complexity of Various Parameterizations of Common Induced Subgraph Isomorphism
On the Complexity of Various Parameterizations of Common Induced Subgraph Isomorphism Faisal N. Abu-Khzam 1, Édouard Bonnet2 and Florian Sikora 2 1 Lebanese American University, Beirut, Lebanon faisal.abukhzam@lau.edu.lb
More informationA Hybrid Recursive Multi-Way Number Partitioning Algorithm
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Hybrid Recursive Multi-Way Number Partitioning Algorithm Richard E. Korf Computer Science Department University
More informationDynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers
Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers A. Srivastava E. Han V. Kumar V. Singh Information Technology Lab Dept. of Computer Science Information Technology Lab Hitachi
More informatione-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data
: Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal
More informationPLANAR GRAPH BIPARTIZATION IN LINEAR TIME
PLANAR GRAPH BIPARTIZATION IN LINEAR TIME SAMUEL FIORINI, NADIA HARDY, BRUCE REED, AND ADRIAN VETTA Abstract. For each constant k, we present a linear time algorithm that, given a planar graph G, either
More information