Maximum Clique Problem. Team Bushido bit.ly/parallel-computing-fall-2014

Similar documents
Maximum Clique Solver using Bitsets on GPUs

Search Algorithms for Discrete Optimization Problems

Minimal Dominating Sets in Graphs: Enumeration, Combinatorial Bounds and Graph Classes

This document is the Accepted Manuscript version of a Published Work that appeared in final form in Journal of Chemical Information and Modeling

The Maximum Clique Problem

A Comparison of Decomposition Methods for the Maximum Common Subgraph Problem

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Introduction to Combinatorial Algorithms

Lecture 4: Graph Algorithms

Randomized Algorithms

Lecture 3, Review of Algorithms. What is Algorithm?

A Performance Analysis on Maximal Common Subgraph Algorithms

CAD Algorithms. Categorizing Algorithms

The Maximum Common Subgraph Problem: Faster Solutions via Vertex Cover

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

Search Algorithms for Discrete Optimization Problems

Geometric data structures:

Computing optimal total vertex covers for trees

Parallelizing Maximal Clique Enumeration in Haskell

The Size Robust Multiple Knapsack Problem

Chapter 11 Search Algorithms for Discrete Optimization Problems

Implementing Scalable Parallel Search Algorithms for Data-Intensive Applications

Efficient query processing

Arabesque. A system for distributed graph mining. Mohammed Zaki, RPI

REDUCING GRAPH COLORING TO CLIQUE SEARCH

2. (a) Explain when the Quick sort is preferred to merge sort and vice-versa.

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Search Algorithms. IE 496 Lecture 17

CS301 - Data Structures Glossary By

Faster parameterized algorithms for Minimum Fill-In

Coloring the nodes of a directed graph

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

CS 231: Algorithmic Problem Solving

Practice Final Exam 1

Branch-and-bound: an example

Constraint Satisfaction Problems

A Decomposition for Chordal graphs and Applications

Solution of Maximum Clique Problem. by Using Branch and Bound Method

Some graph theory applications. communications networks

Interaction Between Input and Output-Sensitive

Combinatorial Algorithms. Unate Covering Binate Covering Graph Coloring Maximum Clique

Complementary Graph Coloring

Community Detection. Community

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Backtracking. Chapter 5

Search and Optimization

Lecture 5: Search Algorithms for Discrete Optimization Problems

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Comparing the Best Maximum Clique Finding Algorithms, Which are Using Heuristic Vertex Colouring

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Parallel Computing in Combinatorial Optimization

Computing Largest Correcting Codes and Their Estimates Using Optimization on Specially Constructed Graphs p.1/30

Graph Algorithms. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

21# 33# 90# 91# 34# # 39# # # 31# 98# 0# 1# 2# 3# 4# 5# 6# 7# 8# 9# 10# #

3 INTEGER LINEAR PROGRAMMING

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Decomposition of log-linear models

Maximum Clique Problem

Anytime AND/OR Depth-first Search for Combinatorial Optimization

General Methods and Search Algorithms

Faster parameterized algorithms for Minimum Fill-In

Introduction to Parallel Computing

Chapter-6 Backtracking

Branch and Bound. Live-node: A node that has not been expanded. It is similar to backtracking technique but uses BFS-like search.

Combinatorial Search; Monte Carlo Methods

Artificial Intelligence

54 Years of Graph Isomorphism Testing

Efficient Subgraph Matching by Postponing Cartesian Products

Graph Partitioning for Scalable Distributed Graph Computations

Dominating Set on Bipartite Graphs

Accelerating the Prediction of Protein Interactions

Artificial Intelligence

Constraint Satisfaction Problems

Solving scheduling problems using parallel message-passing based constraint programming

Multicore Triangle Computations Without Tuning

15.083J Integer Programming and Combinatorial Optimization Fall Enumerative Methods

Web Structure Mining Community Detection and Evaluation

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Sorting (Chapter 9) Alexandre David B2-206

Global Register Allocation - Part 2

B553 Lecture 12: Global Optimization

On Graph Query Optimization in Large Networks

Scalable GPU Graph Traversal!

Extremal Graph Theory. Ajit A. Diwan Department of Computer Science and Engineering, I. I. T. Bombay.

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Parallel Query Optimisation

Simplicity is Beauty: Improved Upper Bounds for Vertex Cover

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem

Graph Definitions. In a directed graph the edges have directions (ordered pairs). A weighted graph includes a weight function.

DESIGN AND ANALYSIS OF ALGORITHMS

LECTURE 17 GRAPH TRAVERSALS

Monte Carlo Methods; Combinatorial Search

DATA STRUCTURES AND ALGORITHMS

Graph Theory. Connectivity, Coloring, Matching. Arjun Suresh 1. 1 GATE Overflow

Locality- Sensitive Hashing Random Projections for NN Search

Simple and Fast: Improving a Branch-And-Bound Algorithm for Maximum Clique

Parallel Traveling Salesman. PhD Student: Viet Anh Trinh Advisor: Professor Feng Gu.

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Transcription:

Maximum Clique Problem Team Bushido bit.ly/parallel-computing-fall-2014

Agenda Problem summary Research Paper 1 Research Paper 2 Research Paper 3 Software Design Demo of Sequential Program

Summary Of the Problem What is a clique? Find maximum clique NP Complete

Research Paper 1 Author: Depolli, Matjaž and Konc, Janez and Rozman, Kati and Trobec, Roman and Janežič, Dušanka Title: Exact Parallel Maximum Clique Algorithm for General and Protein Graphs Journal: Journal of Chemical Information and Modeling Volume: 53 Number: 9 Pages: 2217-2228 Date: August 21, 2013 Number of pages: 12

Research Paper 1 Protein Graphs - Product graphs Sequential approach - MaxCliqueSeq (MCS) Parallel approach - MaxCliquePara (MCP)

Research Paper 1 Protein Graphs Vertices Edges Product graphs Why? detect structural similarities development of new drugs Exact Parallel Maximum Clique Algorithm for General and Protein Graphs, part of Figure 2 on page 2219

Research Paper 1 MaxCliqueSeq - Maximum clique algorithm Approximate Graph Coloring Initial Vertex Ordering Use of bit strings for encoding adjacency matrix Combination has proven faster than other techniques for different protein graphs

Research Paper 1 Approximate coloring Assigns colors to vertices, adjacent vertices different Number of colors used as bound condition in MCS Coloring is NP complete, but approximate algorithm works on greedy strategy. O(n 2 ) Number of colors presents upper bound to size of maximum clique MCS uses it to prune branches of search tree

Research Paper 1 Initial vertex ordering Order in which vertices are fed to the coloring algorithm affects overall MCS performance Coloring is tighter if vertices are fed in increasing order of degree Vertices with same degrees are ordered in increasing order if ex-degree Preprocessing involves: ex-degree calculation > sort by degree > sort by ex-degree > renumbering vertex with highest degree 1, then 2 and so on

Research Paper 1 Use of bit strings Encode adjacency matrix Bit strings offer fast set operations Slower in counting number of elements or extracting elements with lowest or highest degree

Research Paper 2 Author: Schmidt, Matthew C. and Samatova, Nagiza F. and Thomas, Kevin and Park, Byung-Hoon Title: A Scalable, Parallel Algorithm for Maximal Clique Enumeration Journal: J. Parallel Distrib. Comput. Volume: 69 Number: 4 Pages: 417-428 Date: April, 2009 Number of pages: 12

Problems addressed Finding maximal clique for practical applications consisting of huge number of vertices(thousands & more) Huge combinatorial search space Unbalanced load distribution

Solution Depth first backtracking search constraining search space and memory Decomposition of search tree Improved load balancing by on demand distribution of data

Sequential Approach - Backtracking algorithm 1 2 5 Current vertex / / 1,2,3,4,5 / not list 3 4 output 2 Candidate 2 2 1,3,4,5 / 3 5 3 2,3 1,4 / 5 2,5 4 / 1 4 4 1 1,2,3 / / 4 2,3,4 / / 4 2,5,4 / /

Improving serial algorithm efficiency 3 possibilities to represent graph will be considered 1. Adjacency list 2. Adjacency bit matrix 3. Hash table of edges

Proposed parallel approach 1. Decomposes the search tree traversal into independent tasks of traversing unexplored search subtrees. 2. Candidate data structure is the unexplored search subtree that consists of the vertex being visited by the search node along with the 3 lists. 3. Subtrees may vary greatly in size 4. Also certain subtrees may generate more maximal cliques than others 5. On demand distribution of load between the computing nodes

Proposed parallel approach 1. Divide the vertices of graph to the computing nodes 2. Each node maintains the candidate path structure in a stack for the vertex assigned 3. Each of the node then runs the sequential clique algorithm and adds new candidate paths (unexplored subtrees) to the stack in the process 4. If any of the node completes task earlier a. Get the load(candidate path structure) from the random thread within same process b. If all threads are idle in that process then get the load from a random process using interprocess communication 5. When all processes become idle, they can terminate

Parallel approach 1. Time for the parallel algorithm with p processors T (p) = T init + max ( T enum (i)+ T idle (i) ) where 1 i p 2. Should be minimum 3. Ideally all the computing tasks should end at same time T enum (i)+ T idle (i) should be same for all processors 4. Speedup = T(p)/ T(2p) 5. For linear speedup T init and T idle (i) should be minimum

Algorithm Design Reading input graph 1. All processes need graph data 2. Parallel read limited by file scalability 3. One process reads the graph 4. Broadcast to other processes

Algorithm Design Initial load distribution 1. Initial distribution include dividing vertices amongst the computing nodes 2. T init should be as small as possible 3. More time required if single process does this initial distribution to the other processes 4. Each thread within the process will select vertices on its own using thread number and process rank

Algorithm Design Stack Splitting 1. Candidate path structures (Unexplored subtrees) transferred from bottom of stack 2. Candidate paths will be examined from top of stack 3. Bottom candidates represent unexplored subtrees at higher level of subtrees 4. Likely to generate more work (candidates) therefore avoiding too many on demand transfer requests.

Research Paper 3 Author: S Szabó Title: Parallel algorithms for finding cliques in a graph Journal: Journal of Physics: Conference Series}, Issue: 1 Volume: 268 Number: 1 Date: 2011 Number of pages: 21

Research Paper 3 1. Split Partitions Graph G = (V, E) V partitioned into 3 sets C 1, C 2, C 3 C 1,C 3 C 1 and C 3 are not connected. Subgraphs G 1 = C 1 U C 2, G 2 = C 2 U C 3

Research Paper 3 2. Coloring Technique Split the vertices into t partitions C 1, C 2, C 3.. C t the size of the maximum clique <= t Provides an upper bound on the size of the clique

Research Paper 3 3. Dominating Nodes Graph G =(V, E) N(v) = neighbours of v v V Node a is dominated by Node b if they are not connected and N(a) N(b)

a b d e f b dominates a

Research Paper 3 Dominance relation is transitive Algorithm to find node dominance: Divide the whole Vertex Set V into two lists L 1 and L 2 L 1 = N(a) L 2 = V-N(a)-a Used in Carraghan-Pardalos clique search algorithm

Research Paper 3 4. Dominating edges Graph G = (V, E) Edge (a, u) is dominated by edge (u, b) if b is not connected to a and N(a) N(u) N(u) N(b) Edge dominance is transitive

Sequential program demo

Thank you!