Maximum Clique Problem Team Bushido bit.ly/parallel-computing-fall-2014
Agenda Problem summary Research Paper 1 Research Paper 2 Research Paper 3 Software Design Demo of Sequential Program
Summary Of the Problem What is a clique? Find maximum clique NP Complete
Research Paper 1 Author: Depolli, Matjaž and Konc, Janez and Rozman, Kati and Trobec, Roman and Janežič, Dušanka Title: Exact Parallel Maximum Clique Algorithm for General and Protein Graphs Journal: Journal of Chemical Information and Modeling Volume: 53 Number: 9 Pages: 2217-2228 Date: August 21, 2013 Number of pages: 12
Research Paper 1 Protein Graphs - Product graphs Sequential approach - MaxCliqueSeq (MCS) Parallel approach - MaxCliquePara (MCP)
Research Paper 1 Protein Graphs Vertices Edges Product graphs Why? detect structural similarities development of new drugs Exact Parallel Maximum Clique Algorithm for General and Protein Graphs, part of Figure 2 on page 2219
Research Paper 1 MaxCliqueSeq - Maximum clique algorithm Approximate Graph Coloring Initial Vertex Ordering Use of bit strings for encoding adjacency matrix Combination has proven faster than other techniques for different protein graphs
Research Paper 1 Approximate coloring Assigns colors to vertices, adjacent vertices different Number of colors used as bound condition in MCS Coloring is NP complete, but approximate algorithm works on greedy strategy. O(n 2 ) Number of colors presents upper bound to size of maximum clique MCS uses it to prune branches of search tree
Research Paper 1 Initial vertex ordering Order in which vertices are fed to the coloring algorithm affects overall MCS performance Coloring is tighter if vertices are fed in increasing order of degree Vertices with same degrees are ordered in increasing order if ex-degree Preprocessing involves: ex-degree calculation > sort by degree > sort by ex-degree > renumbering vertex with highest degree 1, then 2 and so on
Research Paper 1 Use of bit strings Encode adjacency matrix Bit strings offer fast set operations Slower in counting number of elements or extracting elements with lowest or highest degree
Research Paper 2 Author: Schmidt, Matthew C. and Samatova, Nagiza F. and Thomas, Kevin and Park, Byung-Hoon Title: A Scalable, Parallel Algorithm for Maximal Clique Enumeration Journal: J. Parallel Distrib. Comput. Volume: 69 Number: 4 Pages: 417-428 Date: April, 2009 Number of pages: 12
Problems addressed Finding maximal clique for practical applications consisting of huge number of vertices(thousands & more) Huge combinatorial search space Unbalanced load distribution
Solution Depth first backtracking search constraining search space and memory Decomposition of search tree Improved load balancing by on demand distribution of data
Sequential Approach - Backtracking algorithm 1 2 5 Current vertex / / 1,2,3,4,5 / not list 3 4 output 2 Candidate 2 2 1,3,4,5 / 3 5 3 2,3 1,4 / 5 2,5 4 / 1 4 4 1 1,2,3 / / 4 2,3,4 / / 4 2,5,4 / /
Improving serial algorithm efficiency 3 possibilities to represent graph will be considered 1. Adjacency list 2. Adjacency bit matrix 3. Hash table of edges
Proposed parallel approach 1. Decomposes the search tree traversal into independent tasks of traversing unexplored search subtrees. 2. Candidate data structure is the unexplored search subtree that consists of the vertex being visited by the search node along with the 3 lists. 3. Subtrees may vary greatly in size 4. Also certain subtrees may generate more maximal cliques than others 5. On demand distribution of load between the computing nodes
Proposed parallel approach 1. Divide the vertices of graph to the computing nodes 2. Each node maintains the candidate path structure in a stack for the vertex assigned 3. Each of the node then runs the sequential clique algorithm and adds new candidate paths (unexplored subtrees) to the stack in the process 4. If any of the node completes task earlier a. Get the load(candidate path structure) from the random thread within same process b. If all threads are idle in that process then get the load from a random process using interprocess communication 5. When all processes become idle, they can terminate
Parallel approach 1. Time for the parallel algorithm with p processors T (p) = T init + max ( T enum (i)+ T idle (i) ) where 1 i p 2. Should be minimum 3. Ideally all the computing tasks should end at same time T enum (i)+ T idle (i) should be same for all processors 4. Speedup = T(p)/ T(2p) 5. For linear speedup T init and T idle (i) should be minimum
Algorithm Design Reading input graph 1. All processes need graph data 2. Parallel read limited by file scalability 3. One process reads the graph 4. Broadcast to other processes
Algorithm Design Initial load distribution 1. Initial distribution include dividing vertices amongst the computing nodes 2. T init should be as small as possible 3. More time required if single process does this initial distribution to the other processes 4. Each thread within the process will select vertices on its own using thread number and process rank
Algorithm Design Stack Splitting 1. Candidate path structures (Unexplored subtrees) transferred from bottom of stack 2. Candidate paths will be examined from top of stack 3. Bottom candidates represent unexplored subtrees at higher level of subtrees 4. Likely to generate more work (candidates) therefore avoiding too many on demand transfer requests.
Research Paper 3 Author: S Szabó Title: Parallel algorithms for finding cliques in a graph Journal: Journal of Physics: Conference Series}, Issue: 1 Volume: 268 Number: 1 Date: 2011 Number of pages: 21
Research Paper 3 1. Split Partitions Graph G = (V, E) V partitioned into 3 sets C 1, C 2, C 3 C 1,C 3 C 1 and C 3 are not connected. Subgraphs G 1 = C 1 U C 2, G 2 = C 2 U C 3
Research Paper 3 2. Coloring Technique Split the vertices into t partitions C 1, C 2, C 3.. C t the size of the maximum clique <= t Provides an upper bound on the size of the clique
Research Paper 3 3. Dominating Nodes Graph G =(V, E) N(v) = neighbours of v v V Node a is dominated by Node b if they are not connected and N(a) N(b)
a b d e f b dominates a
Research Paper 3 Dominance relation is transitive Algorithm to find node dominance: Divide the whole Vertex Set V into two lists L 1 and L 2 L 1 = N(a) L 2 = V-N(a)-a Used in Carraghan-Pardalos clique search algorithm
Research Paper 3 4. Dominating edges Graph G = (V, E) Edge (a, u) is dominated by edge (u, b) if b is not connected to a and N(a) N(u) N(u) N(b) Edge dominance is transitive
Sequential program demo
Thank you!