Foundation of Parallel Computing- Term project report

Size: px
Start display at page:

Download "Foundation of Parallel Computing- Term project report"

Transcription

1 Foundation of Parallel Computing- Term project report Shobhit Dutia Shreyas Jayanna Anirudh S N (snd7555@rit.edu) (sj7316@rit.edu) (asn5467@rit.edu) 1. Overview: Graphs are a set of connections between nodes or items and represent some sort of a relation between them. It can represent anything from distances between cities to a group of people in a given space. In the practical world, graphs are used in finding shortest distance between cities, networks, manufacturing scheduling and electrical circuits. In this project, we have worked on a Graph coloring problem. 2. Description of computational problem: We will be looking at a particular graph problem involving coloring its vertices. Graph coloring is a form of graph labelling. It is an NP-hard problem and involves assigning minimum number of colors to the nodes in such a way that no two adjacent vertices share the same colors. An extension of the graph coloring problem is the graph-k coloring problem where each vertex uses a different color. Graph coloring problems are used in processor scheduling, student class scheduling and radio frequency assignment. 3. Research paper 1: The paper by Boman et. al [1] presents their own graph coloring algorithm that uses a distributed memory structure. It is a parallel algorithm and works as a set of supersteps i.e. the number of vertices to be colored before sending and receiving the color information from other processors. The algorithm has two phases. The first phase is called tentative coloring phase and the second is called the conflict detection phase. Tentative coloring phase requires communication between the various processors to send and receive the colors of boundary vertices and is therefore organized into supersteps which reduce the communication frequency of coloring. On the other hand, conflict detection phase resolves conflicts by randomly selecting a processor for color re-assignment and hence there is no communication needed in this phase. Thus, conflict detection phase is not organized into supersteps as this can be done independently. In the tentative coloring phase, the vertices are colored by each processor concurrently and the color information is exchanged with the other processors after the given number of vertices are colored in the superstep. The messages are sent so as to reduce the number of conflicts in the subsequent supersteps. The second phase - conflict detection, is used to detect such conflicts and resolve them. The processor that colors a given vertex is chosen at random. Further, the paper discusses variations to this algorithm. These are as follows: a. Initial partitioning: If the vertices are partitioned to processors in such a way that the number of cross edged links across processors is high, it will result in a poor performance as there may be a large number of conflicts. b. Synchronous vs asynchronous supersteps: In synchronous mode there exists a barrier at the end of each superstep. This is advantageous as in the conflict detection phase, the color of the boundary vertices needs to be checked only against its neighbors colored in the same superstep. This however introduces a delay as there exists a barrier at the end of every superstep. On the other hand, in asynchronous mode, there are no barriers at the end of supersteps and vertices use the information that is available completely at that instant. Thus the conflict detection phase, the boundary vertices needs to be checked against all its off processor neighbors. The number of conflicts may possibly be more in asynchronous mode. 4. Research paper 2: The second research paper by Fredrik Manne[2] et.al focusses on speeding up parallel graph colouring algorithms. They explain prior work done on it and go on to present their experimental results based on ordering and

2 partitioning of graphs. The paper discusses the Gebremedhin and Manne parallel algorithm which assumes that the number of processors p is less than the number of vertices n in the graph. Each processor is assigned n/p vertices. The algorithm is shown as the second algorithm. The authors have chosen this algorithm because they believed that this is the only algorithm that can be expected to perform even faster through addition of more processors. They also discuss the issues faced in ordering of vertices in a graph and partitioning of a graph. According to the paper, a highly clustered with minimum number of boundary vertices are preferable as they increase cache hits and reduce random access and inter-process communication. In the experiments conducted, two graphs are chosen and the vertices are visited using three methods - their natural order, a random order and using reverse Cuthill-McKee(RCM) order. They conclude that random ordering increases running time by a factor of three and RCM ordering reduces running time by almost 31%. Based on these findings, graphs were ordered using RCM and their vertex partitioning was compared using Metis partitioning method. The paper concludes by saying that what determines speed in a graph coloring problem is only two things - partitioning method and ordering of graph. Choosing the right one will determine how fast the problem is solved. 5. Research paper 3: Erik Saule, et.al propose few methods in the paper Improving graph coloring on distributed-memory parallel computers to improve the number of colors used to color a graph. The first two methods target the coloring phase and the other three methods target the recoloring phase. The two methods that are proposed in this paper for coloring phase concentrate on the vertex-visit orderings. They are Largest First (LF) and Smallest Last (SL) orderings. In the LF ordering scheme, the vertex with the largest degree is selected and removed from the graph for the next round of ordering. This is repeated until all the vertices in the graph are colored. The SF ordering is the exact opposite of this scheme. Once the vertices are ordered based on either of these schemes, the coloring of the vertices will follow the same order. This approach might result in a different coloring of the graph when colored on multiple processors as compared to coloring on a single processor because each processor would follow the same ordering scheme but only local knowledge is used by each processor for vertex ordering, i.e., each processor will order the vertices in its set based on the local knowledge that it possesses about those vertices. The methods investigated in this paper for recoloring are permutation of color classes - Reverse Order (RO) of colors, Non-Increasing (NI) number of vertices and Non-Decreasing (ND) number of vertices. In the recoloring phase, vertices that belong to the same color classes are considered to be colored in a consecutive manner. In the NI scheme, the color classes are ordered in the non-increasing order of the number of vertices and the ND scheme is the exact opposite of this scheme. The authors of this paper tested with different types of graphs with a combination of these approaches. In their study, the authors found that coloring large conflicts inducing graphs in a synchronous manner yielded better results. They found that vertex visiting based on the properties of the graph and properties of the partition, when considered together gave closer to optimal number of colors. 6. Sequential program-design: The sequential program operates on the Greedy algorithm presented as a pseudo-code below. It first reads the input and creates an adjacency list of vertices. For a possible color in the colorset, we check if the colorset contains the possible color. If not, the current vertex color is set to the color chosen from possible colorset. Otherwise, the possible colorset is incremented by one and the entire process is repeated. The data structures for representing the graph is a TreeMap as it provides natural ordering. For the sequential version, we can also use a hashset however we kept it as a TreeMap as it is used in the parallel version where we need to maintain the ordering. This is since the vertices are ordered from 1 to n, thread 1 will have the initial values, thread 2 will have the next values and so on. The reason for this partitioning is that we are implementing an efficient graph partitioning approach where the number of boundary vertices across the threads is less which would not have been the case if the threads would get a random value.

3 The complexity of the algorithm is O(V*E) where V is the number of vertices in the graph and E is the number of edges. The disadvantage of the sequential approach is the obvious time taken in solving it. Algorithm: Read input Let colorset=set of possible colors For each vertex: Get adjacency list of vertex For all adjacent vertices Get color of adjacent vertices into a adjcolorlist For possiblecolor in colorset Check if adjcolorlist contains possiblecolor If not, set current vertex color to possiblecolor and continue to next vertex If yes, increment posiblecolor and repeat 7. Parallel program-design: The parallel algorithm we implemented was a variation of the one in the research paper 1[1]. We modified the algorithm wherein the entire graph memory could be accessed by all the threads. Thus there was a huge advantage of minimizing the latency of sending and receiving messages across the processors as the required information was accessed directly. This approach is of course limited to using a single node as using a cluster would require sharing messages across tuple space which would induce the latency. In our approach, only the thread which is currently assigned the required node can modify the node s color. Another thread can only possibly read the information at the same time as the other thread however it will never write it. As we have introduced a barrier after the tentative coloring phase, all the processors will update their respective information and only then proceed to the conflict detection phase. Even if the thread reading the information reads the information which is updated by the other thread, at the conflict detection phase, the information is again read thus there is no problem in this approach. Moreover, we have also modified the algorithm in research paper 1[1] in terms of load balancing. In our approach, after the conflict detection phase if there exists say 5 conflicts in thread 1 and only 1 conflict in thread 2, this will be reduced to a new vertex set of size 6 and the tentative coloring phase will start again with a balanced load of 3-3 vertices instead of the unbalanced load of 5-1. This, in our opinion is also a huge advantage. Algorithm: The terms used for the algorithm are: a. Superstep: Number of vertices the processor will color before exchanging color information with other processors. Step 1: Let colorset=set of possible colors Step 2: Read input vertex set Step 3: While size of vertex set is not empty Step 3a: Distribute the vertex set across all the processors (parallel for loop) Step 3b: For each processor Partition the vertex set into l subsets of size s (superstep size) For i=1 to l For each vertex Assign the vertex a permissible color (same as the sequential algorithm) Send and receive colors of boundary vertices Step 3c: For each processor Partition the vertex set into l subsets of size s For i=1 to l For each vertex If vertex is a boundary vertex and has a same color with adjacent vertex

4 Add it to a set S_thread of vertices to be recolored Step 3d: Let all the S_threads be reduced to a single set S Step 3e: Let vertex set=s 8. Developer manual: GraphColSmp.java contains the parallel version of the graph coloring problem. GraphColSeq.java is the sequential version of the graph coloring problem. Compile either of the sequential version or the parallel version along with the Graph.java, Node.java and the VertexSetVbl.java using javac *.java. To run the program use java pj2 <cores=n> GraphColSeq <filename> where the number of cores are optional (default=1 core). 9. Creating input files: The input to the code was a hypergraph of 22 dimensions. We re-used the code from HypercubeGraph.java file. Input file 1 was generated as an output of HypercubeGraph.java which resulted as a single hypercube of 22 dimensions. To create input file 2, we generated the same output but with different numbered vertices and concatenated the output with the input 1 which resulted in two 22 dimensional hypercubes. We then connected some of the vertices from input 1 with input 2 manually. In a similar manner we created input files 3, 4 and 5 respectively. This effectively doubled the size at every input. 1. Strong scaling performance: We tested the strong scaling metric across 8 cores with five different problem sizes. Following tables represent the running time, speedup and efficiency. Also, we increased the problem size since the presentation 3 and got much better results. #Vertices #Edges K T (msec) Speedup Eff 4,194,34 46,137,344 Seq ,388,68 92,274,699 Seq ,582, ,412,63 Seq 1584

5 Running Time (msec) ,777, ,549,47 Seq ,971,52 257,529,856 Seq Running Time vs V = 4,194,34 V = 8,388,68 V = 12,582,912 V = 16,777,216 V = 2,971,52

6 Speedup Efficiency Efficiency vs V = 4,194,34 V = 8,388,68 V = 12,582,912 V = 16,777,216 V = 2,971, Speedup vs. V = 4,194,34 V = 8,388,68 V = 12,582,912 V = 16,777,216 V = 2,971, Comment on strong scaling performance: As we can see from the above charts, the speedup is influenced by the input size. For a sufficiently large input, we have a very good efficiency. However for a small input graph, the maximum speedup on 8 cores is 6. Also, we are getting an extremely good speedup for large input graphs. This is since all our inputs are hypergraphs, all the vertices have roughly the same number of edges (roughly since there are vertices which we have connected manually across the different input sets). Say if two processors are coloring the given input file. After the processors color their respective subsets, and in the worst case, even if all the boundary vertices are colored the same, the next phase of the algorithm will consist of only half of the vertices. 12. Weak scaling performance: n(1) K n T (msec) Sizeup Sizeup Efficiency 4,194,34 Seq 4,194, ,194, ,388, ,582, ,777, ,971, ,165, ,36,

7 Efficiency Sizeup Runnint Time (msec) 8 33,554, Running Time vs n(1) = 4,194,34 Sizeup vs n(1) = 4,194,34 Efficiency vs n(1) = 4,194, Why non-ideal weak scaling is occurring: As we can see from the weak scaling graphs, the efficiency slightly decreases as we increase the input size and number of cores. This is due to the fact that as we are increasing the cores and size of the graph, the number of boundary vertices also increases. This in-turn increases the likelihood of the conflicts and thus increases the number of vertices that need to be re-colored.

8 14. Future work: We propose the use of techniques presented in research paper 2 [2] to achieve a better speed up of the above algorithm. The major techniques termed are: a. Sequential optimization: The second research paper proposes a technique to order the vertices in a way that the neighbors of vertices span as few cache lines as possible. b. Assign vertices to processors with less number of cross edged links: As it is evident from the algorithm, the number of boundary vertices directly influence the possibility of having more number of conflicts. Thus, if the graph is partitioned in such a way that the number of cross edged links are minimized, we can get a better speed up. VMetis [5] is a library that can be used to create efficient partitioning of graphs. c. Reduced conflict checking: In the current algorithm, the interior vertices are also iterated to check for conflicts in the conflict detection phase. There is no need to do so as only the boundary vertices can be iterated. 15. What we learned: a. Speed of algorithm: We learned that the speed of the algorithm is directly influenced by the way the graph is partitioned across the processors. If we partition the graph in a way that the number of cross edge links is low across processors, a better scaling is obtained. b. Next, we also learned that mid-sized graphs work the best as opposed to very sparse or very dense graphs. 16. Distribution of work: The project topic phase, which was one of the most time consuming phase, was decided mutually by each team member. Next, the team presentations had an equal effort wherein each team member contributed to different sections of the presentation viz. describing the topic area, presentation of sequential code, presentation of parallel code. As there were three research papers, each team member had a task of analyzing the research papers in depth. The majority the code (sequential as well as parallel) was written by Shobhit Dutia and Shreyas Jayanna. 17. References: [1] Erik G. Boman, Doruk Bozdağ, Umit Catalyurek, Assefaw H. Gebremedhin, Fredrik Manne, A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers, 11th International Euro-Par Conference, Lisbon, Portugal, Proceedings, v 3648, pp , 25. [2] Assefaw H. Gebremedhin,Fredrik Manne, and Tom Woods, Speeding up Parallel Graph Coloring, 7th International Workshop, PARA 24, Lyngby, Denmark, June 2-23, 24. Revised Selected Papers, v 3732, pp , 26. [3] Sariyuce A.E., Saule E., Catalyurek U.V, Improving graph coloring on distributed-memory parallel computers, 18th International Conference on High Performance Computing (HiPC), pp 1-1, 211

A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers

A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers Erik G. Boman 1, Doruk Bozdağ 2, Umit Catalyurek 2,,AssefawH.Gebremedhin 3,, and Fredrik Manne 4 1 Sandia National Laboratories,

More information

A Parallel Distance-2 Graph Coloring Algorithm for Distributed Memory Computers

A Parallel Distance-2 Graph Coloring Algorithm for Distributed Memory Computers A Parallel Distance-2 Graph Coloring Algorithm for Distributed Memory Computers Doruk Bozdağ 1, Umit Catalyurek 1, Assefaw H. Gebremedhin 2, Fredrik Manne 3, Erik G. Boman 4,andFüsun Özgüner 1 1 Ohio State

More information

DISTRIBUTED-MEMORY PARALLEL ALGORITHMS FOR DISTANCE-2 COLORING AND THEIR APPLICATION TO DERIVATIVE COMPUTATION

DISTRIBUTED-MEMORY PARALLEL ALGORITHMS FOR DISTANCE-2 COLORING AND THEIR APPLICATION TO DERIVATIVE COMPUTATION DISTRIBUTED-MEMORY PARALLEL ALGORITHMS FOR DISTANCE-2 COLORING AND THEIR APPLICATION TO DERIVATIVE COMPUTATION DORUK BOZDAĞ, ÜMİT V. ÇATALYÜREK, ASSEFAW H. GEBREMEDHIN, FREDRIK MANNE, ERIK G. BOMAN, AND

More information

c 2010 Society for Industrial and Applied Mathematics

c 2010 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 32, No. 4, pp. 2418 2446 c 2010 Society for Industrial and Applied Mathematics DISTRIBUTED-MEMORY PARALLEL ALGORITHMS FOR DISTANCE-2 COLORING AND RELATED PROBLEMS IN DERIVATIVE

More information

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem Project Report Sandeep Kumar Ragila Rochester Institute of Technology sr5626@rit.edu Santosh Vodela Rochester Institute of Technology pv8395@rit.edu

More information

Maximum Clique Problem

Maximum Clique Problem Maximum Clique Problem Dler Ahmad dha3142@rit.edu Yogesh Jagadeesan yj6026@rit.edu 1. INTRODUCTION Graph is a very common approach to represent computational problems. A graph consists a set of vertices

More information

Scalable Hybrid Implementation of Graph Coloring using MPI and OpenMP

Scalable Hybrid Implementation of Graph Coloring using MPI and OpenMP Scalable Hybrid Implementation of Graph Coloring using MPI and OpenMP Ahmet Erdem Sarıyüce, Erik Saule, and Ümit V. Çatalyürek Department of Biomedical Informatics Department of Computer Science and Engineering

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Last time Network topologies Intro to MPI Matrix-matrix multiplication Today MPI I/O Randomized Algorithms Parallel k-select Graph coloring Assignment 2 Parallel I/O Goal of Parallel

More information

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Subset Sum Problem Parallel Solution

Subset Sum Problem Parallel Solution Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in

More information

Parallel Distance-k Coloring Algorithms for Numerical Optimization

Parallel Distance-k Coloring Algorithms for Numerical Optimization Parallel Distance-k Coloring Algorithms for Numerical Optimization Assefaw Hadish Gebremedhin 1, Fredrik Manne 1, and Alex Pothen 2 1 Department of Informatics, University of Bergen, N-5020 Bergen, Norway

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

Accelerated Load Balancing of Unstructured Meshes

Accelerated Load Balancing of Unstructured Meshes Accelerated Load Balancing of Unstructured Meshes Gerrett Diamond, Lucas Davis, and Cameron W. Smith Abstract Unstructured mesh applications running on large, parallel, distributed memory systems require

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Parallel Distance-k Coloring Algorithms for Numerical Optimization

Parallel Distance-k Coloring Algorithms for Numerical Optimization Parallel Distance-k Coloring Algorithms for Numerical Optimization Assefaw Hadish Gebremedhin Fredrik Manne Alex Pothen Abstract Matrix partitioning problems that arise in the efficient estimation of sparse

More information

Enabling High Performance Computational Science through Combinatorial Algorithms

Enabling High Performance Computational Science through Combinatorial Algorithms Enabling High Performance Computational Science through Combinatorial Algorithms Erik G. Boman 1, Doruk Bozdag 2, Umit V. Catalyurek 2, Karen D. Devine 1, Assefaw H. Gebremedhin 3, Paul D. Hovland 4, Alex

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

Parallel Computing 38 (2012) Contents lists available at SciVerse ScienceDirect. Parallel Computing

Parallel Computing 38 (2012) Contents lists available at SciVerse ScienceDirect. Parallel Computing Parallel Computing 38 (2012) 576 594 Contents lists available at SciVerse ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Graph coloring algorithms for multi-core and massively

More information

Graph and Hypergraph Partitioning for Parallel Computing

Graph and Hypergraph Partitioning for Parallel Computing Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913)

FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) Overview The partitioning of data points according to certain features of the points into small groups is called clustering.

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

Parallel Computing in Combinatorial Optimization

Parallel Computing in Combinatorial Optimization Parallel Computing in Combinatorial Optimization Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Course Outline Objective: provide an overview of the current research on the design of parallel

More information

Chapter 16 Heuristic Search

Chapter 16 Heuristic Search Chapter 16 Heuristic Search Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Computer Science 385 Design and Analysis of Algorithms Siena College Spring Topic Notes: Brute-Force Algorithms

Computer Science 385 Design and Analysis of Algorithms Siena College Spring Topic Notes: Brute-Force Algorithms Computer Science 385 Design and Analysis of Algorithms Siena College Spring 2019 Topic Notes: Brute-Force Algorithms Our first category of algorithms are called brute-force algorithms. Levitin defines

More information

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns

More information

Models of distributed computing: port numbering and local algorithms

Models of distributed computing: port numbering and local algorithms Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February

More information

N N Sudoku Solver. Sequential and Parallel Computing

N N Sudoku Solver. Sequential and Parallel Computing N N Sudoku Solver Sequential and Parallel Computing Abdulaziz Aljohani Computer Science. Rochester Institute of Technology, RIT Rochester, United States aaa4020@rit.edu Abstract 'Sudoku' is a logic-based

More information

COMP3121/3821/9101/ s1 Assignment 1

COMP3121/3821/9101/ s1 Assignment 1 Sample solutions to assignment 1 1. (a) Describe an O(n log n) algorithm (in the sense of the worst case performance) that, given an array S of n integers and another integer x, determines whether or not

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators)

CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators) Name: Sample Solution Email address (UWNetID): CSE 332 Autumn 2016 Final Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering.

More information

The Traveling Salesman Problem: Adapting 2-Opt And 3-Opt Local Optimization to Branch & Bound Techniques

The Traveling Salesman Problem: Adapting 2-Opt And 3-Opt Local Optimization to Branch & Bound Techniques The Traveling Salesman Problem: Adapting 2-Opt And 3-Opt Local Optimization to Branch & Bound Techniques Hitokazu Matsushita hitokazu@byu.edu Ogden Mills ogdenlaynemills@gmail.com Nathan Lambson nlambson@gmail.com

More information

WalkSAT: Solving Boolean Satisfiability via Stochastic Search

WalkSAT: Solving Boolean Satisfiability via Stochastic Search WalkSAT: Solving Boolean Satisfiability via Stochastic Search Connor Adsit cda8519@rit.edu Kevin Bradley kmb3398@rit.edu December 10, 2014 Christian Heinrich cah2792@rit.edu Contents 1 Overview 1 2 Research

More information

Distributed-Memory Parallel Algorithms for Matching and Coloring

Distributed-Memory Parallel Algorithms for Matching and Coloring Distributed-Memory Parallel Algorithms for Matching and Coloring Ümit V. Çatalyürek, Florin Dobrian, Assefaw Gebremedhin, Mahantesh Halappanavar, Alex Pothen Depts. of Biomedial Informatics and Electrical

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

Chapter 26 Cluster Heuristic Search

Chapter 26 Cluster Heuristic Search Chapter 26 Cluster Heuristic Search Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Introduction to Combinatorial Algorithms

Introduction to Combinatorial Algorithms Fall 2009 Intro Introduction to the course What are : Combinatorial Structures? Combinatorial Algorithms? Combinatorial Problems? Combinatorial Structures Combinatorial Structures Combinatorial structures

More information

Objective of the present work

Objective of the present work Objective of the present work Optimize Road Network Optimize Airline Network Optimize Rail Network Optimize Computer Network Optimize Social Network Spanners and Emulators For most graph algorithms, the

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050/VT3 Part I Introduction Overview Graphs definitions & representations Minimal Spanning Tree (MST) Prim s algorithm Single Source

More information

Parallel Sorting. Sathish Vadhiyar

Parallel Sorting. Sathish Vadhiyar Parallel Sorting Sathish Vadhiyar Parallel Sorting Problem The input sequence of size N is distributed across P processors The output is such that elements in each processor P i is sorted elements in P

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES

More information

Scalable GPU Graph Traversal!

Scalable GPU Graph Traversal! Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang

More information

arxiv: v1 [cs.dc] 16 May 2012

arxiv: v1 [cs.dc] 16 May 2012 Graph Coloring Algorithms for Multi-core and Massively Multithreaded Architectures Ümit V. Çatalyürek John Feo Assefaw H. Gebremedhin Mahantesh Halappanavar Alex Pothen arxiv:1205.3809v1 [cs.dc] 16 May

More information

Search Algorithms for Discrete Optimization Problems

Search Algorithms for Discrete Optimization Problems Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Supplementary Material for The Generalized PatchMatch Correspondence Algorithm

Supplementary Material for The Generalized PatchMatch Correspondence Algorithm Supplementary Material for The Generalized PatchMatch Correspondence Algorithm Connelly Barnes 1, Eli Shechtman 2, Dan B Goldman 2, Adam Finkelstein 1 1 Princeton University, 2 Adobe Systems 1 Overview

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Chapter 11 Search Algorithms for Discrete Optimization Problems

Chapter 11 Search Algorithms for Discrete Optimization Problems Chapter Search Algorithms for Discrete Optimization Problems (Selected slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003.

More information

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

Lecture 5. Figure 1: The colored edges indicate the member edges of several maximal matchings of the given graph.

Lecture 5. Figure 1: The colored edges indicate the member edges of several maximal matchings of the given graph. 5.859-Z Algorithmic Superpower Randomization September 25, 204 Lecture 5 Lecturer: Bernhard Haeupler Scribe: Neil Shah Overview The next 2 lectures are on the topic of distributed computing there are lots

More information

Efficient Bufferless Packet Switching on Trees and Leveled Networks

Efficient Bufferless Packet Switching on Trees and Leveled Networks Efficient Bufferless Packet Switching on Trees and Leveled Networks Costas Busch Malik Magdon-Ismail Marios Mavronicolas Abstract In bufferless networks the packets cannot be buffered while they are in

More information

Maximum Clique Conformance Measure for Graph Coloring Algorithms

Maximum Clique Conformance Measure for Graph Coloring Algorithms Maximum Clique Conformance Measure for Graph Algorithms Abdulmutaleb Alzubi Jadarah University Dept. of Computer Science Irbid, Jordan alzoubi3@yahoo.com Mohammad Al-Haj Hassan Zarqa University Dept. of

More information

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1 Program Performance Metrics The parallel run time (Tpar) is the time from the moment when computation starts to the moment when the last processor finished his execution The speedup (S) is defined as the

More information

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 9 Database Design

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 9 Database Design Database Systems: Design, Implementation, and Management Tenth Edition Chapter 9 Database Design Objectives In this chapter, you will learn: That successful database design must reflect the information

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Subset Sum - A Dynamic Parallel Solution

Subset Sum - A Dynamic Parallel Solution Subset Sum - A Dynamic Parallel Solution Team Cthulu - Project Report ABSTRACT Tushar Iyer Rochester Institute of Technology Rochester, New York txi9546@rit.edu The subset sum problem is an NP-Complete

More information

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational

More information

Different Optimal Solutions in Shared Path Graphs

Different Optimal Solutions in Shared Path Graphs Different Optimal Solutions in Shared Path Graphs Kira Goldner Oberlin College Oberlin, OH 44074 (610) 324-3931 ksgoldner@gmail.com ABSTRACT We examine an expansion upon the basic shortest path in graphs

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

Hashing for searching

Hashing for searching Hashing for searching Consider searching a database of records on a given key. There are three standard techniques: Searching sequentially start at the first record and look at each record in turn until

More information

DPHPC: Performance Recitation session

DPHPC: Performance Recitation session SALVATORE DI GIROLAMO DPHPC: Performance Recitation session spcl.inf.ethz.ch Administrativia Reminder: Project presentations next Monday 9min/team (7min talk + 2min questions) Presentations

More information

New Challenges In Dynamic Load Balancing

New Challenges In Dynamic Load Balancing New Challenges In Dynamic Load Balancing Karen D. Devine, et al. Presentation by Nam Ma & J. Anthony Toghia What is load balancing? Assignment of work to processors Goal: maximize parallel performance

More information

7.3 Spanning trees Spanning trees [ ] 61

7.3 Spanning trees Spanning trees [ ] 61 7.3. Spanning trees [161211-1348 ] 61 7.3 Spanning trees We know that trees are connected graphs with the minimal number of edges. Hence trees become very useful in applications where our goal is to connect

More information

The Partitioning Problem

The Partitioning Problem The Partitioning Problem 1. Iterative Improvement The partitioning problem is the problem of breaking a circuit into two subcircuits. Like many problems in VLSI design automation, we will solve this problem

More information

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4) S-72.2420/T-79.5203 Basic Concepts 1 S-72.2420/T-79.5203 Basic Concepts 3 Characterizing Graphs (1) Characterizing Graphs (3) Characterizing a class G by a condition P means proving the equivalence G G

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

Parallelizing a Monte Carlo simulation of the Ising model in 3D

Parallelizing a Monte Carlo simulation of the Ising model in 3D Parallelizing a Monte Carlo simulation of the Ising model in 3D Morten Diesen, Erik Waltersson 2nd November 24 Contents 1 Introduction 2 2 Description of the Physical Model 2 3 Programs 3 3.1 Outline of

More information

Lecture 4: Graph Algorithms

Lecture 4: Graph Algorithms Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e

More information

Detection and Analysis of Iterative Behavior in Parallel Applications

Detection and Analysis of Iterative Behavior in Parallel Applications Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Quick Sort. CSE Data Structures May 15, 2002

Quick Sort. CSE Data Structures May 15, 2002 Quick Sort CSE 373 - Data Structures May 15, 2002 Readings and References Reading Section 7.7, Data Structures and Algorithm Analysis in C, Weiss Other References C LR 15-May-02 CSE 373 - Data Structures

More information

Implementation of Parallel Path Finding in a Shared Memory Architecture

Implementation of Parallel Path Finding in a Shared Memory Architecture Implementation of Parallel Path Finding in a Shared Memory Architecture David Cohen and Matthew Dallas Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 Email: {cohend4, dallam}

More information

Massively Parallel Approximation Algorithms for the Knapsack Problem

Massively Parallel Approximation Algorithms for the Knapsack Problem Massively Parallel Approximation Algorithms for the Knapsack Problem Zhenkuang He Rochester Institute of Technology Department of Computer Science zxh3909@g.rit.edu Committee: Chair: Prof. Alan Kaminsky

More information

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis Improvements in Dynamic Partitioning Aman Arora Snehal Chitnavis Introduction Partitioning - Decomposition & Assignment Break up computation into maximum number of small concurrent computations that can

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

High Performance Computing. Introduction to Parallel Computing

High Performance Computing. Introduction to Parallel Computing High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials

More information

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks. Physical Disk Structure Physical Data Organization and Indexing Chapter 11 1 4 Access Path Refers to the algorithm + data structure (e.g., an index) used for retrieving and storing data in a table The

More information

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing Solving Traveling Salesman Problem Using Parallel Genetic Algorithm and Simulated Annealing Fan Yang May 18, 2010 Abstract The traveling salesman problem (TSP) is to find a tour of a given number of cities

More information

Operating Systems Unit 6. Memory Management

Operating Systems Unit 6. Memory Management Unit 6 Memory Management Structure 6.1 Introduction Objectives 6.2 Logical versus Physical Address Space 6.3 Swapping 6.4 Contiguous Allocation Single partition Allocation Multiple Partition Allocation

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Localized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1

Localized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1 Localized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1 HAI THANH MAI AND MYOUNG HO KIM Department of Computer Science Korea Advanced Institute of Science

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050 Spring 202 Part I Introduction Overview Graphsdenitions, properties, representation Minimal spanning tree Prim's algorithm Shortest

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Graph traversal and BFS

Graph traversal and BFS Graph traversal and BFS Fundamental building block Graph traversal is part of many important tasks Connected components Tree/Cycle detection Articulation vertex finding Real-world applications Peer-to-peer

More information

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 1 Rensselaer Polytechnic Institute, 2 Sandia National

More information