Parallel Computing in Paderborn: The SFB 376 Massive Parallelism Algorithms, Design Methods, Applications?

Size: px
Start display at page:

Download "Parallel Computing in Paderborn: The SFB 376 Massive Parallelism Algorithms, Design Methods, Applications?"

Transcription

1 Parallel Computing in Paderborn: The SFB 376 Massive Parallelism Algorithms, Design Methods, Applications? Friedhelm Meyer auf der Heide, Thomas Decker Department of Mathematics and Computer Science and Heinz Nixdorf Institute University of Paderborn, Germany ffmadh, deckerg.html 1 Introduction A major research area in the University of Paderborn is Parallel Computing. Next to computer scientists, also researchers in mathematics, electrical and machine engineering, and manufacturing technology employ the computation power of parallel and distributed systems. Further, many institutions of our university focus on research related to this topic: the Paderborn Center for Parallel Computing (PC 2 ) offers support for efficient, comfortable use of parallel machines not only to users of the Paderborn or other universities, but also to users in international industries. Parallel computing in the Heinz Nixdorf Institute and its DFG-Graduate College ranges from basic research to applications in manufacturing technology. The C-LAB, a joint venture with Siemens Nixdorf Informationssysteme AG, contributes design methodology for complex distributed real time systems. All these activities are supported within numerous projects, by, e.g., DFG, BMBF, EU, and industry. The SFB 376 Massive Parallelism has become the central research organization coordinating the activities in parallel computing in Paderborn and conducting the basic research in this area. It aims to develop methods to fully exploit the computation power of large parallel systems, and to make such methods easily usable for applications in science,engineering, and manufacturing technology. The project integrates three major parts: A: Algorithms, B: Design methods, and C: Applications. All these parts are strongly related to each other. On the one hand, the new algorithmic techniques and design methods provide an important basis for the application-oriented parts of the project. On the other hand, the demands of the applications motivate many algorithmic and methodic problems. Further, the applications play an important role in the evaluation of new algorithms and design methods. This project structure demands cooperation and interdisciplinary research between experts in the different application areas, methodic oriented researchers, and algorithmic researchers. We will now describe the different parts in more detail.? This work is supported by the DFG Sonderforschungsbereich 376 : Massive Parallelität - Algorithmen, Entwurfsmethoden, Anwendungen and by the EU ESPRIT Long Term Research Project (ALCOM-IT). More information about the SFB can be found on our web-pages under

2 A: Algorithms. Designing algorithms that fully exploit the computation power of large parallel systems is much more complicated than in the sequential case: The design of parallel algorithms for a large number of processors often requires new algorithmic approaches (examples are: combinatorial optimization, fluid dynamics, etc.). A parallelization of sequential methods often does not lead to satisfactory results. Even algorithms which exploit natural parallelism of the underlying problems are difficult to implement on massively parallel systems because of their highly dynamic behavior concerning process generation and communication (e.g. adaptive finite element methods, event driven simulations). Using networks of workstations as one single parallel machine imposes further problems. Due to the heterogenity of the computation- and communication-hardware, different control- and communication-mechanisms coexist within one application. Sometimes it is even essential to use different algorithms on different architectures. Within this part of the SFB, we design and analyze protocols for basic services like load balancing, routing, and data management in processor networks. Furthermore, we develop algorithms for realizing data structures, as well as graph-, geometry-, and computational algebra algorithms. All this is made avaliable to users in and beyond the SFB as easy-to-use libraries. B: Design methods. In this part of the project we investigate techniques and tools which support the design, realization, and the comfortable and efficient use of massively parallel systems. The utilization is assisted from the side of the hardware as well as of the software. The leading idea is that we can increase the effectivity of efficient algorithmic approaches by supporting the design of very complex, naturally parallel, reactive technical systems with real-time constraints. by a tool-system automating the systematically solvable tasks occuring in the design process of parallel applications. Thereby, developers of parallel applications who are not familiar with the design of efficient parallel algorithms get access to reusable parallelization know-how. Within this part of the SFB, we develop design methods for massively parallel real-time systems as well as tools for the development and implementation of parallel applications. These systems use (and sometimes build on) results from part A, and are strongly connected to application areas within part C. C: Applications. In this part, we work on applications of massively parallel systems with high economic and scientific relevance. Due to of their complexity, boundary conditions like timing constraints, and their dynamic behavior, these applications put large challenges into the design- and algorithmic methods. The following criteria are common to all applications investigated in this part:

3 Every application field is highly relevant from both the economic and scientific point of view. The applications are broadly scattered across different disciplines outside the area of classic scientific problems. The problems lead to computational demands which are beyond the capabilities of standard computer systems. Every application is highly dynamic with respect to load generation, communication, and to their data access patterns. Therefore the applications represent an important tool for measuring the quality of the algorithms and design methods developed in parts A and B. In particular, we push the development of parallel self-organizing mechatronic systems and applications in the area of artificial intelligence. In addition, we work on production planing as well as on parallel extensions of the computer algebra system MuPAD. In the following we concentrate on describing our algorithmic research on developing, analyzing, and implementing programming platforms designed for large parallel machines, (including massively parallel architectures and SCI-Clusters). In particular, we present techniques and libraries for load balancing and virtual global variables together with the applications by the projects integrated in our SFB. This work is mainly done within the project A2 (Meyer auf der Heide, Monien) Universal basic services. 2 Universal basic services In this part of the SFB, two important basic services are studied: load balancing and data management in large parallel networks. As the developed methods should be appropriate for a large spectrum of applications, they have to be able to adapt to the different application demands as well as to the capabilities of the underlying hardware. This universality can only be achieved by algorithms which take into consideration specific characteristics of the application and of the architectures. For example, a universal load balancing service should offer different methods and adapt them to the specific demands of the application. For data management systems and for load balancing algorithms, different characteristics of the application and of the architecture are relevant for selecting appropriate strategies. In the next sections we take a closer look to these services. The data- and task-management system Daisy makes these services available by integrating tools for the simulation of shared memory (DIVA) and for load balancing (VDS) in a single comprehensive library. A beta release of Daisy is available for Parsytec' s PARIX and PowerMPI programming models. With the improved thread support of MPI-2 and PVM 3.4, Daisy will also become available on workstation clusters. 2.1 Load balancing The problem. Massively parallel computers have been shown to be very efficient at solving problems that can be partitioned into tasks with static computation and commu-

4 nication patterns. However, there exists a large class of problems that have unpredictable computational requirements and/or irregular communication patterns. To efficiently solve this kind of problems with parallel computers, it is neccessary to perform load balancing operations during runtime [14]. In contrast to static load balancing problems, where a priori knowledge about the dynamic behavior of the application is available [12], dynamic algorithms have to place the load-items on-line. Consequently, the application is not only influenced by the obtained load balancing quality but also by the overhead imposed by the balancing activities. Therefore we have to optimize the tradeoff between load balancing cost on the one hand and effort on the other hand. To do this, we are considering the properties of the architecture (communication bandwidth, message-offset costs, latency) and of the application (e.g. granularity). A very important parameter of the application is the demanded load balancing quality. Depending on the application, it may be neccessary to demand a perfect load balance of the processors or only a minimization of the idle-times. Dynamic strategies. We distinguish between scenarios where migration is possible and where objects can only be placed once (dynamic mapping). Dynamic mapping algorithms are often used for process-placement because in many systems the migration of processes is very costly or even impossible. In [13] we presented a universal dynamic mapping algorithm which is able to adapt the mapping overhead to the granularity of the application and to the communication cost imposed by the architecture. A parallel version of a similar mapping process was introduced in [6]. Particularly for the applications considered in the SFB-project, we have scenarios where migration of load items is possible which allows applying completely different balancing algorithms. In these cases, load-items can often be described by data-packets which can be migrated simply by sending them from one processor to another. Here, the selection of the load balancing strategy depends mainly on the demanded balancing quality. If the total set of load items processed during a distributed computation does not depend on the schedule determined by the balancing layer, i.e. the order and location where the items are processed, the maximum speedup can be reached if the idle-times of the nodes are minimized. For example, this is the case in tree-structured computations like divide-and-conquer applications which decompose the problem to be solved into parts which directly depend on the problem-instance itself. For this kind of load balancing (load sharing), randomized workstealing leads to very good results in theory [4] as well as in practice [15]. However, the load-items generated by a distributed computation may also depend significantly on the order they are processed. We find this phenomenon in many search algorithms used in artificial intelligence and operation research. In best-first branchand-bound, for example, the processing order is defined by the quality of the objects (partial solutions). When applications of this kind are parallelized, it is not only important to ensure that all processors are busy, but also some form of qualitative load balancing is neccessary to make sure that all processors are working on good partial solutions and thus to prevent the processors from doing ineffective work (work not processed by a sequential best-first algorithm).

5 In load sharing algorithms, processors can only have two states: idle or not idle. Qualitative algorithms directly take the load-states of the processors into consideration. Based on comparisons of these states, load is migrated from source processors with high load to sink processors with low load. The various algorithms for this setting differ in the point of time they get active, in the strategies used to select the processors which exchange information about their load-state, and in the amount of load which is migrated. In [25] we analytically compared to two well known qualitative local balancing techniques: the dimension exchange (DE)- and the diffusion (DF)-method. The DE method balances the load of each neighbor iteratively, whereas the DF-method balances all neighbors in one step. It was shown that depending on the capabilities of the architecture both techniques have advantages. Assume that (t) w represents the load of node i i at time t, and w (t) represents the average load P at time t. It was proved that the expected value of the system imbalance factor (t) N = i=1 (w(t)? w (t) ) 2 (time t, N nodes) is i smaller for DE if it is possible to communicate to more than one node simultaneously (multi-port model) and larger than for DF otherwise (in the single-port model). Here it was assumed that the load is generated by identically distributed random variables. Consequently, the first method is preferable if the communication hardware is able to support multi-port communication efficiently and the latter one should be used if only one-port communication is possible. Further, it was shown that for synchronous scenarios, where no load-generation takes place during the balancing phase, the expected value of the system imbalance factor of the DE method is always smaller than the one of the DF method independently of the communication model. In addition to the experimental evaluation conducted in [25], we evaluated the practical relevance of the results using a branch and bound algorithm for the set partitioning problem. Both methods clearly outperformed simple approaches which only select one neighbor for balancing [17, 26]. Implementation and application. The virtual data space tool VDS simulates a global data space for objects stored in distributed heaps, stacks, and other abstract data types [11]. The work packets are spread over the distributed processors as evenly as possible with respect to the incurred balancing overhead. Among other algorithms, VDS integrates the methods described above for qualitative load balancing as well as for load sharing. Within the SFB, VDS is applied for the parallelization of an application out of the area of artificial intelligence. Further, we are using VDS inside another A -project dealing with problems of computational algebra like real root isolation. 2.2 Data Management The problem. The provision of shared memory in systems with distributed memory supports comfortable and efficient programming essentially. For example, it is possible to store variables like it is done in sequential programs and at the same time to make them accessable from other processors. Other data-objects are for example pages or

6 cache lines in a virtual shared memory system, shared files in a distributed file system, or media information (video, audio, text, graphic) on a media server. The efficiency of such systems significantly depends on the bandwidth of the architecture. We have to distinguish between systems with high bandwidth and with low bandwidth. In the former case, the dominating problem is the contention of the memory modules (i.e. the number of requests at each module) [5, 7, 8, 9, 16] and in the latter case we have to consider the network congestion. Here we have to preserve the data locality in order to reduce the communication load in the network. A survey of approaches for both scenarios is given in [22]. Strategies for systems with limited bandwidth. Most work concerning data management in parallel and distributed systems investigates either hashing or caching based strategies. Hashing distributes the shared objects uniformly at random among the memory modules, which yields an even distribution of the data and therefore achieves a good load balance. However, uniform hashing gives up any locality in the pattern of read and write accesses. Caching exploits locality by placing or moving copies of the objects at or close to the accessing processors. The basic idea is that this minimizes the distances and therefore decreases the total communication load. The main problem is that minimizing distances can produce bottlenecks in the system, e.g., if many objects are placed on a central processor in the network. For the simulation of shared memory on MPPs or NOWs, the routing mechanism is the bottleneck of the system, i.e., we have to focus on data management in parallel processor systems in which the processors are connected by a relatively sparse network. Each processor is assumed to have its own local memory module such that shared objects have to be distributed among these modules. This scenario is typical for most of today' s parallel computers, including Parsytec GCel and GCpp, Intel Paragon, Fujitsu AP1000, and Cray T3D and T3E. The processors in all these systems are connected by mesh- or torus-networks. Clearly, the larger the number of processors in these systems, the more the communication bandwidth becomes the bottleneck, because the bisection width of these networks increases less than the number of processors. For this scenario, hashing yields an even distribution of the data among the processors and also an even distribution of the communication or routing load among the links in the network. Several hashing based strategies are analyzed in the context of PRAM simulation. For instance, Ranade [24] describes a hashing based PRAM emulation for the direct butterfly network. He shows that an N processor PRAM can be emulated by an N processor butterfly network with slowdown O(log N ). This scheme can also be adapted to other networks, which, e.g., yields an N processor PRAM simulation for p the N p p N mesh with slowdown O( p N ). This slowdown is optimal for general PRAM simulations because of the N bisection width of the mesh. Nevertheless, it is completely unsatisfying for applications including locality. This shows that the main drawback of uniform hashing is that it gives up any locality in the pattern of read and write accesses. In order to exploit locality, we have to minimize the communication load. This can be done by minimizing the distances from the accessing processors to the accessed objects. This problem is widely studied in the context of file allocation and distributed paging, see, e.g., [1, 19, 2]. A survey on these topics is given in [3]. Clearly, minimi-

7 zing the distances minimizes the total communication load. Unfortunately, it also can increase the congestion, i.e. the maximum number of data packets which have to cross the same link. The congestion describes the worst bottleneck of the system and therefore gives a lower bound on the execution time of a given application. Moreover, several results on store-and-forward- and wormhole-routing (see e.g. [18, 21, 10, 23]) indicate that this value is also a good approximation for an upper bound on the execution time of coarse grained applications with high communication load and low synchronization requirement. This shows the importance of considering the congestion rather than the total communication load. In [20] we presented static and dynamic placement strategies for acyclic networks as well as for multidimensional meshes. Furthermore we developed static strategies for indirect networks like Clos-Networks or Fat-Trees. All these strategies aim to minimize the congestion. The static strategy maps the objects to the modules according to some knowledge of the access pattern of a given application. The dynamic strategy makes all placement decisions on-line, i.e., it has no knowledge about the access patterns beforehand. It is a combined hashing and caching strategy. Both strategies can work either with or without redundancy. We compare the achieved congestion with the congestion of an optimal strategy and show that it is close to optimal. Implementation and application. The distributed variables library DIVA provides functions for simulating shared memory on distributed systems. The idea is to provide an access mechanism to distributed variables rather than to memory pages or single memory cells. The variables can be created and released at runtime. Once a global variable is created, each participating processor in the system has access to it. For latency hiding, reads and writes can be performed in two separate function calls. The first call initiates the variable access and the second call waits for its completion. The time between initiation and completion of a variable access can be hidden by other local instructions or variable accesses. Currently, we are working on making the DIVA-library usable for a parallelization of the computer algebra system MuPAD in cooperation with one of the application projects. For this, several protocols for managing global variables, including those mentioned above, are implemented and incorporated in DIVA. References 1. B. Awerbuch, Y. Bartal, and A. Fiat: Competitive distributed file allocation, Proc. of the 25th ACM Symp. on Theory of Computing (STOC), pages , B. Awerbuch, Y. Bartal, and A. Fiat: Distributed paging for general networks, Proc. of the 7th ACM Symp. on Discrete Algorithms (SODA), pages , Y. Bartal: Survey on distributed paging, Proc. of the Dagstuhl Workshop on On-line Algorithms, R. D. Blumhofe, C. E. Leiserson: Scheduling Multithreaded Computations by Work Stealing, Proc. 36th Symp. on Foundations of Computer Science (FOCS ' 95), pp , P. Berenbrink, F. Meyer auf der Heide, V. Stemann: Fault Tolerant Shared Memory Simulations. Proc. 13th Symp. on Theoretical Aspects of Computer Science, pp , P. Berenbrink,F. Meyer auf der Heide, K. Schröder: Allocating Weighted Jobs in Parallel, Proc. of 9th ACM Symp. on Parallel Algorithms and Architectures (SPAA' 97), to appear.

8 7. A. Czumaj, F. Meyer auf der Heide, V. Stemann: Shared memory simulations with triplelogarithmic delay, Proc. of 3rd European bbbp m Symposium on Algorithms (ESA), pp , A. Czumaj, F. Meyer auf der Heide, V. Stemann: Simulating Shared Memory in Real Time: On the Computation Power of Reconfigurable Architectures, Technical Report SFB tr-rsfb , Paderborn University, Jan A. Czumaj, F. Meyer auf der Heide, V. Stemann: Contention Resolution in Hashing Based Shared Memory Simulations, Technical Report SFB tr-rsfb , Paderborn University, Dec. 1996, and: Information and Computation, to appear. 10. R. Cypher, F. Meyer auf der Heide, C. Scheideler, and B. Vöcking: Universal algorithms for store-and-forward and wormhole routing, Proc. of the 26th ACM Symp. on Theory of Computing (STOC), pages , T. Decker: Virtual Data Space - A Universal Load Balancing Scheme, Proc. 4th Int. Symp. on Solving Irregularly Structured Problems in Parallel, IRREGULAR' 97, 1997, to appear. 12. T. Decker, R. Diekmann: Mapping of Coarse-Grained Applications onto Workstation- Clusters, Proc. 5th Euromicro Workshop on Parallel and Distr. Processing, pp. 5-12, T. Decker, R. Diekmann, R. Lüling, B. Monien: Towards Developing Universal Dynamic Mapping Algorithms, 7th IEEE Symp. on Parallel and Distr. Processing, 1995, pp R. Diekmann, B. Monien, R. Preis: Load Balancing Strategies for Distributed Memory Machines, F. Karsch, H. Satz (ed.): Multi-Scale Phenomena and their Simulation, World Scientific, 1997 (to appear). 15. R. Feldmann, P. Mysliwietz, B. Monien: A fully distributed chess program, Advances in Computer Chess VI, Ellis Horwood Publishers, pp. 1-27, R. Karp, M. Luby, F. Meyer auf der Heide: Efficient PRAM simulation on a distributed memory machine, Algorithmica, (16), pp , R. Lüling: Lastverteilung zur effizienten Nutzung paralleler Systeme, Ph.D. Theses, Shaker- Verlag, 1996, to appear. 18. F. T. Leighton, B. M. Maggs, A. G. Ranade, and S. B. Rao: Randomized routing and sorting on fixed-connection networks, Journal of Algorithms, (17), pp , C. Lund, N. Reingold, J. Westbrook, and D. Yan: On-line distributed data management, Proc. of the 2nd European Symposium on Algorithms (ESA), B. Maggs, F. Meyer auf der Heide, B. Vöcking, and M. Westermann: Exploiting locality for networks of limited bandwidth, Techn. Report tr-rsfb , University of Paderborn, F. Meyer auf der Heide, B. Vöcking: A packet routing protocol for arbitrary networks, Proc. 12th Symp. on Theoretical Aspects of Computer Science (STACS), pages , F. Meyer auf der Heide, B. Vöcking: Static and dynamic data management in networks, Proc. of Euro-Par' 97, to appear. 23. R. Ostrovsky and Y. Rabani: Universal O(congestion + dilation + log 1+ n) local control packet switching algorithms, Proc. of the 29th ACM Symp. on Theory of Computing (STOC), to appear, A. G. Ranade: How to emulate shared memory, Proc. of the 28th IEEE Symp. on Foundations of Computer Science (FOCS), pages , C.-Z. Xu, B. Monien, R. Lüling, F. C. M. Lau: An Analytical Comparison of Nearest Neighbour Algorithms for Load Balancing in Parallel Computers Proc. of International Parallel Processing Symposium (IPPS' 95), pp , C.-Z. Xu, S. Tschöke, B. Monien: Performance Evaluation of Load Distribution Strategies in Parallel Branch and Bound Computations Proc. 7th Symposium on Parallel and Distributed Processing (SPDP' 95), pp , 1995.

Welcome to the course Algorithm Design

Welcome to the course Algorithm Design Welcome to the course Algorithm Design Summer Term 2011 Friedhelm Meyer auf der Heide Lecture 13, 15.7.2011 Friedhelm Meyer auf der Heide 1 Topics - Divide & conquer - Dynamic programming - Greedy Algorithms

More information

Welcome to the course Algorithm Design

Welcome to the course Algorithm Design Welcome to the course Algorithm Design Summer Term 2011 Friedhelm Meyer auf der Heide Lecture 12, 8.7.2011 Friedhelm Meyer auf der Heide 1 Randomised Algorithms Friedhelm Meyer auf der Heide 2 Topics -

More information

7 Distributed Data Management II Caching

7 Distributed Data Management II Caching 7 Distributed Data Management II Caching In this section we will study the approach of using caching for the management of data in distributed systems. Caching always tries to keep data at the place where

More information

9 Distributed Data Management II Caching

9 Distributed Data Management II Caching 9 Distributed Data Management II Caching In this section we will study the approach of using caching for the management of data in distributed systems. Caching always tries to keep data at the place where

More information

Data Management in Networks: Experimental Evaluation of a Provably Good Strategy

Data Management in Networks: Experimental Evaluation of a Provably Good Strategy Data Management in Networks: Experimental Evaluation of a Provably Good Strategy Christof Krick Friedhelm Meyer auf der Heide Harald Räcke Berthold Vöcking Matthias Westermann Abstract This paper deals

More information

Õ(Congestion + Dilation) Hot-Potato Routing on Leveled Networks. Costas Busch. Rensselaer Polytechnic Institute

Õ(Congestion + Dilation) Hot-Potato Routing on Leveled Networks. Costas Busch. Rensselaer Polytechnic Institute Õ(Congestion + Dilation) Hot-Potato Routing on Leveled Networks Costas Busch Rensselaer Polytechnic Institute 1 Talk Outline Leveled Networks Congestion + Dilation Hot-Potato Routing Our Algorithm Future

More information

from the (distributed) DYNAMO-system, process them and create new objects which are distributed by the system. The ALDY-library [S95] extends this pro

from the (distributed) DYNAMO-system, process them and create new objects which are distributed by the system. The ALDY-library [S95] extends this pro Virtual Data Space - A Universal Load Balancing Scheme Thomas Decker Department of Mathematics and Computer Science University of Paderborn, Germany e-mail: decker@uni-paderborn.de http://www.uni-paderborn.de/cs/decker.html

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Routing with Bounded Buers and. in Vertex-Symmetric Networks? Friedhelm Meyer auf der Heide and Christian Scheideler. University of Paderborn

Routing with Bounded Buers and. in Vertex-Symmetric Networks? Friedhelm Meyer auf der Heide and Christian Scheideler. University of Paderborn Routing with Bounded Buers and Hot-Potato Routing in Vertex-Symmetric Networks? Friedhelm Meyer auf der Heide and Christian Scheideler Department of Mathematics and Computer Science and Heinz Nixdorf Institute

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Page migration in dynamic networks

Page migration in dynamic networks Page migration in dynamic networks Friedhelm Meyer auf der Heide Data management in networks Friedhelm Meyer auf der Heide How to store data items in a network, so that arbitrary sequences of accesses

More information

Adaptive-Mesh-Refinement Pattern

Adaptive-Mesh-Refinement Pattern Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

Performance of MP3D on the SB-PRAM prototype

Performance of MP3D on the SB-PRAM prototype Performance of MP3D on the SB-PRAM prototype Roman Dementiev, Michael Klein and Wolfgang J. Paul rd,ogrim,wjp @cs.uni-sb.de Saarland University Computer Science Department D-66123 Saarbrücken, Germany

More information

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction

COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Extended Dataflow Model For Automated Parallel Execution Of Algorithms

Extended Dataflow Model For Automated Parallel Execution Of Algorithms Extended Dataflow Model For Automated Parallel Execution Of Algorithms Maik Schumann, Jörg Bargenda, Edgar Reetz and Gerhard Linß Department of Quality Assurance and Industrial Image Processing Ilmenau

More information

A Comparison of the Iserver-Occam, Parix, Express, and PVM Programming Environments on a Parsytec GCel

A Comparison of the Iserver-Occam, Parix, Express, and PVM Programming Environments on a Parsytec GCel A Comparison of the Iserver-Occam, Parix, Express, and PVM Programming Environments on a Parsytec GCel P.M.A. Sloot, A.G. Hoekstra, and L.O. Hertzberger Parallel Scientific Computing & Simulation Group,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing

Embedding Large Complete Binary Trees in Hypercubes with Load Balancing JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 35, 104 109 (1996) ARTICLE NO. 0073 Embedding Large Complete Binary Trees in Hypercubes with Load Balancing KEMAL EFE Center for Advanced Computer Studies,

More information

Comparing the Parix and PVM parallel programming environments

Comparing the Parix and PVM parallel programming environments Comparing the Parix and PVM parallel programming environments A.G. Hoekstra, P.M.A. Sloot, and L.O. Hertzberger Parallel Scientific Computing & Simulation Group, Computer Systems Department, Faculty of

More information

The Potential of Diffusive Load Balancing at Large Scale

The Potential of Diffusive Load Balancing at Large Scale Center for Information Services and High Performance Computing The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September 2016 Matthias Lieber, Kerstin Gößner, Wolfgang

More information

h h

h h Experiments in Routing h-relations Using Constant Thinning And Geometric Thinning Algorithms Anssi Kautonen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 Joensuu, Finland email

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

Optimal Broadcast on Parallel Locality Models

Optimal Broadcast on Parallel Locality Models Optimal Broadcast on Parallel Locality Models BEN JUURLINK 1, Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.

More information

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz.

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz. Blocking vs. Non-blocking Communication under MPI on a Master-Worker Problem Andre Fachat, Karl Heinz Homann Institut fur Physik TU Chemnitz D-09107 Chemnitz Germany e-mail: fachat@physik.tu-chemnitz.de

More information

Dynamic Balancing Complex Workload in Workstation Networks - Challenge, Concepts and Experience

Dynamic Balancing Complex Workload in Workstation Networks - Challenge, Concepts and Experience Dynamic Balancing Complex Workload in Workstation Networks - Challenge, Concepts and Experience Abstract Wolfgang Becker Institute of Parallel and Distributed High-Performance Systems (IPVR) University

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,

More information

Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations

Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Stefan Bischof, Ralf Ebner, and Thomas Erlebach Institut für Informatik Technische Universität München D-80290

More information

Scalable GPU Graph Traversal!

Scalable GPU Graph Traversal! Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang

More information

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

LOAD BALANCING IN PARALLEL COMPUTERS Theory and Practice

LOAD BALANCING IN PARALLEL COMPUTERS Theory and Practice LOAD BALANCING IN PARALLEL COMPUTERS Theory and Practice THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE LOAD BALANCING IN PARALLEL COMPUTERS Theory and Practice Chengzhong Xu Wayne

More information

Load Balancing in the Macro Pipeline Multiprocessor System using Processing Elements Stealing Technique. Olakanmi O. Oladayo

Load Balancing in the Macro Pipeline Multiprocessor System using Processing Elements Stealing Technique. Olakanmi O. Oladayo Load Balancing in the Macro Pipeline Multiprocessor System using Processing Elements Stealing Technique Olakanmi O. Oladayo Electrical & Electronic Engineering University of Ibadan, Ibadan Nigeria. Olarad4u@yahoo.com

More information

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel CS 267 Applications of Parallel Computers Lecture 23: Load Balancing and Scheduling James Demmel http://www.cs.berkeley.edu/~demmel/cs267_spr99 CS267 L23 Load Balancing and Scheduling.1 Demmel Sp 1999

More information

Minimizing Congestion in General Networks

Minimizing Congestion in General Networks Minimizing Congestion in General Networks Harald Räcke Heinz Nixdorf Institute and Department of Mathematics and Computer Science Paderborn University, Germany harry@upb.de Abstract A principle task in

More information

Design of Parallel Algorithms. Course Introduction

Design of Parallel Algorithms. Course Introduction + Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis & Design! Course Web Site: http://www.cse.msstate.edu/~luke/courses/fl17/cse4163! Instructor: Ed Luke! Office:

More information

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers

More information

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and

A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and Parallel Processing Letters c World Scientific Publishing Company A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS DANNY KRIZANC Department of Computer Science, University of Rochester

More information

Independent Sets in Hypergraphs with. Applications to Routing Via Fixed Paths. y.

Independent Sets in Hypergraphs with. Applications to Routing Via Fixed Paths. y. Independent Sets in Hypergraphs with Applications to Routing Via Fixed Paths Noga Alon 1, Uri Arad 2, and Yossi Azar 3 1 Department of Mathematics and Computer Science, Tel-Aviv University noga@mathtauacil

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck. To be published in: Notes on Numerical Fluid Mechanics, Vieweg 1994 Flow simulation with FEM on massively parallel systems Frank Lohmeyer, Oliver Vornberger Department of Mathematics and Computer Science

More information

Mapping Media Streams onto a Network of Servers

Mapping Media Streams onto a Network of Servers Mapping Media Streams onto a Network of Servers Reinhard Lüling Department of Mathematics and Computer Science University of Paderborn, Germany rl@uni-paderborn.de Abstract This paper presents the definition

More information

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel.

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel. Research Statement Yehuda Lindell Dept. of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il www.cs.biu.ac.il/ lindell July 11, 2005 The main focus of my research is the theoretical foundations

More information

Load Balancing Part 1: Dynamic Load Balancing

Load Balancing Part 1: Dynamic Load Balancing Load Balancing Part 1: Dynamic Load Balancing Kathy Yelick yelick@cs.berkeley.edu www.cs.berkeley.edu/~yelick/cs194f07 10/1/2007 CS194 Lecture 1 Implementing Data Parallelism Why didn t data parallel languages

More information

Multi-core Computing Lecture 2

Multi-core Computing Lecture 2 Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 21, 2012 Multi-core Computing Lectures: Progress-to-date

More information

IOS: A Middleware for Decentralized Distributed Computing

IOS: A Middleware for Decentralized Distributed Computing IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc

More information

Continuum Computer Architecture

Continuum Computer Architecture Plenary Presentation to the Workshop on Frontiers of Extreme Computing: Continuum Computer Architecture Thomas Sterling California Institute of Technology and Louisiana State University October 25, 2005

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Massively Parallel Computation for Three-Dimensional Monte Carlo Semiconductor Device Simulation

Massively Parallel Computation for Three-Dimensional Monte Carlo Semiconductor Device Simulation L SIMULATION OF SEMICONDUCTOR DEVICES AND PROCESSES Vol. 4 Edited by W. Fichtner, D. Aemmer - Zurich (Switzerland) September 12-14,1991 - Hartung-Gorre Massively Parallel Computation for Three-Dimensional

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

Parallel Programming Interfaces

Parallel Programming Interfaces Parallel Programming Interfaces Background Different hardware architectures have led to fundamentally different ways parallel computers are programmed today. There are two basic architectures that general

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

The hierarchical model for load balancing on two machines

The hierarchical model for load balancing on two machines The hierarchical model for load balancing on two machines Orion Chassid Leah Epstein Abstract Following previous work, we consider the hierarchical load balancing model on two machines of possibly different

More information

Emulation of a PRAM on Leveled Networks

Emulation of a PRAM on Leveled Networks Emulation of a PRAM on Leveled Networks Michael Palis 1, Sanguthevar Rajasekaran 1,DavidS.L.Wei 2 ABSTRACT There is an interesting class of ICNs, which includes the star graph and the n-way shuffle, for

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University

More information

Solving Traveling Salesman Problem on High Performance Computing using Message Passing Interface

Solving Traveling Salesman Problem on High Performance Computing using Message Passing Interface Solving Traveling Salesman Problem on High Performance Computing using Message Passing Interface IZZATDIN A. AZIZ, NAZLEENI HARON, MAZLINA MEHAT, LOW TAN JUNG, AISYAH NABILAH Computer and Information Sciences

More information

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter M. Bednara, O. Beyer, J. Teich, R. Wanka Paderborn University D-33095 Paderborn, Germany bednara,beyer,teich @date.upb.de,

More information

DOWNLOAD PDF SYNTHESIZING LINEAR-ARRAY ALGORITHMS FROM NESTED FOR LOOP ALGORITHMS.

DOWNLOAD PDF SYNTHESIZING LINEAR-ARRAY ALGORITHMS FROM NESTED FOR LOOP ALGORITHMS. Chapter 1 : Zvi Kedem â Research Output â NYU Scholars Excerpt from Synthesizing Linear-Array Algorithms From Nested for Loop Algorithms We will study linear systolic arrays in this paper, as linear arrays

More information

Counting the Number of Eulerian Orientations

Counting the Number of Eulerian Orientations Counting the Number of Eulerian Orientations Zhenghui Wang March 16, 011 1 Introduction Consider an undirected Eulerian graph, a graph in which each vertex has even degree. An Eulerian orientation of the

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Part IV. Chapter 15 - Introduction to MIMD Architectures

Part IV. Chapter 15 - Introduction to MIMD Architectures D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple

More information

Heuristic Graph Bisection with Less Restrictive Balance Constraints

Heuristic Graph Bisection with Less Restrictive Balance Constraints Heuristic Graph Bisection with Less Restrictive Balance Constraints Stefan Schamberger Fakultät für Elektrotechnik, Informatik und Mathematik Universität Paderborn Fürstenallee 11, D-33102 Paderborn schaum@uni-paderborn.de

More information

An Industrial Employee Development Application Protocol Using Wireless Sensor Networks

An Industrial Employee Development Application Protocol Using Wireless Sensor Networks RESEARCH ARTICLE An Industrial Employee Development Application Protocol Using Wireless Sensor Networks 1 N.Roja Ramani, 2 A.Stenila 1,2 Asst.professor, Dept.of.Computer Application, Annai Vailankanni

More information

POLYMORPHIC ON-CHIP NETWORKS

POLYMORPHIC ON-CHIP NETWORKS POLYMORPHIC ON-CHIP NETWORKS Martha Mercaldi Kim, John D. Davis*, Mark Oskin, Todd Austin** University of Washington *Microsoft Research, Silicon Valley ** University of Michigan On-Chip Network Selection

More information

Tools and Primitives for High Performance Graph Computation

Tools and Primitives for High Performance Graph Computation Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,

More information

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Chapter General Characteristics Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include

More information

Improving Http-Server Performance by Adapted Multithreading

Improving Http-Server Performance by Adapted Multithreading Improving Http-Server Performance by Adapted Multithreading Jörg Keller LG Technische Informatik II FernUniversität Hagen 58084 Hagen, Germany email: joerg.keller@fernuni-hagen.de Olaf Monien Thilo Lardon

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

Programming as Successive Refinement. Partitioning for Performance

Programming as Successive Refinement. Partitioning for Performance Programming as Successive Refinement Not all issues dealt with up front Partitioning often independent of architecture, and done first View machine as a collection of communicating processors balancing

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Constant Queue Routing on a Mesh 1

Constant Queue Routing on a Mesh 1 Constant Queue Routing on a Mesh 1 Sanguthevar Rajasekaran Richard Overholt Dept. of Computer and Information Science Univ. of Pennsylvania, Philadelphia, PA 19104 1 A preliminary version of this paper

More information

BRICS Research Activities Algorithms

BRICS Research Activities Algorithms BRICS Research Activities Algorithms Gerth Stølting Brodal BRICS Retreat, Sandbjerg, 21 23 October 2002 1 Outline of Talk The Algorithms Group Courses Algorithm Events Expertise within BRICS Examples Algorithms

More information

Planar Graphs with Many Perfect Matchings and Forests

Planar Graphs with Many Perfect Matchings and Forests Planar Graphs with Many Perfect Matchings and Forests Michael Biro Abstract We determine the number of perfect matchings and forests in a family T r,3 of triangulated prism graphs. These results show that

More information

Direct Routing: Algorithms and Complexity

Direct Routing: Algorithms and Complexity Direct Routing: Algorithms and Complexity Costas Busch 1, Malik Magdon-Ismail 1, Marios Mavronicolas 2, and Paul Spirakis 3 1 Department of Computer Science,Rensselaer Polytechnic Institute, 110 8th Street,Troy,

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Constant Queue Routing on a Mesh

Constant Queue Routing on a Mesh Constant Queue Routing on a Mesh Sanguthevar Rajasekaran Richard Overholt Dept. of Computer and Information Science Univ. of Pennsylvania, Philadelphia, PA 19104 ABSTRACT Packet routing is an important

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

ARRAY DATA STRUCTURE

ARRAY DATA STRUCTURE ARRAY DATA STRUCTURE Isha Batra, Divya Raheja Information Technology Dronacharya College Of Engineering, Farukhnagar,Gurgaon Abstract- In computer science, an array data structure or simply an array is

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences 1 Scalable Performance Analysis of Parallel Systems: Concepts and Experiences Holger Brunst ab and Wolfgang E. Nagel a a Center for High Performance Computing, Dresden University of Technology, 01062 Dresden,

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols SIAM Journal on Computing to appear From Static to Dynamic Routing: Efficient Transformations of StoreandForward Protocols Christian Scheideler Berthold Vöcking Abstract We investigate how static storeandforward

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Hardware-Efficient Parallelized Optimization with COMSOL Multiphysics and MATLAB

Hardware-Efficient Parallelized Optimization with COMSOL Multiphysics and MATLAB Hardware-Efficient Parallelized Optimization with COMSOL Multiphysics and MATLAB Frommelt Thomas* and Gutser Raphael SGL Carbon GmbH *Corresponding author: Werner-von-Siemens Straße 18, 86405 Meitingen,

More information

PARALLEL CONSISTENCY CHECKING OF AUTOMOTIVE PRODUCT DATA

PARALLEL CONSISTENCY CHECKING OF AUTOMOTIVE PRODUCT DATA PARALLEL CONSISTENCY CHECKING OF AUTOMOTIVE PRODUCT DATA WOLFGANG BLOCHINGER, CARSTEN SINZ AND WOLFGANG KÜCHLIN Symbolic Computation Group, WSI for Computer Science, Universität Tübingen, 72076 Tübingen,

More information