Optimizing Molecular Dynamics

Size: px
Start display at page:

Download "Optimizing Molecular Dynamics"

Transcription

1 Optimizing Molecular Dynamics This chapter discusses performance tuning of parallel and distributed molecular dynamics (MD) simulations, which involves both: (1) intranode optimization within each node of a parallel computer; and (2) internode optimization involving communications. Intranode Optimization: Memory Access Pattern in Molecular Dynamics Molecular dynamics (MD) program using a linked-list cell method is characterized by a random memory access pattern due to excessive indirections. This could in turn cause a significant degradation of MFlops performance. Fig. 1: Linked-lest access to atoms in a cell. In program, pmd.c, interatomic forces are computed in nested for loops over central and neighbor cells. Atoms in each cell are then accessed following the linked list implemented with array, lscl, this access pattern is unchanged for the entire force-computing routine. To take advantage of this fixed memory access pattern, we could copy the entire coordinate array, r, to another array, r1, by rearranging atoms such that memory access becomes consecutive [1]. We can also prepare an array, cell_end, which holds the starting and ending atomic indices in r1 for all the cells. For cell c, then the atoms are accessed as: for i = cell_end[c]+1 to cell_end[c+1] access r1[i] endfor Fig. 2: Modification of array layout for atomic coordinates. 1

2 Spacefilling (Hilbert-Peano) Curve The above data layout achieves data locality. The next step is to achieve computation locality. Note that the pmd.c program loops over the pairs of atoms residing in the nearest-neighbor cells. The locality of this computation can be achieved by ordering the cells such that the spatial proximity of the consecutive cells is preserved. The spacefilling curve may be used for this purpose. Below, we review one type of spacefilling curve called Hilbert-Peano curve. Gray code: a sequence of numbers such that each successive numbers have Hamming distance 1. (Hamming distance is the total number of bit positions at which two binary numbers differ.) The k-bit Gray code G(k) is defined recursively. (1) G(1) is a sequence: 0 1. (2) G(k+1) is constructed from G(k) as follows. a. Construct a new sequence by appending a 0 to the left of all members of G(k). b. Construct a new sequence by reversing G(k) and then appending a 1 to the left of all members of the sequence. c. G(k+1) is the concatenation of the sequences defined in steps a and b. Example: > Two-bit Gray code > Three-bit Gray code EMBEDDING A LINE TOPOLOGY INTO A HYPERCUBE Map the processor i of the line topology (size 2 d ) onto the i-th entry of the d-dimensional hypercube. (3D Example) SPACEFILLING CURVE Spacefilling curve: A mapping from [0,1] [0,1] d, or a one-dimensional curve, which fills a d- dimensional cube. It has many applications in graph partitioning, image compression, optimization (Traveling salesman problem), etc. Partitioning and ordering many points in a d-dimensional space (many of them is NP-complete) approximately reduces to an one-dimensional sorting problem whose complexity is O(N log N). Hilbert curve: A special spacefilling curve, which is based on the Gray sequence. Hilbert curve in d-dimensional space uses the d-dimensional Gray code. Note in 3 dimensions, there are 24 possible Gray sequences: 8 starting nodes, each having 3 possible terminating nodes. All of them are used to construct a Hilbert curve. 2

3 $ seed_rotate start = 0 end = 1: start = 0 end = 2: start = 0 end = 4: start = 1 end = 0: start = 1 end = 3: start = 1 end = 5: start = 2 end = 0: start = 2 end = 3: start = 2 end = 6: start = 3 end = 1: start = 3 end = 2: start = 3 end = 7: start = 4 end = 0: start = 4 end = 5: start = 4 end = 6: start = 5 end = 1: start = 5 end = 4: start = 5 end = 7: start = 6 end = 2: start = 6 end = 4: start = 6 end = 7: start = 7 end = 3: start = 7 end = 5: start = 7 end = 6: Hilbert curve is obtained as a limit of a recursive procedure: Prepare a Gray code as a seed, and recursively replace its nodes by (rotated) Gray seeds. EXAMPLE: 2-DIMENSIONAL HILBERT CURVES 3

4 APPLICATION OF HILBERT CURVE: TRAVELLING SALESMAN PROBLEM Traveling salesman problem: Given N cities in a map, find the shortest path to visit all the cities. This is known as an NP-complete problem (i.e., all the combinations must be tested so that the cost grows exponentially as N). A heuristic solution to the traveling salesman problem is obtained by using the Hilbert curve. First divide a square containing all the cities into 2 m 2 m cells so that each cell contains at most one city. Second, draw the Hilbert curve, which traverse all the cells. Finally visit the cities according to the onedimensional sequence on the Hilbert curve. Internode Optimization Metacomputerized Molecular Dynamics Metacomputing: Using geographically distributed computing resources as a single computing platform. Metacomputing applications > Distributed supercomputing: Large-scale computation that is beyond the power of a single parallel supercomputer. > Collaborative computing: Collaborative, hybrid computation that integrates distributed, multiple expertise [2]. Metacomputing tools > MPI-G2: Global version of MPI. It facilitates multi-protocol communication, cross-platform authentication, etc. in a heterogeneous metacomputing environment [3]. > MPI-GQ: MPI-G2 with quality-of-services support [4]. > Grid remote procedure call (GridRPC): Hybrid GridRPC (e.g., NinfG, see + MPI programs run on a Grid of distributed parallel computers, in which the number of processors change dynamically on demand and resources are allocated and tasks are migrated adaptively in response to unexpected faults [5]. METACOMPUTERIZED MD OVERLAPPING COMPUTATION AND COMMUNICATION Using MPI-G2, parallel MD codes such as pmd.c can be run in a metacomputing environment, e.g., by constructing a virtual machine consisting of hosts at USC and at the Grid Technology Research Center in Japan. We only need to prepare a processor group file that contains host names at both institutions. The problem is such a brute-force approach is latency. Since the force computations cannot start until all the communications for caching complete, larger latency associated with wide area networks 4

5 between U.S. and Japan will cause processors to be idle most of the time waiting for the messages (see Fig. 3, center). One possible solution to this latency problem is the use of asynchronous messages to overlap computation and communication. To do so, we first classify the linked-list cells for the inner and boundary cells, see Fig. 4. The inner cells do not have any face that coincides with one of processor boundaries, and therefore the forces on the atoms in an inner cell can be computed without any cached information. Inner-cell computation can thus be overlapped with communication, see Fig. 3, right. Boundary cells, on the other hand, have processor boundaries as one or more of their faces, and their force computation require cached information. Therefore, we need to wait the asynchronous messages to complete before we start force computation for boundary cells. The following is the metacomputerized MD algorithm: 1. asynchronous receive of cells to be cached 2. send atomic coordinates in the boundary cells 3. compute forces for atoms in the inner cells 4. wait for the completion of the asynchronous receive 5. compute forces for atoms in the boundary cells The actual implementation of the above idea is slightly more complex. Since the message passing is done in a 3-step loop (x, y and z directions), we need to specify which groups of cells can be allowed to compute forces after each step of message passing is completed. Specifically, let us define the following 4 groups: 1) inner cells; 2) boundary cells without any y or z processor-boundary faces; and 3) boundary cells without any z processor-boundary faces; and 4) boundary cells with z processor-boundary faces. (Question) Modify the above metacomputerized MD algorithm, taking account of the stepwise communication scheme. METACOMPUTERIZED MD RENORMALIZED MESSAGES Fig. 3: Gantt charts for parallel MD algorithms, where the arrows, thin lines and boxes indicate time progress, messages and computation activity, respectively. (Left) Regular spatial decomposition as in pmd.c on tightly-coupled computers. (Center) the same in a metacomputing environment involving computers at USC and the Grid Technology Research Center in Japan. (Right) Metacomputerized-MD at USC-Japan. To reduce the latency, it is desirable to minimize the number of messages. For a metacomputing involving multiple processors in one geographical site, latency can be reduced significantly by 5 Fig. 4: Inner and boundary cells in a processor for the linked cell list method are shown along with cached cells from other processors.

6 composing a large cross-site message instead of sending all processor-to-processor messages between the site boundary, see Fig. 5. Such operations are facilitated using the communicator construct in MPI. References Fig. 5: (Top) Processor-to-processor messages. (Bottom) A renormalized message. 1. J. Mellor-Crummey, D. Whalley, and K. Kennedy, Improving memory hierarchy performance for irregular applications using data and computation reorderings, International Journal of Parallel Programming 29, 217 (2001). 2. H. Kikuchi, R.K. Kalia, A. Nakano, P. Vashishta, H. Iyetomi, S. Ogata, T. Kouno, F. Shimojo, K. Tsuruta, and S. Saini, Collaborative simulation Grid: multiscale quantum-mechanical/classical atomistic simulations on distributed PC clusters in the US and Japan, in Proceedings of Supercomputing 2002 (IEEE Computer Society, Los Alamitos, CA, 2002). 3. I. Foster, J. Geisler, W.D. Gropp, N.T. Karonis, E. Lusk, G. Thiruvathukal, and S. Tuecke, Widearea implementation of the Message Passing Interface, Parallel Computing 24, 1735 (1998); See also the MPICH-G2 (Grid/Globus enabled MPI) homepage, 4. A. Roy, I. Foster, W.D. Gropp, N.T. Karonis, V. Sander, and B. Toonen, MPICH-GQ: quality-ofservice for message passing programs, in Proceedings of Supercomputing 2000 (IEEE Computer Society, Los Alamitos, CA, 2000). 5. H. Takemiya, Y. Tanaka, S. Sekiguchi, S. Ogata, R. K. Kalia, A. Nakano, and P. Vashishta, Sustainable adaptive Grid supercomputing: multiscale simulation of semiconductor processing across the Pacific, in Proceedings of Supercomputing 2006 (IEEE Computer Society, Los Alamitos, CA, 2006). 6

Grid Computing: Application to Science

Grid Computing: Application to Science Grid Computing: Application to Science Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials

More information

6. Parallel Volume Rendering Algorithms

6. Parallel Volume Rendering Algorithms 6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks

More information

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Lecture 4: Principles of Parallel Algorithm Design (part 4) Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction

More information

8. Hardware-Aware Numerics. Approaching supercomputing...

8. Hardware-Aware Numerics. Approaching supercomputing... Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum

More information

8. Hardware-Aware Numerics. Approaching supercomputing...

8. Hardware-Aware Numerics. Approaching supercomputing... Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Visualizing Higher-Dimensional Data by Neil Dickson

Visualizing Higher-Dimensional Data by Neil Dickson Visualizing Higher-Dimensional Data by Neil Dickson Introduction In some scenarios, dealing with more dimensions than 3 is an unfortunate necessity. This is especially true in the cases of quantum systems

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Massive Dataset Visualization

Massive Dataset Visualization Massive Dataset Visualization Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials Science,

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

High End Computing Is Bringing Atomistic Simulation To Macroscopic

High End Computing Is Bringing Atomistic Simulation To Macroscopic Virtualization-aware Application Framework for High-end Classical-quantum Atomistic Simulations of Nanosystems Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters

Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters J Supercomput (2011) 57: 20 33 DOI 10.1007/s11227-011-0560-1 Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters Liu Peng Manaschai Kunaseth Hikmet Dursun Ken-ichi

More information

Foster s Methodology: Application Examples

Foster s Methodology: Application Examples Foster s Methodology: Application Examples Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 19, 2011 CPD (DEI / IST) Parallel and

More information

OVERVIEW OF SPACE-FILLING CURVES AND THEIR APPLICATIONS IN SCHEDULING

OVERVIEW OF SPACE-FILLING CURVES AND THEIR APPLICATIONS IN SCHEDULING OVERVIEW OF SPACE-FILLING CURVES AND THEIR APPLICATIONS IN SCHEDULING Mir Ashfaque Ali 1 and S. A. Ladhake 2 1 Head, Department of Information Technology, Govt. Polytechnic, Amravati (MH), India. 2 Principal,

More information

Digital Halftoning Algorithm Based o Space-Filling Curve

Digital Halftoning Algorithm Based o Space-Filling Curve JAIST Reposi https://dspace.j Title Digital Halftoning Algorithm Based o Space-Filling Curve Author(s)ASANO, Tetsuo Citation IEICE TRANSACTIONS on Fundamentals o Electronics, Communications and Comp Sciences,

More information

Topology and Boundary Representation. The ACIS boundary representation (B-rep) of a model is a hierarchical decomposition of the model s topology:

Topology and Boundary Representation. The ACIS boundary representation (B-rep) of a model is a hierarchical decomposition of the model s topology: Chapter 6. Model Topology Topology refers to the spatial relationships between the various entities in a model. Topology describes how geometric entities are connected (connectivity). On its own, topology

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves

Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves Csaba Attila Vigh, Dr. Michael Bader Department of Informatics, TU München JASS 2006, course 2: Numerical Simulation:

More information

Managing MPICH-G2 Jobs with WebCom-G

Managing MPICH-G2 Jobs with WebCom-G Managing MPICH-G2 Jobs with WebCom-G Padraig J. O Dowd, Adarsh Patil and John P. Morrison Computer Science Dept., University College Cork, Ireland {p.odowd, adarsh, j.morrison}@cs.ucc.ie Abstract This

More information

Parallel Molecular Dynamics

Parallel Molecular Dynamics Parallel Molecular Dynamics Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials

More information

Algorithms for GIS: Space filling curves

Algorithms for GIS: Space filling curves Algorithms for GIS: Space filling curves Z-order visit quadrants recursively in this order: NW, NE, SW, SE Z-order visit quadrants recursively in this order: NW, NE, SW, SE Z-order visit quadrants recursively

More information

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model

More information

A motion planning method for mobile robot considering rotational motion in area coverage task

A motion planning method for mobile robot considering rotational motion in area coverage task Asia Pacific Conference on Robot IoT System Development and Platform 018 (APRIS018) A motion planning method for mobile robot considering rotational motion in area coverage task Yano Taiki 1,a) Takase

More information

Adaptive-Mesh-Refinement Pattern

Adaptive-Mesh-Refinement Pattern Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points

More information

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Kavish Gandhi April 4, 2015 Abstract A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long

More information

Partitioning with Space-Filling Curves on the Cubed-Sphere

Partitioning with Space-Filling Curves on the Cubed-Sphere Partitioning with Space-Filling Curves on the Cubed-Sphere John M. Dennis Scientific Computing Division National Center for Atmospheric Research P.O. Box 3000 Boulder, CO 80307 dennis@ucar.edu Abstract

More information

Parallel MD Part 1. Parallel Computing: Concepts

Parallel MD Part 1. Parallel Computing: Concepts Parallel MD Part 1 Michela Taufer The material used in class is based on material from: Scientific Computing and Visualization (course given at USC) http://cacs.usc.edu/ education/cs596_f05.htmlhigh Performance

More information

COPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges.

COPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges. Chapter 1 Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges Manish Parashar and Xiaolin Li 1.1 MOTIVATION The exponential growth in computing, networking,

More information

Empirical Analysis of Space Filling Curves for Scientific Computing Applications

Empirical Analysis of Space Filling Curves for Scientific Computing Applications Empirical Analysis of Space Filling Curves for Scientific Computing Applications Daryl DeFord 1 Ananth Kalyanaraman 2 1 Department of Mathematics 2 School of Electrical Engineering and Computer Science

More information

Dynamic load balancing in OSIRIS

Dynamic load balancing in OSIRIS Dynamic load balancing in OSIRIS R. A. Fonseca 1,2 1 GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal 2 DCTI, ISCTE-Instituto Universitário de Lisboa, Portugal Maintaining parallel load balance

More information

Evaluating the Performance of the Community Atmosphere Model at High Resolutions

Evaluating the Performance of the Community Atmosphere Model at High Resolutions Evaluating the Performance of the Community Atmosphere Model at High Resolutions Soumi Manna MS candidate, University of Wyoming Mentor: Dr. Ben Jamroz National Center for Atmospheric Research Boulder,

More information

Overpartioning with the Rice dhpf Compiler

Overpartioning with the Rice dhpf Compiler Overpartioning with the Rice dhpf Compiler Strategies for Achieving High Performance in High Performance Fortran Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/hug00overpartioning.pdf

More information

Algorithm classification

Algorithm classification Types of Algorithms Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We ll talk about a classification scheme for algorithms This classification scheme

More information

Sorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup

Sorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup Sorting Algorithms - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup The worst-case time complexity of mergesort and the average time complexity of quicksort are

More information

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

Space Filling Curves

Space Filling Curves Algorithms for GIS Space Filling Curves Laura Toma Bowdoin College A map from an interval to a square Space filling curves Space filling curves https://mathsbyagirl.wordpress.com/tag/curve/ A map from

More information

Edge Equalized Treemaps

Edge Equalized Treemaps Edge Equalized Treemaps Aimi Kobayashi Department of Computer Science University of Tsukuba Ibaraki, Japan kobayashi@iplab.cs.tsukuba.ac.jp Kazuo Misue Faculty of Engineering, Information and Systems University

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Empirical Analysis of Space Filling Curves for Scientific Computing Applications

Empirical Analysis of Space Filling Curves for Scientific Computing Applications Empirical Analysis of Space Filling Curves for Scientific Computing Applications Daryl DeFord 1 Ananth Kalyanaraman 2 1 Dartmouth College Department of Mathematics 2 Washington State University School

More information

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization 2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic

More information

Turning Heterogeneity into an Advantage in Overlay Routing

Turning Heterogeneity into an Advantage in Overlay Routing Turning Heterogeneity into an Advantage in Overlay Routing Zhichen Xu Hewlett-Packard Laboratories 50 Page Mill Rd Palo Alto, CA 9404 Email: zhichen@hpl.hp.com Mallik Mahalingam VMware Inc. 45 Porter Drive

More information

CAR-TR-990 CS-TR-4526 UMIACS September 2003

CAR-TR-990 CS-TR-4526 UMIACS September 2003 CAR-TR-990 CS-TR-4526 UMIACS 2003-94 September 2003 Object-based and Image-based Object Representations Hanan Samet Computer Science Department Center for Automation Research Institute for Advanced Computer

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

Graph/Network Visualization

Graph/Network Visualization Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation

More information

Lecturer 2: Spatial Concepts and Data Models

Lecturer 2: Spatial Concepts and Data Models Lecturer 2: Spatial Concepts and Data Models 2.1 Introduction 2.2 Models of Spatial Information 2.3 Three-Step Database Design 2.4 Extending ER with Spatial Concepts 2.5 Summary Learning Objectives Learning

More information

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 DUE : April 9, 2014 HOMEWORK IV READ : - Related portions of Chapter 5 and Appendces F and I of the Hennessy book - Related portions of Chapter 1, 4 and 6 of

More information

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Chapter 8 : Multiprocessors

Chapter 8 : Multiprocessors Chapter 8 Multiprocessors 8.1 Characteristics of multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment. The term processor in multiprocessor

More information

Introduction to Spatial Database Systems

Introduction to Spatial Database Systems Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated

More information

SNS COLLEGE OF ENGINEERING

SNS COLLEGE OF ENGINEERING SNS COLLEGE OF ENGINEERING Coimbatore. Department of Computer Science and Engineering Question Bank- Even Semester 2015-2016 CS6401 OPERATING SYSTEMS Unit-I OPERATING SYSTEMS OVERVIEW 1. Differentiate

More information

Grid Computing Systems: A Survey and Taxonomy

Grid Computing Systems: A Survey and Taxonomy Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

MPI+X on The Way to Exascale. William Gropp

MPI+X on The Way to Exascale. William Gropp MPI+X on The Way to Exascale William Gropp http://wgropp.cs.illinois.edu Some Likely Exascale Architectures Figure 1: Core Group for Node (Low Capacity, High Bandwidth) 3D Stacked Memory (High Capacity,

More information

Parallel Implementation of 3D FMA using MPI

Parallel Implementation of 3D FMA using MPI Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system

More information

Effective Tour Searching for Large TSP Instances. Gerold Jäger

Effective Tour Searching for Large TSP Instances. Gerold Jäger Effective Tour Searching for Large TSP Instances Gerold Jäger Martin-Luther-University Halle-Wittenberg (Germany) joint work with Changxing Dong, Paul Molitor, Dirk Richter German Research Foundation Grant

More information

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha Symposium on Interactive Ray Tracing 2008 Los Angeles, California Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Kirill Garanzha Department of Software for Computers Bauman Moscow State

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Kevin J. Barker. Scott Pakin and Darren J. Kerbyson

Kevin J. Barker. Scott Pakin and Darren J. Kerbyson Experiences in Performance Modeling: The Krak Hydrodynamics Application Kevin J. Barker Scott Pakin and Darren J. Kerbyson Performance and Architecture Laboratory (PAL) http://www.c3.lanl.gov/pal/ Computer,

More information

Multidimensional Indexes [14]

Multidimensional Indexes [14] CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting

More information

IOS: A Middleware for Decentralized Distributed Computing

IOS: A Middleware for Decentralized Distributed Computing IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc

More information

Code Transformation of DF-Expression between Bintree and Quadtree

Code Transformation of DF-Expression between Bintree and Quadtree Code Transformation of DF-Expression between Bintree and Quadtree Chin-Chen Chang*, Chien-Fa Li*, and Yu-Chen Hu** *Department of Computer Science and Information Engineering, National Chung Cheng University

More information

L21: Putting it together: Tree Search (Ch. 6)!

L21: Putting it together: Tree Search (Ch. 6)! Administrative CUDA project due Wednesday, Nov. 28 L21: Putting it together: Tree Search (Ch. 6)! Poster dry run on Dec. 4, final presentations on Dec. 6 Optional final report (4-6 pages) due on Dec. 14

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009 VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication

More information

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET zk0 ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign

More information

Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze

Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze Computer Science and Engineering 2013, 3(3): 76-83 DOI: 10.5923/j.computer.20130303.04 Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze Tomio Kurokawa Department of Information

More information

Evaluating the Performance of Skeleton-Based High Level Parallel Programs

Evaluating the Performance of Skeleton-Based High Level Parallel Programs Evaluating the Performance of Skeleton-Based High Level Parallel Programs Anne Benoit, Murray Cole, Stephen Gilmore, and Jane Hillston School of Informatics, The University of Edinburgh, James Clerk Maxwell

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

Wide-area Cluster System

Wide-area Cluster System Performance Evaluation of a Firewall-compliant Globus-based Wide-area Cluster System Yoshio Tanaka 3, Mitsuhisa Sato Real World Computing Partnership Mitsui bldg. 14F, 1-6-1 Takezono Tsukuba Ibaraki 305-0032,

More information

Classification and Generation of 3D Discrete Curves

Classification and Generation of 3D Discrete Curves Applied Mathematical Sciences, Vol. 1, 2007, no. 57, 2805-2825 Classification and Generation of 3D Discrete Curves Ernesto Bribiesca Departamento de Ciencias de la Computación Instituto de Investigaciones

More information

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Fabiana Leibovich, Laura De Giusti, and Marcelo Naiouf Instituto de Investigación en Informática LIDI (III-LIDI),

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

A Distributed Media Service System Based on Globus Data-Management Technologies1

A Distributed Media Service System Based on Globus Data-Management Technologies1 A Distributed Media Service System Based on Globus Data-Management Technologies1 Xiang Yu, Shoubao Yang, and Yu Hong Dept. of Computer Science, University of Science and Technology of China, Hefei 230026,

More information

Ray Tracing Acceleration Data Structures

Ray Tracing Acceleration Data Structures Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

A Graph Algorithmic Framework for the Assembly of Shredded Documents. Fabian Richter, Christian X. Ries, Rainer Lienhart. Report August 2011

A Graph Algorithmic Framework for the Assembly of Shredded Documents. Fabian Richter, Christian X. Ries, Rainer Lienhart. Report August 2011 Universität Augsburg A Graph Algorithmic Framework for the Assembly of Shredded Documents Fabian Richter, Christian X. Ries, Rainer Lienhart Report 2011-05 August 2011 Institut für Informatik D-86135 Augsburg

More information

Multidimensional Data and Modelling

Multidimensional Data and Modelling Multidimensional Data and Modelling 1 Problems of multidimensional data structures l multidimensional (md-data or spatial) data and their implementation of operations between objects (spatial data practically

More information

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA 2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance

More information

A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D

A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D 3 rd International Symposium on Impact Engineering 98, 7-9 December 1998, Singapore A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D M.

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Massive Data Algorithmics

Massive Data Algorithmics In the name of Allah Massive Data Algorithmics An Introduction Overview MADALGO SCALGO Basic Concepts The TerraFlow Project STREAM The TerraStream Project TPIE MADALGO- Introduction Center for MAssive

More information

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-core Processor

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-core Processor Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-core Processor Liu Peng, Guangming Tan, Rajiv K. Kalia, Aiichiro Nakano, Priya Vashishta, Dongrui Fan and Ninghui

More information

Tutorial: Application MPI Task Placement

Tutorial: Application MPI Task Placement Tutorial: Application MPI Task Placement Juan Galvez Nikhil Jain Palash Sharma PPL, University of Illinois at Urbana-Champaign Tutorial Outline Why Task Mapping on Blue Waters? When to do mapping? How

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Chap4: Spatial Storage and Indexing

Chap4: Spatial Storage and Indexing Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

L20: Putting it together: Tree Search (Ch. 6)!

L20: Putting it together: Tree Search (Ch. 6)! Administrative L20: Putting it together: Tree Search (Ch. 6)! November 29, 2011! Next homework, CUDA, MPI (Ch. 3) and Apps (Ch. 6) - Goal is to prepare you for final - We ll discuss it in class on Thursday

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Leveraging Flash in HPC Systems

Leveraging Flash in HPC Systems Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,

More information

Squid: Enabling search in DHT-based systems

Squid: Enabling search in DHT-based systems J. Parallel Distrib. Comput. 68 (2008) 962 975 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Squid: Enabling search in DHT-based

More information

Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods

Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods Uei-Ren Chen 1, Chin-Chi Wu 2, and Woei Lin 3 1 Department of Electronic Engineering, Hsiuping Institute of Technology

More information

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

An Introduction to Spatial Databases

An Introduction to Spatial Databases An Introduction to Spatial Databases R. H. Guting VLDB Journal v3, n4, October 1994 Speaker: Giovanni Conforti Outline: a rather old (but quite complete) survey on Spatial DBMS Introduction & definition

More information

CS 614 COMPUTER ARCHITECTURE II FALL 2005

CS 614 COMPUTER ARCHITECTURE II FALL 2005 CS 614 COMPUTER ARCHITECTURE II FALL 2005 DUE : November 23, 2005 HOMEWORK IV READ : i) Related portions of Chapters : 3, 10, 15, 17 and 18 of the Sima book and ii) Chapter 8 of the Hennessy book. ASSIGNMENT:

More information