Optimizing Molecular Dynamics
|
|
- Angelina Jackson
- 5 years ago
- Views:
Transcription
1 Optimizing Molecular Dynamics This chapter discusses performance tuning of parallel and distributed molecular dynamics (MD) simulations, which involves both: (1) intranode optimization within each node of a parallel computer; and (2) internode optimization involving communications. Intranode Optimization: Memory Access Pattern in Molecular Dynamics Molecular dynamics (MD) program using a linked-list cell method is characterized by a random memory access pattern due to excessive indirections. This could in turn cause a significant degradation of MFlops performance. Fig. 1: Linked-lest access to atoms in a cell. In program, pmd.c, interatomic forces are computed in nested for loops over central and neighbor cells. Atoms in each cell are then accessed following the linked list implemented with array, lscl, this access pattern is unchanged for the entire force-computing routine. To take advantage of this fixed memory access pattern, we could copy the entire coordinate array, r, to another array, r1, by rearranging atoms such that memory access becomes consecutive [1]. We can also prepare an array, cell_end, which holds the starting and ending atomic indices in r1 for all the cells. For cell c, then the atoms are accessed as: for i = cell_end[c]+1 to cell_end[c+1] access r1[i] endfor Fig. 2: Modification of array layout for atomic coordinates. 1
2 Spacefilling (Hilbert-Peano) Curve The above data layout achieves data locality. The next step is to achieve computation locality. Note that the pmd.c program loops over the pairs of atoms residing in the nearest-neighbor cells. The locality of this computation can be achieved by ordering the cells such that the spatial proximity of the consecutive cells is preserved. The spacefilling curve may be used for this purpose. Below, we review one type of spacefilling curve called Hilbert-Peano curve. Gray code: a sequence of numbers such that each successive numbers have Hamming distance 1. (Hamming distance is the total number of bit positions at which two binary numbers differ.) The k-bit Gray code G(k) is defined recursively. (1) G(1) is a sequence: 0 1. (2) G(k+1) is constructed from G(k) as follows. a. Construct a new sequence by appending a 0 to the left of all members of G(k). b. Construct a new sequence by reversing G(k) and then appending a 1 to the left of all members of the sequence. c. G(k+1) is the concatenation of the sequences defined in steps a and b. Example: > Two-bit Gray code > Three-bit Gray code EMBEDDING A LINE TOPOLOGY INTO A HYPERCUBE Map the processor i of the line topology (size 2 d ) onto the i-th entry of the d-dimensional hypercube. (3D Example) SPACEFILLING CURVE Spacefilling curve: A mapping from [0,1] [0,1] d, or a one-dimensional curve, which fills a d- dimensional cube. It has many applications in graph partitioning, image compression, optimization (Traveling salesman problem), etc. Partitioning and ordering many points in a d-dimensional space (many of them is NP-complete) approximately reduces to an one-dimensional sorting problem whose complexity is O(N log N). Hilbert curve: A special spacefilling curve, which is based on the Gray sequence. Hilbert curve in d-dimensional space uses the d-dimensional Gray code. Note in 3 dimensions, there are 24 possible Gray sequences: 8 starting nodes, each having 3 possible terminating nodes. All of them are used to construct a Hilbert curve. 2
3 $ seed_rotate start = 0 end = 1: start = 0 end = 2: start = 0 end = 4: start = 1 end = 0: start = 1 end = 3: start = 1 end = 5: start = 2 end = 0: start = 2 end = 3: start = 2 end = 6: start = 3 end = 1: start = 3 end = 2: start = 3 end = 7: start = 4 end = 0: start = 4 end = 5: start = 4 end = 6: start = 5 end = 1: start = 5 end = 4: start = 5 end = 7: start = 6 end = 2: start = 6 end = 4: start = 6 end = 7: start = 7 end = 3: start = 7 end = 5: start = 7 end = 6: Hilbert curve is obtained as a limit of a recursive procedure: Prepare a Gray code as a seed, and recursively replace its nodes by (rotated) Gray seeds. EXAMPLE: 2-DIMENSIONAL HILBERT CURVES 3
4 APPLICATION OF HILBERT CURVE: TRAVELLING SALESMAN PROBLEM Traveling salesman problem: Given N cities in a map, find the shortest path to visit all the cities. This is known as an NP-complete problem (i.e., all the combinations must be tested so that the cost grows exponentially as N). A heuristic solution to the traveling salesman problem is obtained by using the Hilbert curve. First divide a square containing all the cities into 2 m 2 m cells so that each cell contains at most one city. Second, draw the Hilbert curve, which traverse all the cells. Finally visit the cities according to the onedimensional sequence on the Hilbert curve. Internode Optimization Metacomputerized Molecular Dynamics Metacomputing: Using geographically distributed computing resources as a single computing platform. Metacomputing applications > Distributed supercomputing: Large-scale computation that is beyond the power of a single parallel supercomputer. > Collaborative computing: Collaborative, hybrid computation that integrates distributed, multiple expertise [2]. Metacomputing tools > MPI-G2: Global version of MPI. It facilitates multi-protocol communication, cross-platform authentication, etc. in a heterogeneous metacomputing environment [3]. > MPI-GQ: MPI-G2 with quality-of-services support [4]. > Grid remote procedure call (GridRPC): Hybrid GridRPC (e.g., NinfG, see + MPI programs run on a Grid of distributed parallel computers, in which the number of processors change dynamically on demand and resources are allocated and tasks are migrated adaptively in response to unexpected faults [5]. METACOMPUTERIZED MD OVERLAPPING COMPUTATION AND COMMUNICATION Using MPI-G2, parallel MD codes such as pmd.c can be run in a metacomputing environment, e.g., by constructing a virtual machine consisting of hosts at USC and at the Grid Technology Research Center in Japan. We only need to prepare a processor group file that contains host names at both institutions. The problem is such a brute-force approach is latency. Since the force computations cannot start until all the communications for caching complete, larger latency associated with wide area networks 4
5 between U.S. and Japan will cause processors to be idle most of the time waiting for the messages (see Fig. 3, center). One possible solution to this latency problem is the use of asynchronous messages to overlap computation and communication. To do so, we first classify the linked-list cells for the inner and boundary cells, see Fig. 4. The inner cells do not have any face that coincides with one of processor boundaries, and therefore the forces on the atoms in an inner cell can be computed without any cached information. Inner-cell computation can thus be overlapped with communication, see Fig. 3, right. Boundary cells, on the other hand, have processor boundaries as one or more of their faces, and their force computation require cached information. Therefore, we need to wait the asynchronous messages to complete before we start force computation for boundary cells. The following is the metacomputerized MD algorithm: 1. asynchronous receive of cells to be cached 2. send atomic coordinates in the boundary cells 3. compute forces for atoms in the inner cells 4. wait for the completion of the asynchronous receive 5. compute forces for atoms in the boundary cells The actual implementation of the above idea is slightly more complex. Since the message passing is done in a 3-step loop (x, y and z directions), we need to specify which groups of cells can be allowed to compute forces after each step of message passing is completed. Specifically, let us define the following 4 groups: 1) inner cells; 2) boundary cells without any y or z processor-boundary faces; and 3) boundary cells without any z processor-boundary faces; and 4) boundary cells with z processor-boundary faces. (Question) Modify the above metacomputerized MD algorithm, taking account of the stepwise communication scheme. METACOMPUTERIZED MD RENORMALIZED MESSAGES Fig. 3: Gantt charts for parallel MD algorithms, where the arrows, thin lines and boxes indicate time progress, messages and computation activity, respectively. (Left) Regular spatial decomposition as in pmd.c on tightly-coupled computers. (Center) the same in a metacomputing environment involving computers at USC and the Grid Technology Research Center in Japan. (Right) Metacomputerized-MD at USC-Japan. To reduce the latency, it is desirable to minimize the number of messages. For a metacomputing involving multiple processors in one geographical site, latency can be reduced significantly by 5 Fig. 4: Inner and boundary cells in a processor for the linked cell list method are shown along with cached cells from other processors.
6 composing a large cross-site message instead of sending all processor-to-processor messages between the site boundary, see Fig. 5. Such operations are facilitated using the communicator construct in MPI. References Fig. 5: (Top) Processor-to-processor messages. (Bottom) A renormalized message. 1. J. Mellor-Crummey, D. Whalley, and K. Kennedy, Improving memory hierarchy performance for irregular applications using data and computation reorderings, International Journal of Parallel Programming 29, 217 (2001). 2. H. Kikuchi, R.K. Kalia, A. Nakano, P. Vashishta, H. Iyetomi, S. Ogata, T. Kouno, F. Shimojo, K. Tsuruta, and S. Saini, Collaborative simulation Grid: multiscale quantum-mechanical/classical atomistic simulations on distributed PC clusters in the US and Japan, in Proceedings of Supercomputing 2002 (IEEE Computer Society, Los Alamitos, CA, 2002). 3. I. Foster, J. Geisler, W.D. Gropp, N.T. Karonis, E. Lusk, G. Thiruvathukal, and S. Tuecke, Widearea implementation of the Message Passing Interface, Parallel Computing 24, 1735 (1998); See also the MPICH-G2 (Grid/Globus enabled MPI) homepage, 4. A. Roy, I. Foster, W.D. Gropp, N.T. Karonis, V. Sander, and B. Toonen, MPICH-GQ: quality-ofservice for message passing programs, in Proceedings of Supercomputing 2000 (IEEE Computer Society, Los Alamitos, CA, 2000). 5. H. Takemiya, Y. Tanaka, S. Sekiguchi, S. Ogata, R. K. Kalia, A. Nakano, and P. Vashishta, Sustainable adaptive Grid supercomputing: multiscale simulation of semiconductor processing across the Pacific, in Proceedings of Supercomputing 2006 (IEEE Computer Society, Los Alamitos, CA, 2006). 6
Grid Computing: Application to Science
Grid Computing: Application to Science Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials
More information6. Parallel Volume Rendering Algorithms
6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks
More informationLecture 4: Principles of Parallel Algorithm Design (part 4)
Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More informationPrinciple Of Parallel Algorithm Design (cont.) Alexandre David B2-206
Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction
More informationVisualizing Higher-Dimensional Data by Neil Dickson
Visualizing Higher-Dimensional Data by Neil Dickson Introduction In some scenarios, dealing with more dimensions than 3 is an unfortunate necessity. This is especially true in the cases of quantum systems
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationMassive Dataset Visualization
Massive Dataset Visualization Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Dept. of Computer Science, Dept. of Physics & Astronomy, Dept. of Chemical Engineering & Materials Science,
More informationMemory Hierarchy Management for Iterative Graph Structures
Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationHigh End Computing Is Bringing Atomistic Simulation To Macroscopic
Virtualization-aware Application Framework for High-end Classical-quantum Atomistic Simulations of Nanosystems Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer
More informationData Partitioning. Figure 1-31: Communication Topologies. Regular Partitions
Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy
More informationExploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters
J Supercomput (2011) 57: 20 33 DOI 10.1007/s11227-011-0560-1 Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters Liu Peng Manaschai Kunaseth Hikmet Dursun Ken-ichi
More informationFoster s Methodology: Application Examples
Foster s Methodology: Application Examples Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 19, 2011 CPD (DEI / IST) Parallel and
More informationOVERVIEW OF SPACE-FILLING CURVES AND THEIR APPLICATIONS IN SCHEDULING
OVERVIEW OF SPACE-FILLING CURVES AND THEIR APPLICATIONS IN SCHEDULING Mir Ashfaque Ali 1 and S. A. Ladhake 2 1 Head, Department of Information Technology, Govt. Polytechnic, Amravati (MH), India. 2 Principal,
More informationDigital Halftoning Algorithm Based o Space-Filling Curve
JAIST Reposi https://dspace.j Title Digital Halftoning Algorithm Based o Space-Filling Curve Author(s)ASANO, Tetsuo Citation IEICE TRANSACTIONS on Fundamentals o Electronics, Communications and Comp Sciences,
More informationTopology and Boundary Representation. The ACIS boundary representation (B-rep) of a model is a hierarchical decomposition of the model s topology:
Chapter 6. Model Topology Topology refers to the spatial relationships between the various entities in a model. Topology describes how geometric entities are connected (connectivity). On its own, topology
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationEfficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves
Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves Csaba Attila Vigh, Dr. Michael Bader Department of Informatics, TU München JASS 2006, course 2: Numerical Simulation:
More informationManaging MPICH-G2 Jobs with WebCom-G
Managing MPICH-G2 Jobs with WebCom-G Padraig J. O Dowd, Adarsh Patil and John P. Morrison Computer Science Dept., University College Cork, Ireland {p.odowd, adarsh, j.morrison}@cs.ucc.ie Abstract This
More informationParallel Molecular Dynamics
Parallel Molecular Dynamics Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials
More informationAlgorithms for GIS: Space filling curves
Algorithms for GIS: Space filling curves Z-order visit quadrants recursively in this order: NW, NE, SW, SE Z-order visit quadrants recursively in this order: NW, NE, SW, SE Z-order visit quadrants recursively
More informationChap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationA motion planning method for mobile robot considering rotational motion in area coverage task
Asia Pacific Conference on Robot IoT System Development and Platform 018 (APRIS018) A motion planning method for mobile robot considering rotational motion in area coverage task Yano Taiki 1,a) Takase
More informationAdaptive-Mesh-Refinement Pattern
Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points
More informationMaximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube
Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Kavish Gandhi April 4, 2015 Abstract A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long
More informationPartitioning with Space-Filling Curves on the Cubed-Sphere
Partitioning with Space-Filling Curves on the Cubed-Sphere John M. Dennis Scientific Computing Division National Center for Atmospheric Research P.O. Box 3000 Boulder, CO 80307 dennis@ucar.edu Abstract
More informationParallel MD Part 1. Parallel Computing: Concepts
Parallel MD Part 1 Michela Taufer The material used in class is based on material from: Scientific Computing and Visualization (course given at USC) http://cacs.usc.edu/ education/cs596_f05.htmlhigh Performance
More informationCOPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges.
Chapter 1 Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges Manish Parashar and Xiaolin Li 1.1 MOTIVATION The exponential growth in computing, networking,
More informationEmpirical Analysis of Space Filling Curves for Scientific Computing Applications
Empirical Analysis of Space Filling Curves for Scientific Computing Applications Daryl DeFord 1 Ananth Kalyanaraman 2 1 Department of Mathematics 2 School of Electrical Engineering and Computer Science
More informationDynamic load balancing in OSIRIS
Dynamic load balancing in OSIRIS R. A. Fonseca 1,2 1 GoLP/IPFN, Instituto Superior Técnico, Lisboa, Portugal 2 DCTI, ISCTE-Instituto Universitário de Lisboa, Portugal Maintaining parallel load balance
More informationEvaluating the Performance of the Community Atmosphere Model at High Resolutions
Evaluating the Performance of the Community Atmosphere Model at High Resolutions Soumi Manna MS candidate, University of Wyoming Mentor: Dr. Ben Jamroz National Center for Atmospheric Research Boulder,
More informationOverpartioning with the Rice dhpf Compiler
Overpartioning with the Rice dhpf Compiler Strategies for Achieving High Performance in High Performance Fortran Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/hug00overpartioning.pdf
More informationAlgorithm classification
Types of Algorithms Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We ll talk about a classification scheme for algorithms This classification scheme
More informationSorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup
Sorting Algorithms - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup The worst-case time complexity of mergesort and the average time complexity of quicksort are
More informationA Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme
A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,
More informationEE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More informationSpace Filling Curves
Algorithms for GIS Space Filling Curves Laura Toma Bowdoin College A map from an interval to a square Space filling curves Space filling curves https://mathsbyagirl.wordpress.com/tag/curve/ A map from
More informationEdge Equalized Treemaps
Edge Equalized Treemaps Aimi Kobayashi Department of Computer Science University of Tsukuba Ibaraki, Japan kobayashi@iplab.cs.tsukuba.ac.jp Kazuo Misue Faculty of Engineering, Information and Systems University
More informationAdvanced Parallel Architecture. Annalisa Massini /2017
Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing
More informationEmpirical Analysis of Space Filling Curves for Scientific Computing Applications
Empirical Analysis of Space Filling Curves for Scientific Computing Applications Daryl DeFord 1 Ananth Kalyanaraman 2 1 Dartmouth College Department of Mathematics 2 Washington State University School
More informationMeta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization
2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic
More informationTurning Heterogeneity into an Advantage in Overlay Routing
Turning Heterogeneity into an Advantage in Overlay Routing Zhichen Xu Hewlett-Packard Laboratories 50 Page Mill Rd Palo Alto, CA 9404 Email: zhichen@hpl.hp.com Mallik Mahalingam VMware Inc. 45 Porter Drive
More informationCAR-TR-990 CS-TR-4526 UMIACS September 2003
CAR-TR-990 CS-TR-4526 UMIACS 2003-94 September 2003 Object-based and Image-based Object Representations Hanan Samet Computer Science Department Center for Automation Research Institute for Advanced Computer
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256
More informationGraph/Network Visualization
Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation
More informationLecturer 2: Spatial Concepts and Data Models
Lecturer 2: Spatial Concepts and Data Models 2.1 Introduction 2.2 Models of Spatial Information 2.3 Three-Step Database Design 2.4 Extending ER with Spatial Concepts 2.5 Summary Learning Objectives Learning
More informationCS 6143 COMPUTER ARCHITECTURE II SPRING 2014
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 DUE : April 9, 2014 HOMEWORK IV READ : - Related portions of Chapter 5 and Appendces F and I of the Hennessy book - Related portions of Chapter 1, 4 and 6 of
More informationOmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP
OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering
More informationChapter 8 : Multiprocessors
Chapter 8 Multiprocessors 8.1 Characteristics of multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment. The term processor in multiprocessor
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationSNS COLLEGE OF ENGINEERING
SNS COLLEGE OF ENGINEERING Coimbatore. Department of Computer Science and Engineering Question Bank- Even Semester 2015-2016 CS6401 OPERATING SYSTEMS Unit-I OPERATING SYSTEMS OVERVIEW 1. Differentiate
More informationGrid Computing Systems: A Survey and Taxonomy
Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical
More informationDr e v prasad Dt
Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction
More informationMPI+X on The Way to Exascale. William Gropp
MPI+X on The Way to Exascale William Gropp http://wgropp.cs.illinois.edu Some Likely Exascale Architectures Figure 1: Core Group for Node (Low Capacity, High Bandwidth) 3D Stacked Memory (High Capacity,
More informationParallel Implementation of 3D FMA using MPI
Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system
More informationEffective Tour Searching for Large TSP Instances. Gerold Jäger
Effective Tour Searching for Large TSP Instances Gerold Jäger Martin-Luther-University Halle-Wittenberg (Germany) joint work with Changxing Dong, Paul Molitor, Dirk Richter German Research Foundation Grant
More informationEfficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha
Symposium on Interactive Ray Tracing 2008 Los Angeles, California Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Kirill Garanzha Department of Software for Computers Bauman Moscow State
More informationBenchmarking the UB-tree
Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz
More informationKevin J. Barker. Scott Pakin and Darren J. Kerbyson
Experiences in Performance Modeling: The Krak Hydrodynamics Application Kevin J. Barker Scott Pakin and Darren J. Kerbyson Performance and Architecture Laboratory (PAL) http://www.c3.lanl.gov/pal/ Computer,
More informationMultidimensional Indexes [14]
CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting
More informationIOS: A Middleware for Decentralized Distributed Computing
IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc
More informationCode Transformation of DF-Expression between Bintree and Quadtree
Code Transformation of DF-Expression between Bintree and Quadtree Chin-Chen Chang*, Chien-Fa Li*, and Yu-Chen Hu** *Department of Computer Science and Information Engineering, National Chung Cheng University
More informationL21: Putting it together: Tree Search (Ch. 6)!
Administrative CUDA project due Wednesday, Nov. 28 L21: Putting it together: Tree Search (Ch. 6)! Poster dry run on Dec. 4, final presentations on Dec. 6 Optional final report (4-6 pages) due on Dec. 14
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationVIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009
VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication
More informationISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET
zk0 ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY
More informationImproving Performance of Sparse Matrix-Vector Multiplication
Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign
More informationPicture Maze Generation by Repeated Contour Connection and Graph Structure of Maze
Computer Science and Engineering 2013, 3(3): 76-83 DOI: 10.5923/j.computer.20130303.04 Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze Tomio Kurokawa Department of Information
More informationEvaluating the Performance of Skeleton-Based High Level Parallel Programs
Evaluating the Performance of Skeleton-Based High Level Parallel Programs Anne Benoit, Murray Cole, Stephen Gilmore, and Jane Hillston School of Informatics, The University of Edinburgh, James Clerk Maxwell
More informationIntroduction to Indexing R-trees. Hong Kong University of Science and Technology
Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records
More informationWide-area Cluster System
Performance Evaluation of a Firewall-compliant Globus-based Wide-area Cluster System Yoshio Tanaka 3, Mitsuhisa Sato Real World Computing Partnership Mitsui bldg. 14F, 1-6-1 Takezono Tsukuba Ibaraki 305-0032,
More informationClassification and Generation of 3D Discrete Curves
Applied Mathematical Sciences, Vol. 1, 2007, no. 57, 2805-2825 Classification and Generation of 3D Discrete Curves Ernesto Bribiesca Departamento de Ciencias de la Computación Instituto de Investigaciones
More informationParallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming
Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Fabiana Leibovich, Laura De Giusti, and Marcelo Naiouf Instituto de Investigación en Informática LIDI (III-LIDI),
More informationThe Icosahedral Nonhydrostatic (ICON) Model
The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =
More informationA Distributed Media Service System Based on Globus Data-Management Technologies1
A Distributed Media Service System Based on Globus Data-Management Technologies1 Xiang Yu, Shoubao Yang, and Yu Hong Dept. of Computer Science, University of Science and Technology of China, Hefei 230026,
More informationRay Tracing Acceleration Data Structures
Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationA Graph Algorithmic Framework for the Assembly of Shredded Documents. Fabian Richter, Christian X. Ries, Rainer Lienhart. Report August 2011
Universität Augsburg A Graph Algorithmic Framework for the Assembly of Shredded Documents Fabian Richter, Christian X. Ries, Rainer Lienhart Report 2011-05 August 2011 Institut für Informatik D-86135 Augsburg
More informationMultidimensional Data and Modelling
Multidimensional Data and Modelling 1 Problems of multidimensional data structures l multidimensional (md-data or spatial) data and their implementation of operations between objects (spatial data practically
More information2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA
2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance
More informationA PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D
3 rd International Symposium on Impact Engineering 98, 7-9 December 1998, Singapore A PARALLEL ALGORITHM FOR THE DEFORMATION AND INTERACTION OF STRUCTURES MODELED WITH LAGRANGE MESHES IN AUTODYN-3D M.
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationMassive Data Algorithmics
In the name of Allah Massive Data Algorithmics An Introduction Overview MADALGO SCALGO Basic Concepts The TerraFlow Project STREAM The TerraStream Project TPIE MADALGO- Introduction Center for MAssive
More informationPreliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-core Processor
Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-core Processor Liu Peng, Guangming Tan, Rajiv K. Kalia, Aiichiro Nakano, Priya Vashishta, Dongrui Fan and Ninghui
More informationTutorial: Application MPI Task Placement
Tutorial: Application MPI Task Placement Juan Galvez Nikhil Jain Palash Sharma PPL, University of Illinois at Urbana-Champaign Tutorial Outline Why Task Mapping on Blue Waters? When to do mapping? How
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationChap4: Spatial Storage and Indexing
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationL20: Putting it together: Tree Search (Ch. 6)!
Administrative L20: Putting it together: Tree Search (Ch. 6)! November 29, 2011! Next homework, CUDA, MPI (Ch. 3) and Apps (Ch. 6) - Goal is to prepare you for final - We ll discuss it in class on Thursday
More information10th August Part One: Introduction to Parallel Computing
Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer
More informationLeveraging Flash in HPC Systems
Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,
More informationSquid: Enabling search in DHT-based systems
J. Parallel Distrib. Comput. 68 (2008) 962 975 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Squid: Enabling search in DHT-based
More informationMeshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods
Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods Uei-Ren Chen 1, Chin-Chi Wu 2, and Woei Lin 3 1 Department of Electronic Engineering, Hsiuping Institute of Technology
More informationAnalysis of Basic Data Reordering Techniques
Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu
More informationAn Introduction to Spatial Databases
An Introduction to Spatial Databases R. H. Guting VLDB Journal v3, n4, October 1994 Speaker: Giovanni Conforti Outline: a rather old (but quite complete) survey on Spatial DBMS Introduction & definition
More informationCS 614 COMPUTER ARCHITECTURE II FALL 2005
CS 614 COMPUTER ARCHITECTURE II FALL 2005 DUE : November 23, 2005 HOMEWORK IV READ : i) Related portions of Chapters : 3, 10, 15, 17 and 18 of the Sima book and ii) Chapter 8 of the Hennessy book. ASSIGNMENT:
More information