Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications

Size: px
Start display at page:

Download "Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications"

Transcription

1 Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications L. Zhang, M. Parashar, E. Gallicchio, R. Levy TASSL & BIOMAPS Rutgers University ICPP 06, Columbus, OH, Aug. 16, 2006

2 Outline Introduction Problem Statement/Parallel Replica Exchange Salsa: Scalable Asynchronous Replica Exchange Experimental Evaluation Related Work Conclusion Future Work

3 Motivation Sequencing of the human genome and advances in structural genomics are resulting in an explosion in available high resolution protein structures Large-scale (parallel) molecular simulations of protein structural changes and drug binding to proteins can have significant impact Simulations depends on efficient algorithms for searching over the rough energy landscapes that govern protein folding and drug binding

4 The Replica Exchange Algorithm A powerful sampling algorithm that preserves canonical distributions Allows for efficient crossing of high energy barriers that separate thermodynamically stable states Can significantly reduce sampling times as compared to formulations based on constant temperatures Algorithm overview Several copies, or replicas, of the system are simulated in parallel at different temperatures using walkers Walkers occasionally swap temperatures to allow them to bypass enthalpic barriers by moving to a high temperature Exchanges governed by a probability condition to ensure detailed balance Replica exchange can significantly impact the fields of structural biology and drug design structure based drug design associated with protein misfolding, for example, structure based drug design and binding affinity optimization molecular basis of human diseases associated with protein misfolding

5 Parallel Replica Exchange General formulation requires dynamic and complex coordination and communication patterns between the walkers Pair-wise and asynchronous Dependent on the current state (temperature, energy) of replicas, which are changing dynamically Implementation based on commonly used parallel programming frameworks is challenging Message passing frameworks, e.g. MPI, require matching sends and receives to be explicitly defined for each interaction Implementations use restrictive formulations using synchronous exchanges Exchanges between neighboring temperatures only

6 Salsa: A Framework for Scalable Asynchronous Replica Exchange Provides an abtraction of a semantically specialized virtual shared space Scalable communication and interaction substrate based on the tuple-space model Supports code coupling, parallel data redistribution, multiblock coupling, asynchronous and decoupled interactions

7 Salsa Overview Architecture Directory Layer Presents the shared temperature space abstraction Provides a rendezvous substrate for exchanges Communication Layer Supports high-throughput, lowlatency p2p communications Implementation C library using multithreading Salsa daemon on each processors Customized communication layer using sockets Complements other programming systems MPI, PVM, OpenMP

8 Salsa Programming Interface Operation init(gbl-temp-range) Description Initialize the Seine-Salsa shared space post(exch-temp-lower-bound, Post a temperature range of exchange interest to the shared exch-temp-upper-bound) space get(?temp, engy) Get the exchanged temperature from the space. This is a blocking call and the calling process blocks until a matching temperature is available. The retrieved temperature is removed from the space. getp(?temp, engy) Get the exchanged temperature from the space. This is a non-blocking call and the calling process continues if no matching temperature is available. The retrieved temperature is removed from the space.

9 Salsa-based Replica Exchange Integrated within the IMPACT molecular mechanics program Binding of ligands to the cytochrome P450 class of enzymes responsible for cellular detoxification and drug metabolism Misfolding of naturally occurring and mutated form of protein synuclein associated with Parkinsons disease. Scalably support general (non nearest-neighbor) temperature exchanges ensuring proper mixing of temperatures across the walkers Psuedo-code Post temperature range of interest Negotiate exchange Perform exchange if (seineinitflag.eq. 0) then call init_salsa(global_temperature_lowbound, global_temperature_upperbound) seineinitflag = 1 timestamp = 0 else timestamp = timestamp + 1 endif if (timestamp.eq. (timestamp/exchange_rate)*exchage_rate) then call post(tempt(nspec+1) - GUESSRANGE, tempt(nspec+1) + GUESSRANGE) endif call getp(newtemp, epot, accepted)

10 Salsa Operation Walkers post temperature range of interest to the Salsa shared space using post A request is routed to all Salsa daemons whose index ranges overlap with the posted range On receiving a remote post request, the daemon first checks its storage for potential exchange partners If a candidate exists (say walker2), the requesting walker (say walker1) is notified Otherwise, the incoming request is stored

11 Salsa Operation Walkers negotiate exchange Walkers selects an exchange partner from one or more potential candidates Partners must mutually agree to exchange data using a two-way handshake protocol A walker can be in one of three states free -- the walker is available for an exchange only if it is in the free state. onhold -- the walkers has already agreed to exchange with another walker but exchange has not yet occurred. finished -- the walker has already finished an exchange with another walker and its posted interest to exchange is no longer valid.

12 Salsa Operation Walkers negotiate exchange handshake (contd.) Walker1 contact walker2 with desire to exchange Walker2 checks its local state ( free, onhold, or finished ) If walker2 is free it will respond positively to walker1 The two walkers confirm their intent to exchange data with each other and change their state to onhold If walker2 responds negatively, walker1 attempts to negotiate with the next walker in its list of candidates If the walker cannot find an exchange partner in its list of candidates, it just gives up and continues simulation with its current data until the next exchange cycle.

13 Salsa Operation Perform exchange using the getp operator Walker1 sends its current data (e.g. temperature and energy) to its potential partner (i.e. walker2) Walker2 determines whether they can exchange based on data it receives and its own data This step is necessary since exchanges occur asynchronously and in parallel with the computation, and a walkers data (i.e., energy) may have changed since it posted its exchange interest If walker2 decides to continue with the exchange, it will notify walker1 send its current local data to complete the exchange Exchanges are between a pair of walkers and multiple exchanges between different pairs of walkers can proceed in parallel.

14 Example: Salsa Operation

15 Experimental Evaluation Platform: Linux cluster (Intel Pentium 1.7 Ghz, 512 MB RAM, Linux , 100 Mbps interconnect) Simulations: Alanine Tripeptide Molecule using Hybrid Monte Carlo Temperatures exponentially disturbed within the range K 10 ns simulation time, 250,000 HMC cycles, each consisting of 10 4 fs Exchanges attempted every 25 steps Experiments: Salsa v/s MPI-based replica exchange Number of crosswalks, simulation time Effect of exchange temperature range

16 Experimental Evaluation: Number of Cross Walks A cross-walk is the event that a walker originally within the low temperature range reaches the upper temperature range (e.g. 650 K K) and then returns to the lower temperature range The rate of temperature equilibration is measured by the number of cross-walks (at equilibrium each walker visits each temperature with equal probability)

17 Experimental Evaluation: Simulation Time (a) Average wall-clock execution time and standard deviation with increasing number of processes (walkers); (b) Normalized execution time with increasing number of processes (walkers).

18 Experimental Evaluation: Effect of Posted Temperature Range The temperature range posted by a walker must be chosen to optimize simulation time, number of crosswalks, and convergence

19 Related Work Basic message-passing (MPI, PVM) based implementations use a simplified formulation of the algorithm Exchanges occur only between replicas with adjacent temperatures limits effectiveness Exchanges occur in a centralized and totally synchronous manner limits scalability Folding@HOME (Stanford U.) used a multiplexed replica exchange algorithm Uses multiplexed-replicas with a number of independent molecular dynamics runs at each temperature Attempts exchanges of configurations between these multiplexed-replicas Efficiency improves as there are a larger number of potential exchange partners available Salsa, to the best of our knowledge, is the first to address the decentralized and asynchronous parallel implementation of replica exchange Improves simulation efficiency and scalability by eliminating the limitation of nearest neighbor exchanges and enabling parallel decoupled and decentralized exchanges Can support large numbers of replicas and heterogeneous and loosely coupled pool of processors

20 Conclusion Salsa provides a semantically specialized virtual shared space abstraction to support scalable asynchronous replica exchange for molecular dynamics applications Enables general non-nearest neighboring temperature exchanges Exchanges are decoupled and asynchronously and dynamically determined Communications are decentralized and peer-to-peer, and occur in parallel Salsa is implemented as part of the IMPACT molecular mechanics package Effectiveness, performance and scalability of Salsa is experimentally demonstrated

21 Future Work The overall goal of the project is to enable largescale Grid-based parallel and distributed molecular simulations of protein structural changes and drug binding to proteins. Specific tasks include Implementing a prototype interaction and coordination framework, based on Salsa, for wide-area distributed replica exchange simulations Developing, deploying and evaluating the Grid-based Impact implementation Using the grid-based Impact implementation to provide scientific insights

Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications

Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications Li Zhang and Manish Parashar TASSL, CAIP Center Electrical and Computer Engineering Department Rutgers University

More information

Opportunistic Application Flows in Sensor-based Pervasive Environments

Opportunistic Application Flows in Sensor-based Pervasive Environments Opportunistic Application Flows in Sensor-based Pervasive Environments Nanyan Jiang, Cristina Schmidt, Vincent Matossian, and Manish Parashar ICPS 2004 1 Outline Introduction to pervasive sensor-based

More information

High Speed Asynchronous Data Transfers on the Cray XT3

High Speed Asynchronous Data Transfers on the Cray XT3 High Speed Asynchronous Data Transfers on the Cray XT3 Ciprian Docan, Manish Parashar and Scott Klasky The Applied Software System Laboratory Rutgers, The State University of New Jersey CUG 2007, Seattle,

More information

A Decentralized Content-based Aggregation Service for Pervasive Environments

A Decentralized Content-based Aggregation Service for Pervasive Environments A Decentralized Content-based Aggregation Service for Pervasive Environments Nanyan Jiang, Cristina Schmidt, Manish Parashar The Applied Software Systems Laboratory Rutgers, The State University of New

More information

IOS: A Middleware for Decentralized Distributed Computing

IOS: A Middleware for Decentralized Distributed Computing IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc

More information

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

Chapter 1: Introduction to Parallel Computing

Chapter 1: Introduction to Parallel Computing Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky

More information

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational

More information

Introduction to parallel Computing

Introduction to parallel Computing Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts

More information

Outline A Hierarchical P2P Architecture and an Efficient Flooding Algorithm

Outline A Hierarchical P2P Architecture and an Efficient Flooding Algorithm University of British Columbia Cpsc 527 Advanced Computer Communications Lecture 9b Hierarchical P2P Architecture and Efficient Multicasting (Juan Li s MSc Thesis) Instructor: Dr. Son Vuong The World Connected

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align

More information

Using RDMA for Lock Management

Using RDMA for Lock Management Using RDMA for Lock Management Yeounoh Chung Erfan Zamanian {yeounoh, erfanz}@cs.brown.edu Supervised by: John Meehan Stan Zdonik {john, sbz}@cs.brown.edu Abstract arxiv:1507.03274v2 [cs.dc] 20 Jul 2015

More information

MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand

MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The

More information

PROCESSES AND THREADS

PROCESSES AND THREADS PROCESSES AND THREADS A process is a heavyweight flow that can execute concurrently with other processes. A thread is a lightweight flow that can execute concurrently with other threads within the same

More information

Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters

Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters G.Santhanaraman, T. Gangadharappa, S.Narravula, A.Mamidala and D.K.Panda Presented by:

More information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?

More information

The MOSIX Scalable Cluster Computing for Linux. mosix.org

The MOSIX Scalable Cluster Computing for Linux.  mosix.org The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

Chapter 1: Introduction 1/29

Chapter 1: Introduction 1/29 Chapter 1: Introduction 1/29 What is a Distributed System? A distributed system is a collection of independent computers that appears to its users as a single coherent system. 2/29 Characteristics of a

More information

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati Distributed Algorithms Partha Sarathi Mandal Department of Mathematics IIT Guwahati Thanks to Dr. Sukumar Ghosh for the slides Distributed Algorithms Distributed algorithms for various graph theoretic

More information

Opportunistic Application Flows in Sensor-based Pervasive Environments

Opportunistic Application Flows in Sensor-based Pervasive Environments Opportunistic Application Flows in Sensor-based Pervasive Environments N. Jiang, C. Schmidt, V. Matossian, and M. Parashar WINLAB/TASSL ECE, Rutgers University http://www.caip.rutgers.edu/tassl Presented

More information

Introduction to Cluster Computing

Introduction to Cluster Computing Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS F. Campeotto 1,2 A. Dal Palù 3 A. Dovier 2 F. Fioretto 1 E. Pontelli 1 1. Dept. Computer Science, NMSU 2. Dept.

More information

Accelerating Markov Random Field Inference Using Molecular Optical Gibbs Sampling Units

Accelerating Markov Random Field Inference Using Molecular Optical Gibbs Sampling Units Accelerating Markov Random Field Inference Using Molecular Optical Gibbs Sampling Units Siyang Wang, Xiangyu Zhang, Yuxuan Li, Ramin Bashizade, Song Yang, Chris Dwyer, Alvin Lebeck Duke University Probabilistic

More information

Replacement Policy: Which block to replace from the set?

Replacement Policy: Which block to replace from the set? Replacement Policy: Which block to replace from the set? Direct mapped: no choice Associative: evict least recently used (LRU) difficult/costly with increasing associativity Alternative: random replacement

More information

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Research on the Implementation of MPI on Multicore Architectures

Research on the Implementation of MPI on Multicore Architectures Research on the Implementation of MPI on Multicore Architectures Pengqi Cheng Department of Computer Science & Technology, Tshinghua University, Beijing, China chengpq@gmail.com Yan Gu Department of Computer

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

MVAPICH2 vs. OpenMPI for a Clustering Algorithm

MVAPICH2 vs. OpenMPI for a Clustering Algorithm MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland, Baltimore

More information

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity

More information

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation Metrics, Simulation, and Workloads

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation Metrics, Simulation, and Workloads Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation Metrics, Simulation, and Workloads Copyright 2010 Daniel J. Sorin Duke University Outline Metrics Methodologies Modeling Simulation

More information

High Performance MPI on IBM 12x InfiniBand Architecture

High Performance MPI on IBM 12x InfiniBand Architecture High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks

More information

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University

More information

Experiments with Wide Area Data Coupling Using the Seine Coupling Framework

Experiments with Wide Area Data Coupling Using the Seine Coupling Framework Experiments with Wide Area Data Coupling Using the Seine Coupling Framework Li Zhang 1, Manish Parashar 1, and Scott Klasky 2 1 TASSL, Rutgers University, 94 Brett Rd. Piscataway, NJ 08854, USA 2 Oak Ridge

More information

Communication Models for Resource Constrained Hierarchical Ethernet Networks

Communication Models for Resource Constrained Hierarchical Ethernet Networks Communication Models for Resource Constrained Hierarchical Ethernet Networks Speaker: Konstantinos Katrinis # Jun Zhu +, Alexey Lastovetsky *, Shoukat Ali #, Rolf Riesen # + Technical University of Eindhoven,

More information

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing Solving Traveling Salesman Problem Using Parallel Genetic Algorithm and Simulated Annealing Fan Yang May 18, 2010 Abstract The traveling salesman problem (TSP) is to find a tour of a given number of cities

More information

Chapter 18 Distributed Systems and Web Services

Chapter 18 Distributed Systems and Web Services Chapter 18 Distributed Systems and Web Services Outline 18.1 Introduction 18.2 Distributed File Systems 18.2.1 Distributed File System Concepts 18.2.2 Network File System (NFS) 18.2.3 Andrew File System

More information

Best Practices for Setting BIOS Parameters for Performance

Best Practices for Setting BIOS Parameters for Performance White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

Scalable and Fault Tolerant Failure Detection and Consensus

Scalable and Fault Tolerant Failure Detection and Consensus EuroMPI'15, Bordeaux, France, September 21-23, 2015 Scalable and Fault Tolerant Failure Detection and Consensus Amogh Katti, Giuseppe Di Fatta, University of Reading, UK Thomas Naughton, Christian Engelmann

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

OFA Developer Workshop 2013

OFA Developer Workshop 2013 OFA Developer Workshop 2013 Shared Memory Communications over RDMA (-R) Jerry Stevens IBM sjerry@us.ibm.com Trademarks, copyrights and disclaimers IBM, the IBM logo, and ibm.com are trademarks or registered

More information

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science.   CS677: Distributed OS Distributed Operating Systems Fall 2009 Prashant Shenoy UMass http://lass.cs.umass.edu/~shenoy/courses/677 1 Course Syllabus CMPSCI 677: Distributed Operating Systems Instructor: Prashant Shenoy Email:

More information

Tag Switching. Background. Tag-Switching Architecture. Forwarding Component CHAPTER

Tag Switching. Background. Tag-Switching Architecture. Forwarding Component CHAPTER CHAPTER 23 Tag Switching Background Rapid changes in the type (and quantity) of traffic handled by the Internet and the explosion in the number of Internet users is putting an unprecedented strain on the

More information

Shared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP

Shared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor

More information

Distributed Systems - I

Distributed Systems - I CSE 421/521 - Operating Systems Fall 2011 Lecture - XXIII Distributed Systems - I Tevfik Koşar University at Buffalo November 22 nd, 2011 1 Motivation Distributed system is collection of loosely coupled

More information

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &

More information

Introduction Distributed Systems

Introduction Distributed Systems Introduction Distributed Systems Today Welcome Distributed systems definition, goals and challenges What is a distributed system? Very broad definition Collection of components, located at networked computers,

More information

XpressSpace: a programming framework for coupling partitioned global address space simulation codes

XpressSpace: a programming framework for coupling partitioned global address space simulation codes CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 214; 26:644 661 Published online 17 April 213 in Wiley Online Library (wileyonlinelibrary.com)..325 XpressSpace:

More information

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science. Distributed Operating Systems Spring 2008 Prashant Shenoy UMass Computer Science http://lass.cs.umass.edu/~shenoy/courses/677 Lecture 1, page 1 Course Syllabus CMPSCI 677: Distributed Operating Systems

More information

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

GPU Implementation of Implicit Runge-Kutta Methods

GPU Implementation of Implicit Runge-Kutta Methods GPU Implementation of Implicit Runge-Kutta Methods Navchetan Awasthi, Abhijith J Supercomputer Education and Research Centre Indian Institute of Science, Bangalore, India navchetanawasthi@gmail.com, abhijith31792@gmail.com

More information

Parallel VS Distributed

Parallel VS Distributed Parallel VS Distributed The distributed systems tend to be multicomputers whose nodes made of processor plus its private memory whereas parallel computer refers to a shared memory multiprocessor. In Parallel

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Protein Design in the 2D HP Model

Protein Design in the 2D HP Model rotein Design in the 2D Model A Monte-Carlo Iterative Design Approach Reza Lotun and Camilo Rostoker {rlotun,rostokec}@cs.ubc.ca Department of Computer Science, UBC 1 resentation Outline 1. Review of proteins

More information

Parallel Motif Search Using ParSeq

Parallel Motif Search Using ParSeq Parallel Motif Search Using ParSeq Jun Qin 1, Simon Pinkenburg 2 and Wolfgang Rosenstiel 2 1 Distributed and Parallel Systems Group University of Innsbruck Innsbruck, Austria 2 Department of Computer Engineering

More information

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Lifan Xu, Michela Taufer, Stuart Collins, Dionisios G. Vlachos Global Computing Lab University of Delaware Multiscale Modeling:

More information

Designing High Performance DSM Systems using InfiniBand Features

Designing High Performance DSM Systems using InfiniBand Features Designing High Performance DSM Systems using InfiniBand Features Ranjit Noronha and Dhabaleswar K. Panda The Ohio State University NBC Outline Introduction Motivation Design and Implementation Results

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and

More information

ARCHITECTURE SPECIFIC COMMUNICATION OPTIMIZATIONS FOR STRUCTURED ADAPTIVE MESH-REFINEMENT APPLICATIONS

ARCHITECTURE SPECIFIC COMMUNICATION OPTIMIZATIONS FOR STRUCTURED ADAPTIVE MESH-REFINEMENT APPLICATIONS ARCHITECTURE SPECIFIC COMMUNICATION OPTIMIZATIONS FOR STRUCTURED ADAPTIVE MESH-REFINEMENT APPLICATIONS BY TAHER SAIF A thesis submitted to the Graduate School New Brunswick Rutgers, The State University

More information

Application of Support Vector Machine In Bioinformatics

Application of Support Vector Machine In Bioinformatics Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

Using MPI One-sided Communication to Accelerate Bioinformatics Applications

Using MPI One-sided Communication to Accelerate Bioinformatics Applications Using MPI One-sided Communication to Accelerate Bioinformatics Applications Hao Wang (hwang121@vt.edu) Department of Computer Science, Virginia Tech Next-Generation Sequencing (NGS) Data Analysis NGS Data

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Introduction to Distributed Systems

Introduction to Distributed Systems Introduction to Distributed Systems Minsoo Ryu Department of Computer Science and Engineering 2 Definition A distributed system is a collection of independent computers that appears to its users as a single

More information

Structure of Social Networks

Structure of Social Networks Structure of Social Networks Outline Structure of social networks Applications of structural analysis Social *networks* Twitter Facebook Linked-in IMs Email Real life Address books... Who Twitter #numbers

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

A Generic Distributed Architecture for Business Computations. Application to Financial Risk Analysis.

A Generic Distributed Architecture for Business Computations. Application to Financial Risk Analysis. A Generic Distributed Architecture for Business Computations. Application to Financial Risk Analysis. Arnaud Defrance, Stéphane Vialle, Morgann Wauquier Firstname.Lastname@supelec.fr Supelec, 2 rue Edouard

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

Parallel Combinatorial BLAS and Applications in Graph Computations

Parallel Combinatorial BLAS and Applications in Graph Computations Parallel Combinatorial BLAS and Applications in Graph Computations Aydın Buluç John R. Gilbert University of California, Santa Barbara SIAM ANNUAL MEETING 2009 July 8, 2009 1 Primitives for Graph Computations

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Paolo Bellavista Veronica Conti Carlo Giannelli Jukka Honkola

Paolo Bellavista Veronica Conti Carlo Giannelli Jukka Honkola The Smart-M3 Semantic Information Broker (SIB) Plug-in Extension: Implementation and Evaluation Experiences Paolo Bellavista Veronica Conti Carlo Giannelli Jukka Honkola 20.11.2012 - SN4MS'12 DISI, Università

More information

Objective. A Finite State Machine Approach to Cluster Identification Using the Hoshen-Kopelman Algorithm. Hoshen-Kopelman Algorithm

Objective. A Finite State Machine Approach to Cluster Identification Using the Hoshen-Kopelman Algorithm. Hoshen-Kopelman Algorithm Objective A Finite State Machine Approach to Cluster Identification Using the Cluster Identification Want to find and identify homogeneous patches in a D matrix, where: Cluster membership defined by adjacency

More information

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming

More information

Parallelizing a Monte Carlo simulation of the Ising model in 3D

Parallelizing a Monte Carlo simulation of the Ising model in 3D Parallelizing a Monte Carlo simulation of the Ising model in 3D Morten Diesen, Erik Waltersson 2nd November 24 Contents 1 Introduction 2 2 Description of the Physical Model 2 3 Programs 3 3.1 Outline of

More information

Design and Performance Evaluation of Networked Storage Architectures

Design and Performance Evaluation of Networked Storage Architectures Design and Performance Evaluation of Networked Storage Architectures Xubin He (Hexb@ele.uri.edu) July 25,2002 Dept. of Electrical and Computer Engineering University of Rhode Island Outline Introduction

More information

Accelerating molecular docking on multi- and manycore computer architectures

Accelerating molecular docking on multi- and manycore computer architectures Accelerating molecular docking on multi- and manycore computer architectures Simon McIntosh-Smith University of Bristol, UK simonm@cs.bris.ac.uk 1 ! Power-limited regimes Processor power consumption now

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

UCLA UCLA Previously Published Works

UCLA UCLA Previously Published Works UCLA UCLA Previously Published Works Title Parallel Markov chain Monte Carlo simulations Permalink https://escholarship.org/uc/item/4vh518kv Authors Ren, Ruichao Orkoulas, G. Publication Date 2007-06-01

More information

Parallel Performance Studies for a Clustering Algorithm

Parallel Performance Studies for a Clustering Algorithm Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Advanced Distributed Systems

Advanced Distributed Systems Course Plan and Department of Computer Science Indian Institute of Technology New Delhi, India Outline Plan 1 Plan 2 3 Message-Oriented Lectures - I Plan Lecture Topic 1 and Structure 2 Client Server,

More information

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K. In the multi-core age, How do larger, faster and cheaper and more responsive sub-systems affect data management? Panel at ADMS 211 Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory Department

More information

LS-DYNA Scalability Analysis on Cray Supercomputers

LS-DYNA Scalability Analysis on Cray Supercomputers 13 th International LS-DYNA Users Conference Session: Computing Technology LS-DYNA Scalability Analysis on Cray Supercomputers Ting-Ting Zhu Cray Inc. Jason Wang LSTC Abstract For the automotive industry,

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle

Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle Plamenka Borovska Abstract: The paper investigates the efficiency of parallel branch-and-bound search on multicomputer cluster for the

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Distributed Information Processing

Distributed Information Processing Distributed Information Processing 1 st Lecture Eom, Hyeonsang ( 엄현상 ) Department of Computer Science & Engineering Seoul National University Copyrights 2017 Eom, Hyeonsang All Rights Reserved Outline

More information

Integrity in Distributed Databases

Integrity in Distributed Databases Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

More information