Progress Towards Petascale Virtual Machines
|
|
- Helen Briggs
- 5 years ago
- Views:
Transcription
1 Progress Towards Petascale Virtual Machines Al Geist Oak Ridge National Laboratory EuroPVM-MPI 2003 Venice, Italy September 30, 2003
2 Petascale Virtual Machine Another kind of PVM This talk will describe: DOE Genomes to Life Project PVM use today in the Genomics Integrated Supercomputer Toolkit for fault tolerance, and high availability in a dynamic environment Harness Project (next generation of PVM) and its features to help scale to Petascale systems Distributed peer-to-peer control H2O the self adapting core of Harness FTMPI fault tolerant MPI Latest superscalable algorithms with natural fault tolerance for petascale environments.
3 DOE Genomes to Life Program Understanding the Essential Processes of Living Systems Follow-on to Human Genome Program Determined the entire DNA sequence for humans 24 chromosomes in 6 ft of DNA 3 billion nucleotides code for 35,000 genes Only % difference between people. Instructions to build a human fits on a DVD (3GB) Genomes to Life Program goal is to read the Instructions starting with simple single cell organisms - microbes Molecular Machines Regulatory Pathways Multi-cell Communities $100M effort Develop new computational methods to understand complex biological systems PVM
4 Molecular Machines Fill Cells Many interlinked proteins form interacting machines From The Machinery of Life, David S. Goodsell, Springer-Verlag, New York,
5 Regulatory Networks Control the Machines Gene regulation controls what genes are expressed - And - Proteome changes over time and due to environmental conditions
6 GTL will Require Petascale Systems Cell-based community simulation Protein machine Interactions 1000 TF 100 TF Cell, pathway, and network simulation Molecular machine classical simulation Molecule-based cell simulation 10 TF Current U.S. Computing 1 TF* Constrained rigid docking Genome-scale protein threading Community metabolic regulatory, signaling simulations Constraint-Based Flexible Docking Comparative Genomics *Teraflops Biological Complexity
7 Biology for the21st Century GTL is going to rely on high-performance computing and data analysis to process high-throughput experimental data The new computational biology environments will be conceptually integrated knowledge enabling environments that couple diverse sets of distributed data, advanced informatics methods, experiments, modeling, and simulation. simulation genomes protein structure pathways Data analysis experiment regulatory elements Raw data models modeling 1 ERK?? 3 EGFR Src Eps15 AP-2 Early Endosomes 2 erbb-2 erbb-2 PLCγ Grb-2 4 Golgi Shc Cbl eps8 Late Endosomes Lysosomes?? 5 Annexin II? 6
8 Genome Integrated Supercomputer Toolkit GIST is a framework for large-scale biological application deployment provides a transparent and high-performance interface to biological applications provides transparent access to distributed data sets utilizes PVM to launch and manage jobs across a wide diversity of supercomputers highly fault tolerant and adapts to dynamic changes in the environment using PVM next step deploy across ORNL, ANL, PNNL, SNL as a multi-site Bio-Grid thousand of users for execution of genome analysis and simulation. XML Web portal PVM across Heterogeneous Supercomputers pathways genomes Raw data Protein analysis engine XML P4 Cluster 64 proc Cray X1 256 proc IBM p proc SGI Altix 256 proc
9 The GIST Developers really want Harness They ask us regularly about the next generation of PVM called Harness because they want the increased adaptability and fault tolerance that Harness promises. Harness is being developed by the same team that developed PVM: Vaidy Sunderam Emory University Al Geist Oak Ridge National Lab Jack Dongarra University of Tennessee and ORNL
10 Harness II Design Goals Harness is a distributed virtual machine environment that goes beyond the features of PVM: Allow users to dynamically customize, adapt, and extend a virtual machine's features to more closely match the needs of their application to optimize the virtual machine for the underlying computer resources. Is being designed to scale to petascale virtual machines distributed control minimized global state no single point of failure Allows multiple virtual machines to join and split in temporary micro-grids
11 HARNESS II Architecture Daemon built on top of H2O kernel with DVM pluglet loaded Host A Merge/split with other VMs Host D Host B Virtual Machine Another VM Host C Operation within VM uses Distributed Control Component based daemon DVM FT-MPI Processes control user features HARNESS daemon Customization and extension by dynamically adding pluglets
12 Symmetric Peer-to-Peer Distributed Control Characteristics No single point (or set of points) of failure for Harness. It survives as long as one member still lives. All members know the state of the virtual machine, and their knowledge is kept consistent w.r.t. the order of changes of state. (Important parallel programming requirement!) No member is more important than any other (at any instant) i.e. here isn t a pass-around control token For Petascale Systems the control members can be a distributed subset of all the processors in the system
13 Harness Distributed Control Control is Asynchronous and Parallel add host Fast host delete or recovery from fault Supports fast host adding Parallel recovery from multiple host failures Supports multiple simultaneous updates
14 HARNESS: Petascale Virtual Machine Variable Distributed Control Loop Size Virtual machine Size of the Control Loop 1 <= S <= (size of VM) For small VM and ultimate fault tolerance S = (size of VM) For large VM a random selection of a few hosts (f.e. S = 10) gives a balance of multi-point failure and performance. For S = 1, distributed control becomes simple client/server model.
15 H2O kernel - Overview H2O is multithreaded lightweight kernel that is dynamically configured by loading pluglets Resources provided as services through pluglets. Services may be deployed by any authorized party: provider, client, or third-party reseller H2O is stateless and resources independent Functional interfaces [Suspendible] Clients Pluglet Pluglet Kernel In Harness the DVM service, which includes distributed control of services, must be installed on host Pluglets can provide Multiple programming models Java and C implementations being developed FT-MPI Java RMI OGSA PVM Active objects P2P Programming models
16 H2O kernel RMIX Communication H2O is built on top of a flexible P2P communication layer called RMIX Provides interoperability between kernels and other web services Adopts common RMI semantics Designed for easy porting between protocols Dynamic protocol negotiation Scalable P2P design Java Web Services RPC clients H2O kernel H2O kernel... SOAP clients A C B D E F RMIX Networking RPC, IIOP, JRMP, SOAP, RMIX Networking
17 H2O can support a wide range of distributed computing models Flexibility beyond the PVM/MPI model Grid Web portal Like Genome Channel Biology workbench Web service Internet Computing Like SETI at HOME Entropia, United Devices Cluster computing Like PVM Harness LAM/MPI Registration and Discovery , UDDI JNDI LDAP DNS GIS phone, Publish Find Deploy Provider A Client Provider A native code B Deploy Client Provider Provider... A A B B Deploy Client Legacy App Repository Repository Reseller A C B Developer A C B
18 Harness Fault Tolerant MPI Plug-in FT-MPI built in layers with tuned collectives, tuned derived data type handling and good point2point bandwidth. MPI application libftmpi Startup plugin H2O MPI application libftmpi Startup plugin H2O Name Service Ftmpi_notifier Works with MPE profiling and tools such as JUMPSHOT from ANL. Application performance on par with MPICH-2. FTMPI available SC2003
19 Harness Fault Tolerant MPI Plug-in FT-MPI is a system level Fault Tolerant full MPI 1.2 implementation. Process failures are detected & passed back to the users application using MPI objects. The users application decides how best to reconfigure the system and continue. Recovery Options for affected communicators: ABORT: just do as other implementations i.e.checkpoint restart BLANK: leave hole SHRINK: re-order processes to make a contiguous communicator REBUILD: re-spawn lost processes and add them to MPI_COMM_WORLD Communicator Options X X 5 6 X
20 Large-scale Fault Tolerance Taking fault tolerance beyond checkpoint/restart. Developing fault tolerant algorithms is not trivial. Anything beyond simple checkpoint/restart is beyond most scientists. Many recovery issues must be addressed Doing a restart of 90,000 tasks because of the failure of 1 task, may be very inefficient use of resources. When and what are the recovery options for large-scale simulations?
21 Fault Tolerance a petascale perspective Future systems are being designed with 100,000 processors. The time before some failure will be measured in minutes. Checkpointing and restarting this large a system could take longer than the time to the next failure! Autonomic? Self-healing? What to do? Development of algorithms that can be naturally fault tolerant I.e. failure anywhere can be ignored? And still get the right answer. No monitoring No notification No recovery Is this possible? YES!
22 Progress on Super-scalar algorithms Demonstrated that the scale invariance and natural fault tolerance can exist for local and global algorithms Finite Difference (Christian Engelman) Demonstrated natural fault tolerance w/ chaotic relaxation, meshless, finite difference solution of Laplace and Poisson problems Global information (Kasidit Chancio) Demonstrated natural fault tolerance in global max problem w/random, directed graphs Gridless Multigrid (Ryan Adams) Combines the fast convergence of multigrid with the natural fault tolerance property. Hierarchical implementation of finite difference above. Three different asynchronous updates explored local global
23 Further Information Genomes to Life Harness Naturally Fault tolerant Algoritnms Questions?
From Parallel Virtual Machine to Virtual Parallel Machine: The Unibus System
From Parallel Virtual Machine to Virtual Parallel Machine: The Unibus System Vaidy Sunderam Emory University, Atlanta, USA vss@emory.edu Credits and Acknowledgements Distributed Computing Laboratory, Emory
More informationSemantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI
Semantic and State: Fault Tolerant Application Design for a Fault Tolerant MPI and Graham E. Fagg George Bosilca, Thara Angskun, Chen Zinzhong, Jelena Pjesivac-Grbovic, and Jack J. Dongarra
More informationMPI History. MPI versions MPI-2 MPICH2
MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention
More informationCycle Sharing Systems
Cycle Sharing Systems Jagadeesh Dyaberi Dependable Computing Systems Lab Purdue University 10/31/2005 1 Introduction Design of Program Security Communication Architecture Implementation Conclusion Outline
More informationHARNESS. provides multi-level hot pluggability. virtual machines. split off mobile agents. merge multiple collaborating sites.
HARNESS: Heterogeneous Adaptable Recongurable NEtworked SystemS Jack Dongarra { Oak Ridge National Laboratory and University of Tennessee, Knoxville Al Geist { Oak Ridge National Laboratory James Arthur
More informationMPI versions. MPI History
MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention
More informationOutline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems
Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears
More informationFault tolerance in Grid and Grid 5000
Fault tolerance in Grid and Grid 5000 Franck Cappello INRIA Director of Grid 5000 fci@lri.fr Fault tolerance in Grid Grid 5000 Applications requiring Fault tolerance in Grid Domains (grid applications
More informationGrid Programming Models: Current Tools, Issues and Directions. Computer Systems Research Department The Aerospace Corporation, P.O.
Grid Programming Models: Current Tools, Issues and Directions Craig Lee Computer Systems Research Department The Aerospace Corporation, P.O. Box 92957 El Segundo, CA USA lee@aero.org Domenico Talia DEIS
More informationProactive Process-Level Live Migration in HPC Environments
Proactive Process-Level Live Migration in HPC Environments Chao Wang, Frank Mueller North Carolina State University Christian Engelmann, Stephen L. Scott Oak Ridge National Laboratory SC 08 Nov. 20 Austin,
More informationAggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments
Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Swen Böhm 1,2, Christian Engelmann 2, and Stephen L. Scott 2 1 Department of Computer
More informationIOS: A Middleware for Decentralized Distributed Computing
IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationKnowledge Discovery Services and Tools on Grids
Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid
More informationIntroduction to Cluster Computing
Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationChapter 18 Distributed Systems and Web Services
Chapter 18 Distributed Systems and Web Services Outline 18.1 Introduction 18.2 Distributed File Systems 18.2.1 Distributed File System Concepts 18.2.2 Network File System (NFS) 18.2.3 Andrew File System
More informationScalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems
fastos.org/molar Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems Jyothish Varma 1, Chao Wang 1, Frank Mueller 1, Christian Engelmann, Stephen L. Scott 1 North Carolina State University,
More informationCAS 703 Software Design
Dr. Ridha Khedri Department of Computing and Software, McMaster University Canada L8S 4L7, Hamilton, Ontario Acknowledgments: Material based on Software by Tao et al. (Chapters 9 and 10) (SOA) 1 Interaction
More information3C05 - Advanced Software Engineering Thursday, April 29, 2004
Distributed Software Architecture Using Middleware Avtar Raikmo Overview Middleware What is middleware? Why do we need middleware? Types of middleware Distributed Software Architecture Business Object
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationInteroperable and Transparent Dynamic Deployment of Web Services for Service Oriented Grids
Interoperable and Transparent Dynamic Deployment of Web s for Oriented Grids Michael Messig and Andrzej Goscinski School of Engineering and Information Technology Deakin University Pigdons Road, Geelong
More informationAn agent-based peer-to-peer grid computing architecture
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 An agent-based peer-to-peer grid computing architecture J. Tang University
More informationDiscovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London
Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationSELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS
SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS Thara Angskun, Graham Fagg, George Bosilca, Jelena Pješivac Grbović, and Jack Dongarra,2,3 University of Tennessee, 2 Oak Ridge National
More informationIntroduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill
Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center
More informationSalsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications
Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications L. Zhang, M. Parashar, E. Gallicchio, R. Levy TASSL & BIOMAPS Rutgers University ICPP 06, Columbus, OH, Aug. 16,
More informationParallel Programming Environments. Presented By: Anand Saoji Yogesh Patel
Parallel Programming Environments Presented By: Anand Saoji Yogesh Patel Outline Introduction How? Parallel Architectures Parallel Programming Models Conclusion References Introduction Recent advancements
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationConcepts for High Availability in Scientific High-End Computing
Concepts for High Availability in Scientific High-End Computing C. Engelmann 1,2 and S. L. Scott 1 1 Computer Science and Mathematics Division Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA 2
More informationGrid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen
Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an
More informationGrid Computing with Voyager
Grid Computing with Voyager By Saikumar Dubugunta Recursion Software, Inc. September 28, 2005 TABLE OF CONTENTS Introduction... 1 Using Voyager for Grid Computing... 2 Voyager Core Components... 3 Code
More informationEarly Operational Experience with the Cray X1 at the Oak Ridge National Laboratory Center for Computational Sciences
Early Operational Experience with the Cray X1 at the Oak Ridge National Laboratory Center for Computational Sciences Buddy Bland, Richard Alexander Steven Carter, Kenneth Matney, Sr. Cray User s Group
More informationPeer to Peer Computing
Peer to Peer Computing These slides are based on the slides made available by the authors of Computer Networking: A Top Down Approach Featuring the Internet, 2 nd edition. Jim Kurose, Keith Ross Addison-Wesley,
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationTechnical Comparison between several representative checkpoint/rollback solutions for MPI programs
Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Yuan Tang Innovative Computing Laboratory Department of Computer Science University of Tennessee Knoxville,
More informationDynamic Reconguration and Virtual Machine. System. Abstract. Metacomputing frameworks have received renewed attention
Dynamic Reconguration and Virtual Machine Management in the Harness Metacomputing System Mauro Migliardi 1, Jack Dongarra 23, l Geist 2, and Vaidy Sunderam 1 1 Emory University, Dept. Of Math and omputer
More informationUnderstanding StoRM: from introduction to internals
Understanding StoRM: from introduction to internals 13 November 2007 Outline Storage Resource Manager The StoRM service StoRM components and internals Deployment configuration Authorization and ACLs Conclusions.
More informationDistributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne
Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel
More informationAlgorithm-Based Fault Tolerance. for Fail-Stop Failures
Algorithm-Based Fault Tolerance 1 for Fail-Stop Failures Zizhong Chen and Jack Dongarra Abstract Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging.
More informationJob-Oriented Monitoring of Clusters
Job-Oriented Monitoring of Clusters Vijayalaxmi Cigala Dhirajkumar Mahale Monil Shah Sukhada Bhingarkar Abstract There has been a lot of development in the field of clusters and grids. Recently, the use
More informationThe Leading Parallel Cluster File System
The Leading Parallel Cluster File System www.thinkparq.com www.beegfs.io ABOUT BEEGFS What is BeeGFS BeeGFS (formerly FhGFS) is the leading parallel cluster file system, developed with a strong focus on
More informationSelf Adapting Numerical Software (SANS-Effort)
Self Adapting Numerical Software (SANS-Effort) Jack Dongarra Innovative Computing Laboratory University of Tennessee and Oak Ridge National Laboratory 1 Work on Self Adapting Software 1. Lapack For Clusters
More informationCMSC 433 Programming Language Technologies and Paradigms. Spring 2013
1 CMSC 433 Programming Language Technologies and Paradigms Spring 2013 Distributed Computing Concurrency and the Shared State This semester we have been looking at concurrent programming and how it is
More informationData Management in Application Servers. Dean Jacobs BEA Systems
Data Management in Application Servers Dean Jacobs BEA Systems Outline Clustered Application Servers Adding Web Services Java 2 Enterprise Edition (J2EE) The Application Server platform for Java Java Servlets
More informationICENI: An Open Grid Service Architecture Implemented with Jini Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington
ICENI: An Open Grid Service Architecture Implemented with Jini Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington ( Presentation by Li Zao, 01-02-2005, Univercité Claude
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is
More informationUNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING
Atlanta, Georgia, April 19, 2010 in conjunction with IPDPS 2010 UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING Magdalena Slawinska Jaroslaw Slawinski Vaidy Sunderam {magg, jaross,
More informationParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing
ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing Prof. Wu FENG Department of Computer Science Virginia Tech Work smarter not harder Overview Grand Challenge A large-scale biological
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is
More informationThe Grid Architecture
U.S. Department of Energy Office of Science The Grid Architecture William E. Johnston Distributed Systems Department Computational Research Division Lawrence Berkeley National Laboratory dsd.lbl.gov What
More informationChapter 16. Layering a computing infrastructure
: Chapter 16 by David G. Messerschmitt Layering a computing infrastructure Applications Application components Middleware Operating system Network 2 1 Spanning layer Application Distributed object management
More informationMPICH on Clusters: Future Directions
MPICH on Clusters: Future Directions Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory thakur@mcs.anl.gov http://www.mcs.anl.gov/~thakur Introduction Linux clusters are
More informationDatabase Assessment for PDMS
Database Assessment for PDMS Abhishek Gaurav, Nayden Markatchev, Philip Rizk and Rob Simmonds Grid Research Centre, University of Calgary. http://grid.ucalgary.ca 1 Introduction This document describes
More informationHPCS HPCchallenge Benchmark Suite
HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary
More informationLet s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.
Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein
More informationDS 2009: middleware. David Evans
DS 2009: middleware David Evans de239@cl.cam.ac.uk What is middleware? distributed applications middleware remote calls, method invocations, messages,... OS comms. interface sockets, IP,... layer between
More informationCisco Unified Presence 8.0
Cisco Unified Presence 8.0 Cisco Unified Communications Solutions unify voice, video, data, and mobile applications on fixed and mobile networks, enabling easy collaboration every time from any workspace.
More informationAnnouncements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris
Announcements Email me your survey: See the Announcements page Today Conceptual overview of distributed systems System models Reading Today: Chapter 2 of Coulouris Next topic: client-side processing (HTML,
More informationThe ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR
The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR 042787 05/2004 Parallel File Systems and Parallel I/O Why - From
More informationMobile Middleware Course. Mobile Platforms and Middleware. Sasu Tarkoma
Mobile Middleware Course Mobile Platforms and Middleware Sasu Tarkoma Role of Software and Algorithms Software has an increasingly important role in mobile devices Increase in device capabilities Interaction
More informationDynamic Virtual Cluster reconfiguration for efficient IaaS provisioning
Dynamic Virtual Cluster reconfiguration for efficient IaaS provisioning Vittorio Manetti, Pasquale Di Gennaro, Roberto Bifulco, Roberto Canonico, and Giorgio Ventre University of Napoli Federico II, Italy
More informationAlgorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources
Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources Zizhong Chen University of Tennessee, Knoxville zchen@cs.utk.edu Jack J. Dongarra University of Tennessee,
More informationIntroduction to Distributed Systems (DS)
Introduction to Distributed Systems (DS) INF5040/9040 autumn 2014 lecturer: Frank Eliassen Frank Eliassen, Ifi/UiO 1 Outline Ø What is a distributed system? Ø Challenges and benefits of distributed systems
More informationAllowing Users to Run Services at the OLCF with Kubernetes
Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing
More informationData systems supporting chemical informatics and small molecule discovery for crop protection research.
Data systems supporting chemical informatics and small molecule discovery for crop protection research. Mark Forster - Oracle Life Science User Group Meeting. April 2006. Presentation Outline. Syngenta
More informationIan Foster, An Overview of Distributed Systems
The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,
More informationF6 Model-driven Development Kit (F6MDK)
F6 Model-driven Development Kit (F6MDK) Gabor Karsai, Abhishek Dubey, Andy Gokhale, William R. Otte, Csanad Szabo; Vanderbilt University/ISIS Alessandro Coglio, Eric Smith; Kestrel Institute Prasanta Bose;
More informationUpdate on Scalable SA Project
Update on Scalable SA Project Hal Rosenstock Mellanox Technologies #OFADevWorkshop The Problem And The Solution n^2 SA load SA queried for every connection Communication between all nodes creates an n
More informationDISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.
DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS
More informationResilient X10 Efficient failure-aware programming
David Cunningham, David Grove, Benjamin Herta, Arun Iyengar, Kiyokuni Kawachiya, Hiroki Murata, Vijay Saraswat, Mikio Takeuchi, Olivier Tardieu X10 Workshop 2014 Resilient X10 Efficient failure-aware programming
More informationData Access and Analysis with Distributed, Federated Data Servers in climateprediction.net
Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net Neil Massey 1 neil.massey@comlab.ox.ac.uk Tolu Aina 2, Myles Allen 2, Carl Christensen 1, David Frame 2, Daniel
More informationJava Development and Grid Computing with the Globus Toolkit Version 3
Java Development and Grid Computing with the Globus Toolkit Version 3 Michael Brown IBM Linux Integration Center Austin, Texas Page 1 Session Introduction Who am I? mwbrown@us.ibm.com Team Leader for Americas
More informationOracle Application Server 10g (9.0.4): Manually Managed Cluster. An Oracle White Paper June 2004
Oracle Application Server 10g (9.0.4): Manually Managed Cluster An Oracle White Paper June 2004 Oracle Application Server 10g (9.0.4): Manually Managed Cluster Introduction to Oracle Application Server
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed
More informationGrid Scheduling Architectures with Globus
Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents
More informationISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH
ISC 09 Poster Abstract : I/O Performance Analysis for the Petascale Simulation Code FLASH Heike Jagode, Shirley Moore, Dan Terpstra, Jack Dongarra The University of Tennessee, USA [jagode shirley terpstra
More informationMiddleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004
Middleware Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004 Outline Web Services Goals Where do they come from? Understanding middleware Middleware as infrastructure Communication
More informationDistributed Systems 27. Process Migration & Allocation
Distributed Systems 27. Process Migration & Allocation Paul Krzyzanowski pxk@cs.rutgers.edu 12/16/2011 1 Processor allocation Easy with multiprocessor systems Every processor has access to the same memory
More informationSlurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)
Slurm Roadmap Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull) Exascale Focus Heterogeneous Environment Scalability Reliability Energy Efficiency New models (Cloud/Virtualization/Hadoop) Following
More information38. System Support for Pervasive Applications
38. System Support for Pervasive Applications Robert Grimm 1 and Brian Bershad 2 1 New York University, New York, NY rgrimm@cs.nyu.edu 2 University of Washington, Seattle, WA bershad@cs.washington.edu
More informationTHE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid
THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The
More informationDeveloping a Thin and High Performance Implementation of Message Passing Interface 1
Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More informationGrid Middleware and Globus Toolkit Architecture
Grid Middleware and Globus Toolkit Architecture Lisa Childers Argonne National Laboratory University of Chicago 2 Overview Grid Middleware The problem: supporting Virtual Organizations equirements Capabilities
More informationAvida Checkpoint/Restart Implementation
Avida Checkpoint/Restart Implementation Nilab Mohammad Mousa: McNair Scholar Dirk Colbry, Ph.D.: Mentor Computer Science Abstract As high performance computing centers (HPCC) continue to grow in popularity,
More informationTowards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Frédéric Suter Joint work with Gabriel Antoniu, Julien Bigot, Cristophe Blanchet, Luc
More informationREMEM: REmote MEMory as Checkpointing Storage
REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of
More informationCIFTS: A Coordinated Infrastructure for Fault Tolerant Systems : Experiences and Challenges
CIFTS: A Coordinated Infrastructure for Fault Tolerant Systems : Experiences and Challenges Rinku Gupta Mathematics and Computer Science Division Argonne National Laboratory CIFTS Project The CIFTS Project
More informationPARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM
PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory
More informationShared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP
Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor
More informationA tutorial report for SENG Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far. Mobile Agents.
A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far Mobile Agents Samuel Lee Department of Electrical Engineering University of Calgary Abstract With
More information(9A05803) WEB SERVICES (ELECTIVE - III)
1 UNIT III (9A05803) WEB SERVICES (ELECTIVE - III) Web services Architecture: web services architecture and its characteristics, core building blocks of web services, standards and technologies available
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationHow to Run Scientific Applications over Web Services
How to Run Scientific Applications over Web Services email: Diego Puppin Nicola Tonellotto Domenico Laforenza Institute for Information Science and Technologies ISTI - CNR, via Moruzzi, 60 Pisa, Italy
More informationVortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems
Vortex Whitepaper Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems www.adlinktech.com 2017 Table of Contents 1. Introduction........ P 3 2. Iot and
More information