Ian Foster, CS554: Data-Intensive Computing

Similar documents
Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence. CS550: Advanced Operating Systems 2

Ian Foster, An Overview of Distributed Systems

Arguably one of the most fundamental discipline that touches all other disciplines and people

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Introduction. Chapter 1

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

Client Server & Distributed System. A Basic Introduction

TDP3471 Distributed and Parallel Computing

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architectures

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems LEEC (2006/07 2º Sem.)

Introduction. Distributed Systems IT332

CA464 Distributed Programming

Distributed Computing. Santa Clara University 2016

Concepts of Distributed Systems 2006/2007

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

Introduction to Distributed Systems (DS)

Chapter 1: Introduction 1/29

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

CHAPTER 1 Fundamentals of Distributed System. Issues in designing Distributed System

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Advanced Operating Systems

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

DISTRIBUTED COMPUTING

Distributed OS and Algorithms

Introduction to Distributed Systems (DS)

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Distributed and Operating Systems Spring Prashant Shenoy UMass Computer Science.

Distributed Systems. 01. Introduction. Paul Krzyzanowski. Rutgers University. Fall 2013

1. Distributed Systems

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

CSE 5306 Distributed Systems. Course Introduction

02 - Distributed Systems

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

02 - Distributed Systems

CA485 Ray Walshe Google File System

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Real Parallel Computers

Chapter 18 Parallel Processing

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Chapter 17: Distributed Systems (DS)

It also performs many parallelization operations like, data loading and query processing.

Outline. Distributed Computing Systems. The Rise of Distributed Systems. Depiction of a Distributed System 4/15/2014

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

The Google File System

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Distributed Systems. Chapter 1: Introduction

The Google File System

INTRODUCTION TO Object Oriented Systems BHUSHAN JADHAV

Computer Organization. Chapter 16

HPC Capabilities at Research Intensive Universities

High Performance Computing Course Notes HPC Fundamentals

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Google File System. Arun Sundaram Operating Systems

Regional & National HPC resources available to UCSB

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

CS4513 Distributed Computer Systems

CSE Traditional Operating Systems deal with typical system software designed to be:

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Distributed Systems INF Michael Welzl

Introduction Distributed Systems

Decentralized Distributed Storage System for Big Data

GFS: The Google File System

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Distributed Information Processing

Introduction to HPC Parallel I/O

Computer-System Organization (cont.)

Module 1 - Distributed System Architectures & Models

Distributed Systems COMP 212. Lecture 1 Othon Michail

Introduction to Grid Computing

Parallel File Systems Compared

Outline. Lecture 11: EIT090 Computer Architecture. Small-scale MIMD designs. Taxonomy. Anders Ardö. November 25, 2009

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Real Parallel Computers

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Distributed System. Gang Wu. Spring,2018

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

What is a distributed system?

GFS: The Google File System. Dr. Yingwu Zhu

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

The Google File System

Transcription:

The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster, 2006

What is a distributed system? A collection of independent computers that appears to its users as a single coherent system -A. Tanenbaum

A distributed system organized as middleware. The middleware layer extends over multiple machines, and offers each application the same interface.

Parallel and Distributed Systems tightly coupled loosely coupled Multiprocessors (fast hw network) Multicomputers (slow hw network) Shared-Addr.-Space Multiprocessors Dist.-Addr.-Space Multiprocessors Homogeneous Multicomputers Heterogeneous Multicomputers UMA NUMA Clusters Grids Information systems Pervasive systems Distributed Systems

[GCE08] Cloud Computing and Grid Computing 360-Degree Compared

Computer clusters using commodity processors, network interconnects, and operating systems.

Rack 32 Node Cards Supercomputing ~ HPC Cabled 8x8x16 Baseline System 32 Racks Chip 4 processors Compute Card 1 chip, 1x1x1 Node Card (32 chips 4x4x2) 32 compute, 0-4 IO cards 435 GF/s 64 GB 14 TF/s 2 TB 500TF/s 64 TB Highly-tuned computer clusters using commodity 13.6 GF/s 13.6 GF/s processors 2 GB DDR combined with custom network 8 MB EDRAM interconnects and CS554: customized Data-Intensive Computing operating system

Cray XT4 & XT5 Cray #1 Cielo #18 Hopper #19 IBM BlueGene/L/P/Q Sequia #2 Mira #4 Juqueen #5 Fermi #9 GPU based Titan #1 Tianhe-1A #8 Nebulae #12 SGI Altix ICE Plaiedas #14 SPARC64 VIIIfx K #3

Grids tend to be composed of multiple clusters, and are typically loosely coupled, heterogeneous, and geographically dispersed NCAR UC/ANL NCSA PU IU PSC Tennessee 2008 (~1PF) ORNL SDSC Tommy Minyard, TACC TACC 2007 (504TF) LONI/LSU Grids ~ Federation Computational Resources (size approximate - not to scale)

XSEDE (Formerly TeraGrid) 200K-cores across 11 institutions and 22 systems over the US Open Science Grid (OSG) 43K-cores across 80 institutions over the US Enabling Grids for E-sciencE (EGEE) LHC Computing Grid from CERN Middleware Globus Toolkit Unicore

A large-scale distributed computing paradigm driven by: 1. economies of scale 2. virtualization 3. dynamically-scalable resources 4. delivered on demand over the Internet Clouds ~ hosting

NERSC ALCF Chicago NYC Sunnyval e Nashville OLCF 100 gigabit/ sec

Industry Google App Engine Amazon Windows Azure Salesforce Academia/Government Magellan FutureGrid Opensource middleware Nimbus Eucalyptus OpenNebula OpenStack

Data sharing Multiple users can access common database, data files, Device/resource sharing Printers, servers, CPUs,. Communication Communication with other machines Flexibility Spread workload to different & most appropriate machines Extensibility Add resources and software as needed

Economics Microprocessors have better price/performance than mainframes Speed Collective power of large number of systems Geographic and responsibility distribution Reliability One machine s failure need not bring down the system Extensibility Computers and software can be added incrementally

Software Little software exists compared to PCs Networking Still slow and can cause other problems (e.g. when disconnected) Security Data may be accessed by unauthorized users

Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence

Share hardware,software,data and information Hardware devices Printers,disks,memory, Software sharing Compilers,libraries,toolkits, Data Databases,files,

Definition? Hardware extensions Adding peripherals,memory,communication interfaces Software extensions Operating systems features Communication protocols

In a single system several processes are interleaved In distributed systems: there are many systems with one or more processors Many users simultaneously invoke commands or applications Many servers processes run concurrently, each responding to different client request

Scale of system Few PCs servers ->dept level systems- >local area networks->internetworked systems->wide are network Ideally, system and application software should not change as systems scales Scalability depends on all aspects Hardware Software networks

Definition? Two approaches: Hardware redundancy Software recovery In distributed systems: Servers can be replicated Databases may be replicated Software recovery involves the design so that state of permanent data can be recovered

Transparency Access Location Migration Relocation Replication Concurrency Failure Persistence Description Hide differences in data representation and how a resource is accessed Hide where a resource is located Hide that a resource may move to another location Hide that a resource may be moved to another location while in use Means that users do not know whether a replica or a master provides a service. Hide that a resource may be shared by several competitive users Hide the failure and recovery of a resource Hide whether a (software) resource is in memory or on disk

False assumptions made by first time developer: The network is reliable. The network is secure. The network is homogeneous. The topology does not change. Latency is zero. Bandwidth is infinite. Transport cost is zero. There is one administrator.