Distributed systems: paradigms and models Motivations

Similar documents
Distributed systems: paradigms and models Motivations

High Performance Computing. Introduction to Parallel Computing

Introduction to Parallel Computing

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Marco Danelutto. May 2011, Pisa

45-year CPU Evolution: 1 Law -2 Equations

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

WHY PARALLEL PROCESSING? (CE-401)

Multi-core Programming - Introduction

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

CS 194 Parallel Programming. Why Program for Parallelism?

Introduction to Parallel Programming

Lecture 1. Introduction Course Overview

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Fundamentals of Computers Design

Multi MicroBlaze System for Parallel Computing

Moore s Law. Computer architect goal Software developer assumption

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Parallel and Distributed Computing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

6.1 Multiprocessor Computing Environment

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

Introduction to Parallel Computing

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

GPU for HPC. October 2010

Multicore Hardware and Parallelism

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Organizational issues (I)

ECE 8823: GPU Architectures. Objectives

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

The MPI Message-passing Standard Practical use and implementation (I) SPD Course 2/03/2010 Massimo Coppola

Parallel Computing Concepts. CSInParallel Project

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware

High Performance Computing (HPC) Introduction

Fundamentals of Computer Design

Introduction. HPC Fall 2007 Prof. Robert van Engelen

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Dr Tay Seng Chuan Tel: Office: S16-02, Dean s s Office at Level 2 URL:

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Real Parallel Computers

Computer Architecture

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

COSC 6374 Parallel Computation. Organizational issues (I)

Parallelism in Hardware

Introduction to Parallel Computing

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Parallel Programming Platforms

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

Parallel Algorithm Design. CS595, Fall 2010

Parallel Computing: Parallel Architectures Jin, Hai

4. Shared Memory Parallel Architectures

Master Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms.

Consultation for CZ4102

Introduction to High-Performance Computing

Efficient Smith-Waterman on multi-core with FastFlow

THE IMPACT OF E-COMMERCE ON DEVELOPING A COURSE IN OPERATING SYSTEMS: AN INTERPRETIVE STUDY

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Client Server & Distributed System. A Basic Introduction

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

Computer Architecture Crash course

Parallel Computer Architecture II

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Computer Architecture

Chap. 4 Multiprocessors and Thread-Level Parallelism

Data-Centric Architecture for Space Systems

TDP3471 Distributed and Parallel Computing

Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed

Parallel Combinatorial Search on Computer Cluster: Sam Loyd s Puzzle

Introduction to Parallel Computing

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

Using peer to peer. Marco Danelutto Dept. Computer Science University of Pisa

Ian Foster, CS554: Data-Intensive Computing

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence. CS550: Advanced Operating Systems 2

Distributed and Operating Systems Spring Prashant Shenoy UMass Computer Science.

High Performance Computing on GPUs using NVIDIA CUDA

Multimedia in Mobile Phones. Architectures and Trends Lund

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

The Use of Cloud Computing Resources in an HPC Environment

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Computer Science 146. Computer Architecture

The IBM Blue Gene/Q: Application performance, scalability and optimisation

Performance Tools for Technical Computing

Transcription:

Distributed systems: paradigms and models Motivations Prof. Marco Danelutto Dept. Computer Science University of Pisa Master Degree (Laurea Magistrale) in Computer Science and Networking Academic Year 2009-2010

Contents Hardware motivations CPU evolution HPC Clouds Software motivations innovative paradigms can be moved to different frameworks 2

Moore s law Moore's original statement can be found in his publication "Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965: The complexity for minimum component costs has increased at a rate of roughly a factor of two per year... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer. 3

Moore s law evolution Transistors/gates doubling every 2 years more and more powerful single processor systems Cores doubling every two years simpler cores more complex (?) memory hierarchy more complex interconnection structure 4

Why? doubling core exploits existing technology (and trends) keeping reasonable power consumption doubling the frequency of a single core chip costs much more than putting two simpler cores on the same chip Perf = Freq x IPC Power = DynamicCapacitance x Volt x Volt x Freq (http://download.intel.com/technology/architecture/new_architecture_06.pdf) 5

Commodity processors http://www.edumax.com/assets/images/hardware_files/image010.jpg 6

Intel perspective 7

Intel perspective (2) 8

Commodity processors: non Intel http://www.sun.com/processors/ultrasparc-t2 9

Commodity processors: niche products http://www.tilera.com/products/tile64.php 10

Research processors: Intel 80 cores http://techresearch.intel.com/articles/tera-scale/1449.htm 11

More in detail... 4Ghz chip with mesh (logical and physical) 10x8 core FP, 1,28 TFlops Tile: router: addesses each core on chip, implements the mesh VLIW processor (96 bit x instruction, up to 8 ops per cycle), in-order-execution, 32 registers (6Read/4Write), 2K Data, 3K Instruction cache, 2 FPU (9 stages, 2FLOPs/cycle sustained), Cicli: FPU:9, Ld/St:2, Snd/ Rcv:2, Jmp/Br:1 12

13

13

14

15

16

GPUs / FPGAs 17

Intel Larrabee http://download.intel.com/technology/architecture-silicon/siggraph_larrabee_paper.pdf 18

Not only processors: FPGAs http://images.google.com/imgres?imgurl=http://www.fpgajournal.com/whitepapers_2008/2008q1_images/xilinx_embedded_table1.jpg&imgrefurl=http://www.fpgajournal.com/whitepapers_2008/ q1_embedded_xilinx.htm&usg= PXXvIQmng-24QwOWFUFfFuf1lS4=&h=380&w=650&sz=71&hl=en&start=6&um=1&tbnid=LaX1pZKYodDqSM:&tbnh=80&tbnw=137&prev=/images%3Fq%3Dprocessor %2Bevolution%26hl%3Den%26client%3Dsafari%26rls%3Den%26sa%3DN%26um%3D1 19

Not only processors: FPGAs http://images.google.com/imgres?imgurl=http://www.fpgajournal.com/whitepapers_2008/2008q1_images/xilinx_embedded_table1.jpg&imgrefurl=http://www.fpgajournal.com/whitepapers_2008/ q1_embedded_xilinx.htm&usg= PXXvIQmng-24QwOWFUFfFuf1lS4=&h=380&w=650&sz=71&hl=en&start=6&um=1&tbnid=LaX1pZKYodDqSM:&tbnh=80&tbnw=137&prev=/images%3Fq%3Dprocessor %2Bevolution%26hl%3Den%26client%3Dsafari%26rls%3Den%26sa%3DN%26um%3D1 19

Consequence: programming model Heterogeneous computing coming to the scene more and more adaptivity required in the code more and more special purpose solutions needed (transparent to the user) 20

Energy concerns/tradeoffs http://img.tomshardware.com/us/2007/05/29/chart_energy_cost_full_load.png http://nicolask.files.wordpress.com/2009/05/intel-processors.jpg 21

Energy concerns/tradeoff 22

Consequence: programming model Faster single core systems faster dusty deck code Multi-many core require parallel / distributed code UMA NUMA 23

But... Amdhal law is still there serial fraction = f (% of code not parallelizable) p processors available to parallelize the non serial fraction (1-f) Speedup(p) = Ts / (f Ts + (1-f) (Ts / p)) = 1 / (f + (1-f)/p) asymptotically (when p increases): Speedup(p) = 1 / f 24

25

HPC evolution: www.top500.org Twice per year top 500 installations measured on standard benchmarks mostly installations from government, military, education, companies Significantly reflecting tendencies kind of Formula 1 in the parallel computing scenario e.g. interconnection networks scaled down to small COW/NOWs 26

Top 500: processor family 27

Top 500: processor family 27

Top 500: processor family 27

Top 500: operating system 28

Top 500: operating system 28

Top 500: operating system 28

Top 500: Interconnection network 29

Top 500: Interconnection network 29

Top 500: Interconnection network 29

Top 500: number of processors 30

Top 500: number of processors 30

Top 500: number of processors 30

Moore s law in HPC The Sourcebook of Parallel Computing, Dongarra, Foster, Fox, Gropp, Kennedy, Torczon, White editors, 2003 31

Consequence: programming model Top parallel computing moving towards COW/NOW with smaller and smaller latencies and larger and larger bandwidth 32

Evolution in the user model Single processor standard superpipeline superscalar Multi processor ( 70 80) multi/many core ( 00) NOW COW ( 80 90) distributed architecture SSI GRID (late 90 00) meta computing grid (middleware) 33

Cloud 34

Cloud 35

Amazon cloud 36

Amazon cloud 37

Amazon cloud 37

Amazon cloud 38

Consequences More and more general architecture virtualization (host, network, operating system,...) Need to adapt to the unknown heterogeneity in hw resources (computing, networking) 39

Software evolution Innovative concepts algorithmic skeletons, design patterns, coordination/ orchestration patterns/constructs all introduce efficiency/programmability/... at the price of limitations to programmer freedom software components extreme modular programming (interoperability, commodity and legacy code, portability (w.r.t. framework) services full decoupling usage and implementation 40

Software evolution: structured programming Skeletons mostly from HPC community Design patterns mostly from sw engineering community Different approaches language/library vs. programming methodology Different impact Successfully being moved to grids (clouds?) and distributed architectures in general 41

Software evolution: components and services Components mainly from sw engineering community (with HPC influences) Services mainly from the business/end user community Different approaches recently merged into a common framework (SCA by IBM et al.) Different impact SOA is everywhere (SaS SOA, IaaS clouds,...) 42

Parallel vs. distributed computing McDaniel, George, ed. IBM Dictionary of Computing. New York, NY: McGraw-Hill, Inc., 1994. Parallel computing a computer system in which interconnected processors perform concurrent or simultaneous execution of two or more processes Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. New York, NY: 1990 Distributed computing a computer system in which several interconnected computers share the computing tasks assigned to the system Tanembaum Distributed systems: principles and paradigms, 2nd edition, 2006 Distributed system: a collection of independent computers presenting to the user a single, coherent system image 43

Distributed vs. parallel computing 44

Distributed vs. parallel computing 45

Distributed vs. parallel computing Distributed computing Parallel computing 45

Have a look at standard books indexes... Tanembaum, Van Steen Distributed systems: principles and paradigms, 2nd edition, 2006 Introduction, Architectures, Processes, Communications, Naming, Synchronization, Consistency & replicas, Fault tolerance, Security, OO distributed systems, Distributed file system, Web distributed systems, Coordination based systems Kshemkalyani, Singhal Distributed computing: Principles, algorithms and Systems, 2008 Introduction, A model of distributed computations, Logical time, Global state and snapshot recording algorithms, Terminology and basic algorithms, Message ordering and group communication, Termination detection, Reasoning with knowledge, Distributed mutual exclusion algorithms, Deadlock detection in distributed systems, Global predicate detection, Distributed shared memory, Checkpointing and rollback recovery, Consensus and agreement algorithms, Failure detectors, Authentication in distributed systems, Self-stabilization, Peer-to-peer computing and overlay graphs. 46

Have a look at standard books indexes... Tanembaum, Van Steen Distributed systems: principles and paradigms, 2nd edition, 2006 Introduction, Architectures, Processes, Communications, Naming, Synchronization, Consistency & replicas, Fault tolerance, Security, OO distributed systems, Distributed file system, Web distributed systems, Coordination based systems Kshemkalyani, Singhal Distributed computing: Principles, algorithms and Systems, 2008 Introduction, A model of distributed computations, Logical time, Global state and snapshot recording algorithms, Terminology and basic algorithms, Message ordering and group communication, Termination detection, Reasoning with knowledge, Distributed mutual exclusion algorithms, Deadlock detection in distributed systems, Global predicate detection, Distributed shared memory, Checkpointing and rollback recovery, Consensus and agreement algorithms, Failure detectors, Authentication in distributed systems, Self-stabilization, Peer-to-peer computing and overlay graphs. Distributed computing 46

books... Grama, Gupta, Karypis, Kumar Introduction to parallel computing, 2nd edition 2003 Introduction, Parallel programming platforms, Principles of Parallel Algorithmic design, Basic communication operations, Analytical models of Parallel Programs, Programming using Message passing paradigm, Programming shared address space platforms, Dense matrix algorithms, Sorting, Graph algorithms, Search algorithms for discrete optimization problems, Dynamic programming, Fast Fourier Transform, Appendix: Complexity functions and order analysis Wilkinson, Allen Parallel programming: technique and applications using networks workstations and parallel computers, 2nd edition, 2005 PART I: BASIC TECHNIQUES Parallel computers, Message passing computing, Embarrassingly parallel computations, Partitioning and divide-and--conquer strategies, Pipelined computations, Synchronous computations, Load balancing and termination detection, Programming with shared memory, Distributed shared memory systems and programming PART II: ALGORITHMS AND APPLICATIONS Sorting algorithms, Numerical algorithms, Image processing, Searching and optimization APPENDIXES: Basic MPI routines, Basic Pthread routines, OpenMP directives, library functions and environment variables 47

books... Grama, Gupta, Karypis, Kumar Introduction to parallel computing, 2nd edition 2003 Introduction, Parallel programming platforms, Principles of Parallel Algorithmic design, Basic communication operations, Analytical models of Parallel Programs, Programming using Message passing paradigm, Programming shared address space platforms, Dense matrix algorithms, Sorting, Graph algorithms, Search algorithms for discrete optimization problems, Dynamic programming, Fast Fourier Transform, Appendix: Complexity functions and order analysis Wilkinson, Allen Parallel programming: technique and applications using networks workstations and parallel computers, 2nd edition, 2005 PART I: BASIC TECHNIQUES Parallel computers, Message passing computing, Embarrassingly parallel computations, Partitioning and divide-and--conquer strategies, Pipelined computations, Synchronous computations, Load balancing and termination detection, Programming with shared memory, Distributed shared memory systems and programming PART II: ALGORITHMS AND APPLICATIONS Sorting algorithms, Numerical algorithms, Image processing, Searching and optimization APPENDIXES: Basic MPI routines, Basic Pthread routines, OpenMP directives, library functions and environment variables Parallel computing 47

Distributed systems: paradigms and models Distributed as a kind of summary word for distributed & parallel Systems systems as a whole : hardware + software Paradigms sample paradigms proven successful to exploit parallel & distributed systems Models programming models to exploit parallel & distributed systems 48

Methodology Analysis look for possibilities to apply known techniques/patterns figure out performances Implementation pick up proper tools/mechanisms/models if needed build your own ad-hoc tools Debugging/Tuning rely on application structure Porting rely on tools 49