Distributed systems: paradigms and models Motivations

Distributed systems: paradigms and models Motivations Prof. Marco Danelutto Dept. Computer Science University of Pisa Master Degree (Laurea Magistrale) in Computer Science and Networking Academic Year 2009-2010

Contents Hardware motivations CPU evolution HPC Clouds Software motivations innovative paradigms can be moved to different frameworks 2

Moore s law Moore's original statement can be found in his publication "Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965: The complexity for minimum component costs has increased at a rate of roughly a factor of two per year... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer. 3

Moore s law evolution Transistors/gates doubling every 2 years more and more powerful single processor systems Cores doubling every two years simpler cores more complex (?) memory hierarchy more complex interconnection structure 4

Why? doubling core exploits existing technology (and trends) keeping reasonable power consumption doubling the frequency of a single core chip costs much more than putting two simpler cores on the same chip Perf = Freq x IPC Power = DynamicCapacitance x Volt x Volt x Freq (http://download.intel.com/technology/architecture/new_architecture_06.pdf) 5

Commodity processors http://www.edumax.com/assets/images/hardware_files/image010.jpg 6

Intel perspective 7

Intel perspective (2) 8

Commodity processors: non Intel http://www.sun.com/processors/ultrasparc-t2 9

Commodity processors: niche products http://www.tilera.com/products/tile64.php 10

Research processors: Intel 80 cores http://techresearch.intel.com/articles/tera-scale/1449.htm 11

More in detail... 4Ghz chip with mesh (logical and physical) 10x8 core FP, 1,28 TFlops Tile: router: addesses each core on chip, implements the mesh VLIW processor (96 bit x instruction, up to 8 ops per cycle), in-order-execution, 32 registers (6Read/4Write), 2K Data, 3K Instruction cache, 2 FPU (9 stages, 2FLOPs/cycle sustained), Cicli: FPU:9, Ld/St:2, Snd/ Rcv:2, Jmp/Br:1 12

GPUs / FPGAs 17

Intel Larrabee http://download.intel.com/technology/architecture-silicon/siggraph_larrabee_paper.pdf 18

Not only processors: FPGAs http://images.google.com/imgres?imgurl=http://www.fpgajournal.com/whitepapers_2008/2008q1_images/xilinx_embedded_table1.jpg&imgrefurl=http://www.fpgajournal.com/whitepapers_2008/ q1_embedded_xilinx.htm&usg= PXXvIQmng-24QwOWFUFfFuf1lS4=&h=380&w=650&sz=71&hl=en&start=6&um=1&tbnid=LaX1pZKYodDqSM:&tbnh=80&tbnw=137&prev=/images%3Fq%3Dprocessor %2Bevolution%26hl%3Den%26client%3Dsafari%26rls%3Den%26sa%3DN%26um%3D1 19

Consequence: programming model Heterogeneous computing coming to the scene more and more adaptivity required in the code more and more special purpose solutions needed (transparent to the user) 20

Energy concerns/tradeoffs http://img.tomshardware.com/us/2007/05/29/chart_energy_cost_full_load.png http://nicolask.files.wordpress.com/2009/05/intel-processors.jpg 21

Energy concerns/tradeoff 22

Consequence: programming model Faster single core systems faster dusty deck code Multi-many core require parallel / distributed code UMA NUMA 23

But... Amdhal law is still there serial fraction = f (% of code not parallelizable) p processors available to parallelize the non serial fraction (1-f) Speedup(p) = Ts / (f Ts + (1-f) (Ts / p)) = 1 / (f + (1-f)/p) asymptotically (when p increases): Speedup(p) = 1 / f 24

HPC evolution: www.top500.org Twice per year top 500 installations measured on standard benchmarks mostly installations from government, military, education, companies Significantly reflecting tendencies kind of Formula 1 in the parallel computing scenario e.g. interconnection networks scaled down to small COW/NOWs 26

Top 500: processor family 27

Top 500: operating system 28

Top 500: Interconnection network 29

Top 500: number of processors 30

Moore s law in HPC The Sourcebook of Parallel Computing, Dongarra, Foster, Fox, Gropp, Kennedy, Torczon, White editors, 2003 31

Consequence: programming model Top parallel computing moving towards COW/NOW with smaller and smaller latencies and larger and larger bandwidth 32

Evolution in the user model Single processor standard superpipeline superscalar Multi processor ( 70 80) multi/many core ( 00) NOW COW ( 80 90) distributed architecture SSI GRID (late 90 00) meta computing grid (middleware) 33

Cloud 34

Cloud 35

Amazon cloud 36

Amazon cloud 37

Amazon cloud 38

Consequences More and more general architecture virtualization (host, network, operating system,...) Need to adapt to the unknown heterogeneity in hw resources (computing, networking) 39

Software evolution Innovative concepts algorithmic skeletons, design patterns, coordination/ orchestration patterns/constructs all introduce efficiency/programmability/... at the price of limitations to programmer freedom software components extreme modular programming (interoperability, commodity and legacy code, portability (w.r.t. framework) services full decoupling usage and implementation 40

Software evolution: structured programming Skeletons mostly from HPC community Design patterns mostly from sw engineering community Different approaches language/library vs. programming methodology Different impact Successfully being moved to grids (clouds?) and distributed architectures in general 41

Software evolution: components and services Components mainly from sw engineering community (with HPC influences) Services mainly from the business/end user community Different approaches recently merged into a common framework (SCA by IBM et al.) Different impact SOA is everywhere (SaS SOA, IaaS clouds,...) 42

Parallel vs. distributed computing McDaniel, George, ed. IBM Dictionary of Computing. New York, NY: McGraw-Hill, Inc., 1994. Parallel computing a computer system in which interconnected processors perform concurrent or simultaneous execution of two or more processes Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. New York, NY: 1990 Distributed computing a computer system in which several interconnected computers share the computing tasks assigned to the system Tanembaum Distributed systems: principles and paradigms, 2nd edition, 2006 Distributed system: a collection of independent computers presenting to the user a single, coherent system image 43

Distributed vs. parallel computing 44

Distributed vs. parallel computing 45

Distributed vs. parallel computing Distributed computing Parallel computing 45

Have a look at standard books indexes... Tanembaum, Van Steen Distributed systems: principles and paradigms, 2nd edition, 2006 Introduction, Architectures, Processes, Communications, Naming, Synchronization, Consistency & replicas, Fault tolerance, Security, OO distributed systems, Distributed file system, Web distributed systems, Coordination based systems Kshemkalyani, Singhal Distributed computing: Principles, algorithms and Systems, 2008 Introduction, A model of distributed computations, Logical time, Global state and snapshot recording algorithms, Terminology and basic algorithms, Message ordering and group communication, Termination detection, Reasoning with knowledge, Distributed mutual exclusion algorithms, Deadlock detection in distributed systems, Global predicate detection, Distributed shared memory, Checkpointing and rollback recovery, Consensus and agreement algorithms, Failure detectors, Authentication in distributed systems, Self-stabilization, Peer-to-peer computing and overlay graphs. 46

books... Grama, Gupta, Karypis, Kumar Introduction to parallel computing, 2nd edition 2003 Introduction, Parallel programming platforms, Principles of Parallel Algorithmic design, Basic communication operations, Analytical models of Parallel Programs, Programming using Message passing paradigm, Programming shared address space platforms, Dense matrix algorithms, Sorting, Graph algorithms, Search algorithms for discrete optimization problems, Dynamic programming, Fast Fourier Transform, Appendix: Complexity functions and order analysis Wilkinson, Allen Parallel programming: technique and applications using networks workstations and parallel computers, 2nd edition, 2005 PART I: BASIC TECHNIQUES Parallel computers, Message passing computing, Embarrassingly parallel computations, Partitioning and divide-and--conquer strategies, Pipelined computations, Synchronous computations, Load balancing and termination detection, Programming with shared memory, Distributed shared memory systems and programming PART II: ALGORITHMS AND APPLICATIONS Sorting algorithms, Numerical algorithms, Image processing, Searching and optimization APPENDIXES: Basic MPI routines, Basic Pthread routines, OpenMP directives, library functions and environment variables 47

Distributed systems: paradigms and models Distributed as a kind of summary word for distributed & parallel Systems systems as a whole : hardware + software Paradigms sample paradigms proven successful to exploit parallel & distributed systems Models programming models to exploit parallel & distributed systems 48

Methodology Analysis look for possibilities to apply known techniques/patterns figure out performances Implementation pick up proper tools/mechanisms/models if needed build your own ad-hoc tools Debugging/Tuning rely on application structure Porting rely on tools 49