Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM

Similar documents
Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM

Parallel Systems. Part 7: Evaluation of Computers and Programs. foils by Yang-Suk Kee, X. Sun, T. Fahringer

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Analytical Modeling of Parallel Programs

HOW TO WRITE PARALLEL PROGRAMS AND UTILIZE CLUSTERS EFFICIENTLY

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

ECE 669 Parallel Computer Architecture

Parallel Programming with MPI and OpenMP

Understanding Parallelism and the Limitations of Parallel Computing

What is Performance for Internet/Grid Computation?

High Performance Computing Systems

CSE 613: Parallel Programming. Lecture 2 ( Analytical Modeling of Parallel Algorithms )

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

LogCA: A High-Level Performance Model for Hardware Accelerators

The typical speedup curve - fixed problem size

High Performance Computing. Introduction to Parallel Computing

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

An Effective Speedup Metric Considering I/O Constraint in Large-scale Parallel Computer Systems

Parallel Programming Patterns. Overview and Concepts

CSC630/CSC730 Parallel & Distributed Computing

Chapter 5: Analytical Modelling of Parallel Programs

COMP 633 Parallel Computing.

Introduction to parallel Computing

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures

An Introduction to Parallel Programming

Consultation for CZ4102

Evaluation of Parallel Programs by Measurement of Its Granularity

Administrivia. Talks and other opportunities: Expect HW on functions in ASM (printing binary trees) soon

Scalability of Heterogeneous Computing

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

A Generic Distributed Architecture for Business Computations. Application to Financial Risk Analysis.

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Multiprocessor Systems. Chapter 8, 8.1

Some aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE)

EE382A Lecture 16A: Multi-Core Design Tradeoffs (cont.)

Analytical Modeling of Parallel Programs

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Master-Worker pattern

CSE5351: Parallel Processing Part III

USES AND ABUSES OF AMDAHL'S LAW *

Master-Worker pattern

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel

Lec 25: Parallel Processors. Announcements

Scalability of Processing on GPUs

Design of Parallel Algorithms. Course Introduction

CS650 Computer Architecture. Lecture 10 Introduction to Multiprocessors and PC Clustering

CONCURRENT DISTRIBUTED TASK SYSTEM IN PYTHON. Created by Moritz Wundke

Trends and Challenges in Multicore Programming

Performance analysis. Performance analysis p. 1

Performance, Power, Die Yield. CS301 Prof Szajda

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

1 Introduction to Parallel Computing

Amdahl s Law in the Datacenter Era! A Market for Fair Processor Allocation!

CSC2/458 Parallel and Distributed Systems Machines and Models

Introduction to Parallel Computing

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

ANOTHER VEIW ON PARALLEL SPEEDUP*

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

Dr Tay Seng Chuan Tel: Office: S16-02, Dean s s Office at Level 2 URL:

Introduction to parallel computing

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

27. Parallel Programming I

Performance Metrics. Measuring Performance

ParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser

Introduction to Parallel Computing

Imperative Data Parallelism (Performance)

27. Parallel Programming I

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Optimize HPC - Application Efficiency on Many Core Systems

OPERATING SYSTEM. Chapter 4: Threads

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallelization Principles. Sathish Vadhiyar

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Introduction to Parallel Programming and Computing for Computational Sciences. By Justin McKennon

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Performance analysis, development and improvement of programs, commands and BASH scripts in GNU/Linux systems

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Chapter 4: Multithreaded Programming

PARALLEL AND DISTRIBUTED COMPUTING

High-Performance and Parallel Computing

EE/CSCI 451: Parallel and Distributed Computation

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

Chapter 4: Threads. Chapter 4: Threads

Designing for Performance. Patrick Happ Raul Feitosa

Analyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications

Parallel Interval Analysis for Chemical Process Modeling

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Parallel quicksort algorithms with isoefficiency analysis. Parallel quicksort algorithmswith isoefficiency analysis p. 1

ECE 563 Spring 2016, Second Exam

Transcription:

Speedup Thoai am

Outline Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & i s Law

Speedup & Efficiency Speedup: S = Time(the most efficient sequential Efficiency: E = S / algorithm) / Time(parallel algorithm) with is the number of processors

Amdahl s Law Fixed Problem Size (1) The main objective is to produce the results as soon as possible (ex) video compression, computer graphics, VLSI routing, etc Implications Upper-bound is Make Sequential bottleneck as small as possible Optimize the common case Modified Amdahl s law for fixed problem size including the overhead

Amdahl s Law Fixed Problem Size (2) Sequential Sequential Parallel T s T p T(1) Parallel Sequential P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 T( T s =T(1) T p = (1-)T(1) T( = T(1)+ (1-)T(1)/ umber of processors

Amdahl s Law Fixed Problem Size (3) Speedup Time(1) Time( Speedup T(1) T(1) (1 ) T(1) 1 (1 ) 1 as

Enhanced Amdahl s Law The overhead includes parallelism and interaction overheads Speedup T(1) T(1) (1 ) T(1) T overhead 1 T T(1) overhead as

Gustafson s Law Fixed Time (1) User wants more accurate results within a time limit Execution time is fixed as system scales (ex) FEM (Finite element method) for structural analysis, FDM (Finite difference method) for fluid dynamics Properties of a work metric Easy to measure Architecture independent Easy to model with an analytical expression o additional experiment to measure the work The measure of work should scale linearly with sequential time complexity of the algorithm Time constrained seems to be most generally viable model!

Gustafson s Law Fixed Time (2) Parallel P 9... = s / ( ( = ( + (1-)( (1) = ( + (1-)( Sequential P 0 s 0 ( Sequential Sequential P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9

Gustafson s Law Fixed Time without overhead Time = ork. k ( = Speedup T(1) T( (1). k (. k (1 (1 )

Gustafson s Law Fixed Time with overhead ( = + 0 Speedup T(1) T( (1). k (. k (1 0 1 (1 0

Sun and i s Law Fixed Memory (1) Scale the largest possible solution limited by the memory space. Or, fix memory usage per processor Speedup, Time(1)/Time( for scaled up problem is not appropriate. For simple profile, and G( is the increase of parallel workload as the memory capacity increases times.

Sun and i s Law Fixed Memory (2) =+(1- ) Let M be the memory capacity of a single node nodes: the increased memory *M The scaled work: =+(1- )G( (1 ) G( Speedup MC G ( ) (1 )

Sun and i s Law Fixed Memory (3) Definition: A function is homomorphism if there exists a function such that for any real number c and variable x, g( cx) g( c) g( x) Theorem: g g(m ). If = for some homomorphism function, g( cx) g( c) g( x), then, with all data being shared by all available processors, the simplified memory-bounced speedup is * 1 g( (1 ) G( S g( G( 1 (1 ) g g

Proof: Sun and i s Law Fixed Memory (4) g(m ) Let the memory requirement of n be M, n =. M is the memory requirement when 1 node is available. ith nodes available, the memory capacity will increase to *M. Using all of the available memory, for the scaled parallel * * portion : g( M ) g( g( M) g(. S * * 1 * 1 * * 1 1 g( g(

Speedup S * 1 1 G( G( hen the problem size is independent of the system, the problem size is fixed, G(=1 Amdahl s Law. hen memory is increased times, the workload also increases times, G(= Gustafson s Law For most of the scientific and engineering applications, the computation requirement increases faster than the memory requirement, G(>.

Examples 10 8 Speedup 6 4 2 0 S(Linear) S(ormal) 0 2 4 6 8 10 Processors

Scalability Parallelizing a code does not always result in a speedup; sometimes it actually slows the code down! This can be due to a poor choice of algorithm or to poor coding The best possible speedup is linear, i.e. it is proportional to the number of processors: T( = T(1)/ where = number of processors, T(1) = time for serial run. A code that continues to speed up reasonably close to linearly as the number of processors increases is said to be scalable. Many codes scale up to some number of processors but adding more processors then brings no improvement. Very few, if any, codes are indefinitely scalable.

Factors That Limit Speedup Software overhead Even with a completely equivalent algorithm, software overhead arises in the concurrent implementation. (e.g. there may be additional index calculations necessitated by the manner in which data are "split up" among processors.) i.e. there is generally more lines of code to be executed in the parallel program than the sequential program. Load balancing Communication overhead