16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model
|
|
- Briana Park
- 5 years ago
- Views:
Transcription
1 Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel sum and granularity control,» Prefix computation and Divide & Conquer,» Addition of two n-bits integers and Redundancy Parallele Programmierung 1 Parallele Programmierung 2 What do you program? What do you expect from a model? Complexity = metrics to evaluate the quality of your program. It depends on: The model of the algorithm and of the program:» Data, computation, memory usage, for instance. The model of the host machine:» Processor(s), memory hierarchy, network...? In the sequential case, everything s fine! Von Neumann model fetch & run cycles. Turing machine... A very SIMPLE model enables the correct prediction and categorization of any algorithm. Parallele Programmierung 3 A model has to be: extensive:» Many parameters in general, it ends up in a complx system.» These parameters reflect the program/machine. Abstract» i.e. generic» You do not want to change your model each 18 months (see Morore s law)» You want the general trend, not the details. Predictive» So it must lead to something you can calculate on.» (It does not mean that it must be analytical) Parallele Programmierung 4 In the parallel world... There is no universal model. There are many models For each type of machine, and many types of programs. You have no simple reduction from a model for distributed memory machine to a model for shared memory machine. Because of the network model... Most theoretical models have been obtained with shared memory models. Much simples, less parameters. Scalability limited! Parallele Programmierung 5 Basic ideas for the machine model Disconsider the communication (PRAM) Adapted for shared memory machines, multicore chips... Consider a machine as being homegeneous, static, perfectly interconnected, with zero latency, and a fixed time for message passing (delay model). Consider a homogeneous, static machine, with a network that has latency and/or a given bandwith (LogP) Okay for a cluster Considerar a dynamic, heterogeneous machine (Grid)... No one knows hot to model this. Parallele Programmierung 6 1
2 Parallel program model Parallel program model How do you describe a parallel program? Task parallelism: The program is made of tasks (sequential unit); The tasks are (partially) ordered by dependencies A correct execution if a (possibly parallel) schedule which respects the dependencies. Very close to functionnal programming. The dependencies can be explicit (e.g.: depend on messages) or implicit (e.g.: arguments). Trivial case: no dependencies ( embarassingly parallel (EP), parameter sweeping, task farm, master/slave, client/server, bag of tasks,... More complex case: Divide & Conquer. The dependencies order the tasks in a tree-like way. Data Parallelism Distribute the data, and each process/thread computes on its local data.» Owner Compute Rule Single Program Multiple Data Loop parallelism Comes from the Compiler world Just tell which iterations of a loop can be performed in parallel. Templates for parallel programming Provides skeletons (frameworks). Parallele Programmierung 7 Parallele Programmierung 8 Programming Model vs. Machine Model Performance evaluation RMI/RPC Communicating processes Threads Programing model OpenMP Posix Threads Cilk Java PVM/MPI Satin A parallel program is intrinsically non-deterministic The order of execution of the tass may change from execution to execution The network (if any) adds its part of random. You are interested in runtime. The usual argument I compiled it, therefore the program is okay does not serve at all! It is mandatory to use statistical measurements: Dist. memory Machine model shared memory At least: x runs (x=10,20, 30...), and mean, min. and max. Runtime (or speedup, or efficiency) indicated. Better: x runs, mean and standard deviation» If the standard dev. is high, run it more or ask yourself if there is something wrong... Event better: x runs, confidence interval about the mean. Parallele Programmierung 9 Parallele Programmierung 10 The PRAM model Considerations: time, space and work A PRAM machine is a set of processors, All are equal and only distinguished by an id. All can access in constant time whatever address of a global, shared memory. The processors execute their instructions synchronously. You can use as many processor as you want. Metrics: executing a parallel program with entry of size n, on a PRAM machine, is characterized by two quantities: The parallel runtime T par (n) The number of processors required to this execution P(n) The quality of the PRAM execution is also measured by W p (n) (Work), the total number of instructions. W p (n) T par (n) * P(n) Parallele Programmierung 11 T par (n) is the runtime. Proportional to the runtime of a single instruction. What is important is the order of magnitude:» O(n), O(log n), O( nlog n), θ(n)... P(n) is the number of processors. In the PRAM model, 1 processor = 1 process. P(n) can also be considered as a measure of the (memory) space that is required. C(n) = T par (n) x P(n) is the (parallel) cost. Look at it as a rectangular area. W p (n) is the Work A sub-area of the rectangle. As close as possible as the sequential program. T par (n) Parallele Programmierung 12 P(n) 2
3 Optimal PRAM algorithm Brent s Principle 1. T par (n) as small as possible Maybe you will have to use many processors! What is small?» T par (n) = θ(log n) 2. P(n) not too big. Polynomial in n. 3. Do not perform (many) more instructions in parallel than in sequential. i.e. C(n) = θ(w p (n)) = θ(w 1 (n)) = θ(t 1 (n)) Na optimal PRAM algorithm will usually require P(n) processors, much more than a fixed, constant number p that is physically available ( p vs. n ). So how do you map a PRAM algorithm to a small number of processors? Easy: just adjust the rectangle Each physical processor i=1...p sequentially runs more than one instructions of each PRAM proc. Of course, the runtime increases. How much? Brent s principle (emulation): T par (n) T p (n) W p (n) p + T par (n) P(n) Parallele Programmierung 13 Parallele Programmierung 14 Great, but what does it mean? 1st PRAM algorithm: sum of n elements Take an optimal PRAM algorithm Very parallel, runtime much smaller than seq. and almost as few instr. as in the sequential case. Formally, T par (n) = θ(log n) and W p (n) = θ(t 1 (n)) Input: an array of n = 2 k elements and an associative, commutative operator +. Output: the sum of the n elements. T 1 (n) = n-1 /* I do hope this is obvious for everyone... */ Parallel algorithm: binary tree. Therefore, you can always run it on a fixed, small number of processors p, with runtime: T p (n) T 1 (n) p + log(n) So, when T 1 (n) >> log(n) (which is always the case...), you end up with an almost linear speedup. Nice! Parallele Programmierung 15 Parallele Programmierung 16 PRAM complexity of the parallel sum The optimal parallel sum algorithm 1. T par (n) = θ(log n) 2. P(n) = n/2 3. W(n) = n/2 + n/4 + n/ = n-1 = T 1 (n), great. 4. So what s wrong? C(n) = P(n) * T par (n) = θ(n.log n) >> θ(t 1 (n)) Parallele Programmierung 17 The problem comes from a very fine-grained algorithm. The basic operation is a single + operation. Other version of the (same) problem: we use a little bit more processors than what we want. If we could use P(n) = n/log(n), with T par (n) = θ(log n), then the algorithm would be optimal. Solution: increase the granularity, or (same thing) use Brent s principle. Take p = n/log(n) processors, each one running more than one of the basic + instructions of previous algorithm. Each processor will run log(n) + instructions. This idea can also be seen as a sequential degeneration of the parallel algorithm. To be efficient in parallel, run in sequential! A similar technique is used in sequential algorithmics (see quicksort) Parallele Programmierung 18 3
4 The optimal parallel sum algorithm Parallel Prefix Sequential phase Parallel phase Input: an array of n = 2 k elements and an associative, commutative operator +. Output: the n partial sums of the i first elements, i=1..n. T 1 (n) = n-1 /* I do hope this is obvious for everyone... */ result[1] = a[1] /* the 1st element of the input array */ for (i=2 ; i <= n ; i++) result[i] = result[i-1] + a[i] This seems highly sequential! Let us revisit the computation, using Divide & Conquer This is an IMPORTANT (although simple) concept. Parallele Programmierung 19 Parallele Programmierung 20 Prefix the D&C version Prefix the D&C version a 1 a n/2 a n/2+1 a n a 1 a n/2 a n/2+1 a n a n/2+1 a n/2+1 a n/2+1 +a n/2+2 a n/2+1+ +a n/ a n a n/2+1 +a n/2+2 a n/2+1+ +a n/ a n Divide phase Divide phase Conquer phase Parallele Programmierung 21 Parallele Programmierung 22 PRAM complexity of the D&C parallel Sum of two n-bits numbers. 1. T par (n) = T par (n/2) + 1 =... = θ(log n) 2. P(n) = Max{ 2P(n/2) ; n/2 } =... = n 3. C(n) = P(n) * T par (n) = θ(n.log n) >> θ(t 1 (n)) 4. Apply Brent or increase the granularity and you will get an optimal version. T par (n) = θ(log n), P(n) = n/log(n). Parallele Programmierung 23 Input: 2 binary numbers a and b of n = 2 k bits. Output: the n+1 bits number equal to a+b. T 1 (n) = n, with the algorithm that is learned at elementary school (sum the digits column by column, from right to left, with the carry). r i-1 =b i-1 +b i-1 a 0 b 0 = r n r 0 Simple and nice, but highly sequential again! You have to propagate the carry from right to left, and can not compute the i-th bit without the carry. Parallele Programmierung 24 Carry c i = 0 or 1, depending on r i-1 4
5 Time D&C sum of two binary numbers n/2 heavy weight bits n/2 lightweight bits a 0 b 0 D&C sum with redundancy Time n/2 heavy weight bits, 2 parallel computations n/2 lightweight bits +c n/2 = 0 +c n/2 = 1 a 0 b 0 r n/2-1 r 0 = r n r n/2 = r n r n/2 r n/2-1 r 0 Carry c n/2 = 0 or 1, depending on r n/2-1 Ok... You divide the computation in 2 halves... The conquer phase is trivial... Heavy bits of r are here YES c n/2 == 0? Carry c n/2 = 0 or 1, depending on r n/2-1 = r n r n/2 BUT YOU STILL HAVE SEQUENTIAL DEPENDENCY! NO Heavy bits of r are here Parallele Programmierung 25 Parallele Programmierung 26 PRAM complexity of D&C sum with redundancy Conclusion about PRAM 1. T par (n) = T par (n/2) + 1 =... = θ(log n) 2. P(n) = Max{ 3P(n/2) ; 1 } =... = θ(n log 2 (3) ) = θ(n 1.58 ) 3. C(n) = P(n) * T par (n) = θ(n 1.58.log n) >> θ(t 1 (n)) 4. The optimal algorithm is not obvious to obtain! A simple, but powerful model Quantifies the runtime and the processor number. Evaluates the parallel number of operations. Provides complexity classes (NC). An unrealistic model? Use as many processors as you want» This is an aproximation Brent resolves the problem. Homogeneous architecture» Ok... Uniform time for all the memory accesses» This is a real problem! What if there is some network activity in some place? Some say that the new area of shamred-memory systems (multicore) give a new force to PRAM. Parallele Programmierung 27 Parallele Programmierung 28 5
EE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline
More informationLesson 1 1 Introduction
Lesson 1 1 Introduction The Multithreaded DAG Model DAG = Directed Acyclic Graph : a collection of vertices and directed edges (lines with arrows). Each edge connects two vertices. The final result of
More informationThe PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.
The PRAM (Parallel Random Access Memory) model All processors operate synchronously under the control of a common CPU. The PRAM (Parallel Random Access Memory) model All processors operate synchronously
More informationAnalytical Modeling of Parallel Programs
Analytical Modeling of Parallel Programs Alexandre David Introduction to Parallel Computing 1 Topic overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of
More informationScan Primitives for GPU Computing
Scan Primitives for GPU Computing Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 2 Prefix
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationKismet: Parallel Speedup Estimates for Serial Programs
Kismet: Parallel Speedup Estimates for Serial Programs Donghwan Jeon, Saturnino Garcia, Chris Louie, and Michael Bedford Taylor Computer Science and Engineering University of California, San Diego 1 Questions
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationThe PRAM Model. Alexandre David
The PRAM Model Alexandre David 1.2.05 1 Outline Introduction to Parallel Algorithms (Sven Skyum) PRAM model Optimality Examples 11-02-2008 Alexandre David, MVP'08 2 2 Standard RAM Model Standard Random
More informationSorting (Chapter 9) Alexandre David B2-206
Sorting (Chapter 9) Alexandre David B2-206 1 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =
More informationTHE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS
Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it! Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Memory Computer Technology
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationSorting (Chapter 9) Alexandre David B2-206
Sorting (Chapter 9) Alexandre David B2-206 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationCSC630/CSC730 Parallel & Distributed Computing
CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationParallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting
Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks
More informationA Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs Steve Wolfman, based on work by Dan Grossman (with small tweaks by Alan Hu) Learning
More informationParallel Computing Introduction
Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationParallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting
Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks
More informationLecture 3: Sorting 1
Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:
More informationParallel Functional Programming Lecture 1. John Hughes
Parallel Functional Programming Lecture 1 John Hughes Moore s Law (1965) The number of transistors per chip increases by a factor of two every year two years (1975) Number of transistors What shall we
More informationCSCI 104 Log Structured Merge Trees. Mark Redekopp
1 CSCI 10 Log Structured Merge Trees Mark Redekopp Series Summation Review Let n = 1 + + + + k = σk i=0 n = k+1-1 i. What is n? What is log (1) + log () + log () + log (8)++ log ( k ) = 0 + 1 + + 3+ +
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor
More informationShared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation
Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,
More informationParallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?
Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple
More informationPROGRAM EFFICIENCY & COMPLEXITY ANALYSIS
Lecture 03-04 PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS By: Dr. Zahoor Jan 1 ALGORITHM DEFINITION A finite set of statements that guarantees an optimal solution in finite interval of time 2 GOOD ALGORITHMS?
More informationDesign and Analysis of Parallel Programs. Parallel Cost Models. Outline. Foster s Method for Design of Parallel Programs ( PCAM )
Outline Design and Analysis of Parallel Programs TDDD93 Lecture 3-4 Christoph Kessler PELAB / IDA Linköping university Sweden 2017 Design and analysis of parallel algorithms Foster s PCAM method for the
More informationLecture 8 Parallel Algorithms II
Lecture 8 Parallel Algorithms II Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Original slides from Introduction to Parallel
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationExample of a Parallel Algorithm
-1- Part II Example of a Parallel Algorithm Sieve of Eratosthenes -2- -3- -4- -5- -6- -7- MIMD Advantages Suitable for general-purpose application. Higher flexibility. With the correct hardware and software
More informationSCALABILITY ANALYSIS
SCALABILITY ANALYSIS PERFORMANCE AND SCALABILITY OF PARALLEL SYSTEMS Evaluation Sequential: runtime (execution time) Ts =T (InputSize) Parallel: runtime (start-->last PE ends) Tp =T (InputSize,p,architecture)
More informationClaude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique
Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES
More informationCSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms
Dr Izadi CSE-40533 Introduction to Parallel Processing Chapter 5 PRAM and Basic Algorithms Define PRAM and its various submodels Show PRAM to be a natural extension of the sequential computer (RAM) Develop
More informationCS575: Parallel Processing Sanjay Rajopadhye CSU. Course Topics
CS575: Parallel Processing Sanjay Rajopadhye CSU Lecture 2: Parallel Computer Models CS575 lecture 1 Course Topics Introduction, background Complexity, orders of magnitude, recurrences Models of parallel
More informationMergesort again. 1. Split the list into two equal parts
Quicksort Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4 Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7 Mergesort
More informationParallel Random-Access Machines
Parallel Random-Access Machines Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS3101 (Moreno Maza) Parallel Random-Access Machines CS3101 1 / 69 Plan 1 The PRAM Model 2 Performance
More informationPrinciples of Algorithm Design
Principles of Algorithm Design When you are trying to design an algorithm or a data structure, it s often hard to see how to accomplish the task. The following techniques can often be useful: 1. Experiment
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Simon D. Levy BIOL 274 17 November 2010 Chapter 12 12.1: Concurrent Processing High-Performance Computing A fancy term for computers significantly faster than
More information«Computer Science» Requirements for applicants by Innopolis University
«Computer Science» Requirements for applicants by Innopolis University Contents Architecture and Organization... 2 Digital Logic and Digital Systems... 2 Machine Level Representation of Data... 2 Assembly
More informationCOMP 633 Parallel Computing.
COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!
More informationAlgorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I
Algorithm Analysis College of Computing & Information Technology King Abdulaziz University CPCS-204 Data Structures I Order Analysis Judging the Efficiency/Speed of an Algorithm Thus far, we ve looked
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationChapter 13 Strong Scaling
Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables
More informationCray XE6 Performance Workshop
Cray XE6 erformance Workshop odern HC Architectures David Henty d.henty@epcc.ed.ac.uk ECC, University of Edinburgh Overview Components History Flynn s Taxonomy SID ID Classification via emory Distributed
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationParallel Programming Concepts. Parallel Algorithms. Peter Tröger
Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More information10th August Part One: Introduction to Parallel Computing
Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationIntroduction. CSCI 4850/5850 High-Performance Computing Spring 2018
Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationDESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID
I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort
More informationMultithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa
CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part I Lecture 4, Jan 25, 2012 Majd F. Sakr and Mohammad Hammoud Today Last 3 sessions Administrivia and Introduction to Cloud Computing Introduction to Cloud
More informationCSCE 626 Experimental Evaluation.
CSCE 626 Experimental Evaluation http://parasol.tamu.edu Introduction This lecture discusses how to properly design an experimental setup, measure and analyze the performance of parallel algorithms you
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationPrinciples of Parallel Algorithm Design: Concurrency and Decomposition
Principles of Parallel Algorithm Design: Concurrency and Decomposition John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 2 12 January 2017 Parallel
More informationAnalytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors
Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went
More informationLet s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.
Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationParallel Partition Revisited
Parallel Partition Revisited Leonor Frias and Jordi Petit Dep. de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya WEA 2008 Overview Partitioning an array with respect to a pivot
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationCS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and
CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims Lecture 10: Asymptotic Complexity and What Makes a Good Algorithm? Suppose you have two possible algorithms or
More informationComplexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2
Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 2 Trivia ISRO has a new supercomputer rated at 220 Tflops Can be extended to Pflops. Consumes only 150 KW of power. LINPACK is
More informationThe typical speedup curve - fixed problem size
Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. The typical speedup curve
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationUNIT 1 ANALYSIS OF ALGORITHMS
UNIT 1 ANALYSIS OF ALGORITHMS Analysis of Algorithms Structure Page Nos. 1.0 Introduction 7 1.1 Objectives 7 1.2 Mathematical Background 8 1.3 Process of Analysis 12 1.4 Calculation of Storage Complexity
More informationOptimize HPC - Application Efficiency on Many Core Systems
Meet the experts Optimize HPC - Application Efficiency on Many Core Systems 2018 Arm Limited Florent Lebeau 27 March 2018 2 2018 Arm Limited Speedup Multithreading and scalability I wrote my program to
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationTheory and Frontiers of Computer Science. Fall 2013 Carola Wenk
Theory and Frontiers of Computer Science Fall 2013 Carola Wenk We have seen so far Computer Architecture and Digital Logic (Von Neumann Architecture, binary numbers, circuits) Introduction to Python (if,
More informationDistributed Systems CS /640
Distributed Systems CS 15-440/640 Programming Models Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud andvinay Kolar 1 Objectives Discussion on Programming Models
More informationCSE 4500 (228) Fall 2010 Selected Notes Set 2
CSE 4500 (228) Fall 2010 Selected Notes Set 2 Alexander A. Shvartsman Computer Science and Engineering University of Connecticut Copyright c 2002-2010 by Alexander A. Shvartsman. All rights reserved. 2
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance
More informationCSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015
CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis Kevin Quinn Fall 2015 Outline Done: How to write a parallel algorithm with fork and join Why using divide-and-conquer
More informationPARALLEL MEMORY ARCHITECTURE
PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 is due tonight n The last
More informationCS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism
CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationCSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Parallel Sorting Algorithms Instructor: Haidar M. Harmanani Spring 2016 Topic Overview Issues in Sorting on Parallel Computers Sorting
More informationProgramming at Scale: Concurrency
Programming at Scale: Concurrency 1 Goal: Building Fast, Scalable Software How do we speed up software? 2 What is scalability? A system is scalable if it can easily adapt to increased (or reduced) demand
More informationCOMP Parallel Computing. PRAM (4) PRAM models and complexity
COMP 633 - Parallel Computing Lecture 5 September 4, 2018 PRAM models and complexity Reading for Thursday Memory hierarchy and cache-based systems Topics Comparison of PRAM models relative performance
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationHigh Performance Computing. Introduction to Parallel Computing
High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
More informationCS61C : Machine Structures
CS C L.. Cache I () Design Principles for Hardware CSC : Machine Structures Lecture.. Cache I -- Kurt Meinz inst.eecs.berkeley.edu/~csc. Simplicity favors regularity Every instruction has operands, opcode
More informationIE 495 Lecture 3. Septermber 5, 2000
IE 495 Lecture 3 Septermber 5, 2000 Reading for this lecture Primary Miller and Boxer, Chapter 1 Aho, Hopcroft, and Ullman, Chapter 1 Secondary Parberry, Chapters 3 and 4 Cosnard and Trystram, Chapter
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More information15-750: Parallel Algorithms
5-750: Parallel Algorithms Scribe: Ilari Shafer March {8,2} 20 Introduction A Few Machine Models of Parallel Computation SIMD Single instruction, multiple data: one instruction operates on multiple data
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More information1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors
1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors on an EREW PRAM: See solution for the next problem. Omit the step where each processor sequentially computes the AND of
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationMultiprocessor scheduling
Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More information