Outline: Embarrassingly Parallel Problems. Example#1: Computation of the Mandelbrot Set. Embarrassingly Parallel Problems. The Mandelbrot Set

Size: px
Start display at page:

Download "Outline: Embarrassingly Parallel Problems. Example#1: Computation of the Mandelbrot Set. Embarrassingly Parallel Problems. The Mandelbrot Set"

Transcription

1 Outline: Embarrassingly Parallel Problems Example#1: Computation of the Mandelbrot Set what they are Mandelbrot Set computation cost considerations static parallelization dynamic parallelizations and its analysis Monte Carlo Methods parallel random number generation Ref: Lin and Snyder Ch 5, Wilkinson and Allen Ch 3 Admin: reminder - pracs this week, get your NCI accounts!; parallel programming poll News: Improving Energy Efficiency and Exploiting Parallelism with Processing in Memory and Near-Data Processing COMP4300/8300 L8: Embarrassingly Parallel Problems COMP4300/8300 L8: Embarrassingly Parallel Problems Embarrassingly Parallel Problems computation can be divided into completely independent parts for execution by separate processors (correspond to totally disconnected computational graphs) infrastructure: Blocks of Independent Computations (BOINC) project SETI@home and Folding@Home are projects solving very large such problems part of an application may be embarrassingly parallel distribution and collection of data are the key issues (can be non-trivial and/or costly) frequently uses the master/slave approach (p 1 speedup) The Mandelbrot Set set of points in complex plane that are quasi-stable computed by iterating the function z k+1 = z 2 k + c z and c are complex numbers (z = a + bi) z initially zero c gives the position of the point in the complex plane iterations continue until z >2 or some arbitrary iteration limit is reached z = a 2 + b 2 enclosed by a circle centered at (0,0) of radius 2 COMP4300/8300 L8: Embarrassingly Parallel Problems COMP4300/8300 L8: Embarrassingly Parallel Problems

2 Evaluating 1 Point Cost Considerations on NCI s Raijin t2 str t 1{ t r 1 st t 1 t r t 1 1 { t t 1 3 { t t t sq { t 3 r 3 r 3 3 r 3 3 r 3 3 r t t sq 3 r 3 r 3 3 t sq < t < 1 t r r t r t 10 flops per iteration maximum 256 iterations per point approximate time on one Raijin core: /( ) 0.12usec between two nodes the time to communicate single point to slave and receive result 2 2usec (latency limited) conclusion: cannot parallelize over individual points also must allow time for master to send to all slaves before it can return to any given process COMP4300/8300 L8: Embarrassingly Parallel Problems COMP4300/8300 L8: Embarrassingly Parallel Problems Building the Full Image Parallelisation: Static Define: P min. and max. values for (usually -2 to 2) number of horizontal and vertical pixels r 1 1 < t 1 r 2 2 < t 2 { r r t 1 1 r r t t 2 1 t r 1 s r P Summary: t t totally independent tasks each task can be of different length COMP4300/8300 L8: Embarrassingly Parallel Problems COMP4300/8300 L8: Embarrassingly Parallel Problems

3 Master: Static Implementation r s r s < r s { s r s r r t r r 1 1 < t t 1 { r 1 2 r 2 r ss r s r Slave: st t st r r r rstr st r str rstr t r t r 1 1 < t 1 r 2 rstr 2 < str 2 { r r t 1 1 r r t t 2 1 t r 1 s 1 2 r st r COMP4300/8300 L8: Embarrassingly Parallel Problems t Processor Farm: Master r r s s < r { s r s t t r { r s r r 2 r r s t t t r < t { s r s t t r s s r s t r t r t s 2 t r r r t > COMP4300/8300 L8: Embarrassingly Parallel Problems Dynamic Task Assignment discussion point: why would we expect static assignment to be sub-optimal for the Mandelbrot set calculation? Would any regular static decomposition be significantly better (or worse)? pool of over-decomposed) tasks that are dynamically assigned to next requesting process Processor Farm: Slave st t st r r r 2 st r 2 t s r t s r t t t { t 2 1 t r 1 1 < t 1 { r r t 1 1 r r t r 1 1 s 2 2 r st r r s t t r 2 st r s r t COMP4300/8300 L8: Embarrassingly Parallel Problems COMP4300/8300 L8: Embarrassingly Parallel Problems

4 Analysis Let p, m, n, I denote r t t 1 t r: sequential time: (t f denotes floating point operation time) t seq I mn t f = O(mn) parallel communication 1: (neglect t h term, assume message length of 1 word) t com1 = 2(p 1)(t s +t w ) parallel computation: t comp I mn p 1 t f parallel communication 2: t com2 = overall: m p 1 (t s +t w ) t par I mn p 1 t f + (p 1+ m p 1 )(t s +t w ) Discussion point: What assumptions have we been making here? Are there any situations where we might still have poor performance, and how could we mitigate this? COMP4300/8300 L8: Embarrassingly Parallel Problems evaluation of integral (x 1 x i x 2 ) example r < { 1r r 1 1 s 1r 1r 1r r s 1 1 Monte Carlo Integration x2 1 N area = f (x)dx = lim x 1 N N f (x i )(x 2 x 1 ) i=1 x2 I = (x 2 3x)dx x 1 where r 1 1 computes a pseudo-random number between 1 and 1 COMP4300/8300 L8: Embarrassingly Parallel Problems Example#2: Monte Carlo Methods use random numbers to solve numerical/physical problems evaluation of π by determining if random points in the unit square fall inside a circle area of circle area of square = π(1)2 2 2 = π 4 Parallelization only problem is ensuring each process uses a different random number and that there is no correlation one solution is to have a unique process (maybe the master) issuing random numbers to the slaves P COMP4300/8300 L8: Embarrassingly Parallel Problems COMP4300/8300 L8: Embarrassingly Parallel Problems

5 Parallel Code: Integration Master (process 0): r < { r < 1r r 1 1 r 2 r r q t s r s 1r sr t r < r { r r q t s st t r s r Slave: st t st r r s st r r q t r 1r st r t t t { r < s 1r 1r 1r s r q t r 1r st r t r s P r Question: performance problems with this code? COMP4300/8300 L8: Embarrassingly Parallel Problems Parallel Random Numbers linear congruential generators x i+1 = (ax i + c) mod m (a, c, and m are constants) using property x i+p = (A(a, p,m)x i +C(c,a, p,m)) mod m, we can generate the first p random numbers sequentially to repeatedly calculate the next p numbers in parallel Summary: embarrassingly parallel problems defining characteristic: tasks do not need to communicate non-trivial however: providing input data to tasks, assembling results, load balancing, scheduling, heterogeneous compute resources, costing static task assignment (lower communication costs) vs. dynamic task assignment + overdecomposition (better load balance) Monte Carlo or ensemble simulations are a big use of computational power! the field of grid computing arose to solve this issue COMP4300/8300 L8: Embarrassingly Parallel Problems

Outline: Embarrassingly Parallel Problems

Outline: Embarrassingly Parallel Problems Outline: Embarrassingly Parallel Problems what they are Mandelbrot Set computation cost considerations static parallelization dynamic parallelizations and its analysis Monte Carlo Methods parallel random

More information

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies slides3-1 Parallel Techniques Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Load Balancing

More information

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies. Load Balancing and Termination Detection

Parallel Techniques. Embarrassingly Parallel Computations. Partitioning and Divide-and-Conquer Strategies. Load Balancing and Termination Detection Parallel Techniques Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Load Balancing and Termination

More information

Algorithms PART I: Embarrassingly Parallel. HPC Fall 2012 Prof. Robert van Engelen

Algorithms PART I: Embarrassingly Parallel. HPC Fall 2012 Prof. Robert van Engelen Algorithms PART I: Embarrassingly Parallel HPC Fall 2012 Prof. Robert van Engelen Overview Ideal parallelism Master-worker paradigm Processor farms Examples Geometrical transformations of images Mandelbrot

More information

Embarrassingly Parallel Computations

Embarrassingly Parallel Computations Parallel Techniques Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Strategies that achieve

More information

Embarrassingly Parallel Computations

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that can be divided into a number of completely independent parts, each of which can be executed by a separate processor. Input data Processes Results

More information

Common Parallel Programming Paradigms

Common Parallel Programming Paradigms Parallel Program Models Last Time» Message Passing Model» Message Passing Interface (MPI) Standard» Examples Today» Embarrassingly Parallel» Master-Worer Reminders/Announcements» Homewor #3 is due Wednesday,

More information

Embarrassingly Parallel Computations Creating the Mandelbrot set

Embarrassingly Parallel Computations Creating the Mandelbrot set Embarrassingly Parallel Computations Creating the Mandelbrot set Péter Kacsuk Laboratory of Parallel and Distributed Systems MTA SZTAKI Research Institute kacsuk@sztaki.hu www.lpds.sztaki.hu Definition

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

Computational Methods. Randomness and Monte Carlo Methods

Computational Methods. Randomness and Monte Carlo Methods Computational Methods Randomness and Monte Carlo Methods Manfred Huber 2010 1 Randomness and Monte Carlo Methods Introducing randomness in an algorithm can lead to improved efficiencies Random sampling

More information

Monte Carlo Methods; Combinatorial Search

Monte Carlo Methods; Combinatorial Search Monte Carlo Methods; Combinatorial Search Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 22, 2012 CPD (DEI / IST) Parallel and

More information

Combinatorial Search; Monte Carlo Methods

Combinatorial Search; Monte Carlo Methods Combinatorial Search; Monte Carlo Methods Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 02, 2016 CPD (DEI / IST) Parallel and Distributed

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

Parallelization Strategies ASD Distributed Memory HPC Workshop

Parallelization Strategies ASD Distributed Memory HPC Workshop Parallelization Strategies ASD Distributed Memory HPC Workshop Computer Systems Group Research School of Computer Science Australian National University Canberra, Australia November 01, 2017 Day 3 Schedule

More information

Embarrassingly Parallel Computations

Embarrassingly Parallel Computations Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that can be divided into completely independent parts, each of which can be executed on a separate process(or) is

More information

Big Orange Bramble. August 09, 2016

Big Orange Bramble. August 09, 2016 Big Orange Bramble August 09, 2016 Overview HPL SPH PiBrot Numeric Integration Parallel Pi Monte Carlo FDS DANNA HPL High Performance Linpack is a benchmark for clusters Created here at the University

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Monte Carlo Integration and Random Numbers

Monte Carlo Integration and Random Numbers Monte Carlo Integration and Random Numbers Higher dimensional integration u Simpson rule with M evaluations in u one dimension the error is order M -4! u d dimensions the error is order M -4/d u In general

More information

Lecture 4: Locality and parallelism in simulation I

Lecture 4: Locality and parallelism in simulation I Lecture 4: Locality and parallelism in simulation I David Bindel 6 Sep 2011 Logistics Distributed memory machines Each node has local memory... and no direct access to memory on other nodes Nodes communicate

More information

Parallel Algorithm Design. CS595, Fall 2010

Parallel Algorithm Design. CS595, Fall 2010 Parallel Algorithm Design CS595, Fall 2010 1 Programming Models The programming model o determines the basic concepts of the parallel implementation and o abstracts from the hardware as well as from the

More information

Fractals. Investigating task farms and load imbalance

Fractals. Investigating task farms and load imbalance Fractals Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

CMPSC 311- Introduction to Systems Programming Module: Caching

CMPSC 311- Introduction to Systems Programming Module: Caching CMPSC 311- Introduction to Systems Programming Module: Caching Professor Patrick McDaniel Fall 2016 Reminder: Memory Hierarchy L0: Registers CPU registers hold words retrieved from L1 cache Smaller, faster,

More information

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 5 Part 2 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Static work allocation Where work distribution is predetermined, but based on what? Typical scheme Divide n size data into P

More information

ˆ Note that we often make a trade-off between time and space. ˆ Time complexity ˆ Space complexity. ˆ Unlike time, we can reuse memory.

ˆ Note that we often make a trade-off between time and space. ˆ Time complexity ˆ Space complexity. ˆ Unlike time, we can reuse memory. ˆ We use O-notation to describe the asymptotic 1 upper bound of complexity of the algorithm. ˆ So O-notation is widely used to classify algorithms by how they respond to changes in its input size. 2 ˆ

More information

Fractals exercise. Investigating task farms and load imbalance

Fractals exercise. Investigating task farms and load imbalance Fractals exercise Investigating task farms and load imbalance Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

What is Performance for Internet/Grid Computation?

What is Performance for Internet/Grid Computation? Goals for Internet/Grid Computation? Do things you cannot otherwise do because of: Lack of Capacity Large scale computations Cost SETI Scale/Scope of communication Internet searches All of the above 9/10/2002

More information

GPU Integral Computations in Stochastic Geometry

GPU Integral Computations in Stochastic Geometry Outline GPU Integral Computations in Stochastic Geometry Department of Computer Science Western Michigan University June 25, 2013 Outline Outline 1 Introduction 2 3 4 Outline Outline 1 Introduction 2 3

More information

Parallelizing a Monte Carlo simulation of the Ising model in 3D

Parallelizing a Monte Carlo simulation of the Ising model in 3D Parallelizing a Monte Carlo simulation of the Ising model in 3D Morten Diesen, Erik Waltersson 2nd November 24 Contents 1 Introduction 2 2 Description of the Physical Model 2 3 Programs 3 3.1 Outline of

More information

= f (a, b) + (hf x + kf y ) (a,b) +

= f (a, b) + (hf x + kf y ) (a,b) + Chapter 14 Multiple Integrals 1 Double Integrals, Iterated Integrals, Cross-sections 2 Double Integrals over more general regions, Definition, Evaluation of Double Integrals, Properties of Double Integrals

More information

Overview of Project's Achievements

Overview of Project's Achievements PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,

More information

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations

Parallel Hybrid Monte Carlo Algorithms for Matrix Computations Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University

More information

Philipp Slusallek Karol Myszkowski. Realistic Image Synthesis SS18 Instant Global Illumination

Philipp Slusallek Karol Myszkowski. Realistic Image Synthesis SS18 Instant Global Illumination Realistic Image Synthesis - Instant Global Illumination - Karol Myszkowski Overview of MC GI methods General idea Generate samples from lights and camera Connect them and transport illumination along paths

More information

Random and Parallel Algorithms a journey from Monte Carlo to Las Vegas

Random and Parallel Algorithms a journey from Monte Carlo to Las Vegas Random and Parallel Algorithms a journey from Monte Carlo to Las Vegas Google Maps couldn t help! Bogdán Zaválnij Institute of Mathematics and Informatics University of Pecs Ljubljana, 2014 Bogdán Zaválnij

More information

Module 6: Pinhole camera model Lecture 32: Coordinate system conversion, Changing the image/world coordinate system

Module 6: Pinhole camera model Lecture 32: Coordinate system conversion, Changing the image/world coordinate system The Lecture Contains: Back-projection of a 2D point to 3D 6.3 Coordinate system conversion file:///d /...(Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2032/32_1.htm[12/31/2015

More information

Message Passing Interface (MPI)

Message Passing Interface (MPI) What the course is: An introduction to parallel systems and their implementation using MPI A presentation of all the basic functions and types you are likely to need in MPI A collection of examples What

More information

CMPSC 311- Introduction to Systems Programming Module: Caching

CMPSC 311- Introduction to Systems Programming Module: Caching CMPSC 311- Introduction to Systems Programming Module: Caching Professor Patrick McDaniel Fall 2014 Lecture notes Get caching information form other lecture http://hssl.cs.jhu.edu/~randal/419/lectures/l8.5.caching.pdf

More information

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel

More information

Parallel Monte Carlo Sampling Scheme for Sphere and Hemisphere

Parallel Monte Carlo Sampling Scheme for Sphere and Hemisphere Parallel Monte Carlo Sampling Scheme for Sphere and Hemisphere I.T. Dimov 1,A.A.Penzov 2, and S.S. Stoilova 3 1 Institute for Parallel Processing, Bulgarian Academy of Sciences Acad. G. Bonchev Str., bl.

More information

COMP4300/8300: Parallelisation via Data Partitioning. Alistair Rendell

COMP4300/8300: Parallelisation via Data Partitioning. Alistair Rendell COMP4300/8300: Parallelisation via Data Partitioning Chapter 5: Lin and Snyder Chapter 4: Wilkinson and Allen Alistair Rendell COMP4300 Lecture 7-1 Copyright c 2015 The Australian National University 7.1

More information

Pipelined Computations

Pipelined Computations Pipelined Computations In the pipeline technique, the problem is divided into a series of tasks that have to be completed one after the other. In fact, this is the basis of sequential programming. Each

More information

CS1210 Lecture 27 Mar. 25, 2019

CS1210 Lecture 27 Mar. 25, 2019 CS1210 Lecture 27 Mar. 25, 2019 Discussion section exam scores posted score # people 0-5 6-10 11-15 16-20 21-25 26-30 28 48 39 37 30 9 median: 13 HW6 available Wednesday Today Some problems involving randomization

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Data Structures Hashing Structures Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Hashing Structures I. Motivation and Review II. Hash Functions III. HashTables I. Implementations

More information

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering

A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering A Parallel Hardware Architecture for Information-Theoretic Adaptive Filtering HPRCTA 2010 Stefan Craciun Dr. Alan D. George Dr. Herman Lam Dr. Jose C. Principe November 14, 2010 NSF CHREC Center ECE Department,

More information

The Vibrating String

The Vibrating String CS 789 Multiprocessor Programming The Vibrating String School of Computer Science Howard Hughes College of Engineering University of Nevada, Las Vegas (c) Matt Pedersen, 200 P P2 P3 P4 The One-Dimensional

More information

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart

More information

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014 Introduction to MPI EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2014 References This presentation is almost an exact copy of Dartmouth College's Introduction

More information

Parallel Programming Patterns

Parallel Programming Patterns Parallel Programming Patterns Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Copyright 2013, 2017, 2018 Moreno Marzolla, Università

More information

Photon Maps. The photon map stores the lighting information on points or photons in 3D space ( on /near 2D surfaces)

Photon Maps. The photon map stores the lighting information on points or photons in 3D space ( on /near 2D surfaces) Photon Mapping 1/36 Photon Maps The photon map stores the lighting information on points or photons in 3D space ( on /near 2D surfaces) As opposed to the radiosity method that stores information on surface

More information

Παράλληλη Επεξεργασία

Παράλληλη Επεξεργασία Παράλληλη Επεξεργασία Μέτρηση και σύγκριση Παράλληλης Απόδοσης Γιάννος Σαζεϊδης Εαρινό Εξάμηνο 2013 HW 1. Homework #3 due on cuda (summary of Tesla paper on web page) Slides based on Lin and Snyder textbook

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

UCT Algorithm Circle: Probabilistic Algorithms

UCT Algorithm Circle: Probabilistic Algorithms UCT Algorithm Circle: 24 February 2011 Outline 1 2 3 Probabilistic algorithm A probabilistic algorithm is one which uses a source of random information The randomness used may result in randomness in the

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

ASSIGNMENT 2. COMP-202A, Fall 2013, All Sections. Due: October 20 th, 2013 (23:59)

ASSIGNMENT 2. COMP-202A, Fall 2013, All Sections. Due: October 20 th, 2013 (23:59) ASSIGNMENT 2 COMP-202A, Fall 2013, All Sections Due: October 20 th, 2013 (23:59) Please read the entire PDF before starting. You must do this assignment individually and, unless otherwise specified, you

More information

Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons

Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons Dr Kevin Chalmers School of Computing Edinburgh Napier University Edinburgh k.chalmers@napier.ac.uk Overview ˆ I started

More information

Introduction to High Performance Computing

Introduction to High Performance Computing Introduction to High Performance Computing Gregory G. Howes Department of Physics and Astronomy University of Iowa Iowa High Performance Computing Summer School University of Iowa Iowa City, Iowa 25-26

More information

Data Organization and Processing

Data Organization and Processing Data Organization and Processing Spatial Join (NDBI007) David Hoksza http://siret.ms.mff.cuni.cz/hoksza Outline Spatial join basics Relational join Spatial join Spatial join definition (1) Given two sets

More information

Least Squares; Sequence Alignment

Least Squares; Sequence Alignment Least Squares; Sequence Alignment 1 Segmented Least Squares multi-way choices applying dynamic programming 2 Sequence Alignment matching similar words applying dynamic programming analysis of the algorithm

More information

Message-Passing Computing Examples

Message-Passing Computing Examples Message-Passing Computing Examples Problems with a very large degree of parallelism: Image Transformations: Shifting, Rotation, Clipping etc. Mandelbrot Set: Sequential, static assignment, dynamic work

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Concepts from High-Performance Computing

Concepts from High-Performance Computing Concepts from High-Performance Computing Lecture A - Overview of HPC paradigms OBJECTIVE: The clock speeds of computer processors are topping out as the limits of traditional computer chip technology are

More information

Monte Carlo Techniques. Professor Stephen Sekula Guest Lecture PHY 4321/7305 Sep. 3, 2014

Monte Carlo Techniques. Professor Stephen Sekula Guest Lecture PHY 4321/7305 Sep. 3, 2014 Monte Carlo Techniques Professor Stephen Sekula Guest Lecture PHY 431/7305 Sep. 3, 014 What are Monte Carlo Techniques? Computational algorithms that rely on repeated random sampling in order to obtain

More information

CSCE GPU PROJECT PRESENTATION SCALABLE PARALLEL RANDOM NUMBER GENERATION

CSCE GPU PROJECT PRESENTATION SCALABLE PARALLEL RANDOM NUMBER GENERATION CSCE 5013-002 GPU PROJECT PRESENTATION SCALABLE PARALLEL RANDOM NUMBER GENERATION Farhad Parsan 10/25/2010 OUTLINE Introduction Design Performance Live Demonstration Conclusion 2 INTRODUCTION Scalable

More information

COMP1730/COMP6730 Programming for Scientists. Strings

COMP1730/COMP6730 Programming for Scientists. Strings COMP1730/COMP6730 Programming for Scientists Strings Lecture outline * Sequence Data Types * Character encoding & strings * Indexing & slicing * Iteration over sequences Sequences * A sequence contains

More information

6. The Mandelbrot Set

6. The Mandelbrot Set 1 The Mandelbrot Set King of mathematical monsters Im 0-1 -2-1 0 Re Christoph Traxler Fractals-Mandelbrot 1 Christoph Traxler Fractals-Mandelbrot 2 6.1 Christoph Traxler Fractals-Mandelbrot 3 Christoph

More information

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Patterns for! Parallel Programming II!

Patterns for! Parallel Programming II! Lecture 4! Patterns for! Parallel Programming II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Task Decomposition Also known as functional

More information

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision!

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision! Fitting EECS 598-08 Fall 2014! Foundations of Computer Vision!! Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! Readings: FP 10; SZ 4.3, 5.1! Date: 10/8/14!! Materials on these

More information

Big picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures

Big picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures Internal sorting Comp Sci 1575 Data Structures Admin notes Advising appointments will eclipse office hours this week, so no guarantees about availability during normal times. With 130 appointments at 15

More information

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into 2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel

More information

Motion estimation for video compression

Motion estimation for video compression Motion estimation for video compression Blockmatching Search strategies for block matching Block comparison speedups Hierarchical blockmatching Sub-pixel accuracy Motion estimation no. 1 Block-matching

More information

Computational modeling

Computational modeling Computational modeling Lecture 8 : Monte Carlo Instructor : Cedric Weber Course : 4CCP1000 Integrals, how to calculate areas? 2 Integrals, calculate areas There are different ways to calculate areas Integration

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs 2014 IJEDR Volume 2, Issue 1 ISSN: 2321-9939 Analytical Modeling of Parallel Programs Hardik K. Molia Master of Computer Engineering, Department of Computer Engineering Atmiya Institute of Technology &

More information

MPI Programming. Henrik R. Nagel Scientific Computing IT Division

MPI Programming. Henrik R. Nagel Scientific Computing IT Division 1 MPI Programming Henrik R. Nagel Scientific Computing IT Division 2 Outline Introduction Finite Difference Method Finite Element Method LU Factorization SOR Method Monte Carlo Method Molecular Dynamics

More information

GAMES Webinar: Rendering Tutorial 2. Monte Carlo Methods. Shuang Zhao

GAMES Webinar: Rendering Tutorial 2. Monte Carlo Methods. Shuang Zhao GAMES Webinar: Rendering Tutorial 2 Monte Carlo Methods Shuang Zhao Assistant Professor Computer Science Department University of California, Irvine GAMES Webinar Shuang Zhao 1 Outline 1. Monte Carlo integration

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 13 Random Numbers and Stochastic Simulation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright

More information

Scalable Vector Graphics (SVG) vector image World Wide Web Consortium (W3C) defined with XML searched indexed scripted compressed Mozilla Firefox

Scalable Vector Graphics (SVG) vector image World Wide Web Consortium (W3C) defined with XML searched indexed scripted compressed Mozilla Firefox SVG SVG Scalable Vector Graphics (SVG) is an XML-based vector image format for twodimensional graphics with support for interactivity and animation. The SVG specification is an open standard developed

More information

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche Hélène Renard 19th International Heterogeneity in Computing Workshop April 19, 2010 École polytechnique universitaire

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

(Lec 14) Placement & Partitioning: Part III

(Lec 14) Placement & Partitioning: Part III Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style

More information

Parallel Computer Architecture and Programming Final Project

Parallel Computer Architecture and Programming Final Project Muhammad Hilman Beyri (mbeyri), Zixu Ding (zixud) Parallel Computer Architecture and Programming Final Project Summary We have developed a distributed interactive ray tracing application in OpenMP and

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal

More information

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation Unit 5 SIMULATION THEORY Lesson 39 Learning objective: To learn random number generation. Methods of simulation. Monte Carlo method of simulation You ve already read basics of simulation now I will be

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Computational modeling

Computational modeling Computational modeling Lecture 5 : Monte Carlo Integration Physics: Integration The Monte Carlo method Programming: Subroutine Differences Subroutine/Function The Fortran random number subroutine Monte

More information

COMP528: Multi-core and Multi-Processor Computing

COMP528: Multi-core and Multi-Processor Computing COMP528: Multi-core and Multi-Processor Computing Dr Michael K Bane, G14, Computer Science, University of Liverpool m.k.bane@liverpool.ac.uk https://cgi.csc.liv.ac.uk/~mkbane/comp528 17 Background Reading

More information

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 Lecture Outline Heterogenous multi-core systems and general purpose GPU programming Programming models Heterogenous multi-kernels

More information

PSEUDORANDOM numbers are very important in practice

PSEUDORANDOM numbers are very important in practice Proceedings of the 2013 Federated Conference on Computer Science and Information Systems pp. 515 519 Template Library for Multi- Pseudorandom Number Recursion-based Generars Dominik Szałkowski Institute

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3 Depth First Search A B C D E F G A 4 3 2 B 4 5 4 3 C 5 D 3 4 2 E 2 2 3 F 3 2 G 2 3 Minimum (Weight) Spanning Trees Let G be a graph with weights on the edges. We define the weight of any subgraph of G

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course

More information

x + 2 = 0 or Our limits of integration will apparently be a = 2 and b = 4.

x + 2 = 0 or Our limits of integration will apparently be a = 2 and b = 4. QUIZ ON CHAPTER 6 - SOLUTIONS APPLICATIONS OF INTEGRALS; MATH 15 SPRING 17 KUNIYUKI 15 POINTS TOTAL, BUT 1 POINTS = 1% Note: The functions here are continuous on the intervals of interest. This guarantees

More information

Hardware Accelerators

Hardware Accelerators Hardware Accelerators José Costa Software for Embedded Systems Departamento de Engenharia Informática (DEI) Instituto Superior Técnico 2014-04-08 José Costa (DEI/IST) Hardware Accelerators 1 Outline Hardware

More information

Parallelization Strategy

Parallelization Strategy COSC 6374 Parallel Computation Algorithm structure Spring 2008 Parallelization Strategy Finding Concurrency Structure the problem to expose exploitable concurrency Algorithm Structure Supporting Structure

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Polar Coordinates. Chapter 10: Parametric Equations and Polar coordinates, Section 10.3: Polar coordinates 27 / 45

Polar Coordinates. Chapter 10: Parametric Equations and Polar coordinates, Section 10.3: Polar coordinates 27 / 45 : Given any point P = (x, y) on the plane r stands for the distance from the origin (0, 0). θ stands for the angle from positive x-axis to OP. Polar coordinate: (r, θ) Chapter 10: Parametric Equations

More information

Professor Stephen Sekula

Professor Stephen Sekula Monte Carlo Techniques Professor Stephen Sekula Guest Lecture PHYS 4321/7305 What are Monte Carlo Techniques? Computational algorithms that rely on repeated random sampling in order to obtain numerical

More information