Structured Parallel Programming Patterns for Efficient Computation

Size: px
Start display at page:

Download "Structured Parallel Programming Patterns for Efficient Computation"

Transcription

1 Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann Publishers is an imprint of Elsevier M<

2 Contents Listings Preface Preliminaries xv xix xxiii CHAPTER 1 Introduction Think Parallel Performance Motivation: Pervasive Parallelism Hardware Trends Encouraging Parallelism Observed Historical Trends in Parallelism Need for Explicit Parallel Programming Structured Pattern-Based Programming Parallel Programming Models Desired Properties Abstractions Instead of Mechanisms Expression of Regular Data Parallelism Composability Portability of Functionality Performance Portability Safety, Determinism, and Maintainability Overview of Programming Models Used When to Use Which Model? Organization of this Book Summary 38 CHAPTER 2 Background Vocabulary and Notation Strategies Mechanisms Machine Models Machine Model Key Features for Performance Flynn's Characterization Evolution Performance Theory Latency and Throughput Speedup, Efficiency, and Scalability 56 V

3 vi Contents Power Amdahl's Law Gustafson-Barsis' Law Work-Span Model Asymptotic Complexity Asymptotic Speedup and Efficiency Little's Formula Pitfalls Race Conditions Mutual Exclusion and Locks Deadlock Strangled Scaling Lack of Locality Load Imbalance Overhead Summary 75 PART I PATTERNS CHAPTER 3 Patterns Nesting Pattern Structured Serial Control Flow Patterns Sequence Selection Iteration Recursion Parallel Control Patterns Fork-Join Map Stencil Reduction Scan Recurrence Serial Data Management Patterns Random Read and Write Stack Allocation Heap Allocation Closures Objects 97

4 Contents vii 3.5 Parallel Data Management Patterns Pack Pipeline Geometrie Decomposition Gather Scatter Other Parallel Patterns Superscalar Sequences Futures Speculative Selection Workpile Search Segmentation Expand Category Reduction Term Graph Rewriting Non-Deterministic Patterns Branch and Bound Transactions Programming Model Support for Patterns CilkPlus Threading Building Blocks OpenMP Array Building Blocks OpenCL Summary 118 CHAPTER 4 Map Map Scaled Vector Addition (SAXPY) Description of the Problem Serial Implementation TBB CilkPlus Cilk Plus with Array Notation OpenMP ArBB Using Vector Operations ArBB Using Elemental Functions OpenCL 130

5 viii Contents 4.3 Mandelbrot Description of the Problem Serial Implementation TBB Cilk Plus Cilk Plus with Array Notations OpenMP ArBB OpenCL Sequence of Maps versus Map of Sequence Comparison of Parallel Models Related Patterns Stencil Workpile Divide-and-conquer Summary 143 CHAPTERS Collectives Reduce Reordering Computations Vectorization Tiling Precision Implementation Fusing Map and Reduce Explicit Fusion in TBB Explicit Fusion in Cilk Plus Automatic Fusion in ArBB DotProduct Description of the Problem Serial Implementation SSE Intrinsics TBB Cilk Plus OpenMP ArBB Scan Cilk Plus TBB ArBB OpenMP Fusing Map and Scan 166

6 Contents ix 5.6 Integration Description of the Problem Serial Implementation CilkPlus OpenMP TBB ArBB Summary 177 CHAPTER 6 Data Reorganization Gather General Gather Shift Zip Scatter Atomic Scatter Permutation Scatter Merge Scatter Priority Scatter Converting Scatter to Gather Pack Fusing Map and Pack Geometric Decomposition and Partition Array of Structures vs. Structures of Arrays Summary 197 CHAPTER 7 Stencil and Recurrence Stencil Implementing Stencil with Shift Tiling Stencils for Cache Optimizing Stencils for Communication Recurrence Summary 207 CHAPTER 8 Fork-Join Definition Programming Model Support for Fork-Join Cilk Plus Support for Fork-Join TBB Support for Fork-Join OpenMP Support for Fork-Join Recursive Implementation of Map Choosing Base Cases 217

7 x Contents 8.5 Load Balancing Complexity of Parallel Divide-and-Conquer Karatsuba Multiplication of Polynomials Note on Allocating Scratch Space Cache Locality and Cache-Oblivious Algorithms Quicksort Cilk Quicksort TBB Quicksort Work and Span for Quicksort Reductions and Hyperobjects Implementing Scan with Fork-Join Applying Fork-Join to Recurrences Analysis Flat Fork-Join Summary 251 CHAPTER 9 Pipeline Basic Pipeline Pipeline with Parallel Stages Implementation of a Pipeline Programming Model Support for Pipelines Pipeline in TBB Pipeline in Cilk Plus More General Topologies Mandatory versus Optional Parallelism Summary 262 PART II EXAMPLES CHAPTER 10 Forward Seismic Simulation Background Stencil Computation Impact of Caches on Arithmetic Intensity Raising Arithmetic Intensity with Space-Time Tiling Cilk Plus Code ArBB Implementation Summary 277

8 Contents CHAPTER 11 K-Means Clustering Algorithm K-Means with Cilk Plus Hyperobjects K-Means with TBB Summary 289 CHAPTER 12 Bzip2 Data Compression The Bzip2 Algorithm Three-Stage Pipeline Using TBB Four-Stage Pipeline Using TBB Three-Stage Pipeline Using Cilk Plus Summary 297 CHAPTER 13 Merge Sort Parallel Merge TBB Parallel Merge Work and Span of Parallel Merge Parallel Merge Sort Work and Span of Merge Sort Summary 305 CHAPTER 14 Sample Sort Overall Structure Choosing the Number of Bins Binning TBB Implementation Repacking and Subsorting Performance Analysis of Sample Sort For C++ Experts Summary 313 CHAPTER 15 Cholesky Factorization Fortran Rules! Recursive Cholesky Decomposition Triangular Solve Symmetric Rank Update Where Is the Time Spent? Summary 322

9 xii Contents APPENDICES APPENDIX A Further Reading 325 A.1 Parallel Algorithms and Patterns 325 A.2 Computer Architecture Including Parallel Systems 325 A.3 Parallel Programming Models 326 APPENDIX В Cilk Plus 329 B.1 Shared Design Principles with TBB 329 B.2 Unique Characteristics 329 B.3 Borrowing Components from TBB 331 B.4 Keyword Spelling 332 B.5 cilk_for 332 B.6 ci 1 k.spawn and ci 1 k_sync 333 B.7 Reducers (Hyperobjects) 334 B.7.1 C++ Syntax 335 B.7.2 С Syntax 337 B.8 Array Notation 338 B.8.1 Specifying Array Sections 339 B.8.2 Operations on Array Sections 340 B.8.3 Reductions on Array Sections 341 B.8.4 Implicit Index 342 B.8.5 Avoid Partial Overlap of Array Sections 342 B.9 //pragma simd 343 B. 10 Elemental Functions 344 B.10.1 Attribute Syntax 345 B.11 NoteonC++ll 345 B. 12 Notes on Setup 346 B.13 History 346 B. 14 Summary 347 APPENDIX С TBB 349 С1 Unique Characteristics 349 C.2 Using TBB 350 C.3 parallel-for 351 C.3.1 Ы ocked_range 351 C.3.2 Partitioners 352 C.4 paral 1 el.reduce 353 C.5 paral1el.deterministic.reduce 354 C.6 paral 1 el_pi pel i ne 354 C.7 paral 1 el_i nvoke 354

10 Contents xiii C.8 task_group 355 C.9 task 355 C.9.1 empty_ta s к 356 C.10 atomic 356 C.11 enumerabl e.thread.speci fic 358 C.12 Notes onc++ll 358 C.13 History 359 C.14 Summary 360 APPENDIX D C D.1 Declaring with auto 361 D.2 Lambda Expressions 361 D.3 std : -.move 365 APPENDIXE Glossary 367 Bibliography 391 Index 397

Structured Parallel Programming

Structured Parallel Programming Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming F 'C 3 R'"'C,_,. HO!.-IJJ () An Introduction to Parallel Programming Peter S. Pacheco University of San Francisco ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Application Programming

Application Programming Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris

More information

Algorithmic Graph Theory and Perfect Graphs

Algorithmic Graph Theory and Perfect Graphs Algorithmic Graph Theory and Perfect Graphs Second Edition Martin Charles Golumbic Caesarea Rothschild Institute University of Haifa Haifa, Israel 2004 ELSEVIER.. Amsterdam - Boston - Heidelberg - London

More information

Computers as Components Principles of Embedded Computing System Design

Computers as Components Principles of Embedded Computing System Design Computers as Components Principles of Embedded Computing System Design Third Edition Marilyn Wolf ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Third Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by David Goldberg Xerox Palo

More information

Information Modeling and Relational Databases

Information Modeling and Relational Databases Information Modeling and Relational Databases Second Edition Terry Halpin Neumont University Tony Morgan Neumont University AMSTERDAM» BOSTON. HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

CLASSIC DATA STRUCTURES IN JAVA

CLASSIC DATA STRUCTURES IN JAVA CLASSIC DATA STRUCTURES IN JAVA Timothy Budd Oregon State University Boston San Francisco New York London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Town Hong Kong Montreal CONTENTS

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

"Charting the Course to Your Success!" MOC A Developing High-performance Applications using Microsoft Windows HPC Server 2008

Charting the Course to Your Success! MOC A Developing High-performance Applications using Microsoft Windows HPC Server 2008 Description Course Summary This course provides students with the knowledge and skills to develop high-performance computing (HPC) applications for Microsoft. Students learn about the product Microsoft,

More information

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY

Programming. In Ada JOHN BARNES TT ADDISON-WESLEY Programming In Ada 2005 JOHN BARNES... TT ADDISON-WESLEY An imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong Kong Seoul Taipei New Delhi

More information

Heuristic Search. Theory and Applications. Stefan Edelkamp. Stefan Schrodl ELSEVIER. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON

Heuristic Search. Theory and Applications. Stefan Edelkamp. Stefan Schrodl ELSEVIER. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON Heuristic Search Theory and Applications Stefan Edelkamp Stefan Schrodl AMSTERDAM BOSTON HEIDELBERG LONDON ELSEVIER NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY» TOKYO Morgan Kaufmann

More information

Cilk Plus GETTING STARTED

Cilk Plus GETTING STARTED Cilk Plus GETTING STARTED Overview Fundamentals of Cilk Plus Hyperobjects Compiler Support Case Study 3/17/2015 CHRIS SZALWINSKI 2 Fundamentals of Cilk Plus Terminology Execution Model Language Extensions

More information

Introduction to Algorithms Third Edition

Introduction to Algorithms Third Edition Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Introduction to Algorithms Third Edition The MIT Press Cambridge, Massachusetts London, England Preface xiü I Foundations Introduction

More information

Computer Architecture

Computer Architecture Computer Architecture Pipelined and Parallel Processor Design Michael J. Flynn Stanford University Technische Universrtat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK lnventar-nr.: Sachgebiete: Standort:

More information

Engineering Real- Time Applications with Wild Magic

Engineering Real- Time Applications with Wild Magic 3D GAME ENGINE ARCHITECTURE Engineering Real- Time Applications with Wild Magic DAVID H. EBERLY Geometric Tools, Inc. AMSTERDAM BOSTON HEIDELRERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

Embedded Systems Architecture

Embedded Systems Architecture Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers By Tammy Noergaard ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms :Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms Page1 Andrew Chien, 2008 2 Outline Concurrency vs. Parallelism Quicksort example Nested

More information

The Definitive Guide to the ARM Cortex-M3

The Definitive Guide to the ARM Cortex-M3 The Definitive Guide to the ARM Cortex-M3 Joseph Yiu AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of Elsevier Newnes Forewopd

More information

The Essential Guide to Video Processing

The Essential Guide to Video Processing The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON

More information

Programming 8-bit PIC Microcontrollers in С

Programming 8-bit PIC Microcontrollers in С Programming 8-bit PIC Microcontrollers in С with Interactive Hardware Simulation Martin P. Bates älllllltlilisft &Щ*лЛ AMSTERDAM BOSTON HEIDELBERG LONDON ^^Ш NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

An Introduction to Programming with IDL

An Introduction to Programming with IDL An Introduction to Programming with IDL Interactive Data Language Kenneth P. Bowman Department of Atmospheric Sciences Texas A&M University AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN

More information

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez SQL Queries for Mere Mortals Third Edition A Hands-On Guide to Data Manipulation in SQL John L. Viescas Michael J. Hernandez r A TT TAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco

More information

The Unified Modeling Language User Guide

The Unified Modeling Language User Guide The Unified Modeling Language User Guide Grady Booch James Rumbaugh Ivar Jacobson Rational Software Corporation TT ADDISON-WESLEY Boston San Francisco New York Toronto Montreal London Munich Paris Madrid

More information

Parallel Computing. November 20, W.Homberg

Parallel Computing. November 20, W.Homberg Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better

More information

DB2 SQL Tuning Tips for z/os Developers

DB2 SQL Tuning Tips for z/os Developers DB2 SQL Tuning Tips for z/os Developers Tony Andrews IBM Press, Pearson pic Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Cape Town Sydney

More information

Foundations of Multidimensional and Metric Data Structures

Foundations of Multidimensional and Metric Data Structures Foundations of Multidimensional and Metric Data Structures Hanan Samet University of Maryland, College Park ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

M (~ Computer Organization and Design ELSEVIER. David A. Patterson. John L. Hennessy. University of California, Berkeley. Stanford University

M (~ Computer Organization and Design ELSEVIER. David A. Patterson. John L. Hennessy. University of California, Berkeley. Stanford University T H I R D EDITION REVISED Computer Organization and Design THE HARDWARE/SOFTWARE INTERFACE David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions

More information

PROBLEM SOLVING WITH FORTRAN 90

PROBLEM SOLVING WITH FORTRAN 90 David R. Brooks PROBLEM SOLVING WITH FORTRAN 90 FOR SCIENTISTS AND ENGINEERS Springer Contents Preface v 1.1 Overview for Instructors v 1.1.1 The Case for Fortran 90 vi 1.1.2 Structure of the Text vii

More information

Real World Multicore Embedded Systems

Real World Multicore Embedded Systems Real World Multicore Embedded Systems A Practical Approach Expert Guide Bryon Moyer AMSTERDAM BOSTON HEIDELBERG LONDON I J^# J NEW YORK OXFORD PARIS SAN DIEGO S V J SAN FRANCISCO SINGAPORE SYDNEY TOKYO

More information

Computer Organization and Design

Computer Organization and Design Computer Organization and Design THE H A R D W A R E / S O F T W A R E I N T E R F A C E John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With a contribution

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

CSE 613: Parallel Programming

CSE 613: Parallel Programming CSE 613: Parallel Programming Lecture 3 ( The Cilk++ Concurrency Platform ) ( inspiration for many slides comes from talks given by Charles Leiserson and Matteo Frigo ) Rezaul A. Chowdhury Department of

More information

Moving to the Cloud. Developing Apps in. the New World of Cloud Computing. Dinkar Sitaram. Geetha Manjunath. David R. Deily ELSEVIER.

Moving to the Cloud. Developing Apps in. the New World of Cloud Computing. Dinkar Sitaram. Geetha Manjunath. David R. Deily ELSEVIER. Moving to the Cloud Developing Apps in the New World of Cloud Computing Dinkar Sitaram Geetha Manjunath Technical Editor David R. Deily AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO

More information

Contents. Preface. About the Authors BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS. l. 1 The Demand for Computational Speed 3

Contents. Preface. About the Authors BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS. l. 1 The Demand for Computational Speed 3 Preface About the Authors PARTI BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS l. 1 The Demand for Computational Speed 3 1.2 Potential for Increased Computational Speed 6 Speedup Factor 6 What Is the Maximum

More information

MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0

More information

Modern Embedded Computing Designing Connected, Pervasive, Media-Rich Systems

Modern Embedded Computing Designing Connected, Pervasive, Media-Rich Systems Modern Embedded Computing Designing Connected, Pervasive, Media-Rich Systems Peter Barry Patrick Crowley ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Maya Python. for Games and Film. and the Maya Python API. A Complete Reference for Maya Python. Ryan Trowbridge. Adam Mechtley ELSEVIER

Maya Python. for Games and Film. and the Maya Python API. A Complete Reference for Maya Python. Ryan Trowbridge. Adam Mechtley ELSEVIER Maya Python for Games and Film A Complete Reference for Maya Python and the Maya Python API Adam Mechtley Ryan Trowbridge AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Coding for Penetration

Coding for Penetration Coding for Penetration Testers Building Better Tools Jason Andress Ryan Linn ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Syngress is

More information

Curriculum 2013 Knowledge Units Pertaining to PDC

Curriculum 2013 Knowledge Units Pertaining to PDC Curriculum 2013 Knowledge Units Pertaining to C KA KU Tier Level NumC Learning Outcome Assembly level machine Describe how an instruction is executed in a classical von Neumann machine, with organization

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

Computer Architecture and Structured Parallel Programming James Reinders, Intel

Computer Architecture and Structured Parallel Programming James Reinders, Intel Computer Architecture and Structured Parallel Programming James Reinders, Intel Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 17 Manycore Computing and GPUs Computer

More information

Algorithms and Parallel Computing

Algorithms and Parallel Computing Algorithms and Parallel Computing Algorithms and Parallel Computing Fayez Gebali University of Victoria, Victoria, BC A John Wiley & Sons, Inc., Publication Copyright 2011 by John Wiley & Sons, Inc. All

More information

FPGAs: Instant Access

FPGAs: Instant Access FPGAs: Instant Access Clive"Max"Maxfield AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO % ELSEVIER Newnes is an imprint of Elsevier Newnes Contents

More information

Programming with POSIX Threads

Programming with POSIX Threads Programming with POSIX Threads David R. Butenhof :vaddison-wesley Boston San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sidney Tokyo Singapore Mexico City Contents List of

More information

LOGIC AND DISCRETE MATHEMATICS

LOGIC AND DISCRETE MATHEMATICS LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University

More information

Multi-Core Programming

Multi-Core Programming Multi-Core Programming Increasing Performance through Software Multi-threading Shameem Akhter Jason Roberts Intel PRESS Copyright 2006 Intel Corporation. All rights reserved. ISBN 0-9764832-4-6 No part

More information

Coding for Penetration Testers Building Better Tools

Coding for Penetration Testers Building Better Tools Coding for Penetration Testers Building Better Tools Second Edition Jason Andress Ryan Linn Clara Hartwell, Technical Editor ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO

More information

VISUALIZING QUATERNIONS

VISUALIZING QUATERNIONS THE MORGAN KAUFMANN SERIES IN INTERACTIVE 3D TECHNOLOGY VISUALIZING QUATERNIONS ANDREW J. HANSON «WW m.-:ki -. " ;. *' AMSTERDAM BOSTON HEIDELBERG ^ M Ä V l LONDON NEW YORK OXFORD

More information

F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES

F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES MORGAN KAUFMANN PUBLISHERS SAN MATEO, CALIFORNIA Contents Preface Organization of the Material Teaching

More information

PTC Mathcad Prime 3.0

PTC Mathcad Prime 3.0 Essential PTC Mathcad Prime 3.0 A Guide for New and Current Users Brent Maxfield, P.E. AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO @ Academic

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New

More information

Anany Levitin 3RD EDITION. Arup Kumar Bhattacharjee. mmmmm Analysis of Algorithms. Soumen Mukherjee. Introduction to TllG DCSISFI &

Anany Levitin 3RD EDITION. Arup Kumar Bhattacharjee. mmmmm Analysis of Algorithms. Soumen Mukherjee. Introduction to TllG DCSISFI & Introduction to TllG DCSISFI & mmmmm Analysis of Algorithms 3RD EDITION Anany Levitin Villa nova University International Edition contributions by Soumen Mukherjee RCC Institute of Information Technology

More information

MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING

MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING JOHN W. WOODS Rensselaer Polytechnic Institute Troy, New York»iBllfllfiii.. i. ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD

More information

Intel Array Building Blocks

Intel Array Building Blocks Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call

More information

Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming Nasser Kehtarnavaz

Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming Nasser Kehtarnavaz Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming Nasser Kehtarnavaz Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming by Nasser Kehtarnavaz University

More information

ARCHITECTURE DESIGN FOR SOFT ERRORS

ARCHITECTURE DESIGN FOR SOFT ERRORS ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan

More information

DATA ABSTRACTION AND PROBLEM SOLVING WITH JAVA

DATA ABSTRACTION AND PROBLEM SOLVING WITH JAVA DATA ABSTRACTION AND PROBLEM SOLVING WITH JAVA WALLS AND MIRRORS First Edition Frank M. Carrano University of Rhode Island Janet J. Prichard Bryant College Boston San Francisco New York London Toronto

More information

Parallelization on Multi-Core CPUs

Parallelization on Multi-Core CPUs 1 / 30 Amdahl s Law suppose we parallelize an algorithm using n cores and p is the proportion of the task that can be parallelized (1 p cannot be parallelized) the speedup of the algorithm is assuming

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Compsci 590.3: Introduction to Parallel Computing

Compsci 590.3: Introduction to Parallel Computing Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 9 ] Shared Memory Performance Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture

More information

A Wavelet Tour of Signal Processing The Sparse Way

A Wavelet Tour of Signal Processing The Sparse Way A Wavelet Tour of Signal Processing The Sparse Way Stephane Mallat with contributions from Gabriel Peyre AMSTERDAM BOSTON HEIDELBERG LONDON NEWYORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY»TOKYO

More information

Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB

Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB Jim Cownie Intel SSG/DPD/TCAR 1 Optimization Notice Optimization Notice Intel s compilers may or

More information

System Assurance. Beyond Detecting. Vulnerabilities. Djenana Campara. Nikolai Mansourov

System Assurance. Beyond Detecting. Vulnerabilities. Djenana Campara. Nikolai Mansourov System Assurance Beyond Detecting Vulnerabilities Nikolai Mansourov Djenana Campara ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SYDNEY TOKYO Morgan Kaufmann

More information

The Designer's Guide to VHDL Second Edition

The Designer's Guide to VHDL Second Edition The Designer's Guide to VHDL Second Edition Peter J. Ashenden EDA CONSULTANT, ASHENDEN DESIGNS PTY. VISITING RESEARCH FELLOW, ADELAIDE UNIVERSITY Cl MORGAN KAUFMANN PUBLISHERS An Imprint of Elsevier SAN

More information

A Primer on Scheduling Fork-Join Parallelism with Work Stealing

A Primer on Scheduling Fork-Join Parallelism with Work Stealing Doc. No.: N3872 Date: 2014-01-15 Reply to: Arch Robison A Primer on Scheduling Fork-Join Parallelism with Work Stealing This paper is a primer, not a proposal, on some issues related to implementing fork-join

More information

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6

More information

JAVASCRIPT FOR PROGRAMMERS

JAVASCRIPT FOR PROGRAMMERS JAVASCRIPT FOR PROGRAMMERS DEITEL DEVELOPER SERIES Paul J. Deitel Deitel & Associates, Inc. Harvey M. Deitel Deitel & Associates, Inc. PRENTICE HALL Upper Saddle River, NJ Boston Indianapolis San Francisco

More information

Real-Time Systems and Programming Languages

Real-Time Systems and Programming Languages Real-Time Systems and Programming Languages Ada, Real-Time Java and C/Real-Time POSIX Fourth Edition Alan Burns and Andy Wellings University of York * ADDISON-WESLEY An imprint of Pearson Education Harlow,

More information

An Introduction to Object-Oriented Programming

An Introduction to Object-Oriented Programming An Introduction to Object-Oriented Programming Timothy Budd Oregon State University TT Addison-Wesley Publishing Company Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario Wokingham,

More information

Computer Animation. Algorithms and Techniques. z< MORGAN KAUFMANN PUBLISHERS. Rick Parent Ohio State University AN IMPRINT OF ELSEVIER SCIENCE

Computer Animation. Algorithms and Techniques. z< MORGAN KAUFMANN PUBLISHERS. Rick Parent Ohio State University AN IMPRINT OF ELSEVIER SCIENCE Computer Animation Algorithms and Techniques Rick Parent Ohio State University z< MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER SCIENCE AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

DATA STRUCTURES AND PROBLEM SOLVING USING JAVA

DATA STRUCTURES AND PROBLEM SOLVING USING JAVA DATA STRUCTURES AND PROBLEM SOLVING USING JAVA Second Edition MARK ALLEN WEISS Florida International University Addison Wesley Boston San Francisco New York London Toronto Sydney Tokyo Singapore Madrid

More information

Summary of Contents LIST OF FIGURES LIST OF TABLES

Summary of Contents LIST OF FIGURES LIST OF TABLES Summary of Contents LIST OF FIGURES LIST OF TABLES PREFACE xvii xix xxi PART 1 BACKGROUND Chapter 1. Introduction 3 Chapter 2. Standards-Makers 21 Chapter 3. Principles of the S2ESC Collection 45 Chapter

More information

Intel Thread Building Blocks, Part II

Intel Thread Building Blocks, Part II Intel Thread Building Blocks, Part II SPD course 2013-14 Massimo Coppola 25/03, 16/05/2014 1 TBB Recap Portable environment Based on C++11 standard compilers Extensive use of templates No vectorization

More information

Structured Parallel Programming with Deterministic Patterns

Structured Parallel Programming with Deterministic Patterns Structured Parallel Programming with Deterministic Patterns May 14, 2010 USENIX HotPar 2010, Berkeley, Caliornia Michael McCool, Sotware Architect, Ct Technology Sotware and Services Group, Intel Corporation

More information

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores

MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores MetaFork: A Compilation Framework for Concurrency Platforms Targeting Multicores Presented by Xiaohui Chen Joint work with Marc Moreno Maza, Sushek Shekar & Priya Unnikrishnan University of Western Ontario,

More information

Essential MATLAB for Engineers and Scientists

Essential MATLAB for Engineers and Scientists Essential MATLAB for Engineers and Scientists Third edition Brian D. Hahn and Daniel T. Valentine ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC. Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Session Plan 1 Overview 2 Implicit Vectorisation 3 Explicit Vectorisation 4 Data Alignment 5 Summary Section 1 Overview What is SIMD? Scalar Processing:

More information

Chapter 1 Introduction

Chapter 1 Introduction Preface xv Chapter 1 Introduction 1.1 What's the Book About? 1 1.2 Mathematics Review 2 1.2.1 Exponents 3 1.2.2 Logarithms 3 1.2.3 Series 4 1.2.4 Modular Arithmetic 5 1.2.5 The P Word 6 1.3 A Brief Introduction

More information

Introductory Combinatorics

Introductory Combinatorics Introductory Combinatorics Third Edition KENNETH P. BOGART Dartmouth College,. " A Harcourt Science and Technology Company San Diego San Francisco New York Boston London Toronto Sydney Tokyo xm CONTENTS

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

MSP430 Microcontroller Basics

MSP430 Microcontroller Basics MSP430 Microcontroller Basics John H. Davies AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of Elsevier N WPIGS Contents Preface

More information

Data Structures and Algorithm Analysis in C++

Data Structures and Algorithm Analysis in C++ INTERNATIONAL EDITION Data Structures and Algorithm Analysis in C++ FOURTH EDITION Mark A. Weiss Data Structures and Algorithm Analysis in C++, International Edition Table of Contents Cover Title Contents

More information

Parallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015

Parallel Programming. Presentation to Linux Users of Victoria, Inc. November 4th, 2015 Parallel Programming Presentation to Linux Users of Victoria, Inc. November 4th, 2015 http://levlafayette.com 1.0 What Is Parallel Programming? 1.1 Historically, software has been written for serial computation

More information

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

4.1.2 Merge Sort Sorting Lower Bound Counting Sort Sorting in Practice Solving Problems by Sorting...

4.1.2 Merge Sort Sorting Lower Bound Counting Sort Sorting in Practice Solving Problems by Sorting... Contents 1 Introduction... 1 1.1 What is Competitive Programming?... 1 1.1.1 Programming Contests.... 2 1.1.2 Tips for Practicing.... 3 1.2 About This Book... 3 1.3 CSES Problem Set... 5 1.4 Other Resources...

More information

DATABASE SYSTEM CONCEPTS

DATABASE SYSTEM CONCEPTS DATABASE SYSTEM CONCEPTS HENRY F. KORTH ABRAHAM SILBERSCHATZ University of Texas at Austin McGraw-Hill, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico Milan Montreal

More information

Contents. 1 Introduction. 2 Searching and Traversal Techniques. Preface... (vii) Acknowledgements... (ix)

Contents. 1 Introduction. 2 Searching and Traversal Techniques. Preface... (vii) Acknowledgements... (ix) Contents Preface... (vii) Acknowledgements... (ix) 1 Introduction 1.1 Algorithm 1 1.2 Life Cycle of Design and Analysis of Algorithm 2 1.3 Pseudo-Code for Expressing Algorithms 5 1.4 Recursive Algorithms

More information

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory

More information

Parallel and Distributed Computing (PD)

Parallel and Distributed Computing (PD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Parallel and Distributed Computing (PD) The past decade has brought explosive growth in multiprocessor computing, including multi-core

More information

CS 445: Data Structures Final Examination: Study Guide

CS 445: Data Structures Final Examination: Study Guide CS 445: Data Structures Final Examination: Study Guide Java prerequisites Classes, objects, and references Access modifiers Arguments and parameters Garbage collection Self-test questions: Appendix C Designing

More information

FISMAand the Risk Management Framework

FISMAand the Risk Management Framework FISMAand the Risk Management Framework The New Practice of Federal Cyber Security Stephen D. Gantz Daniel R. Phi I pott Darren Windham, Technical Editor ^jm* ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON

More information

Networked Graphics 01_P374423_PRELIMS.indd i 10/27/2009 6:57:42 AM

Networked Graphics 01_P374423_PRELIMS.indd i 10/27/2009 6:57:42 AM Networked Graphics Networked Graphics Building Networked Games and Virtual Environments Anthony Steed Manuel Fradinho Oliveira AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Managed. Code Rootkits. Hooking. into Runtime. Environments. Erez Metula ELSEVIER. Syngress is an imprint of Elsevier SYNGRESS

Managed. Code Rootkits. Hooking. into Runtime. Environments. Erez Metula ELSEVIER. Syngress is an imprint of Elsevier SYNGRESS Managed Code Rootkits Hooking into Runtime Environments Erez Metula ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEWYORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Syngress is an imprint

More information