Structured Parallel Programming with Deterministic Patterns

Size: px
Start display at page:

Download "Structured Parallel Programming with Deterministic Patterns"

Transcription

1 Structured Parallel Programming with Deterministic Patterns May 14, 2010 USENIX HotPar 2010, Berkeley, Caliornia Michael McCool, Sotware Architect, Ct Technology Sotware and Services Group, Intel Corporation Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners.

2 Patterns A parallel pattern is a commonly occurring combination o task distribution and data access Many common programming models support either only a small number o patterns, or only low-level hardware mechanisms So oten common patterns implemented only as conventions Observation: a small number o patterns, most o them deterministic, can support a wide range o applications Thesis: A system that directly supports these deterministic patterns and allows their composition can generate eicient implementations on a variety o hardware architectures Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 2

3 Motivation or Pattern-based Design Deterministic patterns higher maintainability No need to debug race conditions i it not possible to create them Allow introduction o races only where necessary, and limit scope Determinism and consistency with single serial execution order simpliies user understanding, debugging and testing Application oriented patterns higher productivity Patterns derived rom common use cases in applications Subset o patterns are universal: gives wide applicability Patterns can also target speciic domains: Makes simple things simple Patterns encourage high-level reasoning Focus users on what really matters: parallelism and data locality Simpliies learning how to write eicient programs Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 3

4 Serial Patterns The ollowing patterns are the basis o structured programming or serial computation: Sequence Selection Iteration Recursion Random read Random write Stack allocation Heap allocation Objects/closures Compositions o control low patterns can be used in place o unstructured mechanisms such as goto. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 4

5 Parallel Patterns The ollowing additional parallel patterns can be used or structured parallel programming : Superscalar sequence Speculative selection Map Recurrence/scan Reduce Pack/expand Nest Pipeline Partition Stencil Search/match Gather *Permutation scatter *Merge scatter!atomic scatter Priority scatter Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 5

6 Sequence g p A serial sequence is executed in the exact order given: B = (A); C = g(b); E = p(c); F = q(a); q Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 6

7 Superscalar Sequence g h p q Developer writes serial code: B = (A); C = g(b); E = (C); F = h(c); G = g(e,f); P = p(b); Q = q(b); R = r(g,p,q); g r However, tasks only need to be ordered by data dependencies Depends on limiting scope o data dependencies Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 7

8 Selection c g The condition is evaluated irst, then one o two tasks is executed based on the result. IF (c) { } ELSE { g } Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 8

9 Speculative Selection c g Examples: collision culling; ray tracing; clipping; discrete event simulation; search Both sides o a conditional and the condition are evaluated in parallel, then the unused branch is cancelled. SELECT (c) { } ELSE { g } Eort in cancelled task wasted Use only when a computational resource would otherwise be idle, or tasks are on critical path Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 9

10 Map Map replicates a unction over every element o an index set (which may be abstract or associated with the elements o an array). A = map(,b); Examples: gamma correction and thresholding in images; color space conversions; Monte Carlo sampling; ray tracing. This replaces one speciic usage o iteration in serial programs: processing every element o a collection with an independent operation. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 10

11 Reduction Reduce combines every element in a collection into one element using an associative operator. b = reduce(,b); For example, reduce can be used to ind the sum or maximum o an array. Examples: averaging o Monte Carlo samples; convergence testing; image comparison metrics; sub-task in matrix operations. There are some variants that arise rom combination with partition and search Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 11

12 Scan Scan computes all partial reductions Allows parallelization o many 1D recurrences Requires an associative operator Requires 2n work over serial execution, but lg n steps Examples: integration, sequential decision simulations in inancial engineering, can also be used to implement pack Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 12

13 Recurrences Examples: ininite impulse response ilters; sequence alignment (Smith- Waterman dynamic programming); matrix actorization Recurrences arise rom the data dependency pattern given by nested loopcarried dependencies. nd recurrences can always be parallelized over n-1 dimensions by Lamport s hyperplane theorem Execution o parallel slices can be perormed either via iterative map or via waveront parallelism Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 13

14 Recurrences: Implementation Note Implementation can use blocking or higher perormance When combined with the pipeline pattern recurrences implements waveront computation Can also be combined with superscalar execution (see recent ICS paper...) Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 14

15 Partition Examples: JPG and other macroblock compression; divideand-conquer matrix multiplication; coherency optimization or conebeam reconstruction Partition breaks an input collection into a collection o collections Useul or divide-and-conquer algorithms Variants: Uniorm: dice Non-uniorm: segment Overlapping: tile Issues: How to deal with boundary conditions? Partitions don t move data, they just provide an alternative view o its organization Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 15

16 Stencil Apply unction to all neighbourhoods o an array Neighbourhoods given by set o relative osets Optimized implementation requires blocking and sliding windows Boundary modes on array accesses useul Examples: image iltering including convolution, median, anisotropic diusion; simulation including luid low, electromagnetic, and inancial PDE solvers, lattice QCD Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 16

17 Pipeline Tasks can be organized in chain with local state Useul or serially dependent tasks like codecs Whole chain applied like map to collection or stream Implementation o many sub-patterns may be optimized or pipeline execution when inside this pattern Examples: codecs with variablerate compression; video processing; spam iltering. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 17

18 Pack Pack allows deletion o elements rom a collection and elimination o unused space Useul when used with map and other patterns to avoid unnecessary output Examples: narrow-phase collision detection pair testing (only want to report valid collisions), peak detection or template matching. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 18

19 Expand Expand allows element o map operation to insert any number o elements (including none) into its output stream Examples: broad-phase collision detection pair testing (want to report potentially colliding pairs); compression and decompression. Useul when used with map and other patterns to support variable-rate output Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 19

20 Fused Patterns Programs are built rom combinations o patterns Should be able to use patterns or perormance May be useul to explicitly support speciic combinations Examples: Gather = map + random read Scatter = map + random write Map + reduce or preprocessing beore reduction Map + pack/expand or culling operations Partition + reduce or multidimensional reduction Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 20

21 Search/Match Examples: computation o metrics on segmented regions in vision; computation o web analytics Searching and matching undamental capabilities Use to select data or another operation, by creating a (virtual) collection or partitioned collection. Example: category reduction reduces all elements in an array with the same label, and is the orm used in Google s map-reduce Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 21

22 Gather Map + Random Read Read rom a random (computed) location in an array When used inside a map or as a collective, becomes a parallel operation Views into arrays, but no global pointers Write-ater-read semantics or kernels to avoid races A B C D E F G B F A C C E Examples: sparse matrix operations; ray tracing; proximity queries; collision detection. August 18, 2008 Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 22

23 *!Scatter Map + Random Write Write into a random (computed) location in an array When used inside a map, becomes a parallel operation Race conditions possible when there are duplicate write addresses ( collisions ) To obtain deterministic scatter, need a deterministic rule to resolve collisions A B C D E F Examples: marking pairs in collision detection; handling database update transactions. C A? F B Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 23

24 *Permutation Scatter Make collisions illegal Only guaranteed to work i no duplicate addresses Danger is that programmer will use it when addresses do in act have collisions, then will depend on undeined behaviour Similar saety issue as with out-o-bounds array accesses. Can test or collisions in debug mode A B C D E F Examples: FFT scrambling; matrix/image transpose; unpacking. C A E D F B Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 24

25 *Merge Scatter Use an associative operator to combine values upon collision Problem: as with reduce, depends on programmer to deine associative operator Gives non-deterministic read-modiy-write when used with nonassociative operators Due to structured nature o other patterns, can still provide tool to check or race conditions Examples: histogram; mutual inormation and entropy; database updates Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 25

26 !Atomic Scatter Resolve collisions atomically but non-deterministically Use o this pattern will result in non-deterministic programs Structured nature o rest o patterns makes it possible to test or race conditions A B C D E F Examples: marking pairs in collision detection; computing set intersection or union (used in text databases) C A D F B or E Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 26

27 Priority Scatter Assign every parallel element a priority NOTE: Need hierarchical structure o other patterns to do this Deterministically determine winner based on priority When converting rom serial code, priority can be based on original ordering, giving results consistent with serial program Eicient implementation is similar to hierarchical z-buer A B C D E F C A E F B Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 27

28 Nesting Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 28

29 Conclusion Patterns can be used to reason about and organize development o parallel algorithms and programming models Integrating these patterns into Ct or heterogeneous computing Many useul patterns are deterministic Compositions o deterministic patterns lead to deterministic programs Discussion: Are there a smaller number o primitive patterns? Are any important patterns missing? Can structured be well-deined? How important are non-deterministic patterns? Can any o these be considered structured? Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 29

30 BACKUP Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners.

31 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Perormance tests and ratings are measured using speciic computer systems and/or components and relect the approximate perormance o Intel products as measured by those tests. Any dierence in system hardware or sotware design or coniguration may aect actual perormance. Buyers should consult other sources o inormation to evaluate the perormance o systems or components they are considering purchasing. For more inormation on perormance tests and on the perormance o Intel products, reerence Intel, Intel Core and the Intel logo are trademarks o Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property o others. Copyright Intel Corporation. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 31

32 Challenge: Multiple Parallelism Mechanisms Modern processors have many kinds o parallelism: Pipelining SIMD within a register (SWAR) vectorization Superscalar instruction issue or VLIW Overlapping memory access with computation (preetch) Simultaneous multithreading (hyperthreading) per core Multiple cores Multiple processors Asynchronous host and accelerator execution HPC adds: clusters, distributed memory, grid Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 32

33 General Factors Aecting Algorithm Perormance 1. Parallelism Choose or design a good parallel algorithm Large amount o latent parallelism, low serial overhead Asymptotically eicient Should scale to large number o processing elements 2. Locality Eicient use o the memory hierarchy More requent use o aster local memory Coherent use o memory and data transer Good alignment, predictable memory access; blocking High arithmetic intensity Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 33

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Structured Parallel Programming

Structured Parallel Programming Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Digital Image Processing. Image Enhancement in the Spatial Domain (Chapter 4)

Digital Image Processing. Image Enhancement in the Spatial Domain (Chapter 4) Digital Image Processing Image Enhancement in the Spatial Domain (Chapter 4) Objective The principal objective o enhancement is to process an images so that the result is more suitable than the original

More information

Parallel Programming Patterns

Parallel Programming Patterns Parallel Programming Patterns Pattern-Driven Parallel Application Development 7/10/2014 DragonStar 2014 - Qing Yi 1 Parallelism and Performance p Automatic compiler optimizations have their limitations

More information

Neighbourhood Operations

Neighbourhood Operations Neighbourhood Operations Neighbourhood operations simply operate on a larger neighbourhood o piels than point operations Origin Neighbourhoods are mostly a rectangle around a central piel Any size rectangle

More information

Intel Array Building Blocks

Intel Array Building Blocks Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call

More information

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM UDC 681.3.06 MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM V.K. Pogrebnoy TPU Institute «Cybernetic centre» E-mail: vk@ad.cctpu.edu.ru Matrix algorithm o solving graph cutting problem has been suggested.

More information

9. Reviewing Printed Circuit Board Schematics with the Quartus II Software

9. Reviewing Printed Circuit Board Schematics with the Quartus II Software November 2012 QII52019-12.1.0 9. Reviewing Printed Circuit Board Schematics with the Quartus II Sotware QII52019-12.1.0 This chapter provides guidelines or reviewing printed circuit board (PCB) schematics

More information

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Evolving Small Cells Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Intelligent Heterogeneous Network Optimum User Experience Fibre-optic Connected Macro Base stations

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Bitonic Sorting Intel OpenCL SDK Sample Documentation Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

Using VCS with the Quartus II Software

Using VCS with the Quartus II Software Using VCS with the Quartus II Sotware December 2002, ver. 1.0 Application Note 239 Introduction As the design complexity o FPGAs continues to rise, veriication engineers are inding it increasingly diicult

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

10. SOPC Builder Component Development Walkthrough

10. SOPC Builder Component Development Walkthrough 10. SOPC Builder Component Development Walkthrough QII54007-9.0.0 Introduction This chapter describes the parts o a custom SOPC Builder component and guides you through the process o creating an example

More information

AN 608: HST Jitter and BER Estimator Tool for Stratix IV GX and GT Devices

AN 608: HST Jitter and BER Estimator Tool for Stratix IV GX and GT Devices AN 608: HST Jitter and BER Estimator Tool or Stratix IV GX and GT Devices July 2010 AN-608-1.0 The high-speed communication link design toolkit (HST) jitter and bit error rate (BER) estimator tool is a

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed

More information

2. Design Planning with the Quartus II Software

2. Design Planning with the Quartus II Software November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various

More information

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing Case Study Software Optimizing an Illegal Image Filter System Intel Integrated Performance Primitives High-Performance Computing Tencent Doubles the Speed of its Illegal Image Filter System using SIMD

More information

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Intel Cluster Ready Allowed Hardware Variances

Intel Cluster Ready Allowed Hardware Variances Intel Cluster Ready Allowed Hardware Variances Solution designs are certified as Intel Cluster Ready with an exact bill of materials for the hardware and the software stack. When instances of the certified

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Eliminate Threading Errors to Improve Program Stability This guide will illustrate how the thread checking capabilities in Parallel Studio can be used to find crucial threading defects early in the development

More information

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

Software Occlusion Culling

Software Occlusion Culling Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into

More information

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document

More information

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document

More information

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017 Achieving Peak Performance on Intel Hardware Intel Software Developer Conference London, 2017 Welcome Aims for the day You understand some of the critical features of Intel processors and other hardware

More information

Parallel Programming Pa,erns

Parallel Programming Pa,erns Parallel Programming Pa,erns Bryan Mills, PhD Spring 2017 What is a programming pa,erns? Repeatable solu@on to commonly occurring problem It isn t a solu@on that you can t simply apply, the engineer has

More information

Intel Stress Bitstreams and Encoder (Intel SBE) 2017 AVS2 Release Notes (Version 2.3)

Intel Stress Bitstreams and Encoder (Intel SBE) 2017 AVS2 Release Notes (Version 2.3) Intel Stress Bitstreams and Encoder (Intel SBE) 2017 AVS2 Release Notes (Version 2.3) Overview Changes History Installation Package Contents Known Limitations Attributions Legal Information Overview The

More information

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media

More information

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation

More information

Status. We ll do code generation first... Outline

Status. We ll do code generation first... Outline Status Run-time Environments Lecture 11 We have covered the ront-end phases Lexical analysis Parsin Semantic analysis Next are the back-end phases Optimization Code eneration We ll do code eneration irst...

More information

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems. Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory

More information

Intel Software Development Products Licensing & Programs Channel EMEA

Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of

More information

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Supra-linear Packet Processing Performance with Intel Multi-core Processors White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances

More information

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Jongsoo Park, Parallel Computing Lab, Intel Corporation with contributions from MKL team 1 Algorithm/

More information

Intel Cache Acceleration Software for Windows* Workstation

Intel Cache Acceleration Software for Windows* Workstation Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth Presenter: Surabhi Jain Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth May 25, 2018 ROME workshop (in conjunction with IPDPS 2018), Vancouver,

More information

A Simple Path to Parallelism with Intel Cilk Plus

A Simple Path to Parallelism with Intel Cilk Plus Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Installation Guide and Release Notes Document number: 321604-001US 19 October 2009 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 System Requirements... 2 1.3 Documentation... 3 1.4

More information

Optimizing the operations with sparse matrices on Intel architecture

Optimizing the operations with sparse matrices on Intel architecture Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.

More information

Section III. Advanced Programming Topics

Section III. Advanced Programming Topics Section III. Advanced Programming Topics This section provides inormation about several advanced embedded programming topics. It includes the ollowing chapters: Chapter 8, Exception Handling Chapter 9,

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017 Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference London, 2017 Agenda Vectorization is becoming more and more important What is

More information

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US

More information

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability User s Guide Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327283-001US Revision: 1.0 World

More information

Intel and Badaboom Video File Transcoding

Intel and Badaboom Video File Transcoding Solutions Intel and Badaboom Video File Transcoding Introduction Intel Quick Sync Video, built right into 2 nd generation Intel Core processors, is breakthrough hardware acceleration that lets the user

More information

13. Power Management in Stratix IV Devices

13. Power Management in Stratix IV Devices February 2011 SIV51013-3.2 13. Power Management in Stratix IV Devices SIV51013-3.2 This chapter describes power management in Stratix IV devices. Stratix IV devices oer programmable power technology options

More information

CS 416: Operating Systems Design March 9, 2015

CS 416: Operating Systems Design March 9, 2015 Page translation Operating Systems 10. Memory Management Part 2 Paging Page number, p Displacement (oset), d = page_table[p] Page Paul Krzyzanowski Rutgers University Spring 2015 CPU Logical address p

More information

C Language Constructs for Parallel Programming

C Language Constructs for Parallel Programming C Language Constructs for Parallel Programming Robert Geva 5/17/13 1 Cilk Plus Parallel tasks Easy to learn: 3 keywords Tasks, not threads Load balancing Hyper Objects Array notations Elemental Functions

More information

Classifier Evasion: Models and Open Problems

Classifier Evasion: Models and Open Problems Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin I. P. Rubinstein 2, Ling Huang 3, Anthony D. Joseph 1,3, and J. D. Tygar 1 1 UC Berkeley 2 Microsot Research 3 Intel Labs Berkeley

More information

Warps and Reduction Algorithms

Warps and Reduction Algorithms Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum

More information

Method estimating reflection coefficients of adaptive lattice filter and its application to system identification

Method estimating reflection coefficients of adaptive lattice filter and its application to system identification Acoust. Sci. & Tech. 28, 2 (27) PAPER #27 The Acoustical Society o Japan Method estimating relection coeicients o adaptive lattice ilter and its application to system identiication Kensaku Fujii 1;, Masaaki

More information

Eliminate Memory Errors to Improve Program Stability

Eliminate Memory Errors to Improve Program Stability Introduction INTEL PARALLEL STUDIO XE EVALUATION GUIDE This guide will illustrate how Intel Parallel Studio XE memory checking capabilities can find crucial memory defects early in the development cycle.

More information

A Novel Accurate Genetic Algorithm for Multivariable Systems

A Novel Accurate Genetic Algorithm for Multivariable Systems World Applied Sciences Journal 5 (): 137-14, 008 ISSN 1818-495 IDOSI Publications, 008 A Novel Accurate Genetic Algorithm or Multivariable Systems Abdorreza Alavi Gharahbagh and Vahid Abolghasemi Department

More information

Teaching Think Parallel

Teaching Think Parallel Teaching Think Parallel Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013 1 Tools for Parallel Programming Parallel Models Wildly

More information

More performance options

More performance options More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Installation Guide and Release Notes Document number: 321604-002US 9 July 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 What s New... 2 1.3 System Requirements... 2 1.4 Documentation...

More information

A Classification System and Analysis for Aspect-Oriented Programs

A Classification System and Analysis for Aspect-Oriented Programs A Classiication System and Analysis or Aspect-Oriented Programs Martin Rinard, Alexandru Sălcianu, and Suhabe Bugrara Massachusetts Institute o Technology Cambridge, MA 02139 ABSTRACT We present a new

More information

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012 CS8/68 Computer Vision Spring 0 Dr. George Bebis Programming Assignment Due Date: /7/0 In this assignment, you will implement an algorithm or normalizing ace image using SVD. Face normalization is a required

More information

Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur

Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur 1 Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur Sven Woop, Carsten Benthin, Ingo Wald, Gregory S. Johnson Intel Corporation Eric Tabellion DreamWorks Animation 2 Legal

More information

Total No. of Questions : 18] [Total No. of Pages : 02. M.Sc. DEGREE EXAMINATION, DEC First Year COMPUTER SCIENCE.

Total No. of Questions : 18] [Total No. of Pages : 02. M.Sc. DEGREE EXAMINATION, DEC First Year COMPUTER SCIENCE. (DMCS01) Total No. of Questions : 18] [Total No. of Pages : 02 M.Sc. DEGREE EXAMINATION, DEC. 2016 First Year COMPUTER SCIENCE Data Structures Time : 3 Hours Maximum Marks : 70 Section - A (3 x 15 = 45)

More information

Intel Array Building Blocks Technical Presentation: Code Tips

Intel Array Building Blocks Technical Presentation: Code Tips Intel Array Building Blocks Technical Presentation: Code Tips Zhang Zhang Noah Clemons {zhang.zhang, noah.clemons}@intel.com 1 Intel compilers, associated libraries and associated development tools may

More information

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor * Some names and brands may be claimed as the property of others. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor E.J. Bylaska 1, M. Jacquelin

More information

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance

More information

Intel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version )

Intel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version ) Intel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version 1.0.10) Overview New Features System Requirements Installation Installation Folders How To Use Supported Formats Known

More information

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS

More information

Getting Started with Intel SDK for OpenCL Applications

Getting Started with Intel SDK for OpenCL Applications Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel

More information

Intel Server Board S2600CW2S

Intel Server Board S2600CW2S Redhat* Testing Services Enterprise Platforms and Services Division Intel Server Board S2600CW2S Server Test Submission (STS) Report For Redhat* Certification Rev 1.0 This report describes the Intel Server

More information

Intel MKL Data Fitting component. Overview

Intel MKL Data Fitting component. Overview Intel MKL Data Fitting component. Overview Intel Corporation 1 Agenda 1D interpolation problem statement Functional decomposition of the problem Application areas Data Fitting in Intel MKL Data Fitting

More information

H.J. Lu, Sunil K Pandey. Intel. November, 2018

H.J. Lu, Sunil K Pandey. Intel. November, 2018 H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes

More information

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters TIPL 4703 Presented by Ken Chan Prepared by Ken Chan 1 Table o Contents What is SNR Deinition o SNR Components

More information

Intel Desktop Board DZ68DB

Intel Desktop Board DZ68DB Intel Desktop Board DZ68DB Specification Update April 2011 Part Number: G31558-001 The Intel Desktop Board DZ68DB may contain design defects or errors known as errata, which may cause the product to deviate

More information

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready Intel Cluster Ready Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready Legal Disclaimer Intel may make changes to specifications and product descriptions at any time, without notice.

More information

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation

Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided

More information

What s New August 2015

What s New August 2015 What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability

More information

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017 Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application

More information

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3

More information

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones HOT CHIPS 2009 August 24 2009 Rajesh Patel Lead Architect, Lincroft SoC Intel Corporation Legal Disclaimer INFORMATION

More information

High Performance Multiprocessor System

High Performance Multiprocessor System High Performance Multiprocessor System Requirements : - Large Number of Processors ( 4) - Large WriteBack Caches for Each Processor. Less Bus Traffic => Higher Performance - Large Shared Main Memories

More information

Structured Parallel Programming with Deterministic Patterns

Structured Parallel Programming with Deterministic Patterns Structured Parallel Programming with Deterministic Patterns Michael D. McCool, Intel, michael.mccool@intel.com Many-core processors target improved computational performance by making available various

More information

Section II. Nios II Software Development

Section II. Nios II Software Development Section II. Nios II Sotware Development This section o the Embedded Design Handbook describes how to most eectively use the Altera tools or embedded system sotware development, and recommends design styles

More information

Memory & Thread Debugger

Memory & Thread Debugger Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis

More information

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division LNet Roadmap & Development Amir Shehata Lustre * Network Engineer Intel High Performance Data Division Outline LNet Roadmap Non-contiguous buffer support Map-on-Demand re-work 2 LNet Roadmap (2.12) LNet

More information

Intel Desktop Board D2700DC. PMLP Report. Previously Logo d Motherboard Logo Program (PMLP)

Intel Desktop Board D2700DC. PMLP Report. Previously Logo d Motherboard Logo Program (PMLP) Previously Logo d Motherboard Logo Program (PMLP) Intel Desktop Board D2700DC PMLP Report 1/12/2012 Purpose: This report describes the Board D2700DC Previously Logo d Motherboard Logo Program testing run

More information

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New

More information

SIMULATION OPTIMIZER AND OPTIMIZATION METHODS TESTING ON DISCRETE EVENT SIMULATIONS MODELS AND TESTING FUNCTIONS

SIMULATION OPTIMIZER AND OPTIMIZATION METHODS TESTING ON DISCRETE EVENT SIMULATIONS MODELS AND TESTING FUNCTIONS SIMULATION OPTIMIZER AND OPTIMIZATION METHODS TESTING ON DISCRETE EVENT SIMULATIONS MODELS AND TESTING UNCTIONS Pavel Raska (a), Zdenek Ulrych (b), Petr Horesi (c) (a) Department o Industrial Engineering

More information

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP

More information

Software Evaluation Guide for CyberLink MediaEspresso *

Software Evaluation Guide for CyberLink MediaEspresso * Software Evaluation Guide for CyberLink MediaEspresso 6.7.3521* Version 2013-04 Rev. 1.3 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel

More information

Parallelism in Software

Parallelism in Software Parallelism in Software Minsoo Ryu Department of Computer Science and Engineering 2 1 Parallelism in Software 2 Creating a Multicore Program 3 Multicore Design Patterns 4 Q & A 2 3 Types of Parallelism

More information

Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics

Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics Programmer's Reference Manual For the 2015-2016 Intel Core Processors, Celeron Processors, and Pentium Processors based on

More information

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications

More information

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Using Intel VTune Amplifier XE and Inspector XE in.net environment Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector

More information

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x Binary recursion Unate unctions! Theorem I a cover C() is unate in,, then is unate in.! Theorem I is unate in,, then every prime implicant o is unate in. Why are unate unctions so special?! Special Boolean

More information

What s P. Thierry

What s P. Thierry What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and

More information

CPSC / Sonny Chan - University of Calgary. Collision Detection II

CPSC / Sonny Chan - University of Calgary. Collision Detection II CPSC 599.86 / 601.86 Sonny Chan - University of Calgary Collision Detection II Outline Broad phase collision detection: - Problem definition and motivation - Bounding volume hierarchies - Spatial partitioning

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information