Structured Parallel Programming with Deterministic Patterns
|
|
- Horace Parks
- 5 years ago
- Views:
Transcription
1 Structured Parallel Programming with Deterministic Patterns May 14, 2010 USENIX HotPar 2010, Berkeley, Caliornia Michael McCool, Sotware Architect, Ct Technology Sotware and Services Group, Intel Corporation Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners.
2 Patterns A parallel pattern is a commonly occurring combination o task distribution and data access Many common programming models support either only a small number o patterns, or only low-level hardware mechanisms So oten common patterns implemented only as conventions Observation: a small number o patterns, most o them deterministic, can support a wide range o applications Thesis: A system that directly supports these deterministic patterns and allows their composition can generate eicient implementations on a variety o hardware architectures Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 2
3 Motivation or Pattern-based Design Deterministic patterns higher maintainability No need to debug race conditions i it not possible to create them Allow introduction o races only where necessary, and limit scope Determinism and consistency with single serial execution order simpliies user understanding, debugging and testing Application oriented patterns higher productivity Patterns derived rom common use cases in applications Subset o patterns are universal: gives wide applicability Patterns can also target speciic domains: Makes simple things simple Patterns encourage high-level reasoning Focus users on what really matters: parallelism and data locality Simpliies learning how to write eicient programs Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 3
4 Serial Patterns The ollowing patterns are the basis o structured programming or serial computation: Sequence Selection Iteration Recursion Random read Random write Stack allocation Heap allocation Objects/closures Compositions o control low patterns can be used in place o unstructured mechanisms such as goto. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 4
5 Parallel Patterns The ollowing additional parallel patterns can be used or structured parallel programming : Superscalar sequence Speculative selection Map Recurrence/scan Reduce Pack/expand Nest Pipeline Partition Stencil Search/match Gather *Permutation scatter *Merge scatter!atomic scatter Priority scatter Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 5
6 Sequence g p A serial sequence is executed in the exact order given: B = (A); C = g(b); E = p(c); F = q(a); q Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 6
7 Superscalar Sequence g h p q Developer writes serial code: B = (A); C = g(b); E = (C); F = h(c); G = g(e,f); P = p(b); Q = q(b); R = r(g,p,q); g r However, tasks only need to be ordered by data dependencies Depends on limiting scope o data dependencies Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 7
8 Selection c g The condition is evaluated irst, then one o two tasks is executed based on the result. IF (c) { } ELSE { g } Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 8
9 Speculative Selection c g Examples: collision culling; ray tracing; clipping; discrete event simulation; search Both sides o a conditional and the condition are evaluated in parallel, then the unused branch is cancelled. SELECT (c) { } ELSE { g } Eort in cancelled task wasted Use only when a computational resource would otherwise be idle, or tasks are on critical path Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 9
10 Map Map replicates a unction over every element o an index set (which may be abstract or associated with the elements o an array). A = map(,b); Examples: gamma correction and thresholding in images; color space conversions; Monte Carlo sampling; ray tracing. This replaces one speciic usage o iteration in serial programs: processing every element o a collection with an independent operation. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 10
11 Reduction Reduce combines every element in a collection into one element using an associative operator. b = reduce(,b); For example, reduce can be used to ind the sum or maximum o an array. Examples: averaging o Monte Carlo samples; convergence testing; image comparison metrics; sub-task in matrix operations. There are some variants that arise rom combination with partition and search Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 11
12 Scan Scan computes all partial reductions Allows parallelization o many 1D recurrences Requires an associative operator Requires 2n work over serial execution, but lg n steps Examples: integration, sequential decision simulations in inancial engineering, can also be used to implement pack Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 12
13 Recurrences Examples: ininite impulse response ilters; sequence alignment (Smith- Waterman dynamic programming); matrix actorization Recurrences arise rom the data dependency pattern given by nested loopcarried dependencies. nd recurrences can always be parallelized over n-1 dimensions by Lamport s hyperplane theorem Execution o parallel slices can be perormed either via iterative map or via waveront parallelism Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 13
14 Recurrences: Implementation Note Implementation can use blocking or higher perormance When combined with the pipeline pattern recurrences implements waveront computation Can also be combined with superscalar execution (see recent ICS paper...) Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 14
15 Partition Examples: JPG and other macroblock compression; divideand-conquer matrix multiplication; coherency optimization or conebeam reconstruction Partition breaks an input collection into a collection o collections Useul or divide-and-conquer algorithms Variants: Uniorm: dice Non-uniorm: segment Overlapping: tile Issues: How to deal with boundary conditions? Partitions don t move data, they just provide an alternative view o its organization Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 15
16 Stencil Apply unction to all neighbourhoods o an array Neighbourhoods given by set o relative osets Optimized implementation requires blocking and sliding windows Boundary modes on array accesses useul Examples: image iltering including convolution, median, anisotropic diusion; simulation including luid low, electromagnetic, and inancial PDE solvers, lattice QCD Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 16
17 Pipeline Tasks can be organized in chain with local state Useul or serially dependent tasks like codecs Whole chain applied like map to collection or stream Implementation o many sub-patterns may be optimized or pipeline execution when inside this pattern Examples: codecs with variablerate compression; video processing; spam iltering. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 17
18 Pack Pack allows deletion o elements rom a collection and elimination o unused space Useul when used with map and other patterns to avoid unnecessary output Examples: narrow-phase collision detection pair testing (only want to report valid collisions), peak detection or template matching. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 18
19 Expand Expand allows element o map operation to insert any number o elements (including none) into its output stream Examples: broad-phase collision detection pair testing (want to report potentially colliding pairs); compression and decompression. Useul when used with map and other patterns to support variable-rate output Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 19
20 Fused Patterns Programs are built rom combinations o patterns Should be able to use patterns or perormance May be useul to explicitly support speciic combinations Examples: Gather = map + random read Scatter = map + random write Map + reduce or preprocessing beore reduction Map + pack/expand or culling operations Partition + reduce or multidimensional reduction Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 20
21 Search/Match Examples: computation o metrics on segmented regions in vision; computation o web analytics Searching and matching undamental capabilities Use to select data or another operation, by creating a (virtual) collection or partitioned collection. Example: category reduction reduces all elements in an array with the same label, and is the orm used in Google s map-reduce Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 21
22 Gather Map + Random Read Read rom a random (computed) location in an array When used inside a map or as a collective, becomes a parallel operation Views into arrays, but no global pointers Write-ater-read semantics or kernels to avoid races A B C D E F G B F A C C E Examples: sparse matrix operations; ray tracing; proximity queries; collision detection. August 18, 2008 Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 22
23 *!Scatter Map + Random Write Write into a random (computed) location in an array When used inside a map, becomes a parallel operation Race conditions possible when there are duplicate write addresses ( collisions ) To obtain deterministic scatter, need a deterministic rule to resolve collisions A B C D E F Examples: marking pairs in collision detection; handling database update transactions. C A? F B Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 23
24 *Permutation Scatter Make collisions illegal Only guaranteed to work i no duplicate addresses Danger is that programmer will use it when addresses do in act have collisions, then will depend on undeined behaviour Similar saety issue as with out-o-bounds array accesses. Can test or collisions in debug mode A B C D E F Examples: FFT scrambling; matrix/image transpose; unpacking. C A E D F B Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 24
25 *Merge Scatter Use an associative operator to combine values upon collision Problem: as with reduce, depends on programmer to deine associative operator Gives non-deterministic read-modiy-write when used with nonassociative operators Due to structured nature o other patterns, can still provide tool to check or race conditions Examples: histogram; mutual inormation and entropy; database updates Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 25
26 !Atomic Scatter Resolve collisions atomically but non-deterministically Use o this pattern will result in non-deterministic programs Structured nature o rest o patterns makes it possible to test or race conditions A B C D E F Examples: marking pairs in collision detection; computing set intersection or union (used in text databases) C A D F B or E Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 26
27 Priority Scatter Assign every parallel element a priority NOTE: Need hierarchical structure o other patterns to do this Deterministically determine winner based on priority When converting rom serial code, priority can be based on original ordering, giving results consistent with serial program Eicient implementation is similar to hierarchical z-buer A B C D E F C A E F B Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 27
28 Nesting Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 28
29 Conclusion Patterns can be used to reason about and organize development o parallel algorithms and programming models Integrating these patterns into Ct or heterogeneous computing Many useul patterns are deterministic Compositions o deterministic patterns lead to deterministic programs Discussion: Are there a smaller number o primitive patterns? Are any important patterns missing? Can structured be well-deined? How important are non-deterministic patterns? Can any o these be considered structured? Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 29
30 BACKUP Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners.
31 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Perormance tests and ratings are measured using speciic computer systems and/or components and relect the approximate perormance o Intel products as measured by those tests. Any dierence in system hardware or sotware design or coniguration may aect actual perormance. Buyers should consult other sources o inormation to evaluate the perormance o systems or components they are considering purchasing. For more inormation on perormance tests and on the perormance o Intel products, reerence Intel, Intel Core and the Intel logo are trademarks o Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property o others. Copyright Intel Corporation. Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 31
32 Challenge: Multiple Parallelism Mechanisms Modern processors have many kinds o parallelism: Pipelining SIMD within a register (SWAR) vectorization Superscalar instruction issue or VLIW Overlapping memory access with computation (preetch) Simultaneous multithreading (hyperthreading) per core Multiple cores Multiple processors Asynchronous host and accelerator execution HPC adds: clusters, distributed memory, grid Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 32
33 General Factors Aecting Algorithm Perormance 1. Parallelism Choose or design a good parallel algorithm Large amount o latent parallelism, low serial overhead Asymptotically eicient Should scale to large number o processing elements 2. Locality Eicient use o the memory hierarchy More requent use o aster local memory Coherent use o memory and data transer Good alignment, predictable memory access; blocking High arithmetic intensity Sotware & Services Group, Developer Products Division *Other brands and names are the property o their respective owners. 33
Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville
Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information
More informationStructured Parallel Programming Patterns for Efficient Computation
Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO
More informationStructured Parallel Programming
Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO
More informationDigital Image Processing. Image Enhancement in the Spatial Domain (Chapter 4)
Digital Image Processing Image Enhancement in the Spatial Domain (Chapter 4) Objective The principal objective o enhancement is to process an images so that the result is more suitable than the original
More informationParallel Programming Patterns
Parallel Programming Patterns Pattern-Driven Parallel Application Development 7/10/2014 DragonStar 2014 - Qing Yi 1 Parallelism and Performance p Automatic compiler optimizations have their limitations
More informationNeighbourhood Operations
Neighbourhood Operations Neighbourhood operations simply operate on a larger neighbourhood o piels than point operations Origin Neighbourhoods are mostly a rectangle around a central piel Any size rectangle
More informationIntel Array Building Blocks
Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call
More informationMATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM
UDC 681.3.06 MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM V.K. Pogrebnoy TPU Institute «Cybernetic centre» E-mail: vk@ad.cctpu.edu.ru Matrix algorithm o solving graph cutting problem has been suggested.
More information9. Reviewing Printed Circuit Board Schematics with the Quartus II Software
November 2012 QII52019-12.1.0 9. Reviewing Printed Circuit Board Schematics with the Quartus II Sotware QII52019-12.1.0 This chapter provides guidelines or reviewing printed circuit board (PCB) schematics
More informationEvolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)
Evolving Small Cells Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure) Intelligent Heterogeneous Network Optimum User Experience Fibre-optic Connected Macro Base stations
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationBitonic Sorting Intel OpenCL SDK Sample Documentation
Intel OpenCL SDK Sample Documentation Document Number: 325262-002US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
More informationUsing VCS with the Quartus II Software
Using VCS with the Quartus II Sotware December 2002, ver. 1.0 Application Note 239 Introduction As the design complexity o FPGAs continues to rise, veriication engineers are inding it increasingly diicult
More informationEfficiently Introduce Threading using Intel TBB
Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++
More information10. SOPC Builder Component Development Walkthrough
10. SOPC Builder Component Development Walkthrough QII54007-9.0.0 Introduction This chapter describes the parts o a custom SOPC Builder component and guides you through the process o creating an example
More informationAN 608: HST Jitter and BER Estimator Tool for Stratix IV GX and GT Devices
AN 608: HST Jitter and BER Estimator Tool or Stratix IV GX and GT Devices July 2010 AN-608-1.0 The high-speed communication link design toolkit (HST) jitter and bit error rate (BER) estimator tool is a
More informationEliminate Threading Errors to Improve Program Stability
Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed
More information2. Design Planning with the Quartus II Software
November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various
More informationCase Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing
Case Study Software Optimizing an Illegal Image Filter System Intel Integrated Performance Primitives High-Performance Computing Tencent Doubles the Speed of its Illegal Image Filter System using SIMD
More informationIntel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation
Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationIntel Cluster Ready Allowed Hardware Variances
Intel Cluster Ready Allowed Hardware Variances Solution designs are certified as Intel Cluster Ready with an exact bill of materials for the hardware and the software stack. When instances of the certified
More informationEliminate Threading Errors to Improve Program Stability
Eliminate Threading Errors to Improve Program Stability This guide will illustrate how the thread checking capabilities in Parallel Studio can be used to find crucial threading defects early in the development
More informationПовышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин
Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS
More informationSoftware Occlusion Culling
Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into
More informationBitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved
Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document
More informationSample for OpenCL* and DirectX* Video Acceleration Surface Sharing
Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document
More informationAchieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017
Achieving Peak Performance on Intel Hardware Intel Software Developer Conference London, 2017 Welcome Aims for the day You understand some of the critical features of Intel processors and other hardware
More informationParallel Programming Pa,erns
Parallel Programming Pa,erns Bryan Mills, PhD Spring 2017 What is a programming pa,erns? Repeatable solu@on to commonly occurring problem It isn t a solu@on that you can t simply apply, the engineer has
More informationIntel Stress Bitstreams and Encoder (Intel SBE) 2017 AVS2 Release Notes (Version 2.3)
Intel Stress Bitstreams and Encoder (Intel SBE) 2017 AVS2 Release Notes (Version 2.3) Overview Changes History Installation Package Contents Known Limitations Attributions Legal Information Overview The
More informationAlexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria
Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media
More informationWhat is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology
Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation
More informationStatus. We ll do code generation first... Outline
Status Run-time Environments Lecture 11 We have covered the ront-end phases Lexical analysis Parsin Semantic analysis Next are the back-end phases Optimization Code eneration We ll do code eneration irst...
More informationThis guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.
Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory
More informationIntel Software Development Products Licensing & Programs Channel EMEA
Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of
More informationSupra-linear Packet Processing Performance with Intel Multi-core Processors
White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances
More informationOpportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory
Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Jongsoo Park, Parallel Computing Lab, Intel Corporation with contributions from MKL team 1 Algorithm/
More informationIntel Cache Acceleration Software for Windows* Workstation
Intel Cache Acceleration Software for Windows* Workstation Release 3.1 Release Notes July 8, 2016 Revision 1.3 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationContributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth
Presenter: Surabhi Jain Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth May 25, 2018 ROME workshop (in conjunction with IPDPS 2018), Vancouver,
More informationA Simple Path to Parallelism with Intel Cilk Plus
Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description
More informationGraphics Performance Analyzer for Android
Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent
More informationInstallation Guide and Release Notes
Installation Guide and Release Notes Document number: 321604-001US 19 October 2009 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 System Requirements... 2 1.3 Documentation... 3 1.4
More informationOptimizing the operations with sparse matrices on Intel architecture
Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.
More informationSection III. Advanced Programming Topics
Section III. Advanced Programming Topics This section provides inormation about several advanced embedded programming topics. It includes the ollowing chapters: Chapter 8, Exception Handling Chapter 9,
More informationInstallation Guide and Release Notes
Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel
More informationIntel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides
More informationUsing Intel VTune Amplifier XE for High Performance Computing
Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message
More informationVisualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017
Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference London, 2017 Agenda Vectorization is becoming more and more important What is
More informationOpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing
OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327281-001US
More informationIntel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability
Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability User s Guide Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327283-001US Revision: 1.0 World
More informationIntel and Badaboom Video File Transcoding
Solutions Intel and Badaboom Video File Transcoding Introduction Intel Quick Sync Video, built right into 2 nd generation Intel Core processors, is breakthrough hardware acceleration that lets the user
More information13. Power Management in Stratix IV Devices
February 2011 SIV51013-3.2 13. Power Management in Stratix IV Devices SIV51013-3.2 This chapter describes power management in Stratix IV devices. Stratix IV devices oer programmable power technology options
More informationCS 416: Operating Systems Design March 9, 2015
Page translation Operating Systems 10. Memory Management Part 2 Paging Page number, p Displacement (oset), d = page_table[p] Page Paul Krzyzanowski Rutgers University Spring 2015 CPU Logical address p
More informationC Language Constructs for Parallel Programming
C Language Constructs for Parallel Programming Robert Geva 5/17/13 1 Cilk Plus Parallel tasks Easy to learn: 3 keywords Tasks, not threads Load balancing Hyper Objects Array notations Elemental Functions
More informationClassifier Evasion: Models and Open Problems
Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin I. P. Rubinstein 2, Ling Huang 3, Anthony D. Joseph 1,3, and J. D. Tygar 1 1 UC Berkeley 2 Microsot Research 3 Intel Labs Berkeley
More informationWarps and Reduction Algorithms
Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum
More informationMethod estimating reflection coefficients of adaptive lattice filter and its application to system identification
Acoust. Sci. & Tech. 28, 2 (27) PAPER #27 The Acoustical Society o Japan Method estimating relection coeicients o adaptive lattice ilter and its application to system identiication Kensaku Fujii 1;, Masaaki
More informationEliminate Memory Errors to Improve Program Stability
Introduction INTEL PARALLEL STUDIO XE EVALUATION GUIDE This guide will illustrate how Intel Parallel Studio XE memory checking capabilities can find crucial memory defects early in the development cycle.
More informationA Novel Accurate Genetic Algorithm for Multivariable Systems
World Applied Sciences Journal 5 (): 137-14, 008 ISSN 1818-495 IDOSI Publications, 008 A Novel Accurate Genetic Algorithm or Multivariable Systems Abdorreza Alavi Gharahbagh and Vahid Abolghasemi Department
More informationTeaching Think Parallel
Teaching Think Parallel Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013 1 Tools for Parallel Programming Parallel Models Wildly
More informationMore performance options
More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel
More informationInstallation Guide and Release Notes
Installation Guide and Release Notes Document number: 321604-002US 9 July 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 What s New... 2 1.3 System Requirements... 2 1.4 Documentation...
More informationA Classification System and Analysis for Aspect-Oriented Programs
A Classiication System and Analysis or Aspect-Oriented Programs Martin Rinard, Alexandru Sălcianu, and Suhabe Bugrara Massachusetts Institute o Technology Cambridge, MA 02139 ABSTRACT We present a new
More informationCS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012
CS8/68 Computer Vision Spring 0 Dr. George Bebis Programming Assignment Due Date: /7/0 In this assignment, you will implement an algorithm or normalizing ace image using SVD. Face normalization is a required
More informationExploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur
1 Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur Sven Woop, Carsten Benthin, Ingo Wald, Gregory S. Johnson Intel Corporation Eric Tabellion DreamWorks Animation 2 Legal
More informationTotal No. of Questions : 18] [Total No. of Pages : 02. M.Sc. DEGREE EXAMINATION, DEC First Year COMPUTER SCIENCE.
(DMCS01) Total No. of Questions : 18] [Total No. of Pages : 02 M.Sc. DEGREE EXAMINATION, DEC. 2016 First Year COMPUTER SCIENCE Data Structures Time : 3 Hours Maximum Marks : 70 Section - A (3 x 15 = 45)
More informationIntel Array Building Blocks Technical Presentation: Code Tips
Intel Array Building Blocks Technical Presentation: Code Tips Zhang Zhang Noah Clemons {zhang.zhang, noah.clemons}@intel.com 1 Intel compilers, associated libraries and associated development tools may
More informationPerformance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor
* Some names and brands may be claimed as the property of others. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor E.J. Bylaska 1, M. Jacquelin
More informationGuy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany
Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance
More informationIntel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version )
Intel Media Server Studio 2018 R1 - HEVC Decoder and Encoder Release Notes (Version 1.0.10) Overview New Features System Requirements Installation Installation Folders How To Use Supported Formats Known
More informationIntel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor
Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS
More informationGetting Started with Intel SDK for OpenCL Applications
Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel
More informationIntel Server Board S2600CW2S
Redhat* Testing Services Enterprise Platforms and Services Division Intel Server Board S2600CW2S Server Test Submission (STS) Report For Redhat* Certification Rev 1.0 This report describes the Intel Server
More informationIntel MKL Data Fitting component. Overview
Intel MKL Data Fitting component. Overview Intel Corporation 1 Agenda 1D interpolation problem statement Functional decomposition of the problem Application areas Data Fitting in Intel MKL Data Fitting
More informationH.J. Lu, Sunil K Pandey. Intel. November, 2018
H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes
More informationUnderstanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters
Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters TIPL 4703 Presented by Ken Chan Prepared by Ken Chan 1 Table o Contents What is SNR Deinition o SNR Components
More informationIntel Desktop Board DZ68DB
Intel Desktop Board DZ68DB Specification Update April 2011 Part Number: G31558-001 The Intel Desktop Board DZ68DB may contain design defects or errors known as errata, which may cause the product to deviate
More informationExpand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready
Intel Cluster Ready Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready Legal Disclaimer Intel may make changes to specifications and product descriptions at any time, without notice.
More informationUsing MMX Instructions to Compute the AbsoluteDifference in Motion Estimation
Using MMX Instructions to Compute the AbsoluteDifference in Motion Estimation Information for Developers and ISVs From Intel Developer Services www.intel.com/ids Information in this document is provided
More informationWhat s New August 2015
What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability
More informationBecca Paren Cluster Systems Engineer Software and Services Group. May 2017
Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application
More informationIntel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth
Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3
More informationMoorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones
Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones HOT CHIPS 2009 August 24 2009 Rajesh Patel Lead Architect, Lincroft SoC Intel Corporation Legal Disclaimer INFORMATION
More informationHigh Performance Multiprocessor System
High Performance Multiprocessor System Requirements : - Large Number of Processors ( 4) - Large WriteBack Caches for Each Processor. Less Bus Traffic => Higher Performance - Large Shared Main Memories
More informationStructured Parallel Programming with Deterministic Patterns
Structured Parallel Programming with Deterministic Patterns Michael D. McCool, Intel, michael.mccool@intel.com Many-core processors target improved computational performance by making available various
More informationSection II. Nios II Software Development
Section II. Nios II Sotware Development This section o the Embedded Design Handbook describes how to most eectively use the Altera tools or embedded system sotware development, and recommends design styles
More informationMemory & Thread Debugger
Memory & Thread Debugger Here is What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action Intel Confidential 2 Analysis Tools for Diagnosis
More informationLNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division
LNet Roadmap & Development Amir Shehata Lustre * Network Engineer Intel High Performance Data Division Outline LNet Roadmap Non-contiguous buffer support Map-on-Demand re-work 2 LNet Roadmap (2.12) LNet
More informationIntel Desktop Board D2700DC. PMLP Report. Previously Logo d Motherboard Logo Program (PMLP)
Previously Logo d Motherboard Logo Program (PMLP) Intel Desktop Board D2700DC PMLP Report 1/12/2012 Purpose: This report describes the Board D2700DC Previously Logo d Motherboard Logo Program testing run
More informationIntel C++ Compiler Professional Edition 11.0 for Linux* In-Depth
Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New
More informationSIMULATION OPTIMIZER AND OPTIMIZATION METHODS TESTING ON DISCRETE EVENT SIMULATIONS MODELS AND TESTING FUNCTIONS
SIMULATION OPTIMIZER AND OPTIMIZATION METHODS TESTING ON DISCRETE EVENT SIMULATIONS MODELS AND TESTING UNCTIONS Pavel Raska (a), Zdenek Ulrych (b), Petr Horesi (c) (a) Department o Industrial Engineering
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationSoftware Evaluation Guide for CyberLink MediaEspresso *
Software Evaluation Guide for CyberLink MediaEspresso 6.7.3521* Version 2013-04 Rev. 1.3 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel
More informationParallelism in Software
Parallelism in Software Minsoo Ryu Department of Computer Science and Engineering 2 1 Parallelism in Software 2 Creating a Multicore Program 3 Multicore Design Patterns 4 Q & A 2 3 Types of Parallelism
More informationIntel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics
Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics Programmer's Reference Manual For the 2015-2016 Intel Core Processors, Celeron Processors, and Pentium Processors based on
More informationCollecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers
Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications
More informationUsing Intel VTune Amplifier XE and Inspector XE in.net environment
Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector
More informationBinary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x
Binary recursion Unate unctions! Theorem I a cover C() is unate in,, then is unate in.! Theorem I is unate in,, then every prime implicant o is unate in. Why are unate unctions so special?! Special Boolean
More informationWhat s P. Thierry
What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and
More informationCPSC / Sonny Chan - University of Calgary. Collision Detection II
CPSC 599.86 / 601.86 Sonny Chan - University of Calgary Collision Detection II Outline Broad phase collision detection: - Problem definition and motivation - Bounding volume hierarchies - Spatial partitioning
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More information