Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB

Size: px
Start display at page:

Download "Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB"

Transcription

1 Beyond Threads: Scalable, Composable, Parallelism with Intel Cilk Plus and TBB Jim Cownie Intel SSG/DPD/TCAR 1

2 Optimization Notice Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #

3 Performance Trends After ~2004 only the number of transistors continues to increase We have hit limits in Power Instruction level parallelism Clock speed Single core scalar performance is now only growing slowly 3

4 But Moore s law is alive and well 90nm nm nm 2007 We will have lots of transistors! New Intel technology generation every 2 years Intel R&D technologies drive this pace well into the decade 32nm nm 25 nm Hi-K metal-gate 22nm D Trigate 14nm nm 2015 Shrink 4

5 How do we use all the transistors? Eat other system components Graphics Memory i/f PCI i/f Add cache Replicate cores This is a desktop part, but it has four cores each with two HW threads and 256 bit (8 single or 4 double) SIMD FP units Number of cores will continue to increase in the future Data and thread parallelism are mandatory to achieve highest performance 5

6 How do we use all the transistors for HPC? Many Integrated Core ( MIC, aka Knights ) >50 cache coherent cores, 4 HW threads/core 512 bit vector FPU/core 22nm process Extended x86 ISA Linux kernel Fortran, C, C++, Cilk, OpenMP, MPI, Demonstrated >1 TFlop sustained on DGEMM Data and thread parallelism are even more important here! 6

7 Exascale trends US government wants 1 ExaFlop in 20MW in 2018 Critical issues Power (requires 300x improvement in energy efficiency!) Reliability Programmability (MPI + what?) Did I mention Power? Architecture: Cluster of SMP nodes Each node will have lots (100s..1000s?) of cores Each core will have wide vector units Data and thread parallelism become more important Homogeneous MPI parallelism won t cut it 7

8 But, didn t we solve threading in the 1990s? Pthreads standard: OpenMP standard: 1997 Yes, but IEEE c-1995 How do I choose how many threads to use? How do I split up my work, should I have a function/thread? How do I debug with non-determinism? How do I balance load between threads? What happens if I call a library that also wants to use threads? What happens on a new machine with more cores? Programming with threads is HARD 8

9 The answer was in Seattle 9

10 Scalable, Composable Parallelism Scalable: a single binary can exploit all the cores in the HW it happens to be running on Efficiently Without requiring user control Scalable software benefits from future HW Composable: Parallelism can be used at all levels of SW stack (user code, library, nested library, ) Without over-subscription With parallelism exploited at each level Composable software allows use of parallel libraries 10

11 What s wrong with OpenMP? Parallelism is compulsory You know which thread you are: omp_get_thread_num() You know how many threads exist: omp_get_num_threads() You control how work is assigned to threads: schedule( ) OpenMP gives you lots of control but you end up tuning for the current machine OpenMP gives you too many knobs to play with! 11

12 What s wrong with OpenMP? Static scheduling can t handle jitter If one thread runs slowly (OS interrupt, more cache/tlb misses) all threads have to wait With more cores jitter is more likely Nested parallelism is dangerous If OMP_NESTED=false, inner parallelism is not exploited If OMP_NESTED=true, it s easy to get exponential over-subscription OpenMP is not composable 12

13 OK, but how can I have parallelism without threads? Think about the parallelism in your problem Describe the way your problem can be broken down into independent computations (tasks) Let the runtime do the hard work handle allocation of tasks to threads to ensure efficient execution choose the number of threads to use depending on available hardware You don t normally worry about register allocation, similarly you shouldn t worry about threads 13

14 Key Features of Cilk Plus Small extensions to C and C++ Express the independent tasks in your code Express the vector operations in your code Results are deterministic There is a serial elision of the parallel code Formal properties Guaranteed memory limits: executing on n-threads uses <= n times memory of serial code Provably efficient work-stealing scheduler Tools support: Cilk screen, Cilk view Public specification with an open-source implementation in a GCC branch Cilk lets programmers think about their problem, not the runtime implementation 14

15 Example: Fibonacci Numbers The Fibonacci numbers are the sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34,, where each number is the sum of the previous two. Recurrence: F 0 = 0, F 1 = 1, F n = F n 1 + F n 2 for n > 1 It is named after Leonardo di Pisa ( CE), known as Fibonacci. Fibonacci s 1202 book Liber Abaci introduced the sequence to Western mathematics, though it had previously been discovered in India. 15

16 Fibonacci Execution int fib(int n) { if (n < 2) return n; int x = fib(n-1); int y = fib(n-2); return x + y; } fib(3) Key idea for parallelization: fib(n-1) and fib(n-2) can be calculated simultaneously fib(4) fib(2) fib(2) fib(1) fib(1) fib(0) fib(1) fib(0) 16

17 Nested Parallelism in Cilk Plus int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); } cilk_sync; return x+y; Parallelism is introduced recursively so composability happens trivially. The named child function may execute in parallel with the caller Control cannot pass here until all spawned children have returned Cilk keywords grant permission for parallel execution. They do not force it. Code with the Cilk keywords macro-ed out is a correct serial version. 17

18 So, you can handle functional code, but what about real code? Recursion is hard, we re not all Lisp programmers! cilk_for compiles a loop into a recursive parallel task decomposition of the iteration space Real code has global variables whose update from parallel tasks would be racy Races are Hard to detect (non-deterministic values) Hard to fix (need to modify every access and add locking) Solution Cilk screen for detecting problems Reducers for removing them Cilk Plus is more than just the language extensions 18

19 Cilk screen Cilk screen runs on the executable image using metadata embedded by the compiler No need for a special build For a given input, and lock-free code, Cilk screen guarantees to localize a race if there exists a parallel execution that could produce results different from the serial execution It runs about 20 times slower than real-time Address of data Location of 1 st access Location of 2 nd access Backtrace 2 nd access 19

20 Reducer Hyperobjects A variable can be declared as a reducer over an associative operation, e.g. multiplication, logical AND, list concatenation, Strands can update the variable as if it were an ordinary variable, but it is maintained as a collection of different views The runtime system coordinates the views and combines them when appropriate When only one view remains, the underlying value is stable and can be extracted Example: summing reducer x: 42 x: 14 x:

21 Reducers in Cilk Plus You can write your own reducers with any reduction operation Reducers can be used (though less elegantly) in C Updates local view of sum cilk::reducer_opadd<float> sum = 0;... cilk_for( size_t i=1; i<n; ++i ) sum += f(i); Read final value of sum... = sum.get_value(); Not lexically bound to a particular loop Reducers simplify race removal 21

22 Vector language features, the Plus Similar to Fortran 90 vector language but in C/C++ no vector temporaries introduced by the compiler Explicit vector expressions x[:] = a*x[:] + y[:]; // Known lengths x[0:count] = a*x[0:count] + y[0:count]; x[0:n:2] = a*x[0:n:2] + y[0:n:2]; // Strided x[i1[:]] = y[i2[:]] // scatter, gather Elemental functions #pragma simd to force vectorization Generated code is comparable with hand-coded intrinsics Explicit vector language makes it easier to exploit SIMD instructions efficiently 22

23 Performance Tuning: Cilk view output for Cache-Oblivious Stencil A cache-oblivious stencil algorithm gives good parallelism and minimizes cache misses by using divide-and-conquer in all dimensions including time. Measured Speedup Available Parallelism Algorithm designed by Frigo and Strumpen in "Cache-oblivious stencil computations" (ICS '05) Linear Speedup Burdened Speedup 47.93

24 What if I don t want a new language? Use Threading Building Blocks (TBB) Open Source (GPL) C++ template library Ported to many machines and OSes (not just Intel architectures) Cilk like concepts of tasks, a work-stealing runtime Scalable, composable (and composes with Cilk) Additional features beyond Cilk s recursive parallelism Pipelines of tasks (TBB::pipeline) General dependency DAGs of tasks (TBB::flow::graph) Task priorities Parallel containers Memory allocator optimized for parallel use TBB provides portable task-based parallelism in C++ without requiring new language features. 24

25 Summary On modern and foreseeable processors Data parallelism (vectorization) matters Task parallelism matters Dealing with threads directly is too hard to be a feasible solution Restricts parallelism to one level of the software stack Hard to scale forwards as new hardware appears OpenMP has trouble scaling and composing Tasking systems like Cilk Plus and TBB are available now and are better alternatives Check out

26 26

27 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vpro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vpro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright Intel Corporation. 27

28 Notes and References Slide 3: Graph by Herb Sutter from Slide 5: More Ivybridge info is available from Slide 9: Photos taken by Jim Cownie, used with permission Slide 14: Information about Intel Cilk Plus can be found at along with pointers to the open source implementation. The implementation of the Cilk-5 multithreaded language ( ) Slide 23: Cilk view is described in The Cilkview scalability analyzer ( Cache oblivious stencil computations ( Slide 24: TBB is described at 28

C Language Constructs for Parallel Programming

C Language Constructs for Parallel Programming C Language Constructs for Parallel Programming Robert Geva 5/17/13 1 Cilk Plus Parallel tasks Easy to learn: 3 keywords Tasks, not threads Load balancing Hyper Objects Array notations Elemental Functions

More information

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012

Parallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012 Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2

More information

GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance

GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance GAP Guided Auto Parallelism A Tool Providing Vectorization Guidance 7/27/12 1 GAP Guided Automatic Parallelism Key design ideas: Use compiler to help detect what is blocking optimizations in particular

More information

Software Tools for Software Developers and Programming Models

Software Tools for Software Developers and Programming Models Software Tools for Software Developers and Programming Models James Reinders Director, Evangelist, Intel Software james.r.reinders@intel.com 1 Our Goals for Software Tools and Models 2 Our Goals for Software

More information

Cilk Plus in GCC. GNU Tools Cauldron Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation

Cilk Plus in GCC. GNU Tools Cauldron Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation Cilk Plus in GCC GNU Tools Cauldron 2012 Balaji V. Iyer Robert Geva and Pablo Halpern Intel Corporation July 10, 2012 Presentation Outline Introduction Cilk Plus components Implementation GCC Project Status

More information

Intel MKL Data Fitting component. Overview

Intel MKL Data Fitting component. Overview Intel MKL Data Fitting component. Overview Intel Corporation 1 Agenda 1D interpolation problem statement Functional decomposition of the problem Application areas Data Fitting in Intel MKL Data Fitting

More information

Intel Parallel Amplifier Sample Code Guide

Intel Parallel Amplifier Sample Code Guide The analyzes the performance of your application and provides information on the performance bottlenecks in your code. It enables you to focus your tuning efforts on the most critical sections of your

More information

Intel(R) Threading Building Blocks

Intel(R) Threading Building Blocks Getting Started Guide Intel Threading Building Blocks is a runtime-based parallel programming model for C++ code that uses threads. It consists of a template-based runtime library to help you harness the

More information

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors Application Note May 2008 Order Number: 319801; Revision: 001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Getting Compiler Advice from the Optimization Reports

Getting Compiler Advice from the Optimization Reports Getting Compiler Advice from the Optimization Reports Getting Started Guide An optimizing compiler can do a lot better with just a few tips from you. We've integrated the Intel compilers with Intel VTune

More information

Using Intel Inspector XE 2011 with Fortran Applications

Using Intel Inspector XE 2011 with Fortran Applications Using Intel Inspector XE 2011 with Fortran Applications Jackson Marusarz Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Parallel Programming Models

Parallel Programming Models Parallel Programming Models Intel Cilk Plus Tasking Intel Threading Building Blocks, Copyright 2009, Intel Corporation. All rights reserved. Copyright 2015, 2011, Intel Corporation. All rights reserved.

More information

Overview of Intel Parallel Studio XE

Overview of Intel Parallel Studio XE Overview of Intel Parallel Studio XE Stephen Blair-Chappell 1 30-second pitch Intel Parallel Studio XE 2011 Advanced Application Performance What Is It? Suite of tools to develop high performing, robust

More information

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Alexander Kalinkin Anton Anders Roman Anders 1 Legal Disclaimer INFORMATION IN

More information

What's new in VTune Amplifier XE

What's new in VTune Amplifier XE What's new in VTune Amplifier XE Naftaly Shalev Software and Services Group Developer Products Division 1 Agenda What s New? Using VTune Amplifier XE 2013 on Xeon Phi coprocessors New and Experimental

More information

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин

Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Повышение энергоэффективности мобильных приложений путем их распараллеливания. Примеры. Владимир Полин Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

Techniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line

Techniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line Techniques for Lowering Power Consumption in Design Utilizing the Intel Integrated Processor Product Line Order Number: 320180-003US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division Intel MKL Sparse Solvers - Agenda Overview Direct Solvers Introduction PARDISO: main features PARDISO: advanced functionality DSS Performance data Iterative Solvers Performance Data Reference Copyright

More information

Intel IT Director 1.7 Release Notes

Intel IT Director 1.7 Release Notes Intel IT Director 1.7 Release Notes Document Number: 320156-005US Contents What s New Overview System Requirements Installation Notes Documentation Known Limitations Technical Support Disclaimer and Legal

More information

Intel(R) Threading Building Blocks

Intel(R) Threading Building Blocks Getting Started Guide Intel Threading Building Blocks is a runtime-based parallel programming model for C++ code that uses threads. It consists of a template-based runtime library to help you harness the

More information

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen*

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen* How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen* Technical Brief v1.0 September 2011 Legal Lines and Disclaimers INFORMATION IN THIS

More information

Open FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters

Open FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters Open FCoE for ESX*-based Intel Ethernet Server X520 Family Adapters Technical Brief v1.0 August 2011 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Intel C++ Compiler Documentation

Intel C++ Compiler Documentation Document number: 304967-001US Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY

More information

Intel MPI Library for Windows* OS

Intel MPI Library for Windows* OS Intel MPI Library for Windows* OS Getting Started Guide The Intel MPI Library is a multi-fabric message passing library that implements the Message Passing Interface, v2 (MPI-2) specification. Use it to

More information

Multicore programming in CilkPlus

Multicore programming in CilkPlus Multicore programming in CilkPlus Marc Moreno Maza University of Western Ontario, Canada CS3350 March 16, 2015 CilkPlus From Cilk to Cilk++ and Cilk Plus Cilk has been developed since 1994 at the MIT Laboratory

More information

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism. Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,

More information

Product Change Notification

Product Change Notification Product Change Notification 111213-02 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

A Simple Path to Parallelism with Intel Cilk Plus

A Simple Path to Parallelism with Intel Cilk Plus Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description

More information

Cilk Plus GETTING STARTED

Cilk Plus GETTING STARTED Cilk Plus GETTING STARTED Overview Fundamentals of Cilk Plus Hyperobjects Compiler Support Case Study 3/17/2015 CHRIS SZALWINSKI 2 Fundamentals of Cilk Plus Terminology Execution Model Language Extensions

More information

C Language Constructs for Parallel Programming

C Language Constructs for Parallel Programming C Language Constructs for Parallel Programming Robert Geva 10/23/12 1 Today s objective Present a proposal for addition of language constructs for parallel programming to C Get feedback: Is there an interest

More information

VTune(TM) Performance Analyzer for Linux

VTune(TM) Performance Analyzer for Linux VTune(TM) Performance Analyzer for Linux Getting Started Guide The VTune Performance Analyzer provides information on the performance of your code. The VTune analyzer shows you the performance issues,

More information

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Boot-Up Options

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Boot-Up Options Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Boot-Up Options Application Note September 2004 Document Number: 254067-002 Contents INFORMATION IN THIS DOCUMENT IS

More information

Intel Software Development Products Licensing & Programs Channel EMEA

Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Licensing & Programs Channel EMEA Intel Software Development Products Advanced Performance Distributed Performance Intel Software Development Products Foundation of

More information

Overview of Intel MKL Sparse BLAS. Software and Services Group Intel Corporation

Overview of Intel MKL Sparse BLAS. Software and Services Group Intel Corporation Overview of Intel MKL Sparse BLAS Software and Services Group Intel Corporation Agenda Why and when sparse routines should be used instead of dense ones? Intel MKL Sparse BLAS functionality Sparse Matrix

More information

Introduction to Intel Fortran Compiler Documentation. Document Number: US

Introduction to Intel Fortran Compiler Documentation. Document Number: US Introduction to Intel Fortran Compiler Documentation Document Number: 307778-003US Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Using the Intel VTune Amplifier 2013 on Embedded Platforms

Using the Intel VTune Amplifier 2013 on Embedded Platforms Using the Intel VTune Amplifier 2013 on Embedded Platforms Introduction This guide explains the usage of the Intel VTune Amplifier for performance and power analysis on embedded devices. Overview VTune

More information

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Achieving High Performance Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Does Instruction Set Matter? We find that ARM and x86 processors are simply engineering design points optimized

More information

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP

More information

ECC Handling Issues on Intel XScale I/O Processors

ECC Handling Issues on Intel XScale I/O Processors ECC Handling Issues on Intel XScale I/O Processors Technical Note December 2003 Order Number: 300311-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Third Party Hardware TDM Bus Administration

Third Party Hardware TDM Bus Administration Third Party Hardware TDM Bus Administration for Windows Copyright 2003 Intel Corporation 05-1509-004 COPYRIGHT NOTICE INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Continuous Speech Processing API for Host Media Processing

Continuous Speech Processing API for Host Media Processing Continuous Speech Processing API for Host Media Processing Demo Guide April 2005 05-2084-003 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Intel Platform Controller Hub EG20T

Intel Platform Controller Hub EG20T Intel Platform Controller Hub EG20T UART Controller Driver for Windows* Programmer s Guide Order Number: 324261-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Efficiently Introduce Threading using Intel TBB

Efficiently Introduce Threading using Intel TBB Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++

More information

Product Change Notification

Product Change Notification Product Change Notification 110606-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth Presenter: Surabhi Jain Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth May 25, 2018 ROME workshop (in conjunction with IPDPS 2018), Vancouver,

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 114547-01 Change Title: Intel Dual Band Wireless-AC 3165 SKUs: 3165.NGWG.I; 3165.NGWGA.I; 3165.NGWG.S; 3165.NGWG; 3165.NGWGA.S; 3165.NGWGA, PCN 114547-01,

More information

Product Change Notification

Product Change Notification Product Change Notification 110813-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor PCI 16-Bit Read Implementation

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor PCI 16-Bit Read Implementation Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor PCI 16-Bit Read Implementation Application Note September 2004 Document Number: 300375-002 INFORMATION IN THIS DOCUMENT

More information

Product Change Notification

Product Change Notification Product Change Notification 110988-01 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 114137-00 Change Title: Intel Dual Band Wireless-AC 8260, Intel Dual Band Wireless-N 8260, SKUs: 8260.NGWMG.NVS, 8260.NGWMG.S, 8260.NGWMG, 8260.NGWMG.NV

More information

Product Change Notification

Product Change Notification Product Change Notification 110867-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov

Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov Mikhail Dvorskiy, Jim Cownie, Alexey Kukanov What is the Parallel STL? C++17 C++ Next An extension of the C++ Standard Template Library algorithms with the execution policy argument Support for parallel

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 115338-00 Change Title: Intel Dual Band Wireless-AC 7265 and Intel Dual Band Wireless-N 7265 SKUs: 7265.NGWANG.W; 7265.NGWG.NVBR; 7265.NGWG.NVW; 7265.NGWG.W;

More information

Intel Fortran Composer XE 2011 Getting Started Tutorials

Intel Fortran Composer XE 2011 Getting Started Tutorials Intel Fortran Composer XE 2011 Getting Started Tutorials Document Number: 323651-001US World Wide Web: http://developer.intel.com Legal Information Contents Legal Information...5 Introducing the Intel

More information

Product Change Notification

Product Change Notification Product Change Notification 110880-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 115169-01 Change Title: Intel Dual Band Wireless-AC 8265 SKUs: 8265.D2WMLG; 8265.D2WMLG.NV; 8265.D2WMLG.NVH; 8265.D2WMLGH; 8265.D2WMLG.NVS; 8265.D2WMLG.S;

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 114332-00 Change Title: Intel Dual Band Wireless-AC 7260, Intel Dual Band Wireless-N 7260, Intel Wireless-N 7260, SKUs: 7260.NGIANG, 7260.NGIG, 7260.NGINBG,

More information

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Intel IXP400 Software: Integrating STMicroelectronics* ADSL MTK20170* Chipset Firmware

Intel IXP400 Software: Integrating STMicroelectronics* ADSL MTK20170* Chipset Firmware Intel IXP400 Software: Integrating STMicroelectronics* ADSL MTK20170* Chipset Firmware Application Note September 2004 Document Number: 254065-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1.

Moore s Law. Multicore Programming. Vendor Solution. Power Density. Parallelism and Performance MIT Lecture 11 1. Moore s Law 1000000 Intel CPU Introductions 6.172 Performance Engineering of Software Systems Lecture 11 Multicore Programming Charles E. Leiserson 100000 10000 1000 100 10 Clock Speed (MHz) Transistors

More information

MayLoon User Manual. Copyright 2013 Intel Corporation. Document Number: xxxxxx-xxxus. World Wide Web:

MayLoon User Manual. Copyright 2013 Intel Corporation. Document Number: xxxxxx-xxxus. World Wide Web: Copyright 2013 Intel Corporation Document Number: xxxxxx-xxxus World Wide Web: http://www.intel.com/software Document Number: XXXXX-XXXXX Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel C++ Studio XE 2013 for Windows* Installation Guide and Release Notes Document number: 323805-003US 26 June 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.1.1 Changes since Intel

More information

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Using Intel VTune Amplifier XE and Inspector XE in.net environment Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector

More information

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding

More information

Product Change Notification

Product Change Notification Product Change Notification 111962-00 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

Intel Platform Controller Hub EG20T

Intel Platform Controller Hub EG20T Intel Platform Controller Hub EG20T Packet HUB Driver for Windows* Programmer s Guide February 2011 Order Number: 324265-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Cilk Plus: Multicore extensions for C and C++

Cilk Plus: Multicore extensions for C and C++ Cilk Plus: Multicore extensions for C and C++ Matteo Frigo 1 June 6, 2011 1 Some slides courtesy of Prof. Charles E. Leiserson of MIT. Intel R Cilk TM Plus What is it? C/C++ language extensions supporting

More information

Product Change Notification

Product Change Notification Product Change Notification 112087-00 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 114216-00 Change Title: Intel SSD 730 Series (240GB, 480GB, 2.5in SATA 6Gb/s, 20nm, MLC) 7mm, Generic Single Pack, Intel SSD 730 Series (240GB, 480GB,

More information

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor

More information

Intel Platform Controller Hub EG20T

Intel Platform Controller Hub EG20T Intel Platform Controller Hub EG20T Inter Integrated Circuit (I 2 C*) Driver for Windows* Programmer s Guide Order Number: 324258-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

What s New August 2015

What s New August 2015 What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability

More information

Product Change Notification

Product Change Notification Product Notification Notification #: 114712-01 Title: Intel SSD 750 Series, Intel SSD DC P3500 Series, Intel SSD DC P3600 Series, Intel SSD DC P3608 Series, Intel SSD DC P3700 Series, PCN 114712-01, Product

More information

CSE 613: Parallel Programming

CSE 613: Parallel Programming CSE 613: Parallel Programming Lecture 3 ( The Cilk++ Concurrency Platform ) ( inspiration for many slides comes from talks given by Charles Leiserson and Matteo Frigo ) Rezaul A. Chowdhury Department of

More information

Product Change Notification

Product Change Notification Product Change Notification 113412-00 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY

More information

Intel Software Development Products for High Performance Computing and Parallel Programming

Intel Software Development Products for High Performance Computing and Parallel Programming Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 115107-00 Change Title: Intel Ethernet Converged Network Adapter X520 - DA1, E10G41BTDAPG1P5,, MM#927066, Intel Ethernet Converged Network Adapter X520

More information

Cilk User s Guide. Document Number: US

Cilk User s Guide. Document Number: US Document Number: 322581-001US Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R) PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL

More information

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes Document number: 323803-001US 4 May 2011 Table of Contents 1 Introduction... 1 1.1 What s New... 2 1.2 Product Contents...

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 115007-00 Change Title: Select Intel SSD 530 Series, Intel SSD 535 Series, Intel SSD E 5410s Series, Intel SSD E 5420s Series, Intel SSD PRO 2500 Series,

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

Virtual PLATFORMS for complex IP within system context

Virtual PLATFORMS for complex IP within system context Virtual PLATFORMS for complex IP within system context VP Modeling Engineer/Pre-Silicon Platform Acceleration Group (PPA) November, 12th, 2015 Rocco Jonack Legal Notice This presentation is for informational

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 114258-00 Change Title: Intel SSD DC S3710 Series (200GB, 400GB, 800GB, 1.2TB, 2.5in SATA 6Gb/s, 20nm, MLC) 7mm, Generic 50 Pack Intel SSD DC S3710 Series

More information

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009

Concepts in. Programming. The Multicore- Software Challenge. MIT Professional Education 6.02s Lecture 1 June 8 9, 2009 Concepts in Multicore Programming The Multicore- Software Challenge MIT Professional Education 6.02s Lecture 1 June 8 9, 2009 2009 Charles E. Leiserson 1 Cilk, Cilk++, and Cilkscreen, are trademarks of

More information

Getting Started Tutorial: Analyzing Memory Errors

Getting Started Tutorial: Analyzing Memory Errors Getting Started Tutorial: Analyzing Memory Errors Intel Inspector XE 2011 for Linux* OS Fortran Sample Application Code Document Number: 326596-001 World Wide Web: http://developer.intel.com Legal Information

More information

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems.

This guide will show you how to use Intel Inspector XE to identify and fix resource leak errors in your programs before they start causing problems. Introduction A resource leak refers to a type of resource consumption in which the program cannot release resources it has acquired. Typically the result of a bug, common resource issues, such as memory

More information

Product Change Notification

Product Change Notification Product Change Notification 112177-01 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

Eliminate Threading Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability Introduction This guide will illustrate how the thread checking capabilities in Intel Parallel Studio XE can be used to find crucial threading defects early in the development cycle. It provides detailed

More information

Product Change Notification

Product Change Notification Product Change Notification 110046-00 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

Product Change Notification

Product Change Notification Product Change Notification 113028-02 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY

More information

Intel EP80579 Software Drivers for Embedded Applications

Intel EP80579 Software Drivers for Embedded Applications Intel EP80579 Software Drivers for Embedded Applications Package Version 1.0 Release Notes October 2008 Order Number: 320150-005US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Vectorization Advisor: getting started

Vectorization Advisor: getting started Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 115190-03 Change Title: Intel Omni-Path Director Class Switch 100 Series 24 Slot Base 1MM 100SWD24B1N Date of Publication: March 1, 2017 Intel Omni-Path

More information

Enabling Hardware Accelerated Playback for Intel Atom /Intel US15W Platform and IEGD

Enabling Hardware Accelerated Playback for Intel Atom /Intel US15W Platform and IEGD White Paper Ishu Verma Software Technical Marketing Engineer Intel Corporation Enabling Hardware Accelerated Playback for Intel Atom /Intel US15W Platform and IEGD Case Study Using MPlayer on Moblin March,

More information

Product Change Notification

Product Change Notification Product Change Notification Change Notification #: 114840-00 Change Title: Intel Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16 Standard 100HFA016FS, Intel Omni-Path Host Fabric Interface

More information