Agus Kurniawan Microsoft MVP Blog http://geeks.netindonesia.net/blogs/agus http://blog.aguskurniawan.net
Agenda Reasons The Paradigm The Technology The People Code Parallel Processing PLINQ Data and Synchronization Tools Debugging Profiling
Why? Because Single-Core Is Slow (Single-Threaded)
Power Density (W/cm 2 ) The Paradigm Hardware limitations due to heat increases Moore's Law implies more Cores not higher speeds 10,000 Sun s Surface 1,000 Rocket Nozzle 100 Nuclear Reactor 10 1 4004 8008 8080 8085 8086 286 386 Hot Plate 486 Pentium processors 70 80 90 00 10
GOPS The Shift More Cores More Parallel Computing 32,768 2,048 128 16 Parallelism Opportunity 80X 2004 2006 2008 2010 2012 2015
The Technology Intel (C) (software.intel.com)
The People "Everyone does it"
Writing Parallel Code is Hard (Multi-Threaded & Multi-Core)
Multi- Threaded Programming It's not easy Error Prone From Race conditions to data sharing Hard to Debug Very hard to reproduce Easy to break (on newer hardware) Dedication and perseverance Businesses want business value not hardcore developers writing threads
Simple Maths Demo 1, 2, 3, 5, 7, 11, 13, 17, 19, 23...
Calculating Prime Numbers List<int> primes = new List<int>(); for (int i = 2; i < upper; i++) { if (SieveOfEratosthenes.IsPrime(i)) primes.add(i); }
"My" Parallel int corecount = Environment.ProcessorCount; int partitionsize = upper / corecount; ManualResetEvent endevent = new ManualResetEvent(false); List<int> result = new List<int>(); object lockobject = new object(); int runninginstances = corecount; for (int i = 0; i < corecount; i++) { // partition per core ThreadPool.QueueUserWorkItem(c => { int core = (int)c; int partitionstart = core * partitionsize; int partitionend = ((core + 1) * partitionsize) - 1; for (int j = partitionstart; j < partitionend; j++) { if (SieveOfEratosthenes.IsPrime(j)) { lock (lockobject) { result.add(j); } } } if (Interlocked.Decrement(ref runninginstances) == 0) { endevent.set(); } }, i); } endevent.waitone(); Static Work Distribution Potential scalability bottleneck Error Prone Error Prone Manual locking Manual Synchronization
Parallel Prime Numbers var result = new ConcurrentBag<int>(); Parallel.For(2, upper, (x) => { if (SieveOfEratosthenes.IsPrime(x)) { result.add(x); } }); var result = new List<int>(); for (int i = 2; i < upper; i++) { if (SieveOfEratosthenes.IsPrime(i)) result.add(i); }
Ray tracer Boids
Architecture public static void Main(string[] args) {... } Declarative Parallel Queries Imperative Parallel Algorithms Compiler (F#, C#, VB...) PLINQ Execution Engine Task Parallel Library Parallel Constructs Task Scheduling Data Structures and Coordination Concurrent Collections Synchronization & Coordination T1 T2 T1 T2 CPU CPU
Library - Parallel Parallel.For & Parallel.ForEach Drop in replacement for for/foreach loop Core Partitioning Scalable Cancellable Parallel.Invoke Action Invoker Cancelation API Cooperative Cancellation
Coordination Structures Collections ConcurrentBag<T> ConcurrentStack<T> BlockingCollection Locks & Signals SpinLock & SpinWain SemaphoreSlim Barrier CountdownEvent
Blend Images
Library - Tasks Task & Task<T> Asynchronous Execution (Blocking) Value on competition Parent & Child Relationships Continuations on success, failure or cancellation Task Scheduler Scheduler on top of the ThreadPool Abstract implementation
Tasks Custom Scheduling Task Stealing Priority Scheduling Round Robin IO Scheduler
Fractals Putting Tasks to do some work
Library - PLINQ LINQ but Parallel IEnumerable<T>.AsParallel() Standard Operators Parallel Extensions Partitioning and Merging API Extensions.AsParallel() Direct Use: ParallelEnumerable.Select/Where...
PLINQ Partitioning: Range & Stride IList<T> Range Continuous ranges Stripped Round Robin, designed for Skip&Take
PLINQ Partitioning: Chunk & Hash IEnumerable<T> Chunk Single Enumerator (locked), chunks of increasing size handled Hash Used for Join, GroupJoin, GroupBy, Distinct, Except, Union and Intersect
PLINQ Ray Tracer
Task Debugging Parallel Tasks All scheduled tasks All running tasks Parallel Stacks Call graphs Call stacks
Parallel + MS Excel 2010
Resources www.microsoft.com/teched Sessions On-Demand & Community www.microsoft.com/learning Microsoft Certification & Training Resources http://microsoft.com/technet Resources for IT Professionals http://microsoft.com/msdn Resources for Developers
2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.