Description Course Summary This course provides students with the knowledge and skills to develop high-performance computing (HPC) applications for Microsoft. Students learn about the product Microsoft, and how to design, debug, tune and run high-performance computing applications under HPC Server 2008. Students also learn the most compelling technologies for building HPC applications, including parametric sweep, multi-threading, OpenMP,.NET Task Parallel Library, MPI, MPI.NET, and HPC Server's SOA-based approach. Students program in Visual C++ as well as C#, and work with both managed and unmanaged code. Objectives At the end of this course, students will be able to: Understand the goals of the high-performance computing (HPC) field. Measure and evaluate the performance of HPC apps. Design HPC apps using the a variety of technologies: parametric sweep, threads, OpenMP, MPI, and SOA. Design HPC apps targeting a variety of hardware: from single-core to multi-core to -based. Implement HPC apps using C++ or C#. Integrate HPC apps with, including a client-friendly front-end. Performance tune HPC applications under. Setup and configure a standard running. Topics Introduction to High-Performance Computing and HPC Server 2008 Multi-threading for Performance The Dangers of Multithreading The HPCS Job Scheduler Parallel Application Design Introduction to OpenMP Introduction to the.net Task Parallel Library Interfacing with HPCS-based Clusters Intro to SOA with HPC Server 2008 Create SOA-based Apps with HPC Server 2008 Audience General Performance Tuning of Parallel Applications Introduction to MPI Data Parallelism and MPI s Collective Operations MPI.NET Using MPI Debugging, Tracing, and Other Tools Designing MPI Applications MPI-2 Excel-based HPC Apps Porting UNIX apps to Open Grid Forum HPC Basic Profile Setup and Administration of Windows HPC Server 2008 This course is intended for software developers who need to develop long-running, compute-intensive, or data-intensive apps targeting multi-core and -based hardware. No experience in the field of high-performance computing is required. Prerequisites Before attending this course, students must have: Basic experience using the Windows platform. Basic programming experience on Windows using Visual Studio. 2 or more years of programming experience in C++ or C#. Duration Five days
Course Outline I. Introduction to High-Performance Computing and HPC Server 2008 This module introduces the field of high-performance computing, the product Microsoft Windows HPC Server 2008, and developing software for HPCSbased s. A. Motivation for HPC B. Brief product history of CCS and HPCS C. Brief overview of HPC Server 2008 components, job submission, scheduler D. Product differentiators E. Software development technologies: parametric sweep, threads, OpenMP, MPI, SOA, etc. F. Measuring performance linear speedup G. Predicting performance Amdahl s law Lab: Introduction to HPC And Windows HPC Server 2008 Submitting and monitoring jobs Running an HPC app Measuring performance Measuring the importance of data locality II. Multi-threading for Performance This module introduces explicit, do-it-yourself multithreading in VC++ and C#. A. Multi-threading for responsiveness and performance B. The costs of multi-threading C. Structured, fork-join parallelism D. Multi-threading in C# using the.net Thread class E. Multi-threading in VC++ using the Windows API F. Load balancing G. Scheduling multi-threaded apps on Windows HPC Server Lab: Multi-threading in VC++ and C# Creating a multi-threaded app III. The Dangers of Multithreading This module discusses the risks of multi-threaded programming (and concurrent programming in general), and then presents strategies for solving the most common pitfalls. A. Race conditions B. Critical sections C. Starvation D. Livelock E. Deadlock F. Compiler and language implications G. Memory models H. Locks I. Interlocking J. Lock-free designs IV. The HPCS Job Scheduler This module introduces the heart of HPCS-based s the Job Scheduler. A. Throughput vs. performance B. Nodes vs. sockets vs. cores C. Jobs vs. Tasks D. Job and task states E. Default scheduling policies F. The impact of job priorities and job preemption G. Job resources and dynamic growing / shrinking H. Submission and activation filters Lab: Working with the Job Scheduler Environment variables in HPC Server 2008 Exit codes and denoting success / failure Checkpointing in case of failure Multi-task jobs and task dependences V. Parallel Application Design This module discusses common design patterns for parallel apps, along with HPCS-specific design issues. A. Two sample design problems B. Foster s method C. Common problem decompositions D. Common communication patterns
E. Computation vs. communication F. Design patterns: master-worker, pipeline, map-reduce, SOA, parametric sweep, and more VI. Introduction to OpenMP This module introduces OpenMP Open MultiProcessing for shared-memory, multithreaded programming in VC++. A. What is OpenMP? B. Shared-memory programming C. Using OpenMP in Visual Studio with VC++ D. Parallel regions E. Execution model F. Data parallelism G. Load balancing, static vs. dynamic scheduling H. Scheduling OpenMP apps on Windows HPC Server Lab: Intro to OpenMP Creating a simple OpenMP app from scratch Using OpenMP to parallelize an existing application VII. Running and measuring performance on the A. Running and measuring performance on the B. Barriers C. Critical sections D. Synchronization approaches E. Implementing common design patterns conditional, task, master-worker, nested F. Data coherence and flushing G. Environment variables H. Common pitfalls VIII. Introduction to the.net Task Parallel Library This module introduces the Task Parallel Library (TPL) for shared-memory, multi-threaded programming in.net 4.0. A. What is the TPL? B. Moving from threads to tasks C. Using the TPL in Visual Studio with C# D. Execution model E. Parallel.For F. Data and task parallelism G. Synchronization approaches H. Concurrent data structures I. Scheduling TPL-based apps on Windows HPC Server Lab: Intro to the TPL Creating a simple TPL-based app from scratch Using the TPL to parallelize an existing application IX. Interfacing with HPCS-based Clusters This module demonstrates the various ways you can interface with, in particular using the HPC Server 2008 API. A. Cluster Manager B. Job Manager C. Job Description Files D. clusrun E. Console window F. PowerShell G. Scripts H. Programmatic access via HPCS API v2.0 Lab: Interfacing with Clusrun is your friend Scripting Using the HPCS API to submit and monitor a job
X. Intro to SOA with HPC Server 2008 This module presents one of the most interesting and unique features of service-oriented HPC. A. Service-oriented architectures B. SOA and WCF C. Mapping SOA onto Jobs and the Job Scheduler D. Private vs. shared sessions E. Secure vs. insecure sessions XI. Create SOA-based Apps with HPC Server 2008 This module presents the details of building a SOAbased HPC app, from start to finish. A. Service-side programming B. Service configuration C. Client-side programming D. WCF configuration and tracing Lab: SOA-based HPC with HPCS and WCF Creating an SOA-based HPC app from start to finish Service-side Client-side XII. General Performance Tuning of Parallel Applications This module discusses various performance tuning strategies on Windows for parallel apps. A. Performance counters B. Heat map in C. Customizing the heat map D. perfmon E. xperf (aka the Windows Performance Toolkit) F. SOA tuning G. What to look for H. Other tools XIII. Introduction to MPI This module introduces *the* most common approach to developing -wide, highperformance applications: the Message-Passing Interface. A. Shared-memory vs. distributed-memory B. The essence of MPI programming message-passing SPMD C. Microsoft MPI D. Using MSMPI in Visual Studio with VC++ E. Execution model F. MPI Send and Receive G. mpiexec H. Scheduling MPI apps on Windows HPC Server Lab: Introduction to MPI Creating a simple MPI app using Send and Receive XIV. Data Parallelism and MPI s Collective Operations This module discusses data parallelism in MPI, and how best to build data parallel MPI apps using its collective operations. A. Data parallelism in MPI B. A real world example C. Broadcast D. Scatter E. Gather F. Barriers G. Reductions H. Defining your own reduction operator I. Common pitfalls Lab: Data Parallelism and MPI s Collective Operations Parallelizing an existing MPI application Mapping Sends and Receives to Broadcast, Scanner, Gather, and All_reduce
XV. MPI.NET This module overviews MPI.NET, a.net wrapper around MSMPI. A. Why MPI.NET? B. Using MPI.NET in Visual Studio with C# C. Type-safe Send and Receive D. Collective operations in MPI.NET E. Execution model F. Scheduling MPI.NET apps on Windows HPC Server XVI. Using MPI Debugging, Tracing, and Other Tools This module dives into the practical realities of using MPI and MPI.NET debugging, tracing options, and other tools of interest. A. Local debugging with Visual Studio B. Remote debugging with Visual Studio C. General MPI tracing D. Tracing with ETW (Event Tracing for Windows) E. Trace visualization F. Other tools for MPI developers Lab: MPI Debugging and Tracing Debugging with Visual Studio Tracing with ETW Viewing traces with Jumpshot and Vampir XVII. Designing MPI Applications This module presents the most common design issues facing MPI developers. A. Hiding latency by overlapping computation and communication B. Avoiding deadlock C. Hybrid designs involving both MPI and OpenMP D. Buffering E. Error handling F. I/O and large datasets XVIII. MPI-2 This module summarizes the advanced features of MPI-2 and MSMPI. A. Groups B. Communicators C. Topologies D. Non-scalar data: packing/unpacking, noncontiguous arrays, and user-defined datatypes E. MPI I/O F. Remote memory access G. [ Dynamic process creation is not supported in MSMPI ] Lab: Working with Advanced Features in MPI-2 MPI Topologies MPI Data types XIX. Excel-based HPC Apps This module presents techniques for bringing the potential of high-performance computing to the world of spreadsheets. A. Excel as a computation engine B. Performing Excel computations on Windows HPC Server 2008 C. Using Excel Services D. Using Excel UDFs E. Future versions of Excel and HPC Server XX. Porting UNIX apps to Windows HPC Server 2008 This module discusses strategies for porting UNIX applications to. A. The most common porting issues B. 32-bit to 64-bit C. UNIX calls D. Manual porting of UNIX code E. Cygwin F. MinGW G. Microsoft SUA Subsystem for UNIXbased Applications
XXI. Open Grid Forum HPC Basic Profile This module introduces the OGF s HPC Basic Profile, and how to enable support in Windows HPC Server 2008. A. What is the OGF HPC Basic Profile? B. Platform-neutral job submission C. JSDL Job Submission Description Language D. Enabling in XXII. Setup and Administration of Windows HPC Server 2008 This module overviews the basic setup and administration of an HPCS-based. A. Hardware requirements B. Software requirements C. Initial decisions D. Headnode setup E. Compute node setup F. Broker node setup G. Developer machine setup H. Diagnostics I. Maintenance including performance J. Troubleshooting