Path to Exascale? Intel in Research and HPC PDF Free Download

Path to Exascale? Intel in Research and HPC 2012

Intel s Investment in Manufacturing New Capacity for 14nm and Beyond D1X Oregon Development Fab Fab 42 Arizona High Volume Fab 22nm Fab Upgrades D1D Oregon D1C Oregon Fab 32 Arizona Fab 28 Israel Fab 12 Arizona

Intel Labs Delivering Breakthrough Technologies to Fuel Intel s Growth Strong Research Partnerships UNIVERSITIES GOVERNMENT World-Class Research Processing and Programming Energy and Sustainability Security and Virtualization Si Photonics and Wireless INDUSTRY User Experience and Interaction and much more!

Intel European Exascale Labs Strong commitment to advance computing leading edge: Intel collaborating with HPC community & European researchers 3 labs in Europe, Exascale computing is the central topic ExaScale Computing Research Center, Paris Exascale Cluster Lab, Jülich Exascience Lab, Leuven Performance and scalability of Exascale applications Exascale cluster scalability and reliability Space weather prediction Architectural simulation and visualization Numerical kernels 4 At the core of Exascale

What s Intel Doing in 2012 in HPC Intel Xeon Processor: E5-2600/4600 Product Families Fabric Technology: Cray s Aries Interconnect Qlogic s TrueScale Product Family Intel Many Integrated Core Architecture

Next Front of System Innovation: Fabrics HPC Expertise Intellectual Property World-class Interconnects HPC Expertise Fabric Management & Software Highest Performance, Scalable IB Products Low-latency Ethernet Switching Data Center Ethernet Expertise High Radix & Low Radix Switch Products Intel s Comprehensive Connectivity and Fabric Portfolio Market Leading Compute & Ethernet Products Platform Expertise Unprecedented Rate of Innovation in HPC Fabric Other brands and names are the property of their respective owners.

Many Core and Multi-Core Many Integrated Cores at 1-1.5 GHz* Multi-core Intel Xeon processor at 2.0-3.5 GHz Die Size not to scale Many core relies on a high degree of parallelism to compensate for the lower speed of each individual core Relatively few specialized applications today are highly parallel, but those applications will benefit from Intel MIC *This is an estimated frequency range for purposes of comparison to multi-core it is not tied to a particular product 7

Intel Xeon Phi Product Family based on Intel Many Integrated Core Architecture Future Knights products Knights Corner Knights Ferry 1 st Intel MIC product 22 nm process >50 Intel Architecture cores In production in 2012 Software Development Platform 8

Spectrum of Execution Models CPU-Centric Intel Xeon Processor Intel MIC-Centric Intel Many Integrated Core (MIC) Multi-core Hosted Offload Symmetric Reverse Offload Many-Core Hosted General purpose serial and parallel computing Codes with balanced needs Highly-parallel codes Codes with highlyparallel phases Codes with serial phases Multi-core Many-core Main( ) MPI_*() Main( ) MPI_*() Main( ) MPI_*() Main( ) MPI_*() Main( ) MPI_*() Main() MPI_*() PCIe Productive Programming Models Across the Spectrum Supported with Intel Tools Not Currently Supported by Language Ext. Offload for Intel Tools

Intel MIC Architecture Code Examples 1. Offloading a function call #pragma offload target (mic) foo(); foo() {... } // Compiled for mic 2. Calculating Pi with automatic offload #pragma offload target (mic) #pragma omp parallel for reduction(+:pi) for (i=0; i<count; i++) { float t = (float)((i+0.5)/count); pi += 4.0/(1.0+t*t); } pi /= count 3. Using MKL with offload void your_hook() { float *A, *B, *C; /* Matrices */ #pragma offload target(mic) in(transa, transb, N, alpha, beta) \ in(a:length(matrix_elements)) \ in(b:length(matrix_elements)) \ in(c:length(matrix_elements)) \ out(c:length(matrix_elements)alloc_if(0)) sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); } 10

Joint venture between CUT & UWA MWA ASKAP SKA

SKA ICT Challenges SKA will generate more data in one day than the whole Internet produces a year (~1.5EB per day). Tim Cornwell CSIRO 12 eresearch 2012

Xeon Phi (Intel MIC) early adoption at ICRAR Started in 2011 KNF A0 -> KNC A0 -> KNC B0 Compression of extremely large spectral-imaging data-cubes (400 TB, SkuareView, TBB, threads) Interferometry radio telescope data processing (software cross-correlation, DiFX, MPI, Pthread) N-body simulations for astrophysics (massive star forming regions, Gadget2, MPI) This technology makes us to believe that we are indeed on the path to Exascale computing required for SKA. 13 eresearch 2012