HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group
Risk Factors Today s s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent Form 10-Q Q or 10-K K filing available on our website for more information on the risk factors that could cause actual results to differ. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit v Intel Performance Benchmark Limitations (http://www.intel.com/performance/resources/limits.htm www.intel.com/performance/resources/limits.htm). Rev. 7/19/06
Real World Problems Driving Petascale & Beyond 1 ZFlops 100 EFlops 10 EFlops 1 EFlops 1 EFlops 100 PFlops 10 PFlops 1 PFlops 100 TFlops 10 TFlops 1 TFlops 100 GFlops 10 GFlops 1 GFlops Real World Exascale Problem Aerodynamic Example real Analysis: world challenges: 1 Petaflops Laser Full Optics: modeling of an aircraft in all 10 conditions Petaflops Molecular Green airplanes Dynamics in Biology: 20 Petaflops Aerodynamic Genetically Design: tailored medicine 1 Exaflops Computational Understand Cosmology: the origin of the universe 10 Exaflops Turbulence Synthetic in fuels Physics: everywhere 100 Exaflops Computational Accurate extreme Chemistry: weather prediction 1 Zettaflops 100 MFlops 1993 1999 2005 2011 2017 What we can model today with <100TF 2023 SUM Of Top500 #1 Source: Dr. Steve Chen, The Growing HPC Momentum in China,, June 30 th, 2006, Dresden, Germany 2029
Silicon Future 90nm 2003 65nm 2005 45nm 2007 2005 2012 32nm 2009 15nm New Intel technology generation every 2 years Intel R&D technologies drive this pace well into the next decade 25 nm 22nm 2011 16nm 2013 2013 2017 11nm 2015 8nm 2017 Roadmap Research
Yesterday, Today and Tomorrow in HPC ENIAC 20 Numbers in Main Memory CDC 6600 First successful Supercomputer 9MFlops ~2008 Beyond 1946 1965-1977 Climate Astrophysics 1997-2006 2006 Cell-base Community Simulation ASCI Red (word fastest on top500 till 2000) First Teraflop Computer, 9298 Intel Pentium II Xeon Processors Intel ENDEAVOR 464 Intel Xeon Processors 5100 series, 6.85 Teraflop MP Linpack,, #68 on top500 PetaScale Platforms Yesterday s s Supercomputing is Today s s Personal Computing
Intel Design & Process Cadence 2 YEARS Shrink/Derivative Xeon 5000 Intel Core Microarchitecture Xeon 5100 & 5300 65nm Five Microprocessors in One Platform 2 YEARS Shrink/Derivative PENRYN New Microarchitecture NEHALEM 45nm 2 YEARS Shrink/Derivative WESTMERE New Microarchitecture SANDY BRIDGE 32nm 6 All dates, product descriptions, availability and plans are forecasts and subject to change without notice.
Multi and Many Core Multi-core is the current progression of multiple cores on a processor die Many core is a discontinuity of putting many simpler cores on a die. Jim Held will be talking about Intel s s research in this area tomorrow.
Increasing I/O Signaling Rate to Fill the Gap 5,000 4,000 Frequency (Mhz) 3,000 2,000 Core GAP 1,000 0 Bus 1985 1990 1995 2000 2005 2010 Silicon Photonics Source: Intel
100 10 1 Increasing Memory Bandwidth BW (GB/sec) Under 2W 3D Memory Higher BW within Power Envelope to Keep Pace Memory BW Constrained by Power 3D Memory Stacking Power and IO Signals Go Through DRAM to CPU Thin DRAM Die Through DRAM Vias Heat-Sink 0.1 1980 1990 2000 2010 2020 CPU DRAM Package Source: Intel
Power and Cooling Cost Today 10 Kilowatt Hour 9 + Power Delivery Megawatt + Datacenter Cooling + = $14.6M Electricity Costs/Year DATACENTER ENERGY LABEL Assume: 9MW system power, 90% power delivery efficiency, cooling Co-efficiency of Performance (COP)=1.5
Managing Power and Cooling Efficiency Packages Silicon: Moore s s law, Strained silicon, Transistor leakage control techniques, Clock gating Silicon Heat Sinks Processor: Policy-based power allocation Multi-threaded threaded cores System Power Delivery: Fine grain power management, Ultra fine grain power management Transistors Facilities Systems Facilities: Air cooling and liquid cooling options Vertical integration of cooling solutions Power Management: From Transistors to Facilities
DELIVERY INEFFICIENCY 100% CUMULATIVE DATACENTER POWER DELIVERY EFFICIENCY 90% 80% HIGH- EFFICIENCY AC 70% 60% 50% BASELINE AC UNINTERRUPTIBLE POWER SUPPLY POWER DISTRIBUTION UNIT POWER SUPPLY UNIT VOLTAGE REGULATORS Source: Intel
CONVERSION OVERKILL Uninterruptible Power Supply Power Distribution Unit Power Supply Unit 480V AC 480V AC AC AC 208V AC DC DC DC 12V CONVENTIONAL DATACENTER AC POWER DISTRIBUTION
SIMPLIFIED DISTRIBUTION Uninterruptible Power Supply Power Distribution Unit Power Supply Unit 480V AC DC 360V 360V DC DC 12V HIGH VOLTAGE DC POWER DISTRIBUTION
HIGH VOLTAGE DC DATACENTER PROTOTYPE
EFFICIENCY REALIZED 2000 1600 1200 800 400 0 BASELINE AC HIGH VOLTAGE DC Source: Intel
Reliability Challenge Billions of Transistors Soft Error FIT/Chip (Logic & Mem) 1000 100 10 1 Relative FIT/bit (mem( cell): expected to be roughly constant Moore s s law: increasing the bit count exponentially: 2x every 2 years An exponential growth in FIT/chip + V 180 130 90 65 45 32 22 16 Assume: Chip size stays roughly the same with each generation. Soft Errors or Single Event Upsets (SEU) are caused by charge collection following an energetic particle strike. Diffusion - Ion Path + - + - + - + Source: Intel Drift Depletion Region Soft Error: One of Many Challenges Source: Intel
SIO Innovation with Acceleration Intel Architecture with Multi Core General purpose Scalability Economies of Scale Accelerator Accelerator Accelerator Accelerator Memory Hub Hub CPU Intel Architecture with Accelerators Accelerator Accelerator I/O I/O hub hub PCI Special Purpose Performance Geneseo PCIe extensions QuickAssist Software & Tools Add Add ins ins Gb Add Gb Add ins Ethernet* ins Ethernet* LPC Energy Efficient Performance with Multi-threaded threaded Cores & Accelerators
Accelerator Accelerator Accelerator Add ins ins Gb Add Gb ins Ethernet* ins System bottleneck in supporting Accelerators CPU Memory Hub I/O hub CPU SIO LPC PCI Interface Performance Reduce hardware overhead Lower latency for small packets Higher bandwidth for large packets Programming model and tools Ease of programming Reduce software overhead; Commands and data movement Status and synchronization Configuration and error handling Virtualization and power management Scheduling & Memory mgmt. Memory Buffer allocation policy Linear addressing Software & Platform latencies are 100X the physical IO latency for most accelerators.
Questions?