Blue Gene: A Next Generation Supercomputer (BlueGene/P)
|
|
- Timothy O’Connor’
- 6 years ago
- Views:
Transcription
1 Blue Gene: A Next Generation Supercomputer (BlueGene/P) Presented by Alan Gara (chief architect) representing the Blue Gene team IBM Corporation
2 Outline of Talk A brief sampling of applications on BlueGene/L. A detailed look at the next generation BlueGene (BlueGene/P) Future challenges motivated by a look at computing ~10 to 15 years out. Insight into future generation BlueGene/Q machine.
3 Blue Gene Roadmap Performance Providing unmatched sustained $$/perf and Watts/perf for scalable applications The primary IBM system research vehicle which influences our more traditional PowerPC product line SoC Design Cu-08 (9SF) tech Blue Gene/P (PPC 450: 0.85 GHz) Scalable to 1 PFlops Blue Gene/Q (Power architecture) Scales to 10s of PFlops Top500 list (June 2007) # Vendor Rmax TFlops Installation 1GB Version Available 1Q06 Blue Gene/L (PPC Mhz) Scalable to 360 TFlops LA: 12/04, GA 6/ IBM Cray Cray/Sandia IBM IBM IBM IBM Dell IBM SGI BlueGene/L DOE/NSSA/LLNL ORNL Sandia (Red Storm) BlueGene/L at Watson BlueGene/L at Stony Brook/BNL ASC Purple LLNL BlueGene/L at RPI NCSA Barcelona PowerPC blades Leibniz Rechenzentrum
4 Car-Parrinello Molecular Dynamics (CPMD): Studying the effect on dopants on SiO 2 /Si boundaries Simulations from first principles to understand the physics and chemistry of current technology and guide the design of next-generation materials Characterization of materials currently under experimental test Formation of a Non-abrupt SiO 2 /Si interface correctly predicted from scratch When nitrogen and hafnium are introduced during the simulation process, detrimental defects are unraveled IBM Corporation
5 Blue Brain EPFL to simulate the neocortical column Our understanding of the brain is limited by insufficient information and complexity Overcome limitations of neuroscientific experimentation Inform experimental design and theory Enable scientific discovery for understanding brain function and diseases Finally feasible!!! (although by no means finshed) 8096 processors (BG/L) 100,000 morphologically complex neurons in real-time 35 cm ~10,000 neurons Total area: 1570 cm 2 Thickness: ~3 mm Columns: ~1million 3 mm 50 cm Courtesy of Henry Markham, EPFL 300 µm IBM Corporation
6 POP2 0.1 benchmark 71% of time in solver Projected BlueGene/P Comparison point for same system (node) size. 20% of time in solver Courtesy of M. Taylor, John Dennis
7 Carbon footprint for Courtesy of John Dennis
8 BlueGene/P in Focus 2007 IBM Corporation
9 BlueGene/P Architectural Highlights Scaled performance through density and frequency bump 2x performance through doubling the processors/node 1.2x from frequency bump due to technology Enhanced function 4 way SMP DMA, remote put-get, user programmable memory prefetch Greatly enhanced 64 bit performance counters (including 450 core) Hold BlueGene/L packaging as much as possible: Improve networks through higher speed signaling on same wires Improve power efficiency through aggressive power management Higher signaling rate 2.4x higher bandwidth, improve latency for Torus and Tree networks 10x higher bandwidth for Ethernet IO 72ki nodes in 72 racks should hit 1.00 PF peak.
10 BGP comparison with BGL Property BG/L BG/P Node Properties Node Processors Processor Frequency 2* 440 PowerPC 0.7GHz 4* 450 PowerPC 0.85GHz (target) Coherency Software managed SMP L1 Cache (private) 32KB/processor 32KB/processor L2 Cache (private) 14 stream prefetching 14 stream prefetching L3 Cache size (shared) 4MB 8MB Main Store/node 512MB/1GB 2GB Main Store Bandwidth 5.6GB/s (16B wide) 13.6 GB/s (2*16B wide) Peak Performance 5.6GF/node 13.6 GF/node Torus Network Bandwidth Hardware Latency (Nearest Neighbor) 6*2*175MB/s=2.1GB/s 200ns (32B packet) 1.6us(256B packet) 6*2*425MB/s=5.1GB/s 160ns (32B packet) 500ns(256B packet) Hardware Latency (Worst Case) 6.4us (64 hops) 5us(64 hops) Collective Network Bandwidth Hardware Latency (round trip worst case) 2*350MB/s=700MB/s 5.0us 2*0.85GB/s=1.7GB/s 4us System Properties Peak Performance (72k nodes) 410TF 1PF Total Power 1.7MW 2.7 MW
11 Data 7GB/s Data 7GB/s 7GB/s BlueGene/P node 14GB/s read(each), 14GB/s write(each) PPC 450 FPU PPC 450 FPU PPC 450 FPU PPC 450 FPU L1 L1 L1 L1 Prefetching L2 Prefetching L2 Prefetching L2 Prefetching L2 Multiplexing switch Multiplexing switch 4MB edram L3 4MB edram L3 DDR-2 Controller DDR-2 Controller 4 symmetric ports for collective, torus and global barriers DMA module allows Remote direct put / get JTAG Control Network DMA Torus Collective Barrier 6*3.4Gb/s bidirectional 3*6.8Gb/s bidirectional Arb 10Gb Ethernet 6*3.5Gb/s To 10Gb bidirectional physical layer (Shares I/O with Torus) 13.6GB/s external DDR2 DRAM bus 2*16B 425Mb/s
12 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Blue Gene/P continues Blue Gene s leadership performance in a spacesaving, power-efficient package for the most demanding and scalable high-performance computing applications Blue Gene/P Rack 32 Node Cards 1024 chips, 4096 procs Cabled 8x8x16 System 1 to 72 or more Racks Compute Card 1 chip, 20 DRAMs Chip 4 processors 13.6 GF/s 8 MB EDRAM Node Card (32 chips 4x4x2) 32 compute, 0-2 IO cards 13.6 GF/s 2.0 GB DDR Supports 4-way SMP IBM System Blue Gene /P Solution 435 GF/s 64 GB 14 TF/s 2 TB Front End Node / Service Node System p Servers Linux SLES10 1 PF/s TB + HPC SW: Compilers GPFS ESSL Loadleveler 2007 IBM Corporation
13 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Blue Gene/P Interconnection Networks 3 Dimensional Torus Interconnects all compute nodes Communications backbone for computations Adaptive cut-through hardware routing 3.4 Gb/s on all 12 node links (5.1 GB/s per node) 0.5 µs latency between nearest neighbors, 5 µs to the farthest MPI: 3 µs latency for one hop, 10 µs to the farthest 1.7/2.6 TB/s bisection bandwidth, 188TB/s total bandwidth (72k machine) Collective Network Interconnects all compute and I/O nodes (1152) One-to-all broadcast functionality Reduction operations functionality 6.8 Gb/s of bandwidth per link Latency of one way tree traversal 2 µs, MPI 5 µs ~62TB/s total binary tree bandwidth (72k machine) Low Latency Global Barrier and Interrupt Latency of one way to reach all 72K nodes 0.65 µs, MPI 1.6 µs Other networks 10Gb Functional Ethernet I/O nodes only 1Gb Private Control Ethernet Provides JTAG access to hardware. Accessible only from Service Node system IBM System Blue Gene /P Solution 2007 IBM Corporation
14 BG/L applications easily port to 4-way-virtual-node BG/P. May increase performance through new BG/P features: Program model changes: Support mixed OpenMP +MPI (OpenMP on 4-way in node) Virtual Node mode supported as in BGL. In BGP 4 MPI tasks/node. ptheads supported. BlueGene/P Software Number of threads limited to number of cores (4) DMA engine enables effective offloading of messaging and increases value of overlapping compute with communicate. Messaging library utilizes DMA and is built around put/get functionality. HPC toolkit will enable access to performance counters. (BG/P has processor counts.) BGL model of high performance kernel on compute nodes and linux on I/O nodes. Working on supporting dynamic linking on high performance kernel. Above also enables new applications for BG/P.
15 Future Challenges Insights for BlueGene/Q 2007 IBM Corporation
16 June 15, 2005 Challenges for the future Can get understanding of challenges by projecting to issues in 2023 (Exaflop era) Power is fundamental problem that is pervasive at many system levels (compute, memory, disk) Memory cost and performance is not keeping pace with compute potential Network performance (bandwidth and latency) will be both costly (bandwidth) and will not scale well to Exaflops Ease of use to extract promised performance from compute will be main focus. Big peak Flops is mainly a power problem. Reliability at the Exaflop scale will require a holistic approach at the architecture level. This results from both a lessening of the underlying silicon technology and from the shear number of logic elements. 1E+17 Supercomputer Peak Performance Peak Speed (flops) 1E+14 1E+11 1E+8 1E+5 Doubling time = 1.5 yr. Blue Gene/L ASCI Purple Earth Red Storm Blue Pacific ASCI White ASCI Red Option SX-5 T3E ASCI Red NWT SX-4 CM-5 CP-PACS Paragon Delta T3D i860 (MPPs) SX-3/44 CRAY-2 SX-2 VP2600/10 S-810/20 X-MP4 Y-MP8 Cyber 205 X-MP2 (parallel vectors) CDC STAR-100 (vectors) CRAY-1 CDC 7600 ILLIAC IV CDC 6600 (ICs) IBM Stretch IBM 7090 (transistors) IBM 704 IBM 701 UNIVAC ENIAC (vacuum tubes) 1E Long Term (power cost=system cost) Near Term (performance through exponential growth in parallelism) Current/Past (performance growth through exponential processor performance growth ) Year Introduced Page IBM Corporation
17 Extrapolating an Exaflop in 2023 BlueGene/L (2005) Exaflop Directly scaled Exaflop Educated guess Assumption for Educated guess Node Peak Perf 5.6GF 20TF 20TF Same node count (64k) Number of hardware threads/node Assume 3.5GHz, 3-D packaging System Power Compute Chip 1 MW 4 GW 50 MW 80x improvement (very optimistic) Link Bandwidth (Each unidirectional 3- D link) 1.4Gbps 5 Tbps 1 Tbps Not possible to maintain bandwidth ratio Wires per unidirectional 3-D link wires 100 wires Large wire count will eliminate high density and drive links onto cables where they are 100x more expensive. Pins in network on node 24 pins 6,000 pins 1,200 pins 20 Gbps differential assumed Power in network 100 KW 38 MW 8 MW 10 mw/gbps assumed Memory Bandwidth/node 5.6GB/s 20TB/s 2 TB/s Not possible to maintain external bandwidth/flop L2 cache/node 4 MB 16 GB 500 MB About 6-7 technology generations Data pins associated with memory/node 128 pins 32,000 pins 4000 pins 5 Gbps per pin Power in memory I/O (not DRAM) 12.8 KW 50 MW 6 MW 5 mw/gbps assumed Total problem size (QCD example) 64^3X ^3x ^3x256 Approx equal time to science QCD CG single iteration time 2.3 msec 9.4 usec 15 usec Requires: 1)fast global sum 2)hardware offload for messaging (Driverless messaging) Memory footprint/node 2.7 MB 42 MB 42 MB Memory footprint is no problem Power associated with external memory will force high efficiency computing to reside inside chip. (or chip stack) Network scaling will be both a latency and bandwidth problem. Bandwidth is a cost problem and latency will require hardware offload to avoid nearly all software layers. Processing in a node will be done via thousand(s) of hardware units, each which is only somewhat faster than today s IBM Corporation
18 IBM Research System Power Efficiency GFLOP/s per Watt GFLOP/s per Watt Year Power-efficient design focus QCDSP Columbia Single thread focus design QCDOC Columbia/ IBM Blue Gene/L NASA, SGI Cray XT3 ASCI White NCSA, Xeon LLNL, Itanium 2 Power 3 SX-8 ASCI Q ECMWF, p690 Power 4+ Earth Simulator BG/L BG/P Red Storm Thunderbird Purple Fujitsu Bioserver Blue Gene/P? Commodity driven Large peak power efficiency advantage Still need dramatic improvement to enable computing in the future IBM Corporation
19 June 15, 2005 The Power Problem Thick gate oxide Scaled gate oxide Gate Field effect transistor t 1.2 nm oxynitride Oxide thickness is near the limit. Traditional CMOS scaling has ended. Density improvements will continue but power efficiency from technology will only improve very slowly. CMOS alone will no longer enable faster computers with similar power Solution is not known! Architecture can help (to some extent) witness the better power efficiency of commodity processors from simplification New circuits can also help This problem needs to be addressed now. 250 TF 1 PF 10 PF 100 PF 1000 PF If power efficiency does not improve Projected Year BlueGene/L 1.0 MWatt 2.5 MWatt 25 MWatt 250 MWatt 2.5 GWatt Earth Simulator 100 MWatt 200 MWatt 2 GWatt 20 GWatt 200 GWatt MareNostrum 5 MWatt 15 MWatt 150 MWatt 1.5 GWatt 15 GWatt Page IBM Corporation
20 Summary/Conclusion BlueGene/L has achieved an application reach far broader than expected (or targeted in the design) Partnership and collaboration have been critical to exploiting BlueGene/L BlueGene/P is an architectural evolution from BlueGene/L Enhancements from BlueGene/L such as a hardware DMA engine promise same or better per node scaling on BlueGene/P. BlueGene/P offers a fully coherent 4-way node with a software stack designed to exploit parallelism. BlueGene/P will offer approximately 2-3x speed up with respect to BlueGene/L for same node count. Future Trends Power will be a severe constraint in the future (and now) Large systems will have millions of threads, each similar in performance to today. Challenges of power will apply to all systems (commercial and HPC). Market forces in commodity commercial world could result in a different, potentially not well aligned with HPC, direction. Reliability of systems in the future will require a holistic approach to reach the extreme levels of scalability. Latency in networks will become a pinch point for capability computing.
Stockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationArchitecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY
Architecture of the IBM Blue Gene Supercomputer Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY President Obama Honors IBM's Blue Gene Supercomputer With National Medal
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationOutline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers
Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationBlue Gene/Q A system overview
Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q A system overview M. Stephan Outline Blue Gene/Q hardware design Processor Network I/O node Jülich Blue Gene/Q configurations (JUQUEEN) Blue Gene/Q software
More informationPorting Applications to Blue Gene/P
Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P
More informationSlides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2
Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era 11/16/2011 Many-Core Computing 2 Gene M. Amdahl, Validity of the Single-Processor Approach to Achieving
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationECMWF HPC Workshop 10/26/2004. IBM s High Performance Computing Strategy. Dr Don Grice Distinguished Engineer, HPC Solutions
Research ECMWF HPC Workshop 10/26/2004 s High Performance Computing Strategy Dr Don Grice Distinguished Engineer, HPC Solutions Research HPC Key to an Innovation Economy Life Sciences: Digital Media: Increasing
More informationHigh Performance Computing: Blue-Gene and Road Runner. Ravi Patel
High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability
More informationPetaFlop+ Supercomputing. Eric Kronstadt IBM TJ Watson Research Center Yorktown Heights, NY IBM Corporation
PetaFlop+ Supercomputing Eric Kronstadt IBM TJ Watson Research Center Yorktown Heights, NY Multiple PetaFlops - Why should one care? President s Information Technology Advisory Committee (PITAC) report
More informationInfluence of Technology Directions on System Architecture. Dr. Randy Isaac VP of Science and Technology IBM Research Division September 10, 2001
Influence of Technology Directions on System Architecture Dr. Randy Isaac VP of Science and Technology IBM Research Division September 10, 2001 Moore's Law continues beyond conventional scaling Power becomes
More informationAn Overview of High Performance Computing
IFIP Working Group 10.3 on Concurrent Systems An Overview of High Performance Computing Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 1/3/2006 1 Overview Look at fastest computers
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationSC2002, Baltimore (http://www.sc-conference.org/sc2002) From the Earth Simulator to PC Clusters
SC2002, Baltimore (http://www.sc-conference.org/sc2002) From the Earth Simulator to PC Clusters Structure of SC2002 Top500 List Dinosaurs Department Earth simulator US -answers (Cray SX1, ASCI purple),
More informationJeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt
IBM Research PCB Overview Jeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt November, 2009 November, 2009 2009 IBM
More informationArchitetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia
Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia Marco Briscolini Deep Computing Sales Marco_briscolini@it.ibm.com IBM Pathways to Deep Computing Single Integrated
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationExascale: Parallelism gone wild!
IPDPS TCPP meeting, April 2010 Exascale: Parallelism gone wild! Craig Stunkel, Outline Why are we talking about Exascale? Why will it be fundamentally different? How will we attack the challenges? In particular,
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationHPC Technology Trends
HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationTechnology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect
Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect Today s Focus Areas For Discussion Will look at various technologies
More informationThe Road from Peta to ExaFlop
The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units
More informationScalable Multiprocessors
Topics Scalable Multiprocessors Scaling issues Supporting programming models Network interface Interconnection network Considerations in Bluegene/L design 2 Limited Scaling of a Bus Comparing with a LAN
More informationMaking a Case for a Green500 List
Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges
More informationWhat have we learned from the TOP500 lists?
What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots
More informationEE 4683/5683: COMPUTER ARCHITECTURE
3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics
More informationBrand-New Vector Supercomputer
Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth
More informationHW Trends and Architectures
Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty
More informationInterconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp
Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationTechnology Trends Presentation For Power Symposium
Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationHigh-Performance Computing & Simulations in Quantum Many-Body Systems PART I. Thomas Schulthess
High-Performance Computing & Simulations in Quantum Many-Body Systems PART I Thomas Schulthess schulthess@phys.ethz.ch What exactly is high-performance computing? 1E10 1E9 1E8 1E7 relative performance
More informationInitial Performance Evaluation of the Cray SeaStar Interconnect
Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on
More informationSupercomputers. Alex Reid & James O'Donoghue
Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationPresentation of the 16th List
Presentation of the 16th List Hans- Werner Meuer, University of Mannheim Erich Strohmaier, University of Tennessee Jack J. Dongarra, University of Tennesse Horst D. Simon, NERSC/LBNL SC2000, Dallas, TX,
More informationBlue Gene/P Universal Performance Counters
Blue Gene/P Universal Performance Counters Bob Walkup (walkup@us.ibm.com) 256 counters, 64 bits each; hardware unit on the BG/P chip 72 counters are in the clock-x1 domain (ppc450 core: fpu, fp load/store,
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimised Programming Preliminary discussion, 17.7.2007 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de Dipl.-Geophys.
More informationNetworks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.
Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationMulti-core Programming - Introduction
Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationThe Impact of Optics on HPC System Interconnects
The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes
More informationLecture 20: Distributed Memory Parallelism. William Gropp
Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order
More informationFuture of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1
Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationCommodity Cluster Computing
Commodity Cluster Computing Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne http://capawww.epfl.ch Commodity Cluster Computing 1. Introduction 2. Characterisation of nodes, parallel machines,applications
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationApplication Performance on Dual Processor Cluster Nodes
Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationHigh Performance MPI on IBM 12x InfiniBand Architecture
High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationHow то Use HPC Resources Efficiently by a Message Oriented Framework.
How то Use HPC Resources Efficiently by a Message Oriented Framework www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivanova Institute of Information and Communication Technologies Bulgarian Academy of Science
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationPower Challenges in Extreme Scale Computing
Power Challenges in Extreme Scale Computing Hans Jacobson IBM T. J. Watson Research Center hansj@us.ibm.com ECTC Plenary Session - June, 20 Extreme Scale Computing What is extreme scale computing? Exascale
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationPerformance and Power Co-Design of Exascale Systems and Applications
Performance and Power Co-Design of Exascale Systems and Applications Adolfy Hoisie Work with Kevin Barker, Darren Kerbyson, Abhinav Vishnu Performance and Architecture Lab (PAL) Pacific Northwest National
More informationTowards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems
Towards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems Robert Birke, German Rodriguez, Cyriel Minkenberg IBM Research Zurich Outline High-performance computing:
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationSystem Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited
System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited 2018 IEEE 68th Electronic Components and Technology Conference San Diego, California May 29
More informationIntroduction of Fujitsu s next-generation supercomputer
Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of
More informationIntroduction. Summary. Why computer architecture? Technology trends Cost issues
Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have
More informationCS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*
CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 2014 COMPUTERS : PRESENT, PAST & FUTURE Top 10 Supercomputers in the World as of November 2013* No Site Computer Cores Rmax + (TFLOPS) Rpeak (TFLOPS)
More informationThe TOP500 Project of the Universities Mannheim and Tennessee
The TOP500 Project of the Universities Mannheim and Tennessee Hans Werner Meuer University of Mannheim EURO-PAR 2000 29. August - 01. September 2000 Munich/Germany Outline TOP500 Approach HPC-Market as
More informationTOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology
TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationIBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation
IBM HPC DIRECTIONS Dr Don Grice ECMWF Workshop November, 2008 IBM HPC Directions Agenda What Technology Trends Mean to Applications Critical Issues for getting beyond a PF Overview of the Roadrunner Project
More informationThe Center for Computational Research & Grid Computing
The Center for Computational Research & Grid Computing Russ Miller Center for Computational Research Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst NSF, NIH, DOE NIMA, NYS,
More informationInfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014
InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More information