Top500

Similar documents
It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

Chapter 1. Introduction

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1

CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics

Emerging Heterogeneous Technologies for High Performance Computing

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

20 Jahre TOP500 mit einem Ausblick auf neuere Entwicklungen

Trends in HPC (hardware complexity and software challenges)

HPC as a Driver for Computing Technology and Education

High Performance Computing in Europe and USA: A Comparison

High-Performance Computing - and why Learn about it?

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Roadmapping of HPC interconnects

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

An Overview of High Performance Computing

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

What have we learned from the TOP500 lists?

HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)

China's supercomputer surprises U.S. experts

Fabio AFFINITO.

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory

HPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek

ECE 574 Cluster Computing Lecture 2

An approach to provide remote access to GPU computational power

CINECA and the European HPC Ecosystem

Jack Dongarra INNOVATIVE COMP ING LABORATORY. University i of Tennessee Oak Ridge National Laboratory

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*

Presentation of the 16th List

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

The TOP500 list. Hans-Werner Meuer University of Mannheim. SPEC Workshop, University of Wuppertal, Germany September 13, 1999

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

High Performance Computing in Europe and USA: A Comparison

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems

The TOP500 Project of the Universities Mannheim and Tennessee

Why we need Exascale and why we won t get there by 2020

Confessions of an Accidental Benchmarker

An Overview of High Performance Computing. Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 11/29/2005 1

HPC Technology Update Challenges or Chances?

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

TOP500 Listen und industrielle/kommerzielle Anwendungen

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

Chapter 5b: top500. Top 500 Blades Google PC cluster. Computer Architecture Summer b.1

Jack Dongarra University of Tennessee Oak Ridge National Laboratory

HPC IN EUROPE. Organisation of public HPC resources

Seagate ExaScale HPC Storage

2014 China TOP100 List of High Performance Computer

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Stockholm Brain Institute Blue Gene/L

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

Steve Scott, Tesla CTO SC 11 November 15, 2011

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

HPC Algorithms and Applications

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

Supercomputers Prestige Objects or Crucial Tools for Science and Industry?

Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Hybrid Architectures Why Should I Bother?

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

Supercomputing im Jahr eine Analyse mit Hilfe der TOP500 Listen

Fujitsu s Approach to Application Centric Petascale Computing

Trends in HPC Architectures and Parallel

A unified Energy Footprint for Simulation Software

Overview of HPC and Energy Saving on KNL for Some Computations

Practical Scientific Computing

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader

High Performance Computing

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar

CCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM

High Performance Linear Algebra

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

Supercomputers. Alex Reid & James O'Donoghue

for Supercomputers Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), J. Habich (a) HPC Services Regionales Rechenzentrum Erlangen (b)

HPC Capabilities at Research Intensive Universities

HPC Resources & Training

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Timothy Lanfear, NVIDIA HPC

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Cray XC Scalability and the Aries Network Tony Ford

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

PART I - Fundamentals of Parallel Computing

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Making a Case for a Green500 List

Transcription:

Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1

2

MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance numbers reflect the largest problem run on a given machine R max the performance in Gflops for the largest problem run on a machine N max the size of the largest problem run on a machine N 1/2 the size where half the R max execution rate is achieved R peak the theoretical peak performance in Glops for the machine 3

Linpack benchmark Pros One number: R max Simple to define and and use to rank Allows problem size to change with machine and over time Cons Emphasizes only peak speed and number of CPUs Does not stress the networks Ignores Amdhal s Law (changing problem size when more CPU are exploited) 4

5

6

ASCI: Advanced Simulation and Computing Program Ricerca finanziata dal governo americano per simulare, soprattutto, sistemi d arma 7

8

9

Top 10 remarks (June 2007) A lot of shuffling among the top-ranked systems No. 1: The BlueGene/L (development by IBM and DOE s National Nuclear Security Administration (NNSA) ) reached a Linpack benchmark performance of 280.6 TFlop/s ( teraflops = 10 12 ) Two other systems exceeded the level of 100 TFlop/s: the upgraded Cray XT4/XT3 at DOE s Oak Ridge National Laboratory, ranked No. 2 with a benchmark performance of 101.7 TFlop/s; Sandia National Laboratory s Cray Red Storm system, which ranked third at 101.4 TFlop/s. Two new IBM BlueGene/L systems entered the Top 10 (New York and Troy: the largest supercomputing installations in an academic setting. The fastest supercomputer in Europe is an IBM JS21 cluster at the Barcelona Supercomputing Center in Spain, which ranked No. 9 at 62.63 TFlop/s. The highest ranked Japanese system is located at the Tokyo Institute of Technology and ranks No. 14 on the list. This system is a cluster integrated by NEC based on Sun Fire x4600 with Opteron processors, ClearSpeed accelerators and an InfiniBand interconnect. 10

11

12

Top 10 of Top 500 (June 2009) Rmax and Rpeak values are in GFlops. Power data in KW for entire system Rank Computer Cores Rmax Rpeak (Gflops) (Gflops) Nmax Power (kw) Processor 1 BladeCenter QS22/LS21 Cluster 129600 1105000 1456700 2329599 2483,47 PowerXCell 8i 2 Cray XT5 QC 2.3 GHz 150152 1059000 1381400 4712799 6950,6 AMD x86_64 Opteron Quad Core 3 Blue Gene/P Solution 294912 825500 1002700 0 2268 PowerPC 450 4 SGI Altix ICE 8200EX, Xeon QC 3.0/2.66 GHz 51200 487005 608829 2300760 2090 Intel EM64T Xeon E54xx (Harpertown) 5 eserver Blue Gene Solution 212992 478200 596378 2456063 2329,6 PowerPC 440 6 Cray XT5 QC 2.3 GHz 66000 463300 607200 0 0 AMD x86_64 Opteron Quad Core 7 Blue Gene/P Solution 163840 458611 557056 0 1260 PowerPC 450 8 SunBlade x6420, Opteron QC 2.3 Ghz, Infiniband 62976 433200 579379 0 2000 AMD x86_64 Opteron Quad Core 9 Blue Gene/P Solution 147456 415700 501350 2958335 1134 PowerPC 450 Sun Constellation, NovaScale Intel EM64T Xeon 10 R422-E2 26304 274800 308283 0 1549 X55xx (Nehalem-EP) 46: CINECA-IT, IBM Power 575, p6 4.7 GHz, Infiniband, Year 2009 Cores=5376 Rmax=78680 RPeak=101069 Nmax=907199 Power=859 13

Top 10 remarks (June 2009) HPC entered a new realm 1 petaflop/s: one quadrillion (10 15 ) floating point operations per second) The new No. 1 system, built by IBM for the U.S. Department of Energy s Los Alamos National Laboratory and called Roadrunner, 1.105 petaflop/s the most energy efficient systems on the TOP500. Roadrunner system is based on the IBM QS22 blades, built with advanced versions of the processor in the Sony PlayStation 3 Blue Gene/P, with a performance of 825.5 teraflop/s is now ranked No. 3, and is located in Germany All the top-10 positions are in the U.S. but the 2 nd and the 10 th Intel powers an increasing number of Top500 supercomputers: 75% TOP500 now also provides energy efficiency calculations The position 14, 15, 16, and 18 corresponds to machines placed in Saudia Arabia, China, Canada, and India, respectively 14

Roadrunner Custom Configuration. Specialized tri-blade combined configuration: Two IBM QS22 blade servers (Cell) One IBM LS21 blade server (AMD Opteron) A total of 3,060 tri-blades built in IBM s Rochester, Minn. plant. Each tri-blade unit can run at 400 billion operations per second (400 Gigaflops). Standard processing (e.g., file system I/O) is handled by the Opteron processors. Mathematically and CPU-intensive elements are directed to the Cell processors. Open-source Linux software from Red Hat IBM is developing new software (targeting commercial applications) to make Cell-powered hybrid computing broadly accessible. Financial services (cause and effect in capital markets in real-time), energy exploration and medical imaging (real-time 3D rendering of tissue and bones) industries among others. 15

Roadrunner DOE s (Department of Energy) National Nuclear Security Administration selected Los Alamos National Laboratory as the development site for Roadrunner and IBM as the computer s designer and builder Roadrunner will primarily be used to ensure the safety and reliability of the nation s nuclear weapons stockpile. It will also be used for research into astronomy, energy, human genome science and climate change. Roadrunner is the world s first hybrid supercomputer. In a first-of-a-kind design, the Cell Broadband Engine -- originally designed for video game platforms such as the Sony Playstation 3 -- will work in conjunction with x86 processors from AMD Roadrunner connects 6,562 dual-core AMD Opteron chips 12,240 Cell chips (on IBM Model QS22 blade servers). The Roadrunner has 98 terabytes of memory, and is housed in 278 refrigerator-sized IBM BladeCenter racks occupying 5,200 square feet ( 440 m 2 ). Infiniband and Gigabit Ethernet 55 miles of fiber optic cable. 16

Top 10 of Top 500 (June 2010) Rmax and Rpeak values are in TFlops. Power data in KW for entire system 17

Top 10 remarks (June 2010) The Chinese System with Intel Xeon X5650 processors (6 cores) and NVidia Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a Chinese system ever achieved. There are now 2 Chinese systems in the TOP10 and 24 in the TOP500 overall. The Jaguar system at Oak Ridge National Laboratory managed to hold the No. 1 spot with 1.75 PFlop/s Linpack performance even as it s peak performance is lower than the Chinese Nebulae system. The most powerful system in Europe is an IBM BlueGene/P system at the German Forschungszentrum Juelich (FZJ) which dropped to No. 5. Intel dominates the high-end processor market 81.6 percent of all systems and over 90 percent of quad-core based systems. The Intel Core i7 (Nehalem-EP) processors increased their presence in the list with 186 systems compared with 95 in the last list. Other notable systems are: The Tianhe-1 system at No. 7, which is a hybrid design with Intel Xeon processors and AMD GPUs. The TH-1 uses AMD GPUs as accelerators. Each node consists of two AMD GPUs attached to two Intel Xeon processors. 18

Top 10 of Top 500 (June 2011) Rmax and Rpeak values are in TFlops. Power data in KW for entire system 19

Top 10 remarks (June 2011) A Japanese supercomputer capable of performing more than 8 petaflop/ s is the new number one system in the world, putting Japan back in the top spot for the first time since the Earth Simulator was dethroned in November 2004, according to the latest edition of the TOP500 List of the world s top supercomputers. The system, called the K Computer, is at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe. For the first time, all of the top 10 systems achieved petaflop/s performance. The K Computer, built by Fujitsu, currently combines 68544 SPARC64 VIIIfx CPUs, each with eight cores, for a total of 548,352 cores almost twice as many as any other system in the TOP500. The K Computer is also more powerful than the next five systems on the list combined. The K Computer s name draws upon the Japanese word "Kei" for 10^16 (ten quadrillions), representing the system's performance goal of 10 petaflops. RIKEN is the Institute for Physical and Chemical Research. Unlike the Chinese system it displaced from the No. 1 slot and other recent very large system, the K Computer does not use graphics processors or other accelerators. The K Computer is also one of the most energy-efficient systems on the list. 20

China builds petaflop supercomputer without AMD, Intel or Nvidia Oct 31 2011 COMMUNIST China has built its first supercomputer using chips designed and manufactured in China instead of relying on AMD, Intel or Nvidia. China's new Sunway Bluelight MPP was installed in the country's National Supercomputer Center in Jinan in September, with estimates pegging the cluster somewhere around the petaflop mark. The cluster is made up of 8,700 Shenwei SW1600 processors, which are completely designed and manufactured in China. China's past success in the HPC arena has been down to good old fashioned American technology. The chips might have been baked in China but the design was done by US-based firms. The 8,700 Shenwei SW1600 processors represent the country's first comprehensive design and construction effort to build a large scale HPC cluster. Since the TOP500 list and supercomputers in general are often viewed as objects of national pride, it's no surprise that China wants to produce its own supercomputer. While the Shenwei cluster isn't quite ready to usurp Japan's K-Computer, few will bet against China heading the TOP500 list with its own chips before too long. 21

Top 10 of Top 500 (June 2012) Rank Site Computer/Year Vendor Cores R max R peak Power 1 DOE/NNSA/LLNL United States Sequoia - BlueGene/Q, Power BQC 16C 1.60 GHz, Custom / 2011 IBM 1572864 16324.75 20132.66 7890.0 2 RIKEN Advanced Institute for Computational Science (AICS) Japan K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect / 2011 Fujitsu 705024 10510.00 11280.38 12659.9 3 DOE/SC/Argonne National Laboratory United States Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2012 IBM 786432 8162.38 10066.33 3945.0 4 Leibniz Rechenzentrum Germany SuperMUC - idataplex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR / 2012 IBM 147456 2897.00 3185.05 3422.7 5 National Supercomputing Center in Tianjin China Tianhe-1A - NUDT YH MPP, Xeon X5670 6C 2.93 GHz, NVIDIA 2050 / 2010 NUDT 186368 2566.00 4701.00 4040.0 6 DOE/SC/Oak Ridge National Laboratory United States Jaguar - Cray XK6, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA 2090 / 2009 Cray Inc. 298592 1941.00 2627.61 5142.0 7 CINECA Italy Fermi - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2012 IBM 163840 1725.49 2097.15 821.9 8 Forschungszentrum Juelich (FZJ) Germany JuQUEEN - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2012 IBM 131072 1380.39 1677.72 657.5 9 CEA/TGCC-GENCI France Curie thin nodes - Bullx B510, Xeon E5-2680 8C 2.700GHz, Infiniband QDR / 2012 Bull 77184 1359.00 1667.17 2251.0 10 National Supercomputing Centre in Shenzhen (NSCS) China Nebulae - Dawning TC3600 Blade System, Xeon X5650 6C 2.66GHz, Infiniband QDR, NVIDIA 2050 / 2010 Dawning 120640 1271.00 2984.30 2580.0 22

Top 10 of Top 500 (June 2012) MANNHEIM, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn. For the first time since November 2009, a United States supercomputer sits atop the TOP500 list of the world s top supercomputers. Named Sequoia, the IBM BlueGene/Q system installed at the Department of Energy s Lawrence Livermore National Laboratory achieved an impressive 16.32 petaflop/s on the Linpack benchmark using 1,572,864 cores. Sequoia is also one of the most energy efficient systems on the list, which will be released Monday, June 18, at the 2012 International Supercomputing Conference in Hamburg, Germany. This will mark the 39th edition of the list, which is compiled twice each year. On the latest list, Fujitsu s K Computer installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan, is now the No. 2 system with 10.51 Pflop/s on the Linpack benchmark using 705,024 SPARC64 processing cores. The K Computer held the No. 1 spot on the previous two lists. Italy makes its debut in the Top 10 with an IBM BlueGene/Q system installed at CINECA. The system is at No. 7 on the list with 1.72 Pflop/s performance. In all, four of the top 10 supercomputers are IBM BlueGene/Q systems. France occupies the No. 9 spot with a homegrown Bull supercomputer. 23

Sequoia BlueGen/Q Sequoia BlueGen/Q System Performance by the Numbers: 16.32 petaflops of sustained performance and a theoretical peak performance of 20.1 petaflops 98,304 computer nodes feature 1.6 million cores with 1GB of RAM per core 1.6 petabytes of RAM total 1.57 million PowerPC cores Parallel design is based on IBM s 18-core PowerPC A2 processor Interconnect speeds clock in at 40Gb/sec with a node to node latency hop of 2.5 microseconds The Sequoia uses 7.89 megawatts of power (in comparison, the #2 supercomputer, Japan s K Computer, uses 20 megawatts of power) 24

Top 10 of Top 500 (June 2013) 25

Top 10 of Top 500 (June 2013) Tianhe-2, a supercomputer developed by China s National University of Defense Technology, is the world s new No. 1 system with a performance of 33.86 petaflop/s on the Linpack benchmark, according to the 41stedition of the twiceyearlytop500 list of the world s most powerful supercomputers. The list was announced June 17 during the opening session of the 2013 International Supercomputing Conference in Leipzig, Germany. Tianhe-2 has 16,000 nodes, each with two Intel Xeon IvyBridge processors and three Xeon Phi processors for a combined total of 3,120,000 computing cores. Intel Xeon Phi coprocessors provide up to 61 cores, 244 threads, and 1.2 teraflops of performance, and they come in a variety of configurations to address diverse hardware, software, workload, performance, and efficiency requirements. Titan, a Cray XK7 system installed at the U.S. Department of Energy s (DOE) Oak Ridge National Laboratory and previously the No. 1 system, is now ranked No. 2. Titan achieved 17.59 petaflop/s on the Linpack benchmark using 261,632 of its NVIDIA K20x accelerator cores. Titan is one of the most energy efficient systems on the list, consuming a total of 8.21 MW and delivering 2,143 Mflops/ W. 26

Top 10 of Top 500 (June 2015) 27

Top 10 of Top 500 (June 2015) For the fifth consecutive time, Tianhe-2, a supercomputer developed by China s National University of Defense Technology, has retained its position as the world s No. 1 system, according to the 45th edition of the twice-yearly TOP500 list of the world s most powerful supercomputers. Tianhe-2, which means Milky Way-2, led the list with a performance of 33.86 petaflop/s (quadrillions of calculations per second) on the Linpack benchmark. At No. 2 was Titan, a Cray XK7 system installed at the Department of Energy s (DOE) Oak Ridge National Laboratory. Titan, the top system in the United States and one of the most energy-efficient systems on the list, achieved 17.59 petaflop/ s on the Linpack benchmark. The only new entry in the Top 10 supercomputers on the latest list is at No. 7 Shaheen II is a Cray XC40 system installed at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. Shaheen II achieved 5.536 petaflop/s on the Linpack benchmark, making it the highest-ranked Middle East system in the 22-year history of the list and the first to crack the Top 10. 28

Performance Development 10 EF/s 1 EF/s 100 PF/s 10 PF/s Performance 1 PF/s 100 TF/s 10 TF/s 1 TF/s 100 GF/s 10 GF/s 1 GF/s 100 MF/s 1995 2000 2005 2010 2015 Lists Sum #1 #500 29

Vendors System Share HP IBM Cray Inc. SGI Bull IBM/ Lenovo Fujitsu Dell NUDT MEGWA Others 12% 14.2% 18.2% 35.6% 30

THIS WEEK United IN K HPC IBM LOOKS TO EXTEND MOORE S LAW; MICR Korea, S (/BLOG/IBM-LOOKS-TO-EXTEND-MOORES-L Russia AZURE/) Poland Posted 1 week ago Others Country System Share United States Japan China Germany France India 46.6% This Week In HPC IBM Looks High Performance to Computing- Extend S. Orlando 31