Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1
2
MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance numbers reflect the largest problem run on a given machine R max the performance in Gflops for the largest problem run on a machine N max the size of the largest problem run on a machine N 1/2 the size where half the R max execution rate is achieved R peak the theoretical peak performance in Glops for the machine 3
Linpack benchmark Pros One number: R max Simple to define and and use to rank Allows problem size to change with machine and over time Cons Emphasizes only peak speed and number of CPUs Does not stress the networks Ignores Amdhal s Law (changing problem size when more CPU are exploited) 4
5
6
ASCI: Advanced Simulation and Computing Program Ricerca finanziata dal governo americano per simulare, soprattutto, sistemi d arma 7
8
9
Top 10 remarks (June 2007) A lot of shuffling among the top-ranked systems No. 1: The BlueGene/L (development by IBM and DOE s National Nuclear Security Administration (NNSA) ) reached a Linpack benchmark performance of 280.6 TFlop/s ( teraflops = 10 12 ) Two other systems exceeded the level of 100 TFlop/s: the upgraded Cray XT4/XT3 at DOE s Oak Ridge National Laboratory, ranked No. 2 with a benchmark performance of 101.7 TFlop/s; Sandia National Laboratory s Cray Red Storm system, which ranked third at 101.4 TFlop/s. Two new IBM BlueGene/L systems entered the Top 10 (New York and Troy: the largest supercomputing installations in an academic setting. The fastest supercomputer in Europe is an IBM JS21 cluster at the Barcelona Supercomputing Center in Spain, which ranked No. 9 at 62.63 TFlop/s. The highest ranked Japanese system is located at the Tokyo Institute of Technology and ranks No. 14 on the list. This system is a cluster integrated by NEC based on Sun Fire x4600 with Opteron processors, ClearSpeed accelerators and an InfiniBand interconnect. 10
11
12
Top 10 of Top 500 (June 2009) Rmax and Rpeak values are in GFlops. Power data in KW for entire system Rank Computer Cores Rmax Rpeak (Gflops) (Gflops) Nmax Power (kw) Processor 1 BladeCenter QS22/LS21 Cluster 129600 1105000 1456700 2329599 2483,47 PowerXCell 8i 2 Cray XT5 QC 2.3 GHz 150152 1059000 1381400 4712799 6950,6 AMD x86_64 Opteron Quad Core 3 Blue Gene/P Solution 294912 825500 1002700 0 2268 PowerPC 450 4 SGI Altix ICE 8200EX, Xeon QC 3.0/2.66 GHz 51200 487005 608829 2300760 2090 Intel EM64T Xeon E54xx (Harpertown) 5 eserver Blue Gene Solution 212992 478200 596378 2456063 2329,6 PowerPC 440 6 Cray XT5 QC 2.3 GHz 66000 463300 607200 0 0 AMD x86_64 Opteron Quad Core 7 Blue Gene/P Solution 163840 458611 557056 0 1260 PowerPC 450 8 SunBlade x6420, Opteron QC 2.3 Ghz, Infiniband 62976 433200 579379 0 2000 AMD x86_64 Opteron Quad Core 9 Blue Gene/P Solution 147456 415700 501350 2958335 1134 PowerPC 450 Sun Constellation, NovaScale Intel EM64T Xeon 10 R422-E2 26304 274800 308283 0 1549 X55xx (Nehalem-EP) 46: CINECA-IT, IBM Power 575, p6 4.7 GHz, Infiniband, Year 2009 Cores=5376 Rmax=78680 RPeak=101069 Nmax=907199 Power=859 13
Top 10 remarks (June 2009) HPC entered a new realm 1 petaflop/s: one quadrillion (10 15 ) floating point operations per second) The new No. 1 system, built by IBM for the U.S. Department of Energy s Los Alamos National Laboratory and called Roadrunner, 1.105 petaflop/s the most energy efficient systems on the TOP500. Roadrunner system is based on the IBM QS22 blades, built with advanced versions of the processor in the Sony PlayStation 3 Blue Gene/P, with a performance of 825.5 teraflop/s is now ranked No. 3, and is located in Germany All the top-10 positions are in the U.S. but the 2 nd and the 10 th Intel powers an increasing number of Top500 supercomputers: 75% TOP500 now also provides energy efficiency calculations The position 14, 15, 16, and 18 corresponds to machines placed in Saudia Arabia, China, Canada, and India, respectively 14
Roadrunner Custom Configuration. Specialized tri-blade combined configuration: Two IBM QS22 blade servers (Cell) One IBM LS21 blade server (AMD Opteron) A total of 3,060 tri-blades built in IBM s Rochester, Minn. plant. Each tri-blade unit can run at 400 billion operations per second (400 Gigaflops). Standard processing (e.g., file system I/O) is handled by the Opteron processors. Mathematically and CPU-intensive elements are directed to the Cell processors. Open-source Linux software from Red Hat IBM is developing new software (targeting commercial applications) to make Cell-powered hybrid computing broadly accessible. Financial services (cause and effect in capital markets in real-time), energy exploration and medical imaging (real-time 3D rendering of tissue and bones) industries among others. 15
Roadrunner DOE s (Department of Energy) National Nuclear Security Administration selected Los Alamos National Laboratory as the development site for Roadrunner and IBM as the computer s designer and builder Roadrunner will primarily be used to ensure the safety and reliability of the nation s nuclear weapons stockpile. It will also be used for research into astronomy, energy, human genome science and climate change. Roadrunner is the world s first hybrid supercomputer. In a first-of-a-kind design, the Cell Broadband Engine -- originally designed for video game platforms such as the Sony Playstation 3 -- will work in conjunction with x86 processors from AMD Roadrunner connects 6,562 dual-core AMD Opteron chips 12,240 Cell chips (on IBM Model QS22 blade servers). The Roadrunner has 98 terabytes of memory, and is housed in 278 refrigerator-sized IBM BladeCenter racks occupying 5,200 square feet ( 440 m 2 ). Infiniband and Gigabit Ethernet 55 miles of fiber optic cable. 16
Top 10 of Top 500 (June 2010) Rmax and Rpeak values are in TFlops. Power data in KW for entire system 17
Top 10 remarks (June 2010) The Chinese System with Intel Xeon X5650 processors (6 cores) and NVidia Tesla C2050 GPUs is now the fastest in theoretical peak performance at 2.98 PFlop/s and No. 2 with a Linpack performance of 1.271 PFlop/s. This is the highest rank a Chinese system ever achieved. There are now 2 Chinese systems in the TOP10 and 24 in the TOP500 overall. The Jaguar system at Oak Ridge National Laboratory managed to hold the No. 1 spot with 1.75 PFlop/s Linpack performance even as it s peak performance is lower than the Chinese Nebulae system. The most powerful system in Europe is an IBM BlueGene/P system at the German Forschungszentrum Juelich (FZJ) which dropped to No. 5. Intel dominates the high-end processor market 81.6 percent of all systems and over 90 percent of quad-core based systems. The Intel Core i7 (Nehalem-EP) processors increased their presence in the list with 186 systems compared with 95 in the last list. Other notable systems are: The Tianhe-1 system at No. 7, which is a hybrid design with Intel Xeon processors and AMD GPUs. The TH-1 uses AMD GPUs as accelerators. Each node consists of two AMD GPUs attached to two Intel Xeon processors. 18
Top 10 of Top 500 (June 2011) Rmax and Rpeak values are in TFlops. Power data in KW for entire system 19
Top 10 remarks (June 2011) A Japanese supercomputer capable of performing more than 8 petaflop/ s is the new number one system in the world, putting Japan back in the top spot for the first time since the Earth Simulator was dethroned in November 2004, according to the latest edition of the TOP500 List of the world s top supercomputers. The system, called the K Computer, is at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe. For the first time, all of the top 10 systems achieved petaflop/s performance. The K Computer, built by Fujitsu, currently combines 68544 SPARC64 VIIIfx CPUs, each with eight cores, for a total of 548,352 cores almost twice as many as any other system in the TOP500. The K Computer is also more powerful than the next five systems on the list combined. The K Computer s name draws upon the Japanese word "Kei" for 10^16 (ten quadrillions), representing the system's performance goal of 10 petaflops. RIKEN is the Institute for Physical and Chemical Research. Unlike the Chinese system it displaced from the No. 1 slot and other recent very large system, the K Computer does not use graphics processors or other accelerators. The K Computer is also one of the most energy-efficient systems on the list. 20
China builds petaflop supercomputer without AMD, Intel or Nvidia Oct 31 2011 COMMUNIST China has built its first supercomputer using chips designed and manufactured in China instead of relying on AMD, Intel or Nvidia. China's new Sunway Bluelight MPP was installed in the country's National Supercomputer Center in Jinan in September, with estimates pegging the cluster somewhere around the petaflop mark. The cluster is made up of 8,700 Shenwei SW1600 processors, which are completely designed and manufactured in China. China's past success in the HPC arena has been down to good old fashioned American technology. The chips might have been baked in China but the design was done by US-based firms. The 8,700 Shenwei SW1600 processors represent the country's first comprehensive design and construction effort to build a large scale HPC cluster. Since the TOP500 list and supercomputers in general are often viewed as objects of national pride, it's no surprise that China wants to produce its own supercomputer. While the Shenwei cluster isn't quite ready to usurp Japan's K-Computer, few will bet against China heading the TOP500 list with its own chips before too long. 21
Top 10 of Top 500 (June 2012) Rank Site Computer/Year Vendor Cores R max R peak Power 1 DOE/NNSA/LLNL United States Sequoia - BlueGene/Q, Power BQC 16C 1.60 GHz, Custom / 2011 IBM 1572864 16324.75 20132.66 7890.0 2 RIKEN Advanced Institute for Computational Science (AICS) Japan K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect / 2011 Fujitsu 705024 10510.00 11280.38 12659.9 3 DOE/SC/Argonne National Laboratory United States Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2012 IBM 786432 8162.38 10066.33 3945.0 4 Leibniz Rechenzentrum Germany SuperMUC - idataplex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR / 2012 IBM 147456 2897.00 3185.05 3422.7 5 National Supercomputing Center in Tianjin China Tianhe-1A - NUDT YH MPP, Xeon X5670 6C 2.93 GHz, NVIDIA 2050 / 2010 NUDT 186368 2566.00 4701.00 4040.0 6 DOE/SC/Oak Ridge National Laboratory United States Jaguar - Cray XK6, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA 2090 / 2009 Cray Inc. 298592 1941.00 2627.61 5142.0 7 CINECA Italy Fermi - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2012 IBM 163840 1725.49 2097.15 821.9 8 Forschungszentrum Juelich (FZJ) Germany JuQUEEN - BlueGene/Q, Power BQC 16C 1.60GHz, Custom / 2012 IBM 131072 1380.39 1677.72 657.5 9 CEA/TGCC-GENCI France Curie thin nodes - Bullx B510, Xeon E5-2680 8C 2.700GHz, Infiniband QDR / 2012 Bull 77184 1359.00 1667.17 2251.0 10 National Supercomputing Centre in Shenzhen (NSCS) China Nebulae - Dawning TC3600 Blade System, Xeon X5650 6C 2.66GHz, Infiniband QDR, NVIDIA 2050 / 2010 Dawning 120640 1271.00 2984.30 2580.0 22
Top 10 of Top 500 (June 2012) MANNHEIM, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn. For the first time since November 2009, a United States supercomputer sits atop the TOP500 list of the world s top supercomputers. Named Sequoia, the IBM BlueGene/Q system installed at the Department of Energy s Lawrence Livermore National Laboratory achieved an impressive 16.32 petaflop/s on the Linpack benchmark using 1,572,864 cores. Sequoia is also one of the most energy efficient systems on the list, which will be released Monday, June 18, at the 2012 International Supercomputing Conference in Hamburg, Germany. This will mark the 39th edition of the list, which is compiled twice each year. On the latest list, Fujitsu s K Computer installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan, is now the No. 2 system with 10.51 Pflop/s on the Linpack benchmark using 705,024 SPARC64 processing cores. The K Computer held the No. 1 spot on the previous two lists. Italy makes its debut in the Top 10 with an IBM BlueGene/Q system installed at CINECA. The system is at No. 7 on the list with 1.72 Pflop/s performance. In all, four of the top 10 supercomputers are IBM BlueGene/Q systems. France occupies the No. 9 spot with a homegrown Bull supercomputer. 23
Sequoia BlueGen/Q Sequoia BlueGen/Q System Performance by the Numbers: 16.32 petaflops of sustained performance and a theoretical peak performance of 20.1 petaflops 98,304 computer nodes feature 1.6 million cores with 1GB of RAM per core 1.6 petabytes of RAM total 1.57 million PowerPC cores Parallel design is based on IBM s 18-core PowerPC A2 processor Interconnect speeds clock in at 40Gb/sec with a node to node latency hop of 2.5 microseconds The Sequoia uses 7.89 megawatts of power (in comparison, the #2 supercomputer, Japan s K Computer, uses 20 megawatts of power) 24
Top 10 of Top 500 (June 2013) 25
Top 10 of Top 500 (June 2013) Tianhe-2, a supercomputer developed by China s National University of Defense Technology, is the world s new No. 1 system with a performance of 33.86 petaflop/s on the Linpack benchmark, according to the 41stedition of the twiceyearlytop500 list of the world s most powerful supercomputers. The list was announced June 17 during the opening session of the 2013 International Supercomputing Conference in Leipzig, Germany. Tianhe-2 has 16,000 nodes, each with two Intel Xeon IvyBridge processors and three Xeon Phi processors for a combined total of 3,120,000 computing cores. Intel Xeon Phi coprocessors provide up to 61 cores, 244 threads, and 1.2 teraflops of performance, and they come in a variety of configurations to address diverse hardware, software, workload, performance, and efficiency requirements. Titan, a Cray XK7 system installed at the U.S. Department of Energy s (DOE) Oak Ridge National Laboratory and previously the No. 1 system, is now ranked No. 2. Titan achieved 17.59 petaflop/s on the Linpack benchmark using 261,632 of its NVIDIA K20x accelerator cores. Titan is one of the most energy efficient systems on the list, consuming a total of 8.21 MW and delivering 2,143 Mflops/ W. 26
Top 10 of Top 500 (June 2015) 27
Top 10 of Top 500 (June 2015) For the fifth consecutive time, Tianhe-2, a supercomputer developed by China s National University of Defense Technology, has retained its position as the world s No. 1 system, according to the 45th edition of the twice-yearly TOP500 list of the world s most powerful supercomputers. Tianhe-2, which means Milky Way-2, led the list with a performance of 33.86 petaflop/s (quadrillions of calculations per second) on the Linpack benchmark. At No. 2 was Titan, a Cray XK7 system installed at the Department of Energy s (DOE) Oak Ridge National Laboratory. Titan, the top system in the United States and one of the most energy-efficient systems on the list, achieved 17.59 petaflop/ s on the Linpack benchmark. The only new entry in the Top 10 supercomputers on the latest list is at No. 7 Shaheen II is a Cray XC40 system installed at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. Shaheen II achieved 5.536 petaflop/s on the Linpack benchmark, making it the highest-ranked Middle East system in the 22-year history of the list and the first to crack the Top 10. 28
Performance Development 10 EF/s 1 EF/s 100 PF/s 10 PF/s Performance 1 PF/s 100 TF/s 10 TF/s 1 TF/s 100 GF/s 10 GF/s 1 GF/s 100 MF/s 1995 2000 2005 2010 2015 Lists Sum #1 #500 29
Vendors System Share HP IBM Cray Inc. SGI Bull IBM/ Lenovo Fujitsu Dell NUDT MEGWA Others 12% 14.2% 18.2% 35.6% 30
THIS WEEK United IN K HPC IBM LOOKS TO EXTEND MOORE S LAW; MICR Korea, S (/BLOG/IBM-LOOKS-TO-EXTEND-MOORES-L Russia AZURE/) Poland Posted 1 week ago Others Country System Share United States Japan China Germany France India 46.6% This Week In HPC IBM Looks High Performance to Computing- Extend S. Orlando 31