ADVANCED COMPUTER ARCHITECTURES

Size: px

Start display at page:

Download "ADVANCED COMPUTER ARCHITECTURES"

Dennis Hardy
5 years ago
Views:

1 ADVANCED COMPUTER ARCHITECTURES AA 2016/2017 Website: Prof. Cristina Silvano Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) Politecnico di Milano

2 Goals of the ACA course Provide an overview of the most recent and advanced computer architectures Introduce the basic micro-architectural mechanisms found in modern microprocessor architectures Provide the reasoning behind the adoption of advanced computer architectures Cristina Silvano Politecnico di Milano - 2 -

3 ADVANCED COMPUTER ARCHITECTURES: AN OVERVIEW Cristina Silvano Politecnico di Milano - 3 -

4 Advanced Computer Architectures: Supercomputers First supercomputer reaching the Petascale peak performance (10 15 Flops) was IBM Roadrunner installed in 2008 at Los Alamos National Lab (New Mexico) Research on supercomputing is pushing towards the Exascale (10 18 Flops) billions of billions to be reached in Cristina Silvano Politecnico di Milano - 4 -

How to measure performance: FLOPS, Floating Point Operations per Second Name FLOPS zettaflops 10 21 exaflops 10 18

5 How to measure performance: FLOPS, Floating Point Operations per Second Name FLOPS zettaflops exaflops petaflops teraflops gigaflops 10 9 megaflops 10 6 kiloflops 10 3 FLOPS 1 Cristina Silvano Politecnico di Milano - 5 -

Top500 ranking of the world s most powerful supercomputers (Nov. 2016) No. 1 Sunway TaihuLight reaches 93.01 PetaFlops (Linpack performance) 125.

86 PetaFlops (Linpack performance) 54.9 PetaFlops peak performance with 17.8 MW power dissipation. Site: National Super Computer Center in Guangzhou (China) No.

6 Top500 ranking of the world s most powerful supercomputers (Nov. 2016) No. 1 Sunway TaihuLight reaches PetaFlops (Linpack performance) PetaFlops peak performance with MW power dissipation. Site: National Supercomputing Center in Wuxi (China) No. 2 Tianhe-2 (Milky-Way-2) reaches PetaFlops (Linpack performance) 54.9 PetaFlops peak performance with 17.8 MW power dissipation. Site: National Super Computer Center in Guangzhou (China) No. 3 Titan: PetaFlops (Linpack performance) PetaFlops (peak performance) with 8.2MW power dissipation. Site: Oak Ridge National Laboratory (USA) Cristina Silvano Politecnico di Milano - 6 -

Top500 ranking: the Italian most powerful supercomputer (Nov. 2016) No. 12 in Top500 and No. 3 in Europe: Marconi Intel Xeon Phi: 6.22 PetaFlops (Linpack performance) 10.

7 Top500 ranking: the Italian most powerful supercomputer (Nov. 2016) No. 12 in Top500 and No. 3 in Europe: Marconi Intel Xeon Phi: 6.22 PetaFlops (Linpack performance) PetaFlops (peak performance) with with 241,808 cores. Site: Casalecchio di Reno, Bologna (Italy) Marconi is the Cineca's Tier-0 system, codesigned by Cineca and Lenovo based on the Lenovo NeXtScale platform and Intel Xeon Phi product family alongside with Intel Xeon processor E v4 product family. In July 2017, this system is planned to reach a total computational power of about 20Pflop/s utilizing future generation Intel Xeon processors (Sky Lakes). Cristina Silvano Politecnico di Milano - 7 -

8 No. 2 TITAN Cray XK7, Opteron 2.2GHz, NVIDIA K20X Cristina Silvano Politecnico di Milano -8-

9 Exascale Supercomputers To reach 20 MW Exascale supercomputers projected to 2023, current supercomputers must achieve energy efficiency pushing towards a goal of 50 GigaFlops/W No.1 Sunway delivers 6 GigaFlops/W resulting only 4th in the Green500 list ranking supercomputers by their energy efficiency. Today most green supercomputer in Green500 achieves 9.4 GigaFlops/W: NVIDIA DGX-1, Xeon E5-2698v4 and NVIDIA Tesla P100 The top positions of Green500 are currently occupied by heterogeneous computing systems This dominance will become a trend for the next coming years to reach the target of 20 MW Exascale supercomputer Cristina Silvano Politecnico di Milano - 9 -

10 US Dept. of Energy Announced Summit and Sierra Supercomputers Cristina Silvano Politecnico di Milano

11 Applications driving the demand for more computing performance Astrophysics Climate Biology Business Analytics Cristina Silvano Politecnico di Milano

Advanced Computer Architectures: Intel Core i7-3770t Processor # of Cores 4 # of Threads 8 Clock Speed Max Turbo Frequency Intel Smart Cache Instruction Set Instruction Set Extensions Embedded

12 Advanced Computer Architectures: Intel Core i7-3770t Processor # of Cores 4 # of Threads 8 Clock Speed Max Turbo Frequency Intel Smart Cache Instruction Set Instruction Set Extensions Embedded Options Available 2.5 GHz 3.7 GHz 8 MB 64-bit SSE4.1/4.2, AVX No 160mm² 22nm 1.40 billion transistors Next generations: Broadwell, Skylake, Kaby Lake at 14nm (2014); Cannonlake at 10nm (2H 2017); Ice Lake 10nm (2018) Cristina Silvano Politecnico di Milano Lithography 22 nm Max TDP 45 W Recomm. Customer Price TRAY: $ Max Memory Size 32 GB Memory Types DDR3-1333/1600 # of Memory Channels 2 Max Memory Bandwidth 25.6 GB/s

13 NVIDIA Fermi GPU Cristina Silvano Politecnico di Milano

14 NVIDIA Kepler GPU Kepler GK110 Architecture 7.1B Transistors 15 SMX units (2880 cores) >1TFLOP FP64 1.5MB L2 Cache 384-bit GDDR5 PCI Express Gen3 Cristina Silvano Politecnico di Milano

15 NVIDIA Tesla P100 with Pascal GP100 GPU Cristina Silvano Politecnico di Milano

16 NVIDIA Tesla P100 compared to prior generations -16-

Advanced Computer Architectures: Smart Phones 4.

64-bit M9 coprocessor ios 10 32GB 128GB 12MP camera 5MP videocamera Retina HD

Waterproof Audio stereo A10 Fusion chip 64-bit M10 co-proecessor ios 10 32GB

17 Advanced Computer Architectures: Smart Phones 4.7-inch 12MP camera 5MP videocamera Retina HD display with 3D touch A9 chip 64-bit M9 coprocessor ios 10 32GB 128GB 12MP camera 5MP videocamera Retina HD display with 3D touch A9 chip 64-bit M9 coprocessor ios 10 32GB 128GB -17- iphone inch New 12MP camera 7MP videocamera Retina HD display with 3D touch Waterproof Audio stereo A10 Fusion chip 64-bit M10 co-proecessor ios 10 32GB 128GB 256GB iphone 7 Plus 5.5-inch display New 12MP camera ++ 7MP videocamera Retina HD display with 3D touch Waterproof Audio stereo A10 Fusion chip 64-bit M10 coprocessor ios 10 32GB 128GB 256GB

18 Apple A8 System-on-Chip Apple A8 is a 64-bit ARM-based SoC was introduced on Sept for the iphone 6 and iphone 6 Plus Apple states that it has 25% more CPU performance and 50% more graphics performance with 50% of the power compared to its predecessor A7. The A8 features the second generation of the Apple-designed 64-bit 1.4 GHz ARMv8-A dual-core CPU, called Cyclone Gen 2, and an integrated PowerVR Series 6XT GX6450 quad-core GPU. The A8 is manufactured on a 20 nm process by TSMC which replaced Samsung as manufacturer of Apple's mobile device processors. It contains 2 billion transistors. It has 1 GB of LPDDR3 RAM included in the package. On October 16, 2014, Apple introduced a variant of the A8, the A8X, in the ipad Air 2 with improved graphics and CPU performance due to one extra core and higher frequency Cristina Silvano Politecnico di Milano

This is one of the most powerful mobile chip on the market toady along with the Samsung Exynos 8890 and Qualcomm Snapdragon 820. The A9 features the Apple-designed 64-bit 1.

19 Apple A9 System-on-Chip Apple A8 is a 64-bit ARM-based SoC was introduced on Sept for the iphone 6S and iphone 6S Plus Apple states that it has 70% more CPU performance and 90% more graphics performance compared to its predecessor A8. This is one of the most powerful mobile chip on the market toady along with the Samsung Exynos 8890 and Qualcomm Snapdragon 820. The A9 features the Apple-designed 64-bit 1.85 GHz ARMv8-A dual-core CPU, called Twister, and an integrated PowerVR Series 7XT GT7600 six-core GPU. The A9 is manufactured by two companies: 14nm FinFET process by Samsung and 16 nm FinFET process by TSMC. A9 has 2 GB of LPDDR4 RAM included in the package. Apple introduced a variant of the A9, the A9X, in the ipad Pro with the M9 motion coprocessor embedded in it Cristina Silvano Politecnico di Milano

20 Apple A10 Fusion Apple A10 Fusion is a 64-bit ARM-based SoC designed by Apple and introduced on Sept for the iphone 7 and iphone 7 Plus Apple states that it has 40% more CPU performance and 50% more graphics performance compared to its predecessor A9. The A10 with a die area of 125 mm 2 and 3.3 billion transistors (including GPU and cache) features two Apple-designed 64-bit 2.34 GHz ARMv8-A cores called Hurricane and two energy-efficient 64-bit cores codenamed Zephyr (like the ARM big.little technology). A10 integrates new designed PowerVR Series 7XT GT7600 six-core GPU. The A10 is manufactured 16 nm FinFET process by TSMC. Cristina Silvano Politecnico di Milano

21 Energy efficiency underlies all markets Energy efficiency is of paramount importance for all application markets (automotive, consumer, mobile, healthcare and beyond) and target systems spanning from sensors, cyberphysical systems, embedded systems up to servers and HPC systems.

22 Squeezing of computing cores nm 1.4 mm 2 Source: ARM9 STMicroelectronics nm nm nm nm

entering the multi/many core era 2005 65 nm 1.

23 entering the multi/many core era nm 1.4 mm 2 Source: ARM9 STmicroelectronics nm nm nm nm

24 What are the barriers of further scaling? Transistor density increases ~2x every 2 years Frequency wall Power wall Utilisation wall the end of the Dennard scaling entering the dark silicon era

efficient scaling in the multi/manycore era Dark

25 The dark silicon problem The power wall and the utilisation wall represent the main barriers for the efficient scaling in the multi/manycore era Dark silicon: Fraction of the die not usable due to the power budget

26 ACA COURSE INFORMATION Cristina Silvano Politecnico di Milano

27 Contact Information Office hours for students: Tuesday at Polo di Como, Via Anzani 42, 2nd floor (please send an to get an appointment). Main Contact: The students can contact prof. Cristina Silvano by by indicating: Subject: ACA COMO, Your_Surname, Your_Name, Your_POLIMI_ID_NUMBER Cristina Silvano Politecnico di Milano

28 ACA Teaching Assistant Ing. Ahmet Erdem: Cristina Silvano Politecnico di Milano

29 ACA Course Info Teaching Activity: The course consists of 5 CFU and it is organized in 30 hours of lectures and 20 hours of written/tool-based exercises to prove the concepts presented during the lectures. Pre-requirements: Basic concepts on logic design and computer architectures. Cristina Silvano Politecnico di Milano

30 ACA Final Exam FINAL EXAM: The final examination consists of a WRITTEN EXAM and an OPTIONAL part consisting of an oral presentation OR discussion of a project topic prepared during the course (the topic for presentation and project will be assigned by the professor and it will cover specific techniques and methodologies) that will be presented by the student at the end of the course. For each written exam, a max. score of 33 points will be assigned: 15 max. points will be assigned for the solution of the exercise part and 18 points will be assigned for answering to the theory part. The OPTIONAL part can provide EXTRA points (from 1 to 2 extra points for the oral presentation and 1 to 4 extra points for the project). The additional points given by the project will be added to the score of the written exam only if the final score of the written exam will be sufficient (>=18). The project/presentation will be assigned at the midterm of the course semester and it must be concluded and presented by: June, 2017 (firm deadline). Cristina Silvano Politecnico di Milano

31 ACA Teaching Material Additional information in slides and papers available through the course webpage: If you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct visualisation and printing of the PDF SLIDES, please use the SAVE AS option and save the PDF FILE on your laptop for correct visualisation and printing. Reference Book: "Computer Architecture, A Quantitative Approach", John Hennessy, David Patterson, Morgan Kaufmann, Fifth Edition. Cristina Silvano Politecnico di Milano

32 Support for the international students ACA course is offered in English Teaching materials (slides/papers/textbook) available in English Final exam can be done in English Teaching support available in English Cristina Silvano Politecnico di Milano -44- March 2013

ADVANCED COMPUTER ARCHITECTURES

088949 ADVANCED COMPUTER ARCHITECTURES AA 2014/2015 Second Semester http://home.deib.polimi.it/silvano/aca-milano.htm Prof. Cristina Silvano email: cristina.silvano@polimi.it Dipartimento di Elettronica,