Fra superdatamaskiner til grafikkprosessorer og

Size: px

Start display at page:

Download "Fra superdatamaskiner til grafikkprosessorer og"

Shannon Norman
5 years ago
Views:

1 Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab

Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc Hypercube CMI (Bergen) and Cornell (Cray arrived at NTNU) 1987: Cluster of 4 IBM 3090s 1988-91: Intel

2 Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc Hypercube CMI (Bergen) and Cornell (Cray arrived at NTNU) 1987: Cluster of 4 IBM 3090s : Intel hypercubes Some on BBN : KSR (MPI1 & 2) TIF(Uncompresed)decompresor arenededtosethispicture. QuickTime anda Kendall Square Research (KSR) KSR-1 at Cornell University: processors Total RAM: 1GB!! - Scalable shared memory multiprocessors (SSMMs) - Proprietary 64-bit processors Intel ipsc Notable Attributes: Network latency across the bridge prevented viable scalability beyond 128 processors. 2

3 The World is Parallel!! All major processor are now multicore chips! --> All computer devices and systems are parallel even your Smartphone! WHY IS THIS? 3

4 Why is computing so exciting today? Look at the tech. trends! Microprocessors have become smaller, denser, and more powerful. As of 2016, the commercially available processor with the highest number of transistors is the 24-core Xeon Haswell-EX with > 5.7 billion transistors. (source: WikiPedia) NVIDIA

5 Tech. Trend: Moore s Law Named after Gordon Moore (co-founder of Intel) Moore predicted in 1965 transistor density of semiconductor chips would double roughly every year, revised in 1975 to every 2 years by 1980 Some think is says that it actually doubles every 18 months since use more transistors and each transistor is faster [due to quote by David House (Intel Exec)] "Moore's law" (popularized by Carver Mead, CalTech) is known as the observation and prediction that the number of transistors on a chip has and will be doubled approximately every 2 years. But in 2015: Intel stated that this has slowed starting in 2012 (22nm), so now every 2.5 yrs (14nm (2014), 10nm scheduled in late 2017) 01/17/2007 from CS267-Lecture 1 5

Tech. Trends: Microprocessor Moore s Law 2X transistors/chip Every 1.5 years Called Moore s Law Microprocessors have become smaller, denser, and more powerful.

6 Tech. Trends: Microprocessor Moore s Law 2X transistors/chip Every 1.5 years Called Moore s Law Microprocessors have become smaller, denser, and more powerful. 01/17/2007 from CS267-Lecture 1 Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. Slide source: Jack Dongarra 6

Revolution is Happening Now Chip density is continuing increase ~2x every 2 years Clock speed is not Number of processor cores may double instead There is little or no hidden

7 Revolution is Happening Now Chip density is continuing increase ~2x every 2 years Clock speed is not Number of processor cores may double instead There is little or no hidden parallelism (ILP) to be found Parallelism must be exposed to and managed by software Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond) 01/17/2007 CS267-Lecture 1 7

8 Power Density Limits Serial Performance 01/17/2007 from CS267-Lecture 1 8

9 What to do? To increase processor performance one can: 1. Increase the system clock speed -> Power Wall(*) 2. Increase memory bandwidth-> more complex 3. Parallelize -> more complex (*) The Power Wall: Too much heat and transistor performance degrades (more power leakage as power increases)! Now maxing out clock at 3-4GHz for general processors

10 Supercomputer & HPC Trends: Clusters and Accelerators! How did we get here?

11 Market forces!! Rapid architecture development driven by gaming (graphics cards) and embedded systems architectures (e.g. ARM) 387 CUDA Teaching & Research Centers as of Aug 27, 2015! 11

12 Motivation GPU Computing: Many advances in processor designs are driven by Billion $$ gaming market! Modern GPUs (Graphic Processing Unit) offer lots of FLOPS per watt! NVIDA GTX and lots of parallelism! (Pascal): 3640 CUDA cores! -Kepler: -GTX 690 and Tesla K10 cards -have 3072 (2x1536) cores!

13 TK1/Kepler TX1/Maxwell - GPU: SMX Kepler: 192 core - CPU: ARM Cortex A15-32-bit, 2instr/cycle, in-order - 15GBs, LPDDR3, 28nm process - GTX 690 and Tesla K10 cards have 3072 (2x1536) cores! - Tesla K80 is 2,5x faster than K TF TFLOPs single prec TFLOPS Double prec. - Nested kernel calls - Hyper Q allowing up to 32 simultaneous MPI tasks - GPU: SMX Maxwell: 256 cores - 1 TFLOPs/s - CPU: ARM Cortex-A57-64-bit, 3 instr/cycle, out-of-order GBs, LPDDR4, 20nm process - Maxwell Titan with 3072 cores - API and Libraries: - Open GL CUDA cudnn 4.0

NTNU IDI HPC-Lab (last 10 yrs) Fall 2006: First 2 student projects

2006): Utilizing GPUs on Cluster Computers (joint with Schlumberger)

Elster as head of Computational Science & Visualization program helped

14 NTNU IDI HPC-Lab (last 10 yrs) Fall 2006: First 2 student projects with GPU programming (Cg) Christian Larsen (MS Fall Project, December 2006): Utilizing GPUs on Cluster Computers (joint with Schlumberger) Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare Elster as head of Computational Science & Visualization program helped NTNU acquire new IBM Supercomputer (Njord, 7+ TFLOPS, proprietary switch) 14

15 The NVIDIA DGX-1 Server

16 NVIDIA DGX-1 Server -- Details CPUs : 2 x Intel Xeon E v3 (16-core Haswell) GPUs: 8 x NVIDIA Tesla P100 (3584 CUDA cores) System Memory: 512 GB DDR GPU Memory 128GB (8 x 16GB) Storage: 4 x Samsung PM TB SSD Network: 4 x Infiniband EDR, 2x 10 GigE Power : 3200W Size 3U Blade GPU Throughput: FP16: 170TFLOPs, FP32: 85TFLOPs, FP 64: 42.5 TFLOPs

17 Supercomputing / HPC units are: Flop: floating point operation Flops/s: floating point operations per second Bytes: size of data (a double precision floating point number is 8) Typical sizes are millions, billions, trillions Mega Mflop/s = 10 6 flop/sec Mbyte = 2 20 = ~ 10 6 bytes Giga Gflop/s = 10 9 flop/sec Gbyte = 2 30 ~ 10 9 bytes TeraTflop/s = flop/sec Tbyte = 2 40 ~ bytes PetaPflop/s = flop/sec Pbyte = 2 50 ~ bytes Exa Eflop/s = flop/sec Ebyte = 2 60 ~ bytes Zetta Zflop/s = flop/sec Zbyte = 2 70 ~ bytes Yotta Yflop/s = flop/sec Ybyte = 2 80 ~ bytes See for current list of the world s fastest supercomputers 01/17/2007 from CS267-Lecture 1 17

Introduction to Computational Science (aka Scientific Computing)

Introduction to Computational Science (aka Scientific Computing) (aka Scientific Computing) Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. August 23, 2016. Acknowledgement Dr. Shirley Moore for setting up a high standard