Enabling the A.I. Era: From Materials to Systems

Similar documents
7 Trends driving the Industry to Software-Defined Servers

Adaptable Intelligence The Next Computing Era

Innovation Leadership in Expanding Markets

Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation

Fast Hardware For AI

Signs of Intelligent Life: AI Simplifies IoT

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Fundamentals of Computer Design

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

OpenCAPI and its Roadmap

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

Microprocessor Trends and Implications for the Future

Fundamentals of Quantitative Design and Analysis

Final Lecture. A few minutes to wrap up and add some perspective

What s P. Thierry

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

EITF20: Computer Architecture Part1.1.1: Introduction

SOLVING MANUFACTURING CHALLENGES AND BRINGING SPIN TORQUE MRAM TO THE MAINSTREAM

The Return of Innovation. David May. David May 1 Cambridge December 2005

Outline Marquette University

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research

An introduction to Machine Learning silicon

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Advanced Heterogeneous Solutions for System Integration

Toward a Memory-centric Architecture

AWS & Intel: A Partnership Dedicated to fueling your Innovations. Thomas Kellerer BDM CSP, Intel Central Europe

Lecture 1: Introduction

Fundamentals of Computers Design

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Emergence of the Memory Centric Architectures

ECE/CS 552: Introduction To Computer Architecture 1

Multi-Core Microprocessor Chips: Motivation & Challenges

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Green Memory Solution. Jung-Bae Lee

CCIX: a new coherent multichip interconnect for accelerated use cases

A backward glance and a forward view

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

Computer Architecture

Computer Architecture!

Computer Architecture. R. Poss

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 23 Domain- Specific Architectures

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Driving the MRAM Revolution. Kevin Conley CEO, Everspin Technologies

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

EECS4201 Computer Architecture

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

IoT Market: Three Classes of Devices

IBM Power Systems: Open Innovation to put data to work. Juan López-Vidriero Mata Director técnico de ventas de servidores

Kirill Rogozhin. Intel

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Thomas Polzer Institut für Technische Informatik

Computing s Energy Problem:

emram: From Technology to Applications David Eggleston VP Embedded Memory

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Computer Architecture

Embracing the Next Wave

Hybrid Memory Platform

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

A Peek at the Future Intel s Technology Roadmap. Jesse Treger Datacenter Strategic Planning October/November 2012

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

World s most advanced data center accelerator for PCIe-based servers

ECE 8823: GPU Architectures. Objectives

Lecture 1: Gentle Introduction to GPUs

Enabling Technology for the Cloud and AI One Size Fits All?

Computer Architecture!

IoT with 5G Technology

COMPUTER ARCHITECTURE

Parallelism and Concurrency. COS 326 David Walker Princeton University

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Computer Architecture!

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

Instructor Information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Scaling Throughput Processors for Machine Intelligence

45-year CPU Evolution: 1 Law -2 Equations

Copyright 2012, Elsevier Inc. All rights reserved.

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Intel Enterprise Processors Technology

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Advanced Computer Architecture (CS620)

Efficient Parallel Programming on Xeon Phi for Exascale

Ushering in the 3D Memory Era with V- NAND

ARM instruction sets and CPUs for wide-ranging applications

HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

BlueDBM: An Appliance for Big Data Analytics*

HOW TO BUILD A MODERN AI

Parallelism. CS6787 Lecture 8 Fall 2017

Gen-Z Memory-Driven Computing

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

ECE 152 Introduction to Computer Architecture

Computer Architecture

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Transcription:

Enabling the A.I. Era: From Materials to Systems Sundeep Bajikar Head of Market Intelligence, Applied Materials New Street Research Conference May 30, 2018 External Use

Key Message PART 1 PART 2 A.I. * Requires New Computing Architectures New A.I. Architectures Require Materials Breakthroughs * A.I. refers to Machine Learning and Deep Learning approaches. 2

Key Message PART 1 PART 2 A.I. * Requires New Computing Architectures New A.I. Architectures Require Materials Breakthroughs What is different about A.I. computing? Empirical analysis A.I. / IoT unaffordable with current computing model 3

What s different about A.I. computing? Memory Bound Performance limited by memory latency and memory bandwidth High Data Parallelism Identical operations can be done on multiple chunks of data in parallel Low Precision Training and inference require much lower floating point (16 bit) and integer (8 bit) precision Relative to traditional CPU architectures Traditional multi-level on-die cache architecture is unnecessary Traditional deep instruction pipeline and out-of-order execution are unnecessary Traditional 64 bit or higher precision data paths are unnecessary Deep Learning Enabled by Processing Abundant Data Source: Computer Architecture: A Quantitative Approach, Sixth Edition, John Hennessy and David Patterson, December 2017, Applied Materials internal analysis 4

Data Algorithms ML / DL RELATIONSHIP BETWEEN Complexity of algorithms VERSUS Abundance of (training) data + affordable highperformance computing AI is about relentless classification Data quality and size are making AI a reality Source: Applied Materials internal analysis. 5

ML / DL CASE STUDY: Google Translate Circa 2010 Complex algorithms based on grammar and sematic rules Developed by team of linguists 70% accuracy Source: Applied Materials internal analysis. 6

ML / DL CASE STUDY: Google Translate Today Simple algorithm Trained using every single web page available in English and Chinese No Chinese speakers on team 98% accuracy Source: Applied Materials internal analysis. 7

FPGA From Homogeneous to Heterogeneous Computing Homogeneous / Discrete Compute Era Heterogeneous Compute Era CPU CPU CPU 0 CPU 1 Media CPU GPU ASIC / FPGA GPU ASIC / FPGA CPU 2 CPU 3 CPU 4 GPU ISP Audio Normal Compiled / Managed Code (Office, OS, Enterprise) Imaging, Video Playback AI Workloads: Training, Inference, Analytics, etc. ASIC CPU GPU Client CC, Search, Audio, VOIP CPU GPU Gaming, HPC, Highly Parallel Workloads Exploits data level parallelism Uses the most power efficient compute engine for a given job Source: Applied Materials internal analysis. 8

DRAM Shipments (EB) NAND Shipments (EB) In Current Compute Model 25 DRAM 500 NAND y = 0.0081x 1.3065 R² = 0.9563 y = 0.0022x 2.0399 R² = 0.9569 20 400 15 300 10 200 5 100 - - 100 200 300 400 500 - - 100 200 300 400 500 Incremental Data Generation (EB) Incremental Data Generation (EB) Source: Cisco VNI, Cisco, Gartner, Factset, Applied Materials internal analysis 2006 to 2016 data from Cisco and Gartner. 2017 to 2020 projections from Cisco for data generation (VNI IP traffic), and industry average estimates for DRAM and NAND content shipments 9

DRAM Shipments (EB) NAND Shipments (EB) A.I. / Big Data Era Needs New Compute Models 180 160 140 120 DRAM 1% adoption of L4/L5 Smart Car 14,000 12,000 10,000 NAND 1% adoption of L4/L5 Smart Car 100 8,000 80 1% SMART CAR ADOPTION 6,000 1% SMART CAR ADOPTION 60 40 20 y = 0.0081x 1.3065 R² = 0.9563 5X MORE DATA 8X MORE DRAM IN 2020 4,000 2,000 y = 0.0022x 2.0399 R² = 0.9569 5X MORE DATA 25X MORE NAND IN 2020 - - 500 1,000 1,500 2,000 2,500 - - 500 1,000 1,500 2,000 2,500 Incremental Data Generation (EB) Incremental Data Generation (EB) Source: Cisco VNI, Cisco, Gartner, Factset, Applied Materials internal analysis 2006 to 2016 data from Cisco and Gartner. 2017 to 2020 projections from Cisco for data generation (VNI IP traffic), and industry average estimates for DRAM and NAND content shipments 10

Key Message PART 2 New A.I. Architectures Require Materials Breakthroughs From classic 2D scaling to architecture innovation From traditional materials to materials breakthroughs 11

Classic 2D Scaling to Architecture Innovation ERA ORIGINAL PERFORMANCE PERFORMANCE GAIN HOW PC, Mobile 1x ~2x Classic 2D scaling When Moore s Law scaling was at it s peak. Assumes no major architecture changes A.I. / IoT 1x ~2x to 2,500x Architecture Innovation Small benefit from Moore s Law scaling Source: Applied Materials internal analysis. Performance broadly refers to speed or power efficiency. ~2x refers to doubling of performance every two years consistent with Moore s Law. ISS Blog post #1: Nodes, Inter-nodes, Leading-Nodes and Trailing-Nodes 12

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 Performance (vs. VAX-11/780) Processor Performance Improvements Have Slowed 100,000 End of Moore's Law Limits of parallelism of Amdahl's Law 10,000 1,000 100 10 1 25% per year End of Dennard Scaling 52% per year 23% per year 12% per year 3.5% per year Chart plots performance relative to the VAX-11/780 as measured by the SPEC integer benchmarks Chart methodology and annotations based on Dec 2017 edition of Computer Architecture textbook Source: Computer Architecture: A Quantitative Approach, Sixth Edition, John Hennessy and David Patterson, December 2017, Applied Materials internal analysis Chart is for illustrative purposes only, not to scale. 13

Guidelines for A.I. Architecture Innovation Increase Memory Bandwidth Use dedicated memories to minimize distance over which data is moved Invest resources saved from dropping advanced microarchitectural optimizations into more arithmetic units or bigger memories Reduce data size and type to the simplest needed for the domain Increase Data Parallelism Use the easiest form of parallelism that matches the domain Use domain specific programming language to port code to the DSA Reduce Precision Training and Inference require much lower Floating Point (16 bit) and Integer (8 bit) precision Software Workloads will Redefine Hardware Architecture Source: Computer Architecture: A Quantitative Approach, Sixth Edition, John Hennessy and David Patterson, December 2017, Applied Materials internal analysis 14

A.I. Enabled by Materials Breakthroughs ERA ORIGINAL PERFORMANCE PERFORMANCE GAIN MATERIALS ENGINEERING PC, Mobile 1x ~2x Unit Materials A.I. / IoT 1x ~2x to 2,500x Materials Breakthroughs New devices, 3D techniques, Advanced packaging Source: Applied Materials internal analysis. Performance broadly refers to speed or power efficiency. ~2x refers to doubling of performance every two years consistent with Moore s Law ISS Blog post #2: Nodes, Inter-nodes, Leading-Nodes and Trailing-Nodes. 15

Architecture Innovation High Memory Bandwidth Performance limited by memory latency and memory bandwidth High Data Parallelism Identical operations can be done on multiple chunks of data in parallel Low Precision Training and inference require much lower floating point (16 bit) and integer (8 bit) precision enabled by Materials Engineering New Devices On-die memory e.g. MRAM New memory e.g. Intel 3D XPoint technology 3D Techniques Self-aligned structures e.g. vias, multi-color patterning Side-to-side to top-to-bottom e.g. COAG Advanced Packaging Stacked DRAM e.g. HBM / HBM2 Logic-DRAM Sensor-Logic Intel and 3D XPoint are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. 16

Example: STT-MRAM CAPPING MGO CAP FREE LAYER TUNNEL BARRIER REFERENCE LAYER SAF PINNED LAYER SEED LAYER >15 individual layers of film in MTJ stack Each layer 0.2-to-2nm thin Requires new etching techniques Requires highly complex process equipment Requires intimate collaboration with customers New Memory Device Enabled By Materials Breakthrough Source: Applied Materials internal analysis. 17

Investment in Innovation R&D INVESTMENTS MAYDAN TECHNOLOGY CENTER $1.8B $1.2B World s most advanced 300mm R&D lab $300M invested in past 3 years APPLIED VENTURES FY2012 FY2017 >$250M assets under management >30 active companies 18

Semiconductor Market Evolution A.I. + Big Data Era Mobile + Social Media Era PC + Internet Era 2000 2010 2017 2020 19