Scaling Datacenter Accelerators With Compute-Reuse Architectures
|
|
- Carol Holmes
- 5 years ago
- Views:
Transcription
1 Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff ISCA 2018 Session 5A June 5, 2018 Los Angeles, CA
2 Scaling Datacenter Accelerators With Compute-Reuse Architectures 2 Sources: "Cramming more components onto integrated circuits GE Moore, Computer 1965 Next-Gen Power Solutions for Hyperscale Data Centers, DataCenter Knowledge 2016
3 Scaling Datacenter Accelerators With Compute-Reuse Architectures 3 Sources: "Cramming more components onto integrated circuits GE Moore, Computer 1965 Next-Gen Power Solutions for Hyperscale Data Centers, DataCenter Knowledge 2016
4 Scaling Datacenter Accelerators With Compute-Reuse Architectures 4? Sources: "Cramming more components onto integrated circuits GE Moore, Computer 1965 Next-Gen Power Solutions for Hyperscale Data Centers, DataCenter Knowledge 2016
5 Scaling Datacenter Accelerators With Compute-Reuse Architectures Sources: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Hazelwood et al. HPCA 2018 Cloud TPU, Google, FPGA Accelerated Computing Using AWS F1 Instances, David Pellerin, AWS summit 2017 Microsoft unveils Project Brainwave for real-time AI, Doug Burger, NVIDIA TESLA V100, NVIDIA, 5
6 Scaling Datacenter Accelerators With Compute-Reuse Architectures Sources: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Hazelwood et al. HPCA 2018 Cloud TPU, Google, FPGA Accelerated Computing Using AWS F1 Instances, David Pellerin, AWS summit 2017 Microsoft unveils Project Brainwave for real-time AI, Doug Burger, NVIDIA TESLA V100, NVIDIA, 6
7 Scaling Datacenter Accelerators With Compute-Reuse Architectures Transistor scaling stops. Chip specialization runs out of steam. What s Next? Sources: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Hazelwood et al. HPCA 2018 Cloud TPU, Google, FPGA Accelerated Computing Using AWS F1 Instances, David Pellerin, AWS summit 2017 Microsoft unveils Project Brainwave for real-time AI, Doug Burger, NVIDIA TESLA V100, NVIDIA, 7
8 Scaling Datacenter Accelerators With Compute-Reuse Architectures 8 Observation I: The Density of Emerging Memories are Projected to Increase ITRS Logic Roadmap
9 Scaling Datacenter Accelerators With Compute-Reuse Architectures Observation II: Datacenter Accelerators Perform Redundant Computations Temporal locality introduces redundancy in videos encoders (recurrent blocks in white) t=0 sec t=2 sec t=4 sec Source: Face recognition in unconstrained videos with matched background similarity, Wolf et al., CVPR
10 Scaling Datacenter Accelerators With Compute-Reuse Architectures Observation II: Datacenter Accelerators Perform Redundant Computations Temporal locality introduces redundancy in videos encoders (recurrent blocks in white) t=0 sec t=2 sec t=4 sec 0% recurrence 38% recurrence 61% recurrence Source: Face recognition in unconstrained videos with matched background similarity, Wolf et al., CVPR
11 Scaling Datacenter Accelerators With Compute-Reuse Architectures Observation II: Datacenter Accelerators Perform Redundant Computations Search term commonality retrieves the similar content intercontinental downtown los angeles Source: Google 11
12 Scaling Datacenter Accelerators With Compute-Reuse Architectures Observation II: Datacenter Accelerators Perform Redundant Computations Search term commonality retrieves the similar content intercontinental downtown los angeles hotel in downtown los angeles near intercontinental Source: Google 12
13 Scaling Datacenter Accelerators With Compute-Reuse Architectures Observation II: Datacenter Accelerators Perform Redundant Computations Search term commonality retrieves the similar content intercontinental downtown los angeles hotel in downtown los angeles near intercontinental Source: Google 13
14 Scaling Datacenter Accelerators With Compute-Reuse Architectures 14 Observation II: Datacenter Accelerators Perform Redundant Computations Power laws suggest high recurrent processing of popular content Source: Twitter
15 Scaling Datacenter Accelerators With Compute-Reuse Architectures 15 Observation II: Datacenter Accelerators Perform Redundant Computations Power laws suggest high recurrent processing of popular content Source: Twitter
16 Scaling Datacenter Accelerators With Compute-Reuse Architectures Memoization: Tables store past computation outputs. Reuse outputs of recurring inputs instead of recomputing. Host Processors Shared LLC / NoC Acceleration Fabric Accelerator Core Input Lookup input core result input DMA Engine output Scratchpad Memory COREx: Compute-Reuse Architecture For Accelerators 16
17 Scaling Datacenter Accelerators With Compute-Reuse Architectures Memoization: Tables store past computation outputs. Reuse outputs of recurring inputs instead of recomputing. Host Processors Shared LLC / NoC Acceleration Fabric lookup Accelerator Core Input Lookup fetched result input core result hit input DMA Engine output Scratchpad Memory core result Compute-Reuse Storage COREx: Compute-Reuse Architecture For Accelerators 17
18 Scaling Datacenter Accelerators With Compute-Reuse Architectures Memoization: Tables store past computation outputs. Reuse outputs of recurring inputs instead of recomputing. Host Processors Shared LLC / NoC Acceleration Fabric lookup Accelerator Core Input Lookup fetched result input core result hit input DMA Engine output Scratchpad Memory core result Compute-Reuse Storage COREx: Compute-Reuse Architecture For Accelerators 18
19 Architectural Guidelines 19 Accelerator Core DMA Engine Scratchpad Specialized Compute Lanes General-Purpose CMP Shared LLC
20 Architectural Guidelines Accelerators Memoization is Natural o Little or no additional programming effort o Built-in input-compute-output flow Output Accelerator Core DMA Engine Compute Scratchpad Specialized Compute Lanes Input General-Purpose CMP Shared LLC 20
21 Architectural Guidelines Accelerators Memoization is Natural o Little or no additional programming effort o Built-in input-compute-output flow But Not Straightforward! o High lookup costs o Unnecessary accesses o High access costs COREx Key Ideas: o Hashing (reduce lookup costs) o Lookup filtering (fewer accesses) o Banking (reduce access costs) Accelerator Core DMA Engine Output Compute Scratchpad Specialized Compute Lanes Input General-Purpose CMP Shared LLC 21
22 Architectural Guidelines Accelerators Memoization is Natural o Little or no additional programming effort o Built-in input-compute-output flow Goal: Extend Specialization with Workload-Specific Memoization But Not Straightforward! o High lookup costs o Unnecessary accesses o High access costs COREx Key Ideas: o Hashing (reduce lookup costs) o Lookup filtering (fewer accesses) o Banking (reduce access costs) Accelerator Core DMA Engine Output Compute Scratchpad Specialized Compute Lanes Input General-Purpose CMP Shared LLC 22
23 Top Level Architecture Mem. Chip Func. Block Control Datapath SoC Interconnect Accelerator Core DMA Engine Scratchpad Specialized Compute Lanes General-Purpose CMP Shared LLC 23
24 Top Level Architecture New Modules: o Input Hashing Unit (IHU) Mem. Chip Func. Block Control Datapath COREx Interconnect IHU Accelerator Core DMA Engine Scratchpad Specialized Compute Lanes General-Purpose CMP Shared LLC SoC Interconnect 24
25 Top Level Architecture New Modules: o Input Hashing Unit (IHU) Mem. Chip Func. Block Control Datapath ILU Associative Cache Cache Ctrl. o Input Lookup Unit (ILU) COREx Interconnect IHU SoC Interconnect Accelerator Core DMA Engine Hashes Scratchpad Specialized Compute Lanes General-Purpose CMP Shared LLC 25
26 Top Level Architecture New Modules: o Input Hashing Unit (IHU) Mem. Chip Func. Block Control Datapath COREx Interconnect ILU Associative Cache Cache Ctrl. Fetch CHT RAM-Array Table RAM-Array Ctrl. o Input Lookup Unit (ILU) IHU Accelerator Core DMA Engine Scratchpad General-Purpose CMP o Computation History Table(CHT) Specialized Compute Lanes Shared LLC SoC Interconnect 26
27 Top Level Architecture New Modules: o Input Hashing Unit (IHU) o Input Lookup Unit (ILU) o Computation History Table(CHT) Mem. Chip Func. Block Control Datapath COREx Interconnect IHU ILU Associative Cache Cache Ctrl. Accelerator Core DMA Engine Scratchpad Fetch Specialized Compute Lanes Match Input CHT RAM-Array Table RAM-Array Ctrl. General-Purpose CMP Shared LLC SoC Interconnect 27
28 Top Level Architecture New Modules: o Input Hashing Unit (IHU) Mem. Chip Func. Block Control Datapath COREx Interconnect ILU Associative Cache Cache Ctrl. Fetch CHT RAM-Array Table RAM-Array Ctrl. o Input Lookup Unit (ILU) IHU Accelerator Core DMA Engine Scratchpad General-Purpose CMP o Computation History Table(CHT) Specialized Compute Lanes Shared LLC SoC Interconnect Use Output 28
29 Building COREx IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 29 Case Study: Acceleration of Video Motion Estimation Optimization Goals: o Runtime, Energy, and Energy-Delay Product (EDP) Baseline: highly-tuned accelerators o Sweep space for design alternatives (Aladdin) o Find optimal accelerator design for each goal
30 Building COREx IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 30 Case Study: Acceleration of Video Motion Estimation Optimization Goals: o Runtime, Energy, and Energy-Delay Product (EDP) Baseline: highly-tuned accelerators o Sweep space for design alternatives (Aladdin) o Find optimal accelerator design for each goal
31 Building COREx IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 31 Case Study: Acceleration of Video Motion Estimation Optimization Goals: o Runtime, Energy, and Energy-Delay Product (EDP) Baseline: highly-tuned accelerators o Sweep space for design alternatives (Aladdin) o Find optimal accelerator design for each goal Runtime OPT: 5.8[us] EDP OPT: 148.7[pJs] Energy OPT: 6.2[uJ]
32 Building COREx IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 32 Memoization-Layers Specialization o Extract input traces, examine hit and miss rates of different ILU/CHT sizes. o Integrate accelerators with emerging memory based ILU+CHT, and sweep gains space. Example: Resistive RAM based COREx
33 Building COREx IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 33 Memoization-Layers Specialization o Extract input traces, examine hit and miss rates of different ILU/CHT sizes. o Integrate accelerators with emerging memory based ILU+CHT, and sweep gains space. Example: Resistive RAM based COREx Runtime Optimization: 2.7x Speedup. 512KB ILU, 32GB CHT EDP Optimization: 63.5% EDP Saved. 512KB ILU, 2GB CHT Energy Optimization: 56.6% Energy Saved. 64KB ILU, 8MB CHT
34 Experimental Setup IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 34 Workloads Kernel Domain Use-Case App Source Input Source and Description DCT Video Encoding Video Server x264 YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SAD Video Encoding Video Server PARBOIL YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SNAPPY ("SNP") Compression Web-Server Traffic Compression TailBench Snappy-C Wikipedia Abstracts. 13 Million Search Queries. SSSP Graph Processing Maps Service: Shortest Internal DIMACS NYC Streets, 10 Million Zipfian Transactions. ("SSP") Walking Route BFS Graph Processing Online Retail MachSuite Amazon Co-Purchasing, 10 Million Zipfian Transactions. RBM Machine Learning Collaborative Filtering CortexSuite Netflix Prize: 10 Million Zipfian Transactions.
35 Experimental Setup IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 35 Workloads Kernel Domain Use-Case App Source Input Source and Description DCT Video Encoding Video Server x264 YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SAD Video Encoding Video Server PARBOIL YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SNAPPY ("SNP") Compression Web-Server Traffic Compression TailBench Snappy-C Wikipedia Abstracts. 13 Million Search Queries. SSSP Graph Processing Maps Service: Shortest Internal DIMACS NYC Streets, 10 Million Zipfian Transactions. ("SSP") Walking Route BFS Graph Processing Online Retail MachSuite Amazon Co-Purchasing, 10 Million Zipfian Transactions. RBM Machine Learning Collaborative Filtering CortexSuite Netflix Prize: 10 Million Zipfian Transactions. Temporal Redundancy
36 Experimental Setup IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 36 Workloads Kernel Domain Use-Case App Source Input Source and Description DCT Video Encoding Video Server x264 YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SAD Video Encoding Video Server PARBOIL YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SNAPPY ("SNP") Compression Web-Server Traffic Compression TailBench Snappy-C Wikipedia Abstracts. 13 Million Search Queries. SSSP Graph Processing Maps Service: Shortest Internal DIMACS NYC Streets, 10 Million Zipfian Transactions. ("SSP") Walking Route BFS Graph Processing Online Retail MachSuite Amazon Co-Purchasing, 10 Million Zipfian Transactions. RBM Machine Learning Collaborative Filtering CortexSuite Netflix Prize: 10 Million Zipfian Transactions. Temporal Redundancy Search Commonality
37 Experimental Setup IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 37 Workloads Kernel Domain Use-Case App Source Input Source and Description DCT Video Encoding Video Server x264 YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SAD Video Encoding Video Server PARBOIL YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SNAPPY ("SNP") Compression Web-Server Traffic Compression TailBench Snappy-C Wikipedia Abstracts. 13 Million Search Queries. SSSP Graph Processing Maps Service: Shortest Internal DIMACS NYC Streets, 10 Million Zipfian Transactions. ("SSP") Walking Route BFS Graph Processing Online Retail MachSuite Amazon Co-Purchasing, 10 Million Zipfian Transactions. RBM Machine Learning Collaborative Filtering CortexSuite Netflix Prize: 10 Million Zipfian Transactions. Temporal Redundancy Search Commonality Content Popularity (75%, 90%, 95% Recurrence)
38 Experimental Setup Workloads Kernel Domain Use-Case App Source Input Source and Description DCT Video Encoding Video Server x264 YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SAD Video Encoding Video Server PARBOIL YouTube Faces. 10 Videos, 10 Seconds, 24 FPS. SNAPPY ("SNP") Compression Methodology Web-Server Traffic Compression TailBench Snappy-C Wikipedia Abstracts. 13 Million Search Queries. SSSP Graph Processing Maps Service: Shortest Internal DIMACS NYC Streets, 10 Million Zipfian Transactions. ("SSP") Walking Route BFS Graph Processing Online Retail MachSuite Amazon Co-Purchasing, 10 Million Zipfian Transactions. RBM Machine Learning Collaborative Filtering CortexSuite Netflix Prize: 10 Million Zipfian Transactions. o Evaluate ILU/CHT as ReRAM, STT-RAM, PCM, or Racetrack (Destiny) o Integrate with highly-tuned accelerators (Aladdin) IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. Temporal Redundancy Search Commonality Content Popularity (75%, 90%, 95% Recurrence) 38
39 Results IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 39 Runtime-OPT: Avg x Speedup o Negligible Differences Between Memories
40 Results IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 40 Runtime-OPT: Avg x Speedup o Negligible Differences Between Memories EDP-OPT: Avg. 50%-68% Savings o PCM/Racetrack High write energy o Gain less for low bias apps (freq. updates)
41 Results Runtime-OPT: Avg x Speedup o Negligible Differences Between Memories EDP-OPT: Avg. 50%-68% Savings o PCM/Racetrack High write energy o Gain less for low bias apps (freq. updates) Energy-OPT: Avg. 22%-50% Savings o PCM unbeneficial for 75% bias SSSP/RBM General Trends: o Large CHTs (MBs-TBs) for Speedup. Smaller (KBs-GBs) for EDP, Smallest for Energy (KBs-MBs) IHU = Input Hashing Unit. ILU = Input Lookup Unit. CHT = Computation History Table. 41
42 Conclusions 42 Memoization is Fit for Accelerators o Memoization-Ready Programming Environment+Interface
43 Conclusions 43 Memoization is Fit for Accelerators o Memoization-Ready Programming Environment+Interface Memoization is Fit for Datacenters o Temporal Redundancy, Search Commonality, Content Popularity
44 Conclusions 44 COREx Extends Hardware Specialization o Memoization-layer specialization tailored for the workload
45 Conclusions 45 COREx Extends Hardware Specialization o Memoization-layer specialization tailored for the workload COREx Opens New Opportunities for Future Architectures o Shift compute from non-scaling CMOS to still-scaling memories
46 Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs David Wentzlaff
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationSuccinct Data Structures: Theory and Practice
Succinct Data Structures: Theory and Practice March 16, 2012 Succinct Data Structures: Theory and Practice 1/15 Contents 1 Motivation and Context Memory Hierarchy Succinct Data Structures Basics Succinct
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationGoogle Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan
More informationBandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design
Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationAdaptable Intelligence The Next Computing Era
Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationA Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than
More informationA Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarangnirun,
More informationDeduplication Storage System
Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationFuture of datacenter STORAGE. Carol Wilder, Niels Reimers,
Future of datacenter STORAGE Carol Wilder, carol.a.wilder@intel.com Niels Reimers, niels.reimers@intel.com Legal Notices/disclaimer Intel technologies features and benefits depend on system configuration
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationConcurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration
Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Berni Schiefer schiefer@ca.ibm.com Tuesday March 17th at 12:00
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationUnderstanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE
Understanding Sources of Inefficiency in General-Purpose Chips Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE 1 Outline Motivation H.264 Basics Key ideas Implementation & Evaluation Summary
More informationSoftware Defined Hardware
Software Defined Hardware For data intensive computation Wade Shen DARPA I2O September 19, 2017 1 Goal Statement Build runtime reconfigurable hardware and software that enables near ASIC performance (within
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationSAP HANA. Jake Klein/ SVP SAP HANA June, 2013
SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationHRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard
More informationOptimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service
Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service * Kshitij Sudan* Sadagopan Srinivasan Rajeev Balasubramonian* Ravi Iyer Executive Summary Goal: Co-schedule N applications
More informationMeet the Walkers! Accelerating Index Traversals for In-Memory Databases"
Meet the Walkers! Accelerating Index Traversals for In-Memory Databases Onur Kocberber Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, Parthasarathy Ranganathan Our World is Data-Driven! Data resides
More informationSigns of Intelligent Life: AI Simplifies IoT
Signs of Intelligent Life: AI Simplifies IoT JEDEC Mobile & IOT Forum Stephen Lum Samsung Semiconductor, Inc. Copyright 2018 APPLICATIONS DRIVE CHANGES IN ARCHITECTURES x86 Processors Apps Processors FPGA
More informationOpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit
OpenCAPI Technology Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI Topics Computation
More informationTransparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh
Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,
More informationPersistent Memory in Mission-Critical Architecture (How and Why) Adam Roberts Engineering Fellow, Western Digital Corporation
Persistent Memory in Mission-Critical Architecture (How and Why) Adam Roberts Engineering Fellow, Western Digital Corporation Forward-Looking Statements Safe Harbor Disclaimers This presentation contains
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationPageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili
PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on Blaise-Pascal Tine Sudhakar Yalamanchili Outline Background: Memory Security Motivation Proposed Solution Implementation Evaluation Conclusion
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationEnterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015
Enterprise Breadth-First Graph Traversal on GPUs Hang Liu H. Howie Huang November 9th, 5 Graph is Ubiquitous Breadth-First Search (BFS) is Important Wide Range of Applications Single Source Shortest Path
More informationSpeeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns
March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations
More informationDeep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor
WELCOME TO Operant conditioning 1938 Spiking Neuron 1952 first neuroscience department 1964 Deep learning prevalence mid 2000s The Turing Machine 1936 Transistor 1947 First computer science department
More informationDecentralized Distributed Storage System for Big Data
Decentralized Distributed Storage System for Big Presenter: Wei Xie -Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University Outline Trends in Big and Cloud Storage
More informationSort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot
Sort vs. Hash Join Revisited for Near-Memory Execution Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot 1 Near-Memory Processing (NMP) Emerging technology Stacked memory: A logic die w/ a stack
More informationHETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE
HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE PHILIP HEAH ASSISTANT CHIEF EXECUTIVE TECHNOLOGY & INFRASTRUCTURE GROUP LAUNCH OF SERVICES AND DIGITAL ECONOMY (SDE) TECHNOLOGY ROADMAP (NOV 2018) Source
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationL7: Performance. Frans Kaashoek Spring 2013
L7: Performance Frans Kaashoek kaashoek@mit.edu 6.033 Spring 2013 Overview Technology fixes some performance problems Ride the technology curves if you can Some performance requirements require thinking
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationNear- Data Computa.on: It s Not (Just) About Performance
Near- Data Computa.on: It s Not (Just) About Performance Steven Swanson Non- Vola0le Systems Laboratory Computer Science and Engineering University of California, San Diego 1 Solid State Memories NAND
More informationOAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches
OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches Jue Wang, Xiangyu Dong, Yuan Xie Department of Computer Science and Engineering, Pennsylvania State University Qualcomm Technology,
More informationIn-Memory Data Management Jens Krueger
In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing
More informationFlash Controller Solutions in Programmable Technology
Flash Controller Solutions in Programmable Technology David McIntyre Senior Business Unit Manager Computer and Storage Business Unit Altera Corp. dmcintyr@altera.com Flash Memory Summit 2012 Santa Clara,
More informationPeeling the Power Onion
CERCS IAB Workshop, April 26, 2010 Peeling the Power Onion Hsien-Hsin S. Lee Associate Professor Electrical & Computer Engineering Georgia Tech Power Allocation for Server Farm Room Datacenter 8.1 Total
More informationRUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS
RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS Yash Ukidave, Perhaad Mistry, Charu Kalra, Dana Schaa and David Kaeli Department of Electrical and Computer Engineering
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationMeet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors
Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it
More information15-740/ Computer Architecture Lecture 5: Project Example. Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011
15-740/18-740 Computer Architecture Lecture 5: Project Example Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011 Reminder: Project Proposals Project proposals due NOON on Monday 9/26 Two to three pages consisang
More informationCtrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs
The 34 th IEEE International Conference on Computer Design Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs Shin-Ying Lee and Carole-Jean Wu Arizona State University October
More informationRhythm: Harnessing Data Parallel Hardware for Server Workloads
Rhythm: Harnessing Data Parallel Hardware for Server Workloads Sandeep R. Agrawal $ Valentin Pistol $ Jun Pang $ John Tran # David Tarjan # Alvin R. Lebeck $ $ Duke CS # NVIDIA Explosive Internet Growth
More informationOpenCAPI and its Roadmap
OpenCAPI and its Roadmap Myron Slota, President OpenCAPI Speaker name, Consortium Title Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI and
More informationMRAM, XPoint, ReRAM PM Fuel to Propel Tomorrow s Computing Advances
MRAM, XPoint, ReRAM PM Fuel to Propel Tomorrow s Computing Advances Jim Handy Objective Analysis Tom Coughlin Coughlin Associates The Market is at a Nexus PM 2 Emerging Memory Technologies MRAM: Magnetic
More informationNonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian
Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Latency (ns) History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 512 550 16 13.5 0.5 0.75 1968 DRAM
More informationDeep Learning Processing Technologies for Embedded Systems. October 2018
Deep Learning Processing Technologies for Embedded Systems October 2018 1 Neural Networks Architecture Single Neuron DNN Multi Task NN Multi-Task Vehicle Detection With Region-of-Interest Voting Popular
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationJim Keller. Digital Equipment Corp. Hudson MA
Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership
More informationCS 537 Lecture 6 Fast Translation - TLBs
CS 537 Lecture 6 Fast Translation - TLBs Michael Swift 9/26/7 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift Faster with TLBS Questions answered in this lecture: Review
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationTiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation
Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation Jianting Zhang 1,2 Simin You 2, Le Gruenwald 3 1 Depart of Computer Science, CUNY City College (CCNY) 2 Department of Computer
More informationAMD Opteron Processors In the Cloud
AMD Opteron Processors In the Cloud Pat Patla Vice President Product Marketing AMD DID YOU KNOW? By 2020, every byte of data will pass through the cloud *Source IDC 2 AMD Opteron In The Cloud October,
More informationDesigning High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches. Onur Mutlu March 23, 2010 GSRC
Designing High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches Onur Mutlu onur@cmu.edu March 23, 2010 GSRC Modern Memory Systems (Multi-Core) 2 The Memory System The memory system
More informationArchitectures for Scalable Media Object Search
Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation
More informationDynamic Vertical Memory Scalability for OpenJDK Cloud Applications
Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Rodrigo Bruno, Paulo Ferreira: INESC-ID / Instituto Superior Técnico, University of Lisbon Ruslan Synytsky, Tetiana Fydorenchyk: Jelastic
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationFinding a needle in Haystack: Facebook's photo storage
Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Table of Contents: The Accelerated Data Center Optimizing Data Center Productivity Same Throughput with Fewer Server Nodes
More informationDatabase Systems II. Secondary Storage
Database Systems II Secondary Storage CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM
More informationThe impact of 3D storage solutions on the next generation of memory systems
The impact of 3D storage solutions on the next generation of memory systems DevelopEX 2017 Airport City Israel Avi Klein Engineering Fellow, Memory Technology Group Western Digital Corp October 31, 2017
More informationA Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationDistributed systems: paradigms and models Motivations
Distributed systems: paradigms and models Motivations Prof. Marco Danelutto Dept. Computer Science University of Pisa Master Degree (Laurea Magistrale) in Computer Science and Networking Academic Year
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationEnabling Technology for the Cloud and AI One Size Fits All?
Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence
More informationTUNING CUDA APPLICATIONS FOR MAXWELL
TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2
More informationRow Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different
More informationIBM Education Assistance for z/os V2R2
IBM Education Assistance for z/os V2R2 Item: RSM Scalability Element/Component: Real Storage Manager Material current as of May 2015 IBM Presentation Template Full Version Agenda Trademarks Presentation
More informationGRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework. Jan Gray CARRV2017: 2017/10/14
GRVI halanx Update: A Massively arallel RISC-V FGA Accelerator Framework Jan Gray jan@fpga.org http://fpga.org CARRV2017: 2017/10/14 FGA Datacenter Accelerators Are Almost Mainstream Catapult v2. Intel
More informationBoosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search
Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid
More informationInterconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp
Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary
More informationIntegrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim
Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim Farzad Farshchi, Qijing Huang, Heechul Yun University of Kansas, University of California, Berkeley SiFive Internship Rocket
More informationComputer Architecture. R. Poss
Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion
More informationPhase Change Memory An Architecture and Systems Perspective
Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,
More informationEmerging NV Storage and Memory Technologies --Development, Manufacturing and
Emerging NV Storage and Memory Technologies --Development, Manufacturing and Applications-- Tom Coughlin, Coughlin Associates Ed Grochowski, Computer Storage Consultant 2014 Coughlin Associates 1 Outline
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationInfrastructure Innovation Opportunities Y Combinator 2013
Infrastructure Innovation Opportunities Y Combinator 2013 James Hamilton, 2013/1/22 VP & Distinguished Engineer, Amazon Web Services email: James@amazon.com web: mvdirona.com/jrh/work blog: perspectives.mvdirona.com
More informationRethinking DRAM Power Modes for Energy Proportionality
Rethinking DRAM Power Modes for Energy Proportionality Krishna Malladi 1, Ian Shaeffer 2, Liji Gopalakrishnan 2, David Lo 1, Benjamin Lee 3, Mark Horowitz 1 Stanford University 1, Rambus Inc 2, Duke University
More information