Near-Data Processing for Differentiable Machine Learning Models

Size: px
Start display at page:

Download "Near-Data Processing for Differentiable Machine Learning Models"

Transcription

1 Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1, Seil Lee 1, Hyunha Nam 1, Seongsik Park 1, Seijoon Kim 1, Eui-Young Chung 2 and Sungroh Yoon 1,3 1 Electrical and Computer Engineering, Seoul National University 2 Electrical and Electronic Engineering, Yonsei University 3 Neurology and Neurological Sciences, Stanford University Correspondence: sryoon@snu.ac.kr Homepage: May 19th, 2017 Hyeokjun Choe et al. MSST 2017 May 19th, / 33

2 Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, / 33

3 Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, / 33

4 Machine Learning s Success Big data Powerful parallel processors Sophisticated models Hyeokjun Choe et al. MSST 2017 May 19th, / 33

5 Issues on Conventional Memory Hierachy Data movement in memory hierarchy Computational efficiency Power consumption Hyeokjun Choe et al. MSST 2017 May 19th, / 33

6 Near-data Processing (NDP) Memory or storage with intelligence (i.e., computing power) Process the data stored in memory or storage Reduce the data movements, CPU offloading Hyeokjun Choe et al. MSST 2017 May 19th, / 33

7 ISP-ML ISP-ML: a full-fledged ISP-supporting SSD platform Easy to implement machine learning algorithm in C/C++ For validation, three SGD algorithms were implemented and experimented with ISP-ML User Application SSD controller SSD OS CPU Embedded Processor(CPU) SRAM ISP SW Host I/F Main Memory DRAM SSD controller SRAM ARM Processor ISP SW SSD Channel Controller ISP HW Cache Controller Channel ISP HW Controller ISP HW NAND Flash NAND Flash NAND Flash NAND Flash Host I/F DRAM Cache Controller ISP HW Channel Controller ISP HW Channel Controller ISP HW NAND Flash NAND Flash NAND Flash NAND Flash Hyeokjun Choe et al. MSST 2017 May 19th, / 33

8 Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, / 33

9 Machine Learning as an Optimization Problem Machine learning categories Supervised learning, unsupervised learning, reinforcement learning The main purpose of supervised machine learning Find the optimal θ that minimizes F (D; θ) F (D, θ) = L(D, θ) + r(θ) (1) D : input data θ : model parameters L : loss function r : regularization term F : objective function Input layer Output layer Hyeokjun Choe et al. MSST 2017 May 19th, / 33

10 Gradient Descent θ t+1 = θ t η F (D, θ t ) (2) = θ t η i F (D i, θ t ) (3) η : learning rate t : iteration index i : data sample index 1st-order iterative optimization algorithm Use all samples per iteration Stochastic gradient descent (SGD) Use only one sample per iteration. Minibatch stochastic gradient descent Between gradient descent and SGD Use multiple samples per iteration Hyeokjun Choe et al. MSST 2017 May 19th, / 33

11 Parallel and Distributed SGD Synchornous SGD Parameter server aggregates θ slave synchronously. Downpour SGD Workers communicate with parameter server asynchronously. Elastic Average SGD (EASGD) Each worker has own parameters Workers transfer (θ slave θ master ), not θ slave Hyeokjun Choe et al. MSST 2017 May 19th, / 33

12 Fundamentals of Solid-State Drives (SSDs) SSD Controller Embedded processor for FTL HDD emulation Wear Leveling, Garbage collection, etc. Cache controller Channel controller DRAM Cache and Buffer 512MB - 2GB NAND flash arrays Simultaneously accessible Host interface logic SATA, PCIe Hyeokjun Choe et al. MSST 2017 May 19th, / 33

13 Previous Work on Near-Data Processing:PIM Perform computation inside the main memory 3D stacked memory (e.g. HMC) is used for PIM recently Implement processing unit in Logic Layer Applications: sorting, string matching, CNN, matrix multiplication etc. Hyeokjun Choe et al. MSST 2017 May 19th, / 33

14 Previous Work on Near-Data Processing:ISP Perform computation inside the storage ISP with embedded processor Pros: easy to implement, flexible Cons: no parallelism ISP with dedicated hardware logic Pros: channel parallelism, hardware acceleration Cons: hard to implement and change Applications: DB query (scan, join), linear regression, k-means, string match etc. Hyeokjun Choe et al. MSST 2017 May 19th, / 33

15 Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, / 33

16 Host I/F SSD controller ARM Processor Cache Controller ISP HW DRAM SRAM ISP SW Channel Controller ISP HW Channel Controller ISP HW NAND Flash NAND Flash NAND Flash NAND Flash ISP-ML: ISP Platform for Machine Learning on SSDs ISP-supporting SSD simulator Implemented in SystemC on the Synopsys Platform Architect Software/Hardware co-simulation Easily executes various machine learning algorithms in C/C++ Transaction level simulator For reasonable simulation speed ISP components SRAM ISP SW, ISP HW User Application OS CPU Main Memory SSD Host I/F DRAM SSD controller Embedded Processor(CPU) Cache Controller ISP HW SRAM ISP SW Channel Controller ISP HW Channel Controller ISP HW NAND Flash NAND Flash SSD NAND Flash NAND Flash Embedded Processor clk/rst Host I/F Cache Controller DRAM Channel Controller NAND Flash Hyeokjun Choe et al. MSST 2017 May 19th, / 33

17 Host I/F ARM Processor Cache Controller ISP HW DRAM SRAM ISP SW Channel Controller ISP HW Channel Controller ISP HW NAND Flash NAND Flash NAND Flash NAND Flash ISP-ML: ISP Platform for Machine Learning on SSDs We implemented two types of ISP hardware components. Channel controller: perform primitive operations on the stored data. Cache controller: collect the results from each of the channel controller. Master-slave architecture They communicate with each other. User Application OS CPU Main Memory SSD controller SSD Host I/F DRAM SSD controller Embedded Processor(CPU) Cache Controller ISP HW SRAM ISP SW Channel Controller ISP HW Channel Controller ISP HW NAND Flash NAND Flash SSD NAND Flash NAND Flash Hyeokjun Choe et al. MSST 2017 May 19th, / 33

18 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

19 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

20 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

21 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

22 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

23 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

24 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

25 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

26 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

27 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

28 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

29 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

30 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

31 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

32 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

33 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

34 Parallel SGD Implementation on ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

35 Methodology for IHP-ISP Performance Comparison Ideal Ways to Fairly Compare ISP and IHP 1 Implementing ISP-ML in a real semiconductor chip High chip manufacturing costs 2 Simulating IHP in the ISP-ML framework. High simulation time to simulate IHP 3 Implementing both ISP and IHP using FPGAs. Require another significant development efforts. Hard to fairly compare the performances of ISP and IHP We propose a practical comparison methodology Hyeokjun Choe et al. MSST 2017 May 19th, / 33

36 Methodology for IHP-ISP Performance Comparison (a) Host Real System Simulator ISP Cmd (b) In Host Measure observed IHP execution time(t total ) Measure data IO time(t IO ) IO Trace Extract IO trace while executing application Storage ISP-ML (baseline) ISP-ML (ISP implemented) In SSD (Sim) Measure baseline simulation time with IO trace(t IOsim ) Observed IHP execution time = T total = T nonio + T IO. Expected IHP simulation time = T nonio + T IOsim = T total - T IO + T IOsim. T IO T nonio T IOsim : Data IO latency time of the storage : Non-data IO time : Data IO time of the baseline SSD in ISP-ML Hyeokjun Choe et al. MSST 2017 May 19th, / 33

37 Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, / 33

38 Setup and Implementation Host specifications CPU 8-core Intel(R) Core i7-3770k (3.50GHz) Main memory DDR3 32GB RAM Storage Samsung SSD 840 Pro OS Ubuntu LTS ISP-ML specifications Embedded processor ARM 926EJ-S (400MHz) FTL DFTL Page size 8KB t prog / t read / t blockerase 300us / 75us / 5ms FPU 0.5 instruction/cycle(pipelined) Dataset x10 amplified MNIST(handwritten digits) Hyeokjun Choe et al. MSST 2017 May 19th, / 33

39 Performance Comparison:ISP-Based Optimization EASGD showed best performance in this experiment. x2.96 against synchornous SGD on average. x1.41 against Downpour SGD on average. For 4,8 Ch, synchronous SGD was slower than Downpour SGD For 16 Ch, synchronous SGD was faster than Downpour SGD 0.94 (a) 4-Channel 0.94 (b) 8-Channel 0.94 (c) 16-Channel Test accuracy Synchronous SGD Downpour SGD EASGD Test accuracy Time(sec) Synchronous SGD Downpour SGD EASGD Test accuracy Time(sec) Synchronous SGD Downpour SGD EASGD Time(sec) Hyeokjun Choe et al. MSST 2017 May 19th, / 33

40 Performance Comparison:IHP versus ISP Compared IHP in memory shortage situation with ISP In large-scale machine learning, the computing systems used may suffer from memory shortages. Assumption: The host had already loaded all the data to main memory for IHP. ISP-based EASGD with 16 channels obtained the best performance in our experiments Test accuracy ISP(EASGD, 4CH) IHP(2GB-memory) IHP(16GB-memory) ISP(EASGD, 8CH) IHP(4GB-memory) IHP(32GB-memory) ISP(EASGD, 16CH) IHP(8GB-memory) Time(sec) Hyeokjun Choe et al. MSST 2017 May 19th, / 33

41 Channel Parallelism The speed-up tends to be proportional to the number of channels. Because the communication overhead in ISP is negligible. In distributed computing systems, communication bottleneck commonly occurs (a) Synchronous SGD 0.94 (b) Downpour SGD Test accuracy Test accuracy (c) EASGD 4-Channel 8-Channel 16-Channel Time(sec) 4-Channel 8-Channel 16-Channel Time(sec) Test accuracy Speed up Channel 8-Channel 16-Channel Time(sec) Synchronous SGD Downpour SGD EASGD (d) Speed up Channel Hyeokjun Choe et al. MSST 2017 May 19th, / 33

42 Effects of Communication Period in Async. SGD Downpour SGD High speed for a low communication period [τ=1; 4] Unstable for a high communication period [τ =16; 64] EASGD Communication period, convergence speed In contrast to the distributed computing system Because of the low communication overhead 0.90 (a) Downpour SGD 0.92 (b) EASGD Test accuracy τ = 1 τ = 4 τ = 16 τ = Time(sec) Test accuracy τ = 1 τ = 4 τ = 16 τ = Time(sec) Hyeokjun Choe et al. MSST 2017 May 19th, / 33

43 Experimental Results Summary 1 EASGD shows the best performance in our ISP-ML environment. 2 ISP is more efficient than IHP while host suffers from insufficient main memory. ISP may be useful in large scale machine learning. 3 The speed-up by parallelizing is proportional to the number of channels. Because of the ultra fast on-chip communication. 4 The performance of EASGD decreases while the communication period increases unlike conventional distributed system. Hyeokjun Choe et al. MSST 2017 May 19th, / 33

44 Outline 1 Introduction 2 Background 3 Proposed Methodology 4 Experimental Results 5 Discussion and Conclusion Hyeokjun Choe et al. MSST 2017 May 19th, / 33

45 Parallelism in ISP ISP can provide various advantages for data processing involved in machine learning. E.g. ultra-fast on-chip communication Increase energy efficiency, security, and reliability High degree of parallelism could be achieved. By increasing the number of channels inside an SSD. Exploiting a hierarchy of parallelism Distributed systems + ISP-based SSDs Hyeokjun Choe et al. MSST 2017 May 19th, / 33

46 Opportunities for Future Research 1 Implementing deep neural networks in ISP-ML framework 2 Implementing adaptive optimization algorithms E.g. Adagrad and Adadelta 3 Pre-computing metadata during data writes 4 Implementing data shuffling functionality 5 Investigate the effect of NAND flash design on performance Hyeokjun Choe et al. MSST 2017 May 19th, / 33

47 Conclusion Create full-fledged ISP-supporting SSD simulator supporting ML Implement and compare multiple versions of parallel SGD Propose fair comparison methodology between IHP and ISP Intrigue future research opportunities in terms of exploiting the channel parallelism Hyeokjun Choe et al. MSST 2017 May 19th, / 33

48 Acknowledgments Hyeokjun Choe et al. MSST 2017 May 19th, / 33

49 Q/A Hyeokjun Choe et al. MSST 2017 May 19th, / 33

Near-Data Processing for Differentiable Machine Learning Models

Near-Data Processing for Differentiable Machine Learning Models Near-Data Processing for Differentiable Machine Learning Models Hyeokjun Choe 1, Seil Lee 1, Hyunha Nam 1, Seongsik Park 1, Seijoon Kim 1, Eui-Young Chung 2, Sungroh Yoon 1,3 1 Electrical and Computer

More information

VSSIM: Virtual Machine based SSD Simulator

VSSIM: Virtual Machine based SSD Simulator 29 th IEEE Conference on Mass Storage Systems and Technologies (MSST) Long Beach, California, USA, May 6~10, 2013 VSSIM: Virtual Machine based SSD Simulator Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong

More information

Architecture Exploration of High-Performance PCs with a Solid-State Disk

Architecture Exploration of High-Performance PCs with a Solid-State Disk Architecture Exploration of High-Performance PCs with a Solid-State Disk D. Kim, K. Bang, E.-Y. Chung School of EE, Yonsei University S. Yoon School of EE, Korea University April 21, 2010 1/53 Outline

More information

LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs

LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs IEICE TRANS. INF. & SYST., VOL.E92 D, NO.4 APRIL 2009 727 LETTER Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs Dong KIM, Kwanhu BANG, Seung-Hwan HA, Chanik PARK, Sung Woo

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

OSSD: A Case for Object-based Solid State Drives

OSSD: A Case for Object-based Solid State Drives MSST 2013 2013/5/10 OSSD: A Case for Object-based Solid State Drives Young-Sik Lee Sang-Hoon Kim, Seungryoul Maeng, KAIST Jaesoo Lee, Chanik Park, Samsung Jin-Soo Kim, Sungkyunkwan Univ. SSD Desktop Laptop

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

Unblinding the OS to Optimize User-Perceived Flash SSD Latency Unblinding the OS to Optimize User-Perceived Flash SSD Latency Woong Shin *, Jaehyun Park **, Heon Y. Yeom * * Seoul National University ** Arizona State University USENIX HotStorage 2016 Jun. 21, 2016

More information

Summarizer: Trading Communication with Computing Near Storage

Summarizer: Trading Communication with Computing Near Storage Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I, H.V. Krishina Giri Nara*, Jing Li, Hung-Wei Tseng, Steven Swanson, Murali Annavaram* *University of

More information

Enabling Cost-effective Data Processing with Smart SSD

Enabling Cost-effective Data Processing with Smart SSD Enabling Cost-effective Data Processing with Smart SSD Yangwook Kang, UC Santa Cruz Yang-suk Kee, Samsung Semiconductor Ethan L. Miller, UC Santa Cruz Chanik Park, Samsung Electronics Efficient Use of

More information

Hybrid Memory Platform

Hybrid Memory Platform Hybrid Memory Platform Kenneth Wright, Sr. Driector Rambus / Emerging Solutions Division Join the Conversation #OpenPOWERSummit 1 Outline The problem / The opportunity Project goals Roadmap - Sub-projects/Tracks

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

Optimizing Translation Information Management in NAND Flash Memory Storage Systems Optimizing Translation Information Management in NAND Flash Memory Storage Systems Qi Zhang 1, Xuandong Li 1, Linzhang Wang 1, Tian Zhang 1 Yi Wang 2 and Zili Shao 2 1 State Key Laboratory for Novel Software

More information

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage CSCI-GA.2433-001 Database Systems Lecture 8: Physical Schema: Storage Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com View 1 View 2 View 3 Conceptual Schema Physical Schema 1. Create a

More information

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Performance Benefits of Running RocksDB on Samsung NVMe SSDs Performance Benefits of Running RocksDB on Samsung NVMe SSDs A Detailed Analysis 25 Samsung Semiconductor Inc. Executive Summary The industry has been experiencing an exponential data explosion over the

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

Decentralized and Distributed Machine Learning Model Training with Actors

Decentralized and Distributed Machine Learning Model Training with Actors Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of

More information

Parallel Stochastic Gradient Descent: The case for native GPU-side GPI

Parallel Stochastic Gradient Descent: The case for native GPU-side GPI Parallel Stochastic Gradient Descent: The case for native GPU-side GPI J. Keuper Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Mark Silberstein Accelerated Computer

More information

Today s Presentation

Today s Presentation Banish the I/O: Together, SSD and Main Memory Storage Accelerate Database Performance Today s Presentation Conventional Database Performance Optimization Goal: Minimize I/O Legacy Approach: Cache 21 st

More information

Beyond Block I/O: Rethinking

Beyond Block I/O: Rethinking Beyond Block I/O: Rethinking Traditional Storage Primitives Xiangyong Ouyang *, David Nellans, Robert Wipfel, David idflynn, D. K. Panda * * The Ohio State University Fusion io Agenda Introduction and

More information

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China doi:10.21311/001.39.7.41 Implementation of Cache Schedule Strategy in Solid-state Disk Baoping Wang School of software, Nanyang Normal University, Nanyang 473061, Henan, China Chao Yin* School of Information

More information

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Solid State Storage Technologies Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu NVMe (1) The industry standard interface for high-performance NVM

More information

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo 1 June 4, 2011 2 Outline Introduction System Architecture A Multi-Chipped

More information

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems

More information

TRADITIONAL search engines utilize hard disk drives

TRADITIONAL search engines utilize hard disk drives This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TC.216.268818,

More information

NEXT-GENERATION SSD VERIFICATION PLATFORM WITH TERA-SCALE NAND CAPACITY AND STORAGE SIGNAL PROCESSING SUPPORT

NEXT-GENERATION SSD VERIFICATION PLATFORM WITH TERA-SCALE NAND CAPACITY AND STORAGE SIGNAL PROCESSING SUPPORT NEXT-GENERATION SSD VERIFICATION PLATFORM WITH TERA-SCALE CAPACITY AND STORAGE SIGNAL PROCESSING SUPPORT NVRAMOS 2012-10-16 Seil Lee, Myunghyun Rhee, and Sungroh Yoon Advanced Computing Laboratory, Seoul

More information

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research 1 The world s most valuable resource Data is everywhere! May. 2017 Values from Data! Need infrastructures for

More information

Linux Storage System Bottleneck Exploration

Linux Storage System Bottleneck Exploration Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications

More information

Toward Scalable Deep Learning

Toward Scalable Deep Learning 한국정보과학회 인공지능소사이어티 머신러닝연구회 두번째딥러닝워크샵 2015.10.16 Toward Scalable Deep Learning 윤성로 Electrical and Computer Engineering Seoul National University http://data.snu.ac.kr Breakthrough: Big Data + Machine Learning

More information

HARD Disk Drives (HDDs) have been widely used as mass

HARD Disk Drives (HDDs) have been widely used as mass 878 IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 7, JULY 2010 Architecture Exploration of High-Performance PCs with a Solid-State Disk Dong Kim, Kwanhu Bang, Student Member, IEEE, Seung-Hwan Ha, Sungroh

More information

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot Sort vs. Hash Join Revisited for Near-Memory Execution Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot 1 Near-Memory Processing (NMP) Emerging technology Stacked memory: A logic die w/ a stack

More information

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Young-Joon Jang and Dongkun Shin Abstract Recent SSDs use parallel architectures with multi-channel and multiway, and manages multiple

More information

CafeGPI. Single-Sided Communication for Scalable Deep Learning

CafeGPI. Single-Sided Communication for Scalable Deep Learning CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks

More information

Toward SLO Complying SSDs Through OPS Isolation

Toward SLO Complying SSDs Through OPS Isolation Toward SLO Complying SSDs Through OPS Isolation October 23, 2015 Hongik University UNIST (Ulsan National Institute of Science & Technology) Sam H. Noh 1 Outline Part 1: FAST 2015 Part 2: Beyond FAST 2

More information

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadati, Saugata Ghose, Onur Mutlu February 13, 2018 Executive Summary

More information

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work

More information

BlueDBM: An Appliance for Big Data Analytics*

BlueDBM: An Appliance for Big Data Analytics* BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting

More information

CSEE W4824 Computer Architecture Fall 2012

CSEE W4824 Computer Architecture Fall 2012 CSEE W4824 Computer Architecture Fall 2012 Lecture 8 Memory Hierarchy Design: Memory Technologies and the Basics of Caches Luca Carloni Department of Computer Science Columbia University in the City of

More information

High-Speed NAND Flash

High-Speed NAND Flash High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The

More information

Performance Modeling and Analysis of Flash based Storage Devices

Performance Modeling and Analysis of Flash based Storage Devices Performance Modeling and Analysis of Flash based Storage Devices H. Howie Huang, Shan Li George Washington University Alex Szalay, Andreas Terzis Johns Hopkins University MSST 11 May 26, 2011 NAND Flash

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Caches Martha A. Kim Columbia University Fall 215 Illustrations Copyright 27 Elsevier 1 / 23 Computer Systems Performance depends on which is slowest: the processor or

More information

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung

More information

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto So slow So cheap So heavy So fast So expensive So efficient NAND based flash memory Retains memory without power It works by trapping a small

More information

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Doug Burger Director, Hardware, Devices, & Experiences MSR NExT November 15, 2015 The Cloud is a Growing Disruptor for HPC Moore s

More information

Join Processing for Flash SSDs: Remembering Past Lessons

Join Processing for Flash SSDs: Remembering Past Lessons Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of

More information

The Challenges of System Design. Raising Performance and Reducing Power Consumption

The Challenges of System Design. Raising Performance and Reducing Power Consumption The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software

More information

Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM

Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM Presented by Steve Graves, McObject and Jeff Chang, AgigA Tech Santa Clara, CA 1 The Problem: Memory Latency NON-VOLATILE MEMORY HIERARCHY

More information

Bring x3 Spark Performance Improvement with PCIe SSD. Yucai, Yu BDT/STO/SSG January, 2016

Bring x3 Spark Performance Improvement with PCIe SSD. Yucai, Yu BDT/STO/SSG January, 2016 Bring x3 Spark Performance Improvement with PCIe SSD Yucai, Yu (yucai.yu@intel.com) BDT/STO/SSG January, 2016 About me/us Me: Spark contributor, previous on virtualization, storage, mobile/iot OS. Intel

More information

Flash Memory Based Storage System

Flash Memory Based Storage System Flash Memory Based Storage System References SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers, ISLPED 06 Energy-Aware Flash Memory Management in Virtual Memory System, islped

More information

OpenSSD Platform Simulator to Reduce SSD Firmware Test Time. Taedong Jung, Yongmyoung Lee, Ilhoon Shin

OpenSSD Platform Simulator to Reduce SSD Firmware Test Time. Taedong Jung, Yongmyoung Lee, Ilhoon Shin OpenSSD Platform Simulator to Reduce SSD Firmware Test Time Taedong Jung, Yongmyoung Lee, Ilhoon Shin Department of Electronic Engineering, Seoul National University of Science and Technology, South Korea

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near

More information

Large and Fast: Exploiting Memory Hierarchy

Large and Fast: Exploiting Memory Hierarchy CSE 431: Introduction to Operating Systems Large and Fast: Exploiting Memory Hierarchy Gojko Babić 10/5/018 Memory Hierarchy A computer system contains a hierarchy of storage devices with different costs,

More information

PCIe Storage Beyond SSDs

PCIe Storage Beyond SSDs PCIe Storage Beyond SSDs Fabian Trumper NVM Solutions Group PMC-Sierra Santa Clara, CA 1 Classic Memory / Storage Hierarchy FAST, VOLATILE CPU Cache DRAM Performance Gap Performance Tier (SSDs) SLOW, NON-VOLATILE

More information

Developing Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017

Developing Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017 Developing Low Latency NVMe Systems for HyperscaleData Centers Prepared by Engling Yeo Santa Clara, CA 95054 Date: 08/04/2017 Quality of Service IOPS, Throughput, Latency Short predictable read latencies

More information

Designing a True Direct-Access File System with DevFS

Designing a True Direct-Access File System with DevFS Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin-Madison Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies

More information

arxiv: v1 [cs.ar] 11 Apr 2017

arxiv: v1 [cs.ar] 11 Apr 2017 FMMU: A Hardware-Automated Flash Map Management Unit for Scalable Performance of NAND Flash-Based SSDs Yeong-Jae Woo Sang Lyul Min Department of Computer Science and Engineering, Seoul National University

More information

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis The Long-Term Future of Solid State Storage Jim Handy Objective Analysis Agenda How did we get here? Why it s suboptimal How we move ahead Why now? DRAM speed scaling Changing role of NVM in computing

More information

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives Chao Sun 1, Asuka Arakawa 1, Ayumi Soga 1, Chihiro Matsui 1 and Ken Takeuchi 1 1 Chuo University Santa Clara,

More information

HADP Talk BlueDBM: An appliance for Big Data Analytics

HADP Talk BlueDBM: An appliance for Big Data Analytics HADP Talk BlueDBM: An appliance for Big Data Analytics Sang-Woo Jun* Ming Liu* Sungjin Lee* Jamey Hicks+ John Ankcorn+ Myron King+ Shuotao Xu* Arvind* *MIT Computer Science and Artificial Intelligence

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Don t Forget the Memory. Dean Klein, VP Memory System Development Micron Technology, Inc.

Don t Forget the Memory. Dean Klein, VP Memory System Development Micron Technology, Inc. Don t Forget the Memory Dean Klein, VP Memory System Development Micron Technology, Inc. Memory is Everywhere 2 One size DOES NOT fit all 3 Question: How many different memories does your computer use?

More information

Handling Memory Accesses in Big Data Environment Chipex 2016

Handling Memory Accesses in Big Data Environment Chipex 2016 Professor Uri Weiser Technion Haifa, Israel Handling Memory Accesses in Big Data Environment Chipex 2016 The talk covers research done by: T. Horowitz, Prof. A. Kolodny, T. Morad,, Prof. A. Mendelson,

More information

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, Onur Mutlu Machine Learning and Big

More information

Large Scale Distributed Deep Networks

Large Scale Distributed Deep Networks Large Scale Distributed Deep Networks Yifu Huang School of Computer Science, Fudan University huangyifu@fudan.edu.cn COMP630030 Data Intensive Computing Report, 2013 Yifu Huang (FDU CS) COMP630030 Report

More information

SOS : Software-based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS : Software-based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS : Software-based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs Sangwook Shane Hahn, Sungjin Lee and Jihong Kim Computer Architecture & Embedded Systems Laboratory School of Computer

More information

NOHOST: A New Storage Architecture for Distributed Storage Systems. Chanwoo Chung

NOHOST: A New Storage Architecture for Distributed Storage Systems. Chanwoo Chung NOHOST: A New Storage Architecture for Distributed Storage Systems by Chanwoo Chung B.S., Seoul National University (2014) Submitted to the Department of Electrical Engineering and Computer Science in

More information

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager 1 T HE D E N A L I N E X T - G E N E R A T I O N H I G H - D E N S I T Y S T O R A G E I N T E R F A C E Laura Caulfield Senior Software Engineer Arie van der Hoeven Principal Program Manager Outline Technology

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Persistent Memory Productization driven by AI & ML. Danny Sabour VP Marketing, Avalanche Technology

Persistent Memory Productization driven by AI & ML. Danny Sabour VP Marketing, Avalanche Technology Persistent Memory Productization driven by AI & ML Danny Sabour VP Marketing, Avalanche Technology Persistent Memory Usage from Cloud to Node CLOUD Compute Storage Deep Learning Training Big data processing

More information

COMP283-Lecture 3 Applied Database Management

COMP283-Lecture 3 Applied Database Management COMP283-Lecture 3 Applied Database Management Introduction DB Design Continued Disk Sizing Disk Types & Controllers DB Capacity 1 COMP283-Lecture 3 DB Storage: Linear Growth Disk space requirements increases

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy CSCI-UA.0201 Computer Systems Organization Memory Hierarchy Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Programmer s Wish List Memory Private Infinitely large Infinitely fast Non-volatile

More information

Memory Expansion Technology Using Software-Controlled SSD

Memory Expansion Technology Using Software-Controlled SSD Memory Expansion Technology Using Software-Controlled SSD S. Kazama*, S. Gokita*, S. Kuwamura*, E. Yoshida*, J. Ogawa*, Y. Honda** *Fujitsu Laboratories Ltd. **Fujitsu Ltd. Contact: sc-ssd-fms2017@ml.labs.fujitsu.com

More information

Flash-Conscious Cache Population for Enterprise Database Workloads

Flash-Conscious Cache Population for Enterprise Database Workloads IBM Research ADMS 214 1 st September 214 Flash-Conscious Cache Population for Enterprise Database Workloads Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clem Dickey,

More information

Next Generation Architecture for NVM Express SSD

Next Generation Architecture for NVM Express SSD Next Generation Architecture for NVM Express SSD Dan Mahoney CEO Fastor Systems Copyright 2014, PCI-SIG, All Rights Reserved 1 NVMExpress Key Characteristics Highest performance, lowest latency SSD interface

More information

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University NAND Flash-based Storage Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics NAND flash memory Flash Translation Layer (FTL) OS implications

More information

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device Hyukjoong Kim 1, Dongkun Shin 1, Yun Ho Jeong 2 and Kyung Ho Kim 2 1 Samsung Electronics

More information

Learning with Purpose

Learning with Purpose Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts

More information

stec Host Cache Solution

stec Host Cache Solution White Paper stec Host Cache Solution EnhanceIO SSD Cache Software and the stec s1120 PCIe Accelerator speed decision support system (DSS) workloads and free up disk I/O resources for other applications.

More information

DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture

DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture DCS-ctrl: A Fast and Flexible ice-control Mechanism for ice-centric Server Architecture Dongup Kwon 1, Jaehyung Ahn 2, Dongju Chae 2, Mohammadamin Ajdari 2, Jaewon Lee 1, Suheon Bae 1, Youngsok Kim 1,

More information

HEAD HardwarE Accelerated Deduplication

HEAD HardwarE Accelerated Deduplication HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version

More information

Lightweight KV-based Distributed Store for Datacenters

Lightweight KV-based Distributed Store for Datacenters Lightweight KV-based Distributed Store for Datacenters Chanwoo Chung, Jinhyung Koo*, Arvind, and Sungjin Lee Massachusetts Institute of Technology (MIT) Daegu Gyeongbuk Institute of Science & Technology

More information

Flash Trends: Challenges and Future

Flash Trends: Challenges and Future Flash Trends: Challenges and Future John D. Davis work done at Microsoft Researcher- Silicon Valley in collaboration with Laura Caulfield*, Steve Swanson*, UCSD* 1 My Research Areas of Interest Flash characteristics

More information

High Performance SSD & Benefit for Server Application

High Performance SSD & Benefit for Server Application High Performance SSD & Benefit for Server Application AUG 12 th, 2008 Tony Park Marketing INDILINX Co., Ltd. 2008-08-20 1 HDD SATA 3Gbps Memory PCI-e 10G Eth 120MB/s 300MB/s 8GB/s 2GB/s 1GB/s SSD SATA

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. X, XXXXX SSD In-Storage Computing for Search Engines

IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. X, XXXXX SSD In-Storage Computing for Search Engines IEEE TRANSACTIONS ON COMPUTERS, VOL. 66, NO. X, XXXXX 2017 1 SSD In-Storage Computing for Search Engines Jianguo Wang, Dongchul Park, Yannis Papakonstantinou, and Steven Swanson Abstract SSD in-storage

More information

Workload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013

Workload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Workload Optimized Systems: The Wheel of Reincarnation Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Outline Definition Technology Minicomputers Prime Workstations Apollo Graphics

More information

CS 261 Fall Mike Lam, Professor. Memory

CS 261 Fall Mike Lam, Professor. Memory CS 261 Fall 2016 Mike Lam, Professor Memory Topics Memory hierarchy overview Storage technologies SRAM DRAM PROM / flash Disk storage Tape and network storage I/O architecture Storage trends Latency comparisons

More information

Implica(ons of Non Vola(le Memory on So5ware Architectures. Nisha Talagala Lead Architect, Fusion- io

Implica(ons of Non Vola(le Memory on So5ware Architectures. Nisha Talagala Lead Architect, Fusion- io Implica(ons of Non Vola(le Memory on So5ware Architectures Nisha Talagala Lead Architect, Fusion- io Overview Non Vola;le Memory Technology NVM in the Datacenter Op;mizing sobware for the iomemory Tier

More information

Bringing Intelligence to Enterprise Storage Drives

Bringing Intelligence to Enterprise Storage Drives Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely

More information

Linux Storage System Analysis for e.mmc With Command Queuing

Linux Storage System Analysis for e.mmc With Command Queuing Linux Storage System Analysis for e.mmc With Command Queuing Linux is a widely used embedded OS that also manages block devices such as e.mmc, UFS and SSD. Traditionally, advanced embedded systems have

More information

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs IO 1 Today IO 2 Key Points CPU interface and interaction with IO IO devices The basic structure of the IO system (north bridge, south bridge, etc.) The key advantages of high speed serial lines. The benefits

More information

NVMe : Redefining the Hardware/Software Architecture

NVMe : Redefining the Hardware/Software Architecture NVMe : Redefining the Hardware/Software Architecture Jérôme Gaysse, IP-Maker Santa Clara, CA 1 NVMe Protocol How to implement the NVMe protocol? SW, HW/SW or HW? 2- NVMe command ready CPU 1-Host driver

More information

I/O. Disclaimer: some slides are adopted from book authors slides with permission 1

I/O. Disclaimer: some slides are adopted from book authors slides with permission 1 I/O Disclaimer: some slides are adopted from book authors slides with permission 1 Thrashing Recap A processes is busy swapping pages in and out Memory-mapped I/O map a file on disk onto the memory space

More information

Exploring System Challenges of Ultra-Low Latency Solid State Drives

Exploring System Challenges of Ultra-Low Latency Solid State Drives Exploring System Challenges of Ultra-Low Latency Solid State Drives Sungjoon Koh Changrim Lee, Miryeong Kwon, and Myoungsoo Jung Computer Architecture and Memory systems Lab Executive Summary Motivation.

More information

SSD Applications in the Enterprise Area

SSD Applications in the Enterprise Area SSD Applications in the Enterprise Area Tony Kim Samsung Semiconductor January 8, 2010 Outline Part I: SSD Market Outlook Application Trends Part II: Challenge of Enterprise MLC SSD Understanding SSD Lifetime

More information

It Takes Guts to be Great

It Takes Guts to be Great It Takes Guts to be Great Sean Stead, STEC Tutorial C-11: Enterprise SSDs Tues Aug 21, 2012 8:30 to 11:20AM 1 Who s Inside Your SSD? Full Data Path Protection Host Interface It s What s On The Inside That

More information