Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
|
|
- Emory Holt
- 6 years ago
- Views:
Transcription
1 Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems Laboratory, Department of Computer Science and Engineering University of California, San Diego
2 The Future of Storage Hard Drives PCIe-Flash 2007 PCIe-NVM 2013? NVM Lat.: 7.1ms BW: 2.6MB/s 1x 1x 68us 250MB/s 104x 96x 12us 1.7GB/s 591x 669x = 2.89x/yr = 2.95x/yr *Random 4KB Reads from user space 2
3 Software Latency Costs Disk Flash PCM 1E+0 1E+1 1E+2 1E+3 1E+4 1E+5 1E+6 1E+7 Log Request Latency (us) File System Operating System Hardware 3
4 Architecting a High Performance SSD Hardware architecture and software layers limit performance HW/SW interface critical to good performance Careful co-design provides significant benefits Increased bandwidth Decreased latencies Increased concurrency 4
5 Overview Motivation Moneta Architecture Optimizing Moneta Performance Analysis Conclusion 5
6 Ring (4 GB/s) Moneta Architecture Host via PIO Request Queue Scoreboard Ring Control Transfer Buffers DMA Control 16GB PCM 16GB PCM 16GB PCM Tag Status Registers 16GB PCM Host via DMA 6
7 Moneta Architecture Overview Application File System OS IO Stack Moneta Driver PCIe 1.1 x8 (2GB/s Full Duplex) 7
8 Advanced Memory Technology Characteristics Work with any fast NVM: DRAM-like speed DRAM-like interface Phase Change Memory Coming soon Simple wear leveling: Start Gap [Micro 2009] 8
9 Moneta: Modeling Advanced NVMs Built on RAMP s BEE3 board PCIe 1.1 x8 host connection 250MHz design DDR2 DRAM emulates NVMs Adjust timings to match PCM RAS-CAS Delay for reads Precharge latency for writes Read Write Projected Latency 48 ns 150 ns Latency projections from [B.C. Lee, ISCA 09] 9
10 Overview Motivation Moneta Architecture Optimizing Moneta Interface & Software Microarchitecture Performance Analysis Conclusion 10
11 Latency (us) Optimized Software is Critical Baseline Latency (4KB) Hardware: 8.2 us Software: 13.4 us Optimize hardware and software Hardware costs PCM Ring DMA Wait Interrupt Issue Copy 5 Schedule OS/User 0 Base 11
12 Latency (us) Removing the IO Scheduler Reduces executed code IO Sched code Kernel request queuing Improves concurrency Requests not serialized through IO scheduler Multiple threads in driver 10% SW latency savings PCM Ring DMA Wait Interrupt Issue Copy Schedule OS/User 12
13 Lock Free Tag Pool Compare-and-swap operations Tag indexed data structures Use processor ID as hint for where to search in tag structure Reduces concurrency conflicts Reduces cache-line misses 13
14 Co-design HW/SW Interface Baseline uses multiple PIO writes to send command Reduce command word to 64 bits Allows a single PIO write to issue a request No need for locks to protect request issue Remove DMA address from command Pre-allocate buffers during initialization Static buffer address for each tag 14
15 Latency (us) Atomic Operations Lock-Free Data Structures Atomic HW/SW Interface Increased concurrency 10% less latency vs. NoSched PCM Ring DMA Wait Interrupt Issue Copy Schedule OS/User 15
16 Latency (us) Add Spin-Waits Spin-waits vs. sleeping Spin for < 4KB requests Sleep for larger requests 5us of software latency 54% less SW vs. Atomic 62% vs. Base PCM Ring DMA Wait Interrupt Issue Copy Schedule OS/User 16
17 Bandwidth (GB/s) Moneta Bandwidth Random Write Accesses 1M IOPS Access Size (KB) Base Base NoSched Base Base NoSched Atomic NoSched Ideal Atomic SpinWait Ideal Ideal Ideal 17
18 CPUs / GB/s CPU Utilization Random 4KB Read/Write Accesses Baseline NoSched Atomic Spin 18
19 Zero-Copy Trade-off multiple PIO writes and zero-copy IO Currently faster for us to copy in the driver than it is to write multiple words to the hardware to issue a request 128-bit or cache-line sized PIO writes needed 19
20 Overview Motivation Moneta Architecture Optimizing Moneta Interface & Software Microarchitecture Performance Analysis Conclusion 20
21 Bandwidth (GB/s) Balancing Bandwidth Usage Full duplex PCIe should see better R/W BW Smarter HW request scheduling = more BW Two request queues: one for reads, one for writes Alternate between Qs on each buffer allocation 4.0 Random R/W Accesses Access Size (KB) Single Queue Ideal 21
22 Ring (4 GB/s) Moneta Architecture Changes Host via PIO Read Request Queue Write Request Queue Scoreboard Ring Control Transfer Buffers DMA Control 16GB PCM 16GB PCM 16GB PCM Tag Status Registers 16GB PCM Host via DMA 22
23 Bandwidth (GB/s) Balancing Bandwidth Usage Full duplex PCIe should see better R/W BW Smarter HW request scheduling = more BW Two request queues: one for reads, one for writes Alternate between Qs on 0.5 each buffer allocation 0.0 Random R/W Accesses Access Size (KB) Single Queue RW Queues Ideal 23
24 Round-Robin Scheduling Prevent requests from starving other requests Allocate a buffer and then put request at back of queue Attains much better small request throughput in the presence of large requests 12x improvement in small request BW 24
25 Ring (4 GB/s) Moneta Architecture Changes Host via PIO Read Request Queue Write Request Queue Scoreboard Ring Control Transfer Buffers DMA Control 16GB PCM 16GB PCM 16GB PCM Tag Status Registers 16GB PCM Host via DMA 25
26 4ns 64ns 256ns 1us 4us 16us 64us 128us Bandwidth (GB/s) Effects of Memory Latency Moneta tolerates memory latencies up to 1us without BW loss Increased parallelism hides extra memory latency Memory Latency 8 Controllers 4 Controllers 2 Controllers 1 Controller 26
27 NVMs for Storage vs DRAM Replacement Write coalescing Storage must guarantee durability by closing row DRAM leaves row open to enable coalescing Row buffer size should match access size Large accesses for storage Cache-line sized for memory Peak memory activity limited by PCIe BW Storage and DRAM replacement are different and should be optimized differently 27
28 Overview Motivation Moneta Architecture Optimizing Moneta Performance Analysis Conclusion 28
29 System Overview Moneta Memory and Device Interconnect Capacity Fusion-IO IODrive PCIe 4x 80GB SLC NAND Flash SW RAID-0 PCIe 4x SATA 2 Controller 128GB Disk HW RAID-0 PCIe 4x RAID Controller 4TB Moneta PCIe 8x 64GB Moneta-4x PCIe 4x 64GB 29
30 Workloads Name Footprint Description Basic IO Benchmarks XDD NoFS 64 GB Low-level IO performance without file system XDD XFS 64 GB XFS file system performance Database Applications Berkeley-DB Btree 16 GB Transactional updates to btree key/value store Berkeley-DB HashTable 16 GB Transactional updates to hash table key/value store BiologicalNetworks 35 GB Biological database queried for properties of genes and biological-networks PTF 50 GB Palomar Transient Factory sky survey queries Memory-hungry Applications DGEMM 21 GB Matrix multiply with 30,000 x 30,000 matrices NAS Parallel Benchmarks 8-35 GB 7 apps from NPB suite modeling scientific workloads 30
31 Bandwidth (GB/s) XDD Bandwidth Comparison 50/50 Read/Write Random Accesses Disk SSD FusionIO Moneta Moneta-4x Access Size (KB) 31
32 Bandwidth (GB/s) Bandwidth w/o File System Read Write Read Write Large Accesses (4MB) Small Accesses (4KB) Moneta FusionIO SSD-RAID DISK-RAID 32
33 Bandwidth (GB/s) Bandwidth with XFS Read Write Read Write Raw IO Large 4MB Accesses (4MB) Small 4KB Accesses (4KB) Moneta FusionIO SSD-RAID DISK-RAID 33
34 Log Speedup vs DISK-RAID Database & App Performance An Opportunity for Leveraging Hardware Optimizations XDD 4KB RW HashTable PTF Btree Bio XDD 4KB RW DGEMM lu bt sp is 34
35 Conclusion Fast advanced NVMs are coming soon We built Moneta to understand the impact of NVMs Found that the interface and microarchitecture are both critical to getting excellent performance Many opportunities to move software optimizations into hardware Many open questions exist about the architecture of fast SSDs and the systems they interface with 35
36 Thank You! Any Questions? 36
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationOnyx: A Prototype Phase-Change Memory Storage Array
Onyx: A Prototype Phase-Change Memory Storage Array Ameen Akel * Adrian Caulfield, Todor Mollov, Rajesh Gupta, Steven Swanson Non-Volatile Systems Laboratory, Department of Computer Science and Engineering
More informationRedrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories
Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories Steven Swanson Director, Non- Vola;le System Laboratory Computer Science and Engineering University of California, San
More informationOnyx: A Protoype Phase Change Memory Storage Array
Onyx: A Protoype Phase Change ory Storage Array Ameen Akel Adrian M. Caulfield Todor I. Mollov Rajesh K. Gupta Steven Swanson Computer Science and Engineering University of California, San Diego Abstract
More informationNear- Data Computa.on: It s Not (Just) About Performance
Near- Data Computa.on: It s Not (Just) About Performance Steven Swanson Non- Vola0le Systems Laboratory Computer Science and Engineering University of California, San Diego 1 Solid State Memories NAND
More informationProviding Safe, User Space Access to Fast, Solid State Disks
Providing Safe, User Space Access to Fast, Solid State Disks Adrian M. Caulfield Todor I. Mollov Louis Alex Eisner Arup De Joel Coburn Steven Swanson Computer Science and Engineering Department University
More informationNew Abstractions for Fast Non-Volatile Storage
New Abstractions for Fast Non-Volatile Storage Joel Coburn, Adrian Caulfield, Laura Grupp, Ameen Akel, Steven Swanson Non-volatile Systems Laboratory Department of Computer Science and Engineering University
More informationNVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory
NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)
More informationWillow: A User- Programmable SSD
Willow: A User- Programmable SSD Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson Non- VolaDle Systems Laboratory Computer Science
More informationComparing Performance of Solid State Devices and Mechanical Disks
Comparing Performance of Solid State Devices and Mechanical Disks Jiri Simsa Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University Motivation Performance gap [Pugh71] technology
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationBeyond Block I/O: Rethinking
Beyond Block I/O: Rethinking Traditional Storage Primitives Xiangyong Ouyang *, David Nellans, Robert Wipfel, David idflynn, D. K. Panda * * The Ohio State University Fusion io Agenda Introduction and
More informationAccelerate Applications Using EqualLogic Arrays with directcache
Accelerate Applications Using EqualLogic Arrays with directcache Abstract This paper demonstrates how combining Fusion iomemory products with directcache software in host servers significantly improves
More informationPCIe Storage Beyond SSDs
PCIe Storage Beyond SSDs Fabian Trumper NVM Solutions Group PMC-Sierra Santa Clara, CA 1 Classic Memory / Storage Hierarchy FAST, VOLATILE CPU Cache DRAM Performance Gap Performance Tier (SSDs) SLOW, NON-VOLATILE
More informationFlash Trends: Challenges and Future
Flash Trends: Challenges and Future John D. Davis work done at Microsoft Researcher- Silicon Valley in collaboration with Laura Caulfield*, Steve Swanson*, UCSD* 1 My Research Areas of Interest Flash characteristics
More informationOperating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)
2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:
More informationChapter 6. Storage and Other I/O Topics
Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationStorage. Hwansoo Han
Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics
More informationRobert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann
Using Flash SSDs as Pi Primary Database Storage Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann {lastname}@dvs.tu-darmstadt.de Fachgebiet DVS Ilia Petrov 1 Flash SSDs,
More informationDynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory
Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory Dong In Shin, Young Jin Yu, Hyeong S. Kim, Jae Woo Choi, Do Yung Jung, Heon Y. Yeom Taejin Infotech, Seoul
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More informationJanuary 28-29, 2014 San Jose
January 28-29, 2014 San Jose Flash for the Future Software Optimizations for Non Volatile Memory Nisha Talagala, Lead Architect, Fusion-io Gary Orenstein, Chief Marketing Officer, Fusion-io @garyorenstein
More informationAzor: Using Two-level Block Selection to Improve SSD-based I/O caches
Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas {klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr
More informationCSCI-GA Database Systems Lecture 8: Physical Schema: Storage
CSCI-GA.2433-001 Database Systems Lecture 8: Physical Schema: Storage Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com View 1 View 2 View 3 Conceptual Schema Physical Schema 1. Create a
More informationExtending the NVMHCI Standard to Enterprise
Extending the NVMHCI Standard to Enterprise Amber Huffman Principal Engineer Intel Corporation August 2009 1 Outline Remember: What is NVMHCI PCIe SSDs Coming with Challenges Enterprise Extensions to NVMHCI
More informationEnabling NVMe I/O Scale
Enabling NVMe I/O Determinism @ Scale Chris Petersen, Hardware System Technologist Wei Zhang, Software Engineer Alexei Naberezhnov, Software Engineer Facebook Facebook @ Scale 800 Million 1.3 Billion 2.2
More informationInterface Trends for the Enterprise I/O Highway
Interface Trends for the Enterprise I/O Highway Mitchell Abbey Product Line Manager Enterprise SSD August 2012 1 Enterprise SSD Market Update One Size Does Not Fit All : Storage solutions will be tiered
More informationExtremely Fast Distributed Storage for Cloud Service Providers
Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed
More informationKey Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs
IO 1 Today IO 2 Key Points CPU interface and interaction with IO IO devices The basic structure of the IO system (north bridge, south bridge, etc.) The key advantages of high speed serial lines. The benefits
More informationThe Non-Volatile Memory Verbs Provider (NVP): Using the OFED Framework to access solid state storage
The Non-Volatile Memory Verbs Provider (NVP): Using the OFED Framework to access solid state storage Bernard Metzler 1, Animesh Trivedi 1, Lars Schneidenbach 2, Michele Franceschini 2, Patrick Stuedi 1,
More informationExploring System Challenges of Ultra-Low Latency Solid State Drives
Exploring System Challenges of Ultra-Low Latency Solid State Drives Sungjoon Koh Changrim Lee, Miryeong Kwon, and Myoungsoo Jung Computer Architecture and Memory systems Lab Executive Summary Motivation.
More informationSummarizer: Trading Communication with Computing Near Storage
Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I, H.V. Krishina Giri Nara*, Jing Li, Hung-Wei Tseng, Steven Swanson, Murali Annavaram* *University of
More informationImplica(ons of Non Vola(le Memory on So5ware Architectures. Nisha Talagala Lead Architect, Fusion- io
Implica(ons of Non Vola(le Memory on So5ware Architectures Nisha Talagala Lead Architect, Fusion- io Overview Non Vola;le Memory Technology NVM in the Datacenter Op;mizing sobware for the iomemory Tier
More informationBzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory
BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory JOY ARULRAJ JUSTIN LEVANDOSKI UMAR FAROOQ MINHAS PER-AKE LARSON Microsoft Research NON-VOLATILE MEMORY [NVM] PERFORMANCE DRAM VOLATILE
More informationComputer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationMANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16
MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16 THE DATA CHALLENGE Performance Improvement (RelaLve) 4.4 ZB Total data created, replicated, and consumed in a single year
More informationUCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer
UCS Invicta: A New Generation of Storage Performance Mazen Abou Najm DC Consulting Systems Engineer HDDs Aren t Designed For High Performance Disk 101 Can t spin faster (200 IOPS/Drive) Can t seek faster
More informationLinux Storage System Bottleneck Exploration
Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationAccelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740
Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 A performance study with NVDIMM-N Dell EMC Engineering September 2017 A Dell EMC document category Revisions Date
More informationReadings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important
Storage Hierarchy III: I/O System Readings reg I$ D$ L2 L3 memory disk (swap) often boring, but still quite important ostensibly about general I/O, mainly about disks performance: latency & throughput
More informationNext Generation Architecture for NVM Express SSD
Next Generation Architecture for NVM Express SSD Dan Mahoney CEO Fastor Systems Copyright 2014, PCI-SIG, All Rights Reserved 1 NVMExpress Key Characteristics Highest performance, lowest latency SSD interface
More informationDongjun Shin Samsung Electronics
2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer
More informationExploiting the benefits of native programming access to NVM devices
Exploiting the benefits of native programming access to NVM devices Ashish Batwara Principal Storage Architect Fusion-io Traditional Storage Stack User space Application Kernel space Filesystem LBA Block
More informationZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationReliability, Availability, Serviceability (RAS) and Management for Non-Volatile Memory Storage
Reliability, Availability, Serviceability (RAS) and Management for Non-Volatile Memory Storage Mohan J. Kumar, Intel Corp Sammy Nachimuthu, Intel Corp Dimitris Ziakas, Intel Corp August 2015 1 Agenda NVDIMM
More informationDesigning a True Direct-Access File System with DevFS
Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin-Madison Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies
More informationBlurred Persistence in Transactional Persistent Memory
Blurred Persistence in Transactional Persistent Memory Youyou Lu, Jiwu Shu, Long Sun Tsinghua University Overview Problem: high performance overhead in ensuring storage consistency of persistent memory
More informationDeveloping Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017
Developing Low Latency NVMe Systems for HyperscaleData Centers Prepared by Engling Yeo Santa Clara, CA 95054 Date: 08/04/2017 Quality of Service IOPS, Throughput, Latency Short predictable read latencies
More informationHow Next Generation NV Technology Affects Storage Stacks and Architectures
How Next Generation NV Technology Affects Storage Stacks and Architectures Marty Czekalski, Interface and Emerging Architecture Program Manager, Seagate Technology Flash Memory Summit 2013 Santa Clara,
More informationSolid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Solid State Storage Technologies Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu NVMe (1) The industry standard interface for high-performance NVM
More informationEnd-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet
Hot Interconnects 2014 End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet Green Platform Research Laboratories, NEC, Japan J. Suzuki, Y. Hayashi, M. Kan, S. Miyakawa,
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationHP visoko-performantna OLTP rješenja
HP visoko-performantna OLTP rješenja Tomislav Alpeza Presales Consultant, BCS/SD 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance
More informationMarch NVM Solutions Group
March 2017 NVM Solutions Group Ideally one would desire an indefinitely large memory capacity such that any particular word would be immediately available. It does not seem possible physically to achieve
More informationImpact of Cache Coherence Protocols on the Processing of Network Traffic
Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background
More informationComputer Systems Laboratory Sungkyunkwan University
I/O System Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction (1) I/O devices can be characterized by Behavior: input, output, storage
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationBuilding an All Flash Server What s the big deal? Isn t it all just plug and play?
Building an All Flash Server What s the big deal? Isn t it all just plug and play? Doug Rollins Micron Technology Santa Clara, CA 1 What we ll cover Industry Secrets (shhhhh. ) Example Platform Key features
More informationAerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song
Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song Outline 1. Storage-Class Memory (SCM) 2. Motivation 3. Design of Aerie 4. File System Features
More informationFalcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University
Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput
More informationNVM PCIe Networked Flash Storage
NVM PCIe Networked Flash Storage Peter Onufryk Microsemi Corporation Santa Clara, CA 1 PCI Express (PCIe) Mid-range/High-end Specification defined by PCI-SIG Software compatible with PCI and PCI-X Reliable,
More informationIBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?
FlashSystem Family 2015 IBM FlashSystem IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein? PiRT - Power i Round Table 17 Sep. 2015 Daniel Gysin IBM
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationHard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)
Hard Disk Drives Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Storage Stack in the OS Application Virtual file system Concrete file system Generic block layer Driver Disk drive Build
More informationSystem and Algorithmic Adaptation for Flash
System and Algorithmic Adaptation for Flash The FAWN Perspective David G. Andersen, Vijay Vasudevan, Michael Kaminsky* Amar Phanishayee, Jason Franklin, Iulian Moraru, Lawrence Tan Carnegie Mellon University
More informationStorage. CS 3410 Computer System Organization & Programming
Storage CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Deniz Altinbuke, Kevin Walsh, and Professors Weatherspoon, Bala, Bracy, and
More informationEvolving Solid State Storage. Bob Beauchamp EMC Distinguished Engineer
Evolving Solid State Storage Bob Beauchamp EMC Distinguished Engineer Macro trends Scale-out and multi-core processors Flash hitting its stride but at a cliff? Looming DRAM challenge The rise of low-latency
More informationCapriccio : Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate
More informationBig and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationCold Storage: The Road to Enterprise Ilya Kuznetsov YADRO
Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO Agenda Technical challenge Custom product Growth of aspirations Enterprise requirements Making an enterprise cold storage product 2 Technical Challenge
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationEfficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han
Efficient Memory Mapped File I/O for In-Memory File Systems Jungsik Choi, Jiwon Kim, Hwansoo Han Operations Per Second Storage Latency Close to DRAM SATA/SAS Flash SSD (~00μs) PCIe Flash SSD (~60 μs) D-XPoint
More informationDesign and Architecture of Dell Acceleration Appliances for Database (DAAD): A Practical Approach with High Availability Guaranteed
Design and Architecture of Dell Acceleration Appliances for Database (DAAD): A Practical Approach with High Availability Guaranteed Kai Yu, Yuxiang Gao Dell Global Solutions Engineering Group Peng Zhang,
More informationI/O. Disclaimer: some slides are adopted from book authors slides with permission 1
I/O Disclaimer: some slides are adopted from book authors slides with permission 1 Thrashing Recap A processes is busy swapping pages in and out Memory-mapped I/O map a file on disk onto the memory space
More informationSUPERTALENT ULTRADRIVE GX(SLC)/GX(MLC) PERFORMANCE ANALYSIS WHITEPAPER
SUPERTALENT ULTRADRIVE GX(SLC)/GX(MLC) PERFORMANCE ANALYSIS WHITEPAPER 2.5 SATA-II SOLID STATE DRIVE Copyright Informa on: Copyright, Super Talent Technology. All rights reserved. Rev 1.3 March 2009 Page
More informationData Storage and Query Answering. Data Storage and Disk Structure (2)
Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400
More informationToward SLO Complying SSDs Through OPS Isolation
Toward SLO Complying SSDs Through OPS Isolation October 23, 2015 Hongik University UNIST (Ulsan National Institute of Science & Technology) Sam H. Noh 1 Outline Part 1: FAST 2015 Part 2: Beyond FAST 2
More informationCSCI-UA.0201 Computer Systems Organization Memory Hierarchy
CSCI-UA.0201 Computer Systems Organization Memory Hierarchy Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Programmer s Wish List Memory Private Infinitely large Infinitely fast Non-volatile
More informationOptimizing Quality of Service with SAP HANA on Power Rapid Cold Start
Optimizing Quality of Service with SAP HANA on Power Rapid Cold Start How SAP HANA on Power with Rapid Cold Start helps clients quickly restore business-critical operations Contents 1 About this document
More informationEnhancing SSD Control of NVMe Devices for Hyperscale Applications. Luca Bert - Seagate Chris Petersen - Facebook
Enhancing SSD Control of NVMe Devices for Hyperscale Applications Luca Bert - Seagate Chris Petersen - Facebook Agenda Introduction & overview (Luca) Problem statement & proposed solution (Chris) SSD implication
More informationField Testing Buffer Pool Extension and In-Memory OLTP Features in SQL Server 2014
Field Testing Buffer Pool Extension and In-Memory OLTP Features in SQL Server 2014 Rick Heiges, SQL MVP Sr Solutions Architect Scalability Experts Ross LoForte - SQL Technology Architect - Microsoft Changing
More informationBuilding a High IOPS Flash Array: A Software-Defined Approach
Building a High IOPS Flash Array: A Software-Defined Approach Weafon Tsao Ph.D. VP of R&D Division, AccelStor, Inc. Santa Clara, CA Clarification Myth 1: S High-IOPS SSDs = High-IOPS All-Flash Array SSDs
More informationA Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs?
1 A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs? Bradley C. Kuszmaul MIT CSAIL, & Tokutek 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,,
More informationBlock Device Scheduling. Don Porter CSE 506
Block Device Scheduling Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Kernel RCU File System Networking Sync Memory Management Device Drivers CPU Scheduler
More informationBlock Device Scheduling
Logical Diagram Block Device Scheduling Don Porter CSE 506 Binary Formats RCU Memory Management File System Memory Allocators System Calls Device Drivers Interrupts Net Networking Threads Sync User Kernel
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationLinux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD
Linux Software RAID Level Technique for High Performance Computing by using PCI-Express based SSD Jae Gi Son, Taegyeong Kim, Kuk Jin Jang, *Hyedong Jung Department of Industrial Convergence, Korea Electronics
More informationThe Datacentered Future Greg Huff CTO, LSI Corporation
The Datacentered Future Greg Huff CTO, LSI Corporation 1 Tremendous Growth in Connected Data Sources, Consumption Devices, and Services 2 Nearly limitless data depth and breadth needed Execution of millions
More informationFlash Storage Trends & Ecosystem
Flash Storage Trends & Ecosystem Hung Vuong Qualcomm Inc. Introduction Trends Agenda Wireless Industry Trends Memory & Storage Trends Opportunities Summary Cellular Products Group (CPG) Wireless Handsets
More informationEvaluation of Intel Memory Drive Technology Performance for Scientific Applications
Evaluation of Intel Memory Drive Technology Performance for Scientific Applications Vladimir Mironov, Andrey Kudryavtsev, Yuri Alexeev, Alexander Moskovsky, Igor Kulikov, and Igor Chernykh Introducing
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More information88X + PERFORMANCE GAINS USING IBM DB2 WITH BLU ACCELERATION ON INTEL TECHNOLOGY
05.11.2013 Thomas Kalb 88X + PERFORMANCE GAINS USING IBM DB2 WITH BLU ACCELERATION ON INTEL TECHNOLOGY Copyright 2013 ITGAIN GmbH 1 About ITGAIN Founded as a DB2 Consulting Company into 2001 DB2 Monitor
More informationFlash Controller Solutions in Programmable Technology
Flash Controller Solutions in Programmable Technology David McIntyre Senior Business Unit Manager Computer and Storage Business Unit Altera Corp. dmcintyr@altera.com Flash Memory Summit 2012 Santa Clara,
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationHigh-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han
High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background
More informationEFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES
EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES MICRO 2011 @ Porte Alegre, Brazil Gabriel H. Loh [1] and Mark D. Hill [2][1] December 2011 [1] AMD Research [2] University
More information