Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Similar documents
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Onyx: A Prototype Phase-Change Memory Storage Array

Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories

Near- Data Computa.on: It s Not (Just) About Performance

Exploring System Challenges of Ultra-Low Latency Solid State Drives

I/O. Disclaimer: some slides are adopted from book authors slides with permission 1

NAND Interleaving & Performance

Onyx: A Protoype Phase Change Memory Storage Array

Providing Safe, User Space Access to Fast, Solid State Disks

Enhancing SSD Control of NVMe Devices for Hyperscale Applications. Luca Bert - Seagate Chris Petersen - Facebook

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Next Generation Architecture for NVM Express SSD

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

VSSIM: Virtual Machine based SSD Simulator

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Building a High IOPS Flash Array: A Software-Defined Approach

I/O Systems. Jo, Heeseung

Enabling NVMe I/O Scale

DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture

Dongjun Shin Samsung Electronics

The Transition to PCI Express* for Client SSDs

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Toward SLO Complying SSDs Through OPS Isolation

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet

Linux Device Drivers: Case Study of a Storage Controller. Manolis Marazakis FORTH-ICS (CARV)

THE IN-PLACE WORKING STORAGE TIER OPPORTUNITIES FOR SOFTWARE INNOVATORS KEN GIBSON, INTEL, DIRECTOR MEMORY SW ARCHITECTURE

Hybrid Memory Platform

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

Roadmap. CPU management. Memory management. Disk management. Other topics. Process, thread, synchronization, scheduling. Virtual memory, demand paging

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song

Designing a True Direct-Access File System with DevFS

Quiz: Address Translation

PCIe Storage Beyond SSDs

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

Linux Storage System Analysis for e.mmc With Command Queuing

NVM PCIe Networked Flash Storage

Software and Tools for HPE s The Machine Project

The Non-Volatile Memory Verbs Provider (NVP): Using the OFED Framework to access solid state storage

Presented by: Nafiseh Mahmoudi Spring 2017

Linux Storage System Bottleneck Exploration

Extending the NVMHCI Standard to Enterprise

Comparing Performance of Solid State Devices and Mechanical Disks

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

FPGA Implementation of Erasure Codes in NVMe based JBOFs

Windows Support for PM. Tom Talpey, Microsoft

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Architecture Exploration of High-Performance PCs with a Solid-State Disk

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

Toward a Memory-centric Architecture

System and Algorithmic Adaptation for Flash

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency

A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NVMe: The Protocol for Future SSDs

QuickRec: Prototyping an Intel Architecture Extension for Record and Replay of Multithreaded Programs

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

System Architecture Directions for Networked Sensors[1]

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Linux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD

High-Speed NAND Flash

Hard Disk Drives (HDDs)

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Using NVDIMM under KVM. Applications of persistent memory in virtualization

Windows Support for PM. Tom Talpey, Microsoft

Solid State Storage Technologies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Using Transparent Compression to Improve SSD-based I/O Caches

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition

Deep Learning Performance and Cost Evaluation

Improving throughput for small disk requests with proximal I/O

Efficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han

March NVM Solutions Group

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Near-Data Processing for Differentiable Machine Learning Models

CIS Operating Systems I/O Systems & Secondary Storage. Professor Qiang Zeng Fall 2017

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Leveraging HyperTransport for a custom high-performance cluster network

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

FC-NVMe. NVMe over Fabrics. Fibre Channel the most trusted fabric can transport NVMe natively. White Paper

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service

Flash Memory Summit Persistent Memory - NVDIMMs

1. Creates the illusion of an address space much larger than the physical memory

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

5 Solutions. Solution a. no solution provided. b. no solution provided

NVMe From The Server Perspective

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

High Performance Packet Processing with FlexNIC

Hardware NVMe implementation on cache and storage systems

OSSD: A Case for Object-based Solid State Drives

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

2 nd Half. Memory management Disk management Network and Security Virtual machine

Transcription:

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010

NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed increased performance Emerging NVM (Phase Change Memory) has more potential Legacy of disk-based storage systems inhibit the potential This legacy assumed that storage is slow H/W interface are sluggish SATA and SAS are slow for faster NVM S/W limits the performance Overhead of I/O stack is large Under Linux, 20,000 instructions for 4KB I/O issue and complete

Architecting a High Performance SSD This is only for showing the importance of SW opt. under NVM-based SSD. Actual latency breakdown should show and explain More and more. This paper talks about architecting high-performance SSD, not NVM optimization (1) SW layers limiting SSD performance should be optimized (2) HW/SW interface is critical to good performance Careful co-design provides significant benefits Moneta prototype allows us to explore design space

Moneta Prototype HOST Four memory controllers PCMs spread across four Connected by ring network A scheduler Coordinates data transfers Interface to the host Buffers for in/out data Several state machines Interface to ring network Communicate with host x8 PCIe 1.1 (2GB/s/dir) Programmed IO (PIO) for I/O request DMA controller for data

Moneta Implementation Four Xilinx Virtex 5 FPGAs Each implement memory controller Ring networks runs over FPGA-to-FPGA links One also implements Moneta scheduler and PCIe interface DDR2 DRAM emulates PCMs (modeled) Projected latency: read (48ns) and write (150ns) Same as 64-bit interface

I/O Request through Moneta 5 7 1,2 4 3 6 (1) From host, a request arrives at request queue as PIO write (2) A request = three 64-bit info (sector addr., Data mem. addr., transfer length, tag, op-code, ) (3) Scheduler allocates buffer for the request s DMA data transfer (4) (W) Scheduler triggers DMA transfer from host s mem. to buffer (5) Scheduler checks scoreboard to determines where memory controller has space to receive data (6) Once completed, it raises an interrupt and sets a tag status bit (7) OS processes interrupt and completes requests by reading and clearing the status bit using PIO operations

Baseline Performance 4KB I/O request latency (21us = HW 8us + SW 13us) Bandwidth of random read under varying access size Moneta baseline shows good performance, but room for smaller request (need more optimization, particularly, SW)

Removing I/O Scheduler Linux IO scheduler tries to sort and merge requests Latency reduction for disks with non-uniform access latencies For Moneta, it just adds software overhead Moneta+NoSched bypasses the scheduler completely For 4KB requests, -2us latency & x4 bandwidth

Atomic Operations Baseline Moneta interface requires several PIO write to issue an I/O request Three 64-bit writes for all information Moneta+Atomic reduces this to only one A request to fit into 64 bits 8 for tag, 8 for command, 16 for length, 32 for address DMA address field is completely removed To specify DMA address, static DMA address is pre-allocated for each tag Additionally, concurrency increased due to no need for locks to protect request issue

Atomic Operations Moneta+Atomic over Moneta+NoSched Latency reduction (-1 us) Bandwidth increase (460MB/s) Big gain in bandwidth from the increased concurrency

Adding Spin-waits -6us +250MB/s Interrupt for request completion requires a context switch To wake up the thread that issued the request This adds latency for small requests Moneta+Spin allows tread to spin in a loop rather than sleep Spin-waits for small request, but interrupt for large requests

Balancing BW Usage Moneta+Spin nearly saturates for read-/write-only workloads Mixed of 50% reads and 50% writes Ideal BW doubles (2GB/s 4GB/s) PCIe link provides full duplex (2GB/s for each direction) Moneta+Spin is still far from ideal performance

Balancing BW Usage In a single queue, read and write block each other Moneta+2Q maintains a Q for read and another Q for write Scheduler processes two queues in RR order, so data DMA transfer can be done in both direction simultaneously

Evaluations FusionIO: FusionIO SSD, 80GB, x4 PCIe SSD: Intel X-25E SSD, 128GB, SATA2, SW RAID-0 Disk: Disk, 4TB, x4 PCIe, HW RAID-0 Moneta: +Spin, x8 PCIe (2GB/s) Moneta-4x: +Spin, x4 PCIe (1GB/s) Moneta outperforms due to faster x8 PCIe link for larger req.

Conclusions Faster NVMs are coming Moneta prototype is implemented to understand NVMbased storage system Both SW and HW optimization opportunities are investigated