FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

Similar documents
System and Algorithmic Adaptation for Flash

FAWN: A Fast Array of Wimpy Nodes

From architecture to algorithms: Lessons from the FAWN project

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017

FAWN: A Fast Array of Wimpy Nodes

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

FAWN: A Fast Array of Wimpy Nodes By David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan

SILT: A MEMORY-EFFICIENT, HIGH-PERFORMANCE KEY- VALUE STORE PRESENTED BY PRIYA SRIDHAR

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems- The Bufferhash KV Store

Near Memory Key/Value Lookup Acceleration MemSys 2017

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

Solid Access Technologies, LLC

Architecture of a Real-Time Operational DBMS

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Next-Generation Cloud Platform

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Multimedia Streaming. Mike Zink

Computer Systems Laboratory Sungkyunkwan University

Memory-Based Cloud Architectures

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

CSE 124: Networked Services Lecture-17

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

CS3600 SYSTEMS AND NETWORKS

CSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018

Performance Analysis in the Real World of Online Services

Ultra-Low Latency Down to Microseconds SSDs Make It. Possible

Tools for Social Networking Infrastructures

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

PC-based data acquisition II

Reduced and Alternative Energy for Cloud and Telephony Applications

Main Memory and the CPU Cache

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

The Google File System

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

MEMORY. Objectives. L10 Memory

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

BlueDBM: An Appliance for Big Data Analytics*

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Optimizing Flash-based Key-value Cache Systems

CA485 Ray Walshe Google File System

Main Memory (RAM) Organisation

Was ist dran an einer spezialisierten Data Warehousing platform?

The Google File System

Distributed Data Store

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Transistor: Digital Building Blocks

Google File System. Arun Sundaram Operating Systems

There Is More Consensus in Egalitarian Parliaments

Three Paths to Better Business Decisions

CS Project Report

Warehouse-Scale Computing

Virtualization of the MS Exchange Server Environment

Samsung SSD PM863 and SM863 for Data Centers. Groundbreaking SSDs that raise the bar on satisfying big data demands

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

The Fusion Distributed File System

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

AD910A M.2 (NGFF) to SATA III Converter Card

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Kathleen Durant PhD Northeastern University CS Indexes

The Google File System

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

The Google File System (GFS)

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Minerva. Performance & Burn In Test Rev AD903A/AD903D Converter Card. Table of Contents. 1. Overview

Dept. Of Computer Science, Colorado State University

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

CS370 Operating Systems

Chapter 12: Query Processing

Executive Briefing. All SAN storage

AMD Opteron Processors In the Cloud

Scaling Distributed Machine Learning with the Parameter Server

Comparing Performance of Solid State Devices and Mechanical Disks

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Using Transparent Compression to Improve SSD-based I/O Caches

Google is Really Different.

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage

Main-Memory Databases 1 / 25

The BioHPC Nucleus Cluster & Future Developments

Optimizing the Data Center with an End to End Solutions Approach

MINERVA. Performance & Burn In Test Rev AD912A Interposer Card. Table of Contents. 1. Overview

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Transcription:

FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh

Energy in computing Power is a significant burden on computing 3-year TCO soon to be dominated by power Hydroelectric Dam 2

3

Google s power consumption... would incur an annual electricity bill of nearly $38 million [Qureshi:sigcomm09] Energy consumption by data centers could nearly double... (by 2011) to more than 100 billion kwh, representing a $7.4 billion annual electricity cost [EPA Report 2007] Annual cost of energy for Google, Amazon, Microsoft = Annual cost of all first-year CS PhD Students 4

Can we reduce energy use by a factor of ten? Still serve the same workloads Avoid increasing capital cost 5

FAWN Fast Array of Wimpy Nodes Improve computational efficiency of data-intensive computing using an array of well-balanced low-power systems. 34-5"6"78-, 9&4:&4 +;1< ()* %&' +,-#. ()* ()* ()* %&' +,-#. ()* %&' +,-#. ()* ()*!"#$ %&' ()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#. ()* %&' +,-#. AMD Geode 256MB DRAM 4GB CompactFlash /001 201 6

Goal: reduce peak power Traditional Datacenter FAWN 100% Power 1000W 100W Cooling Servers (good){ Distribution 20% 20% energy loss 750W <100W 7

Overview Background FAWN Principles FAWN-KV Design Evaluation Conclusion 8

Towards balanced systems 1E+08 1E+07 1E+06 1E+05 Disk Seek Wasted Rebalancing Options Nanoseconds 1E+04 1E+03 1E+02 1E+01 resources DRAM Access 1E+00 CPU Cycle 1E-01 1980 1985 1990 1995 2000 2005 Year Today s CPUs Array of Fastest Disks Slower CPUs Fast Storage Slow CPUs Today s Disks 9

Targeting the sweet-spot in efficiency 2500 Speed vs. Efficiency Fastest processors exhibit superlinear power usage Instructions/sec/W in millions 2000 1500 1000 500 Custom ARM Mote XScale 800Mhz Atom Z500 Xeon7350 Fixed power costs can dominate efficiency for slow processors FAWN targets sweet spot in system efficiency when including fixed costs 0 1 10 100 1000 10000 100000 Instructions/sec in millions (Includes 0.1W power overhead) 10

Targeting the sweet-spot in efficiency Instructions/sec/W in millions FAWN 0 500 1000 1500 2000 2500 Today s CPU Array of Fastest Disks Slower CPU Fast Storage Slow CPU Today s Disk 11 Instructions/sec in millions 1 10 100 1000 10000 100000 XScale 800Mhz Custom ARM Mote Atom Z500 Xeon7350 More efficient

Overview Background FAWN Principles FAWN-KV Design Architecture Constraints Evaluation Conclusion 12

Data-intensive Key Value Critical infrastructure service Service level agreements for performance/latency Random-access, read-mostly, hard to cache 13

FAWN-KV: Our Key Value Proposition Energy-efficient cluster key-value store Goal: improve Queries/Joule Prototype: Alix3c2 nodes with flash storage 500MHz CPU, 256MB DRAM, 4GB CompactFlash 14

FAWN-KV: Our Key Value Proposition Unique Challenges: Energy-efficient cluster key-value store Efficient and fast failover Goal: improve Queries/Joule Wimpy CPUs, limited DRAM Flash poor at small random writes Prototype: Alix3c2 nodes with flash storage 500MHz CPU, 256MB DRAM, 4GB CompactFlash 15

FAWN-KV Architecture Manages Backends Back-end Acts as Gateway Routes Requests Back-end FAWN Back-end FAWN-DS Front-end X Back-end Back-end KV Ring Back-end Consistent hashing 16

FAWN-KV Architecture Back-end FAWN Back-end FAWN-DS Front-end X Back-end Back-end Back-end Back-end FAWN-DS FAWN-KV Limited Resources Avoid random writes Efficient Failover Avoid random writes 17

From key to value KeyFrag!= Key Potential collisions! 160-bit key Log Entry Key Len Data Low probability of multiple Flash reads Hashtable Hash Index Data region KeyFrag Valid. { 12 bytes per entry Offset (a) FAWN-DS FAWN-KV Limited Resources Avoid random writes Efficient Failover Avoid random writes 18

Log-structured Datastore Log-structuring avoids small random writes Get Put Delete Random Read Append FAWN-DS FAWN-KV Limited Resources Avoid random writes Efficient Failover Avoid random writes 19

On a node addition Hash Index Values H A (H,B] G B F C D Node additions, failures require transfer of key-ranges 20

Nodes stream data range Data in original range Data in new range Stream from B to A Concurrent Inserts, B Minimizes locking Compact Datastore A Background operations sequential Continue to meet SLA FAWN-DS FAWN-KV Limited Resources Avoid random writes Efficient Failover Avoid random writes 21

FAWN-KV Take-aways Log-structured datastore Avoids random writes at all levels Minimizes locking during failover Careful resource use but high performing Replication and strong consistency Variant of chain replication (see paper) 22

Overview Background FAWN principles FAWN-KV Design Evaluation Conclusion 23

Evaluation Roadmap Key-value lookup efficiency comparison Impact of background operations TCO analysis for random read workloads 24

FAWN-DS Lookups System QPS Watts QPS Watt Alix3c2/Sandisk(CF) 1298 3.75 346 Desktop/Mobi (SSD) 4289 83 51.7 MacbookPro / HD 66 29 2.3 Desktop / HD 171 87 1.96 Our FAWN-based system over 6x more efficient than 2008-era traditional systems 25

Impact of background ops 1600 1600 Queries per second 1200 800 400 Queries per second 1200 800 400 0 Peak Compact Split Merge 0 Peak Compact Split Merge Peak query load 30% of peak query load Background operations have: Moderate impact at peak load Negligible impact at 30% load 26

When to use FAWN for random access workloads? TCO = Capital Cost + Power Cost ($0.10/kWh) Traditional (200W) FAWN (10W each) Five 2 TB disks 160GB PCI-e Flash SSD 64GB FBDIMM per node 2 TB disk 64GB SATA Flash SSD 2GB DRAM per node ~$2000-8000 per node ~$250-500 per node 27

Architecture with lowest TCO for random access workloads %&'&()'!*+,)!+-!./!$""""!$"""!$""!$"!$ 0.12*+*,%34 0.12*+*0)#35!"#$%&%'(#)*+*,-./ 0.12*+*,-./!"#$!"#$!$!$"!$""!$""" 01)23!4&')!56+77+8-(9():; Ratio of query rate to dataset size determines storage technology Graph ignores management, cooling, networking... FAWN-based systems can provide lower cost per {GB, QueryRate} 28

Conclusion FAWN architecture reduces energy consumption of cluster computing FAWN-KV addresses challenges of wimpy nodes for key value storage Log-structured, memory efficient datastore Efficient replication and failover Meets energy efficiency and performance goals Each decimal order of magnitude increase in parallelism requires a major redesign and rewrite of parallel code - Kathy Yelick 29