Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook

Similar documents
CSE 124: Networked Services Lecture-17

Big Data and Object Storage

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Today s Papers. Array Reliability. RAID Basics (Two optional papers) EECS 262a Advanced Topics in Computer Systems Lecture 3

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Data Storage Infrastructure at Facebook

New Approach to Unstructured Data

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Finding a needle in Haystack: Facebook's photo storage

CS3600 SYSTEMS AND NETWORKS

Top Trends in DBMS & DW

VMware Virtual SAN Technology

ISSUES IN STORAGE OF PHOTOS IN FACEBOOK: REVIEW OF VARIOUS STORAGE TECHNIQUES

HBase Solutions at Facebook

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Automating Information Lifecycle Management with

Warehouse- Scale Computing and the BDAS Stack

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

CISC 7610 Lecture 2b The beginnings of NoSQL

ECE 598 Advanced Operating Systems Lecture 18

5 Fundamental Strategies for Building a Data-centered Data Center

Copyright 2012 EMC Corporation. All rights reserved.

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Effizientes Speichern von Cold-Data

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Open storage architecture for private Oracle database clouds

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

EsgynDB Enterprise 2.0 Platform Reference Architecture

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

AMD Opteron Processors In the Cloud

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Hadoop and HDFS Overview. Madhu Ankam

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Solid State Performance Comparisons: SSD Cache Performance

Storage for HPC, HPDA and Machine Learning (ML)

CS370 Operating Systems

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

SSDs & Memory Why it matters what you choose?

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Advanced Database Technologies NoSQL: Not only SQL

Chapter 11: Implementing File Systems

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Flash Storage Complementing a Data Lake for Real-Time Insight

Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000

The next step in Software-Defined Storage with Virtual SAN

Chapter 12: File System Implementation

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

Automatic-Hot HA for HDFS NameNode Konstantin V Shvachko Ari Flink Timothy Coulter EBay Cisco Aisle Five. November 11, 2011

Introduction to Database Services

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

Driving Data Warehousing with iomemory

File Systems: Fundamentals

Evolving To The Big Data Warehouse

NST6000 UNIFIED HYBRID STORAGE. Performance, Availability and Scale for Any SAN and NAS Workload in Your Environment

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

EMC VSPEX SERVER VIRTUALIZATION SOLUTION

Chapter 10: File System Implementation

File Systems: Fundamentals

StarWind Hyper-Converged Platform: Data Sheet

<Partner Name> <Partner Product> RSA Ready Implementation Guide for. MapR Converged Data Platform 3.1

Digital Preservation in eresearch. Sam Houston Storage Product Manager A/NZ

Next Generation Computing Architectures for Cloud Scale Applications

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

A product by CloudFounders. Wim Provoost Open vstorage

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Outlook. File-System Interface Allocation-Methods Free Space Management

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

10/29/2013. Program Agenda. The Database Trifecta: Simplified Management, Less Capacity, Better Performance

Chapter 12: File System Implementation

OPERATING SYSTEM. Chapter 12: File System Implementation

Distributed Filesystem

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Chapter 12: File System Implementation

Quobyte The Data Center File System QUOBYTE INC.

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

Hadoop/MapReduce Computing Paradigm

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Apache Spark Graph Performance with Memory1. February Page 1 of 13

Planning for the HDP Cluster

At the heart of a new generation of data center infrastructures and appliances. Sept 2017

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

File System Internals. Jo, Heeseung

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

HDFS Design Principles

Week 12: File System Implementation

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

ASN Configuration Best Practices

Transcription:

Efficiency at Scale Sanjeev Kumar Director of Engineering, Facebook International Workshop on Rack-scale Computing, April 2014

Agenda 1 Overview 2 Datacenter Architecture 3 Case Study: Optimizing BLOB Storage system 4 Questions

Facebook Stats 1.15 billion users [ 6/2013 ] ~700 million people use facebook daily 350+ million photos added per day [ 1/2013 ] 240+ billion photos 4.5 billion likes, posts and comments per day [ 5/2013 ]

Facebook Datacenter Regions A large and growing server footprint

Services Provided by Facebook

Salient Points Efficiency matters Complex Software Stack 1000+ specialized services to run A few large services Long tail Custom hardware: cost of designing, validating, fixing Number of machines a service needs can change quickly Many sources of complexity Simplify as much as possible

Agenda 1 Overview 2 Datacenter Architecture 3 Case Study: Optimizing BLOB Storage system 4 Questions

Web 250 racks Cache (~144TB) Front-End Cluster Ads 30 racks Multifeed 9 racks Other small services Service Cluster Back-End Cluster Search Photos Msg Others UDB ADS-DB Tao Leader

Infrastructure Redundancy Regional Datacenter 1 Regional Datacenter 2 FE FE FE FE FE FE SVC SVC SVC SVC BE BE BE BE SVC SVC SVC SVC FE FE FE Regional Datacenter 3 FE FE FE Regional Datacenter 4

Standard Systems Type I Type II Type III Type IV Type V Type VI CPU High 2 x EN2670 Low 1 x 6128HE (AMD) Medium 2 x X5650 Medium 2 x X5650 Low 1 x L5630 High 2 x EN2660 Memory Low 16GB High 144GB High 144GB Medium 48GB Low 18GB High 144GB Disk Low 250GB Low 250GB High IOP 6 x 600GB SAS High +2x1.3TB 12 x 3TB SATA Flash High 12 x 3TB SATA Medium 1TB SATA Services Web, Chat, Ads Memcache, Ads Database Hadoop Photos, Video Multifeed, Search

Standard Systems Type I Type II Type III Type IV Type V Type VI CPU High Low Medium Medium Low High Memory Low High High Medium Low High Disk Low Low High IOPs High High Medium Services Web, Chat, Ads Memcache, Ads Database Hadoop Photos, Video Multifeed, Search

Server Generations Web Servers 2008 2009 2010 2011 2012 Rack Composition L5420 (SC) L5520 (NHM) L5639 (WSM) X5650 (XWSM) EN2670 (SND) Cores / Speed 8 real cores 2.50 GHz 16 logical CPUs (HT) 2.27 GHz 24 logical CPUs (HT) 2.13 GHz 24 logical CPUs (HT) 2.67 GHz 32 Logical CPUs (HT) 2.33GHz RCUs 0.6 1 1.4 1.75 2.41

Agenda 1 Overview 2 Datacenter Architecture 3 Case Study: Optimizing BLOB Storage system 4 Questions

Storage Systems Total Size Storage Technology Bottlenecks Social Graph Single-digit petabytes MySQL & Alternatives Random read IOPS Messages & Time Series Data 10s of petabytes HBase and HDFS Write IOPS & Storage capacity Photos/Videos/BLOBs 100s of petabytes Haystack Storage capacity Data Warehouse 100s of petabytes Hive, HDFS, and Hadoop Storage capacity Cold Storage Exabytes** Custom Storage Capacity

BLOB Storage Storage for Photos, Videos, Attachments, etc. Evolved over many generations Constrained resource shifts and needs to be optimized for Generation 1: Time to Market Generation 2 & 3: Optimize the I/O request rate (Cost) Generation 4: Optimize for Storage Efficiency (Cost)

Generation 1: Commercial Filers New Photos Product NFS Storage First build it the easy way Commercial Storage Tier + HTTP server Each Photo is stored as a separate file Quickly up and running Reliably Store and Serve Photos But: Inefficient Limited by IO rate and not storage density Average 10 IOs to serve each photo Wasted IO to traverse the directory structure

Effective but inefficient Disks are slow: 100 reads per disk per second 1 photo read è 10 disk reads Each disk can serve 10 photos per second A copy of each photo A copy of each photo A copy of each photo A copy of each photo

Generation 2: Gen 1 Optimized Optimization Example: Cache NFS handles to reduce wasted IO operations Reduce the number of IO operations per photo by 3X But: Still expensive: High end storage boxes Still inefficient: Still IO bound and wasting IOs NFS Storage Optimized directory inode owner info size timestamps blocks directory data inode # filename file inode owner info size timestamps blocks data

Generation 3: Haystack [OSDI 10] Custom Solution Commodity Storage Hardware Optimized for 1 IO operation per request File system on top of a file system Compact Index in memory Metadata and data laid out contiguously Efficient from IO perspective But: Problem has changed now Superblock Needle 1 Needle 2 Needle 3 Magic No Key Flags Photo Checksum Single Disk IO to read/write a photo

Photo lifecycle Hot Relevance Time

Photo lifecycle Relevance Warm Time

Photo lifecycle Relevance Cold Time

Access patterns 100% 90% 95% 98% Photo Traffic % 80% 70% 60% 50% 40% 30% 20% 20% 82% 82% traffic 8% photo takes 50 % 85% Photo Volume % 10% 0% 0 4% 8% 4 8 12 16 20 24 Photo Age (Months)

Generation 4: Tiered Storage Different storage solutions for hot, warm, cold photos Hot è Haystack Warm è F4 Storage System (+ Cold Storage) Relevance Time

Agenda 1 Overview 2 Datacenter Architecture 3 Case Study: Optimizing BLOB Storage system 4 Questions

(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0