Finding a needle in Haystack: Facebook's photo storage

Similar documents
Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

CSE 124: Networked Services Lecture-17

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Today s Papers. Array Reliability. RAID Basics (Two optional papers) EECS 262a Advanced Topics in Computer Systems Lecture 3

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

CA485 Ray Walshe Google File System

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Map Reduce. Yerevan.

Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Decentralized Distributed Storage System for Big Data

CSE 124: Networked Services Fall 2009 Lecture-19

Embedded Technosolutions

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

15-440/15-640: Homework 3 Due: November 8, :59pm

Volley: Automated Data Placement for Geo-Distributed Cloud Services

CSE 124: Networked Services Lecture-16

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Load Dynamix Enterprise 5.2

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Virtual Memory. Chapter 8

From Internet Data Centers to Data Centers in the Cloud

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

CLOUD-SCALE INFORMATION RETRIEVAL

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Squirrel case-study. Decentralized peer-to-peer web cache. Traditional centralized web cache. Based on the Pastry peer-to-peer middleware system

Distributed File Systems II

416 Distributed Systems. March 23, 2018 CDNs

The Google File System (GFS)

HTRC Data API Performance Study

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Topics in P2P Networked Systems

Website Designs Australia

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

SaaS Providers. ThousandEyes for. Summary

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

CSE 5306 Distributed Systems

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop

Current Topics in OS Research. So, what s hot?

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Data Storage Infrastructure at Facebook

Active Archive and the State of the Industry

How Akamai delivers your packets - the insight. Christian Kaufmann SwiNOG #21 11th Nov 2010

Outline. Spanner Mo/va/on. Tom Anderson

Application-Oriented Storage Resource Management

How Facebook knows exactly what turns you on

Chapter The LRU* WWW proxy cache document replacement algorithm

OPERATING SYSTEM. Chapter 12: File System Implementation

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment

Operating System Concepts Ch. 11: File System Implementation

Google File System. By Dinesh Amatya

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Performance of relational database management

Multimedia Streaming. Mike Zink

The Google File System

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Survey: Users Share Their Storage Performance Needs. Jim Handy, Objective Analysis Thomas Coughlin, PhD, Coughlin Associates

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Tools for Social Networking Infrastructures

GFS: The Google File System

SoftNAS Cloud Performance Evaluation on AWS

vsan 6.6 Performance Improvements First Published On: Last Updated On:

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

Simplifying Collaboration in the Cloud

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

The Google File System

ISSUES IN STORAGE OF PHOTOS IN FACEBOOK: REVIEW OF VARIOUS STORAGE TECHNIQUES

Server monitoring for Tor exit nodes

Popularity Prediction of Facebook Videos for Higher Quality Streaming

Sharding & CDNs. CS 475, Spring 2018 Concurrent & Distributed Systems

Correlation based File Prefetching Approach for Hadoop

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

Strategic Briefing Paper Big Data

The Ultimate YouTube SEO Guide: Tips & Tricks on How to Increase Views and Rankings for your Online Videos

GFS: The Google File System. Dr. Yingwu Zhu

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

Top Trends in DBMS & DW

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

CLOUD-SCALE FILE SYSTEMS

Chapter 11: Implementing File Systems

CS 425 / ECE 428 Distributed Systems Fall 2015

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

The Google File System

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

The Google File System

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Transcription:

Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total, 60 terabytes per week), an efficient and high performing system is needed. The main point of the system is to create an alternative to conventional file system that performs better under the huge workload facebook has. The results are presented in form of an evaluation of the system. The effectiveness is just presented and not compared to alternatives, which gives less insight if you are not very familiar with the context. The evaluation shows the efficiency of the directory and cache of haystack, as well as the performance of the storage using synthetic and production workloads. The motivation of the paper is to present the system which facebook uses for photo storage. The tool is not open sourced, so it is not a paper for offering the tool per say, but give insight to how they handle huge amount of photos in an efficient way. There are open tools available that was based on the work done in this paper. The paper focuses on Facebook s photo serving stack and the effectiveness of the caching. The research was done by tracking over 77 M requests from more than 1 million unique photos. Elements that were studies includes traffic patterns, cache access patterns, geolocation of clients and servers, and correlation between properties and the content. The research was done over a month long period. One of the most important results of the paper was that it was able to point to elements relevant for future investigation. It points to caching options like geographic collaborative caching, and a possibility to adopt S4LRU eviction algorithms at Edge and Origin layers. Also the option to increasing browser cache size for active clients to improve client performance, and enabling local photo resizing for less active clients. There is also points to future work like the placement of resizing functions in the stack, and the design of better caching algorithms. The motivation is to present how Facebook handles caching with the huge amount of workload they have in their service. Since Facebook is such a big service, an analysis from this company can be of great use for other, based on the big amount of user how much data is in use at a time. Review of: Photo caching on Facebook happens on several layers. the browser cache, edge cache and origin cache. Images can be served from two stacks. Akamai and Facebook. The paper focuses on the latter. The Facebook photo serving infrastructure were instrumented and data were gathered over one month. By gathering data from the web browser, edge- and origin cache, the researches were able to follow requests through the entire stack. The analysis shows that 65.5% of all traffic were served from the browser cache, as the browser cache were the closest one to the client, the hit rate were also 65.5%. Edge caches had 20% of the traffic and a 58% hit rate. Origin caches had 4.6% traffic and 31.8% hit rate. The backend servers had 9.9% of the traffic. Through simulation, the paper identified S4LRU as a better alternative to the FIFO eviction algorithm used on edge and origin caches. For instance, an 8.5% improvement on the edge cache and 13.9% on the origin cache.

Further improvements of the photo cache involves research on improving cache eviction policy algorithms, and improvements on where to place the photo resize functionality. Review of: Finding a needle in Haystack: Facebook's photo storage With over 60 terrabytes of photos uploaded a week, and a million photos served each second at peak times, Facebook saw the need for a better approach. The solution were Heystack, a storage system designed to do metadata lookup in memory, reducing disk operations used to only read actual data. The paper defines some key requirements for Haystack, including high throughput and low latency, fault-tolerant, cost-effective and simple. The Haystack system consists of three sub-components. The Directory, Cache and Store. The directory is responsible for mapping of volumes, the cache caches photos recently written and that is not requested from a CDN, and store is responsible for reading, writing and deleting files from Haystack. Files written to volumes in Haystack is appended to logical volumes with an offset to further reduce file metadata in memory in to speed up the read process. The paper claims that Haystack, compared to the earlier NFS based system, reduced the cost of each usable terabyte with roughly 28%, and were able to process roughly x4 time more reads pr second. This paper presents the design and implementation of Haystack (an object storage system) for storing Facebook s photo. Facebook currently stores over 260 billion images that means Facebook is the biggest photo sharing website in the world. They designed Haystack to serve the long tail of requests seen by sharing photos in a large social network with less expensive and higher performing solution than the previous approach. They believes, Haystack can provides a fault-tolerant and simple solution to photo storage, which is incrementally scalable, a necessary quality as now a days users upload hundreds of millions of photos each week. This paper explores the dynamics of the full Facebook photo-serving stack, from the client browser to Facebook s Haystack storage server, looking effectiveness of the many layers of caching it employs. They present an overview of the Facebook photo servingstack, high- lighting their instrumentation points. By gathering a month-long trace, they instrumented Facebook s photo-serving infrastructure and then using batch processing they analyzed that trace. This analysis examines more than 70 TB of data, all corresponding to client-initiated requests. They also explore traffic between clients and Edge Caches, how traffic is routed between the Edge Caches and Origin Cache, and how Backend requests are routed. Using simulation they evaluate the effect of different cache sizes, algorithms, and strategies. Two properties (1.The age of photos, 2.the number of Facebook followers associated with the owner) should be strongly associated with photo traffic. This is the first paper, which is systematically instrument and analyzes a realworld workload at the scale of Facebook, and to successfully trace such a high volume of events throughout a massively distributed stack. This paper describes Haystack, an object storage system designed for Facebook s Photos application. They designed Haystack to serve the long tail of requests seen by sharing photos in a large social network. The key insight is to avoid disk operations when accessing metadata. Haystack provides a fault-tolerant and simple solution to photo storage at dramatically less cost and higher throughput than

a traditional approach using NAS appliances. Furthermore, Haystack is incrementally scalable, a necessary quality as users upload hundreds of millions of photos each week. In this paper they instrumented the entire Facebook photo-serving stack obtaining traces representative of Facebook s full workload. There are some valuable findings,including the workload pattern, traffic distribution and geographic system dynamics, yielding insights that should be helpful to future system designers. They also identified an opportunity to improve client performance by increasing browser cache sizes for very active clients and by enabling local photo resizing for less active clients. Finding a needle in Haystack: Facebook's photo storage This paper describes Facebook s photo storage system called Haystack. First it describes the goals and motivations behind the creation of such a system. Among the main goals was to reduce latency by reducing disk operations and one of the challenges was to deal with the long-tail (older photos). The Haystack has three main components: Store, Directory and Cache. The Store keeps all the actual data in the form of huge files. The Directory keeps metadata about the images and the Cache functions as an internal cache, which shelters the Store from most frequent requests. Next the authors describe single details, such as insertions, deletions, mapping of image locations and recovery from failures. Finally, they evaluate the whole system and show some trends about users requesting photos (the most frequent image size is small and images are the most active meaning they are modified or deleted a short time after their insertion). They close with a list of related work and compare it to Haystack. Analysis of Facebook Photo Caching The second paper examines the workload of mentioned Haystack. It focuses on the different layers of caching Facebook uses (browser cache, Edge cache, Origin cache, Haystack) and how are they utilized. The main goal of caching is traffic sheltering. They also shortly mention the process of gathering data. Next, it is revealed how the popularity of a photo affects the hit ratio of single layers less popular ones have higher local cache hit rate and more popular ones have higher shared cache hit rate. It is also described how the traffic is distributed geographically. At the end, the authors also present a few improvements for the system (better geographical caching, local photo resizing). The function of sharing photos is necessary for a social website. Many famous social websites like Facebook, LinkedIn and Instagram all provide this service. According to the data from Facebook, users have uploaded more than 65 million photos. Due to the huge volume, the traditional photo storage systems that are based on the filesystem cannot support the service very well. This paper tries to give a new vision depending on the trace from Facebook. It first introduces the photo storage system Haystack used in Facebook and then gives the performance analysis. There are some interesting features of Haystack. For reduction of disk operations, Haystack stores multiple photos in a single file to reduce the memory used for filesystem metadata. The Haystack directory and cache is also specially designed for full cache use. A structure called needle is utilized in the system to map logical space in memory to physical volume. It supports photo read, write and delete. Even it helps to the recovery of failure or reboot. Before the performance analysis of Haystack, the paper introduces the characters of the photo requests. New Photos can generate more requests than aged photos and requests for small images account for

84.4%. The experiment results of evaluating Haystack show the system has a good performance in storing photos. For example, read only operation can achieve delivering 85% of the raw throughput of the device with the 17% higher latency. It also shows Haystack is suitable to be used in Facebook as write operations are always multi-writes. This paper provides a detail description of Haystack and classifies the data features that can guide other researcher to do further research. This paper has a good structure and clearly states the challenges and motivation, which is quite good for readers. An analysis of Facebook photo caching The most common way to improve the request speed is to add the cache. However, there is no method of caching which can be used in all applications. Facebook, the biggest photos sharing website, has to deal with million photos every day and needs its own caching system. This paper tries to reduce the mystery of the photo caching system in Facebook. It introduces the entire Internet image-serving infrastructure in Facebook and considers many aspects including the relationship between browser caches, edge caches and the origin cache and backend servers, popularity distribution and geographic traffic distribution and possible improvements. Depending on the analysis, the paper also shows us many interesting discoveries. First, browser caches, edge caches and the origin cache handle an aggregated 90% of requests. Second, popularity distribution follows a Zipfian distribution although Haystack has a comparatively smaller Zipfcoefficient. Third, content is often served across a large distance rather than locally. For example, the traffic in Miami was distributed among several edge caches, with 50% handled in San Jose, Palo alto and LA and 24% in Miami. Fourth, geographic-scale collaborative caching at edge server and advanced eviction algorithms are two possible ways to improve cache hit. The former one is able to promote 17% and the late is 21.9%. Fifth, content popularity rapidly drops with age following Pareto distribution and is conditionally dependent on the owner s social connectivity. Two problems are left as future research areas in this paper. One is the placement of resizing functionality along the stack. Another one is designing better caching algorithms. It is good to see the paper gives a figure describing the structure of whole caching system and methods used to collect data and sample data. However, some data could be explained further. For example, there is a sentence in introduction, the most popular 0.03% of content, cache hit rates neared 100%. This sentence should be explained or reclaimed in the corresponding section with proper figure. This paper studies the workload and effectiveness of Facebook s multi-layer photo catching stack. The hierarchical storage systems consist of browser cache, edge cache, origin cache and backend storage. Facebook also uses Akamai for additional caching. The whole storage system spreads across several data centers. The authors captured over 77M photo requests from 13.2M user browsers for more than 1.3M unique photos. Of the 77M requests, 65.5% are satisfied by browser caches, 20% by edge caches, 4.6% by the origin cache, and 9.9% by the backend storage. Photo popularity distributions are approximately Zipfian. The plots show that more than 89% of requests for the hundred thousand most popular images can be served by browser and edge caches. Although every edge cache receives a majority of its requests from nearby cities, the largest share does not necessarily go to the nearest neighbor. This is because Facebook s routing policy is based on a combination of latency and peering cost. If a photo is not found in edge cache, consistent hashing is used to locate an origin cache. Most of the time, an

origin cache will retrieve the photo from the backend storage within the same data center, but sometimes it also retrieves photos from other data centers because of misdirected resizing traffic and failed local fetch. The authors also examined the potential performance improvement brought by other caching algorithms and increased cache size. The results show that the Clairvoyant and S4LRU algorithms are the most effective algorithms. They can improve the hit ratio to a large extent or reduce the cache requirements while remaining the same hit ratios. The analysis also shows that new content will draw attention and account for the majority of traffic. This paper presents the design, implementation and evaluation of Facebook s Haystack -- a new backend storage system. Facebook is the biggest photo sharing website in the world. It hosts 65 billion photos which translates to over 20 PB of data. Haystack is tailored for written once, read often, never modified, and rarely deleted Facebook photos. Previous NFS based storage system involves 3 to 10 times of disk operation which caused significant delay. Haystack is to reduce disk operation to once only. Facebook uses a CDN to serve popular images and uses Haystack for unpopular images. Haystack consists of three components: Haystack directory, Haystack cache, and Haystack store. The Haystack directory provides mapping from logical to physical volumes, load balances writes among logical volumes, determines whether uses CDN or Cache to handle requests, and identifies read only volumes. Haystack cache is organized into a DHT and caches those images that cannot be found in CDNs. Haystack store maintains physical volumes. Each volume contains a superblock and several needles (images). An index file is created in the memory to store similar contents as a physical volume except the image content to speed up image retrieval. The file system used is XFS. Evaluation shows that the directory s hashing policy can distribute read and writes very well. Cache achieves around 80% hit rate. Reads are more frequent than writes and deletes. The latency of multiwrite is fairly low and stable. The read latency on read only machines is stable and lower than that of write enabled machines.