Finding a Needle in a Haystack Facebook s Photo Storage Jack Hartner
Paper Outline Introduction Background & Previous Design Design & Implementation Evaluation Related Work Conclusion
Facebook Photo Storage Needs 260 billion images stored 20 petabytes of data stored 1 billion photos / 60 terabytes added per week 1 million images served per second at peak times
A Look at Use Cases Metric Typical Distributed File System Facebook Photo Storage File Size Varied Small, Constant Data is written Once Once Data is read Often Often Data is modified Often Never Data is deleted Sometimes Rarely
The Old Way A Traditional POSIX System High cost of directory navigation on disk root Large metadata & directory navigation require disk operations, so they become the throughput bottleneck dir1 dir2 dir3 usr1 usr2 usr3 High cost of accessing metadata on disk Q: In our experience, we find that the disadvantages of a traditional POSIX based filesystem are directories and per file metadata. Explain how this disadvantage becomes the limiting factor for the read throughput.
The New Way Shrink Metadata & Eliminate Directories Minimal navigation cost root Objective: One disk operation per read Minimal metadata cost RAM M Metadata Photo Data Photo Data All metadata can be cached in RAM
Keeping All Metadata in Main Memory RAM Yes, it is possible to cache metadata for only the most popular files, which works fine for most systems BUT Facebook sees a large number of requests for less popular or older content, known as the long tail. SO Long tail requests will probably miss both the CDN and this RAM cache, thus there is no real gain in performance. Q: We accomplish this by keeping all metadata in main memory,. Why did keeping metadata in memory become a challenge in Facebook s system? Is it possible just to keep metadata of the most popular files in memory and to achieve the objective ( at most one disk operation per read ) by exploiting access locality?
Implementation of Haystack NAS Haystack Directory Haystack Store Web Server Photo Store Server Web Server Haystack Cache Browser CDN Browser CDN Old Way New Way
Goal #1 High Throughput and Low Latency Photos must be served quickly for good user experience Users should not be able to sense different performance for old and new photos Goal of one disk operation per read Possible because metadata is drastically reduced and kept entirely in memory Requests that exceed capacity can t be ignored, must be handled by CDN Expensive and limited by diminishing returns
Goal #2 Fault Tolerant Large scale systems have failures every day Similar to GFS failures are the norm Users must have 24/7 availability of photos regardless of failures Made possible through replication of data in geographically distinct locations
Goal #3 Cost Effective Higher Performance and Lower Cost than NFS / POSIX Quantified by two metrics: Cost per terabyte of usable storage (~28% less) Normalized read rate per terabyte of usable storage (~4x read rate)
Goal #4 Simple Easy to implement (deployable in months instead of years) Easy to maintain
Deploying Haystack Quickly Haystack Store Volume This structure can be implemented with an already existing file system to speed development root Haystack Stores utilize XFS, a robust file system commonly used in UNIX systems. One Large File Occupies Entire Physical Volume Q: That simplicity let us build and deploy a working system in a few months instead of a few years. Comment on this statement (why can Haystack be considered as simple adaptation of Unix file systems?)
Why Not GFS? Main use cases for Facebook storage: Development Data Log Files Photos GFS GFS GFS did not have the correct RAM-to-disk ratio to store all photo metadata in memory Long tail access becomes a problem when partial metadata is cached GFS is best suited for small numbers of large files, not a large number of very small files! Q:.. we explored whether it would be useful to build a system similar to GFS. Comment on the statement. Why does Serving photo requests in the long tail represents a problem on GFS?
Questions? Discussion?