Tools for Social Networking Infrastructures

Similar documents
CS 655 Advanced Topics in Distributed Systems

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Intra-cluster Replication for Apache Kafka. Jun Rao

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

Outline. Introduction Background Use Cases Data Model & Query Language Architecture Conclusion

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Distributed Data Analytics Partitioning

CISC 7610 Lecture 2b The beginnings of NoSQL

Data Acquisition. The reference Big Data stack

Distributed Systems 16. Distributed File Systems II

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Cassandra- A Distributed Database

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Building Durable Real-time Data Pipeline

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Auto Management for Apache Kafka and Distributed Stateful System in General

Distributed File Systems II

Apache Cassandra - A Decentralized Structured Storage System

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Data Acquisition. The reference Big Data stack

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

CS November 2017

Intuitive distributed algorithms. with F#

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

The Google File System. Alexandru Costan

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

CLOUD-SCALE FILE SYSTEMS

CSE-E5430 Scalable Cloud Computing Lecture 9

A Non-Relational Storage Analysis

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Dynamo: Amazon s Highly Available Key-Value Store

Building LinkedIn s Real-time Data Pipeline. Jay Kreps

Architecture of a Real-Time Operational DBMS

Introduction to Apache Kafka

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

The Google File System

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

CS November 2018

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Distributed Filesystem

10. Replication. Motivation

The Google File System

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

The Google File System

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Rule 14 Use Databases Appropriately

The Google File System

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

DIVING IN: INSIDE THE DATA CENTER

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Distributed computing: index building and use

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Map-Reduce. Marco Mura 2010 March, 31th

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Large-Scale Data Stores and Probabilistic Protocols

Applications of Paxos Algorithm

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Distributed computing: index building and use

Apache BookKeeper. A High Performance and Low Latency Storage Service

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

CSE 124: Networked Services Lecture-17

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Multimedia Streaming. Mike Zink

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

NoSQL Database Comparison: Bigtable, Cassandra and MongoDB CJ Campbell Brigham Young University October 16, 2015

Replication. Feb 10, 2016 CPSC 416

MapReduce. U of Toronto, 2014

Typical size of data you deal with on a daily basis

Fluentd + MongoDB + Spark = Awesome Sauce

Indexing Large-Scale Data

Transcription:

Tools for Social Networking Infrastructures 1

Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes constantly easily scalable dealing with failures keeping the cost low 2

Cassandra - a decentralised structured storage system Existing Solutions Coda, Ficus Google File System Bayou Dynamo Bigtable 3

Cassandra - a decentralised structured storage system Data Model Table - multidimensional map indexed by a key Row key - string with no restrictions, usually 13-36 bytes Column family - a group of columns Super column family - group within a group of columns Sorting - Columns can be sorted by name or date It is possible to have multiple tables in one cluster. 4

Cassandra - a decentralised structured storage system System Architecture read/write gets to any node in a cluster, the node determines the replicas for the key write: request is routed to replicas and wait until the quorum has acknowledged read weak : first response is sent back strong : wait for quorum of responses 5

Cassandra - a decentralised structured storage system System Architecture - Partitioning consistent hashing each node has a randomly assigned position on the ring low impact of arrival/departure of the nodes load balancing to alleviate heavily loaded nodes 6

Cassandra - a decentralised structured storage system System Architecture - Replication Allows for high availability and durability Coordinator node is in charge of the replicas various replication options Rack Unaware - use N-1 consecutive nodes in the ring after the coordinator replication metadata is kept in Zookeeper replication across multiple data centers allows for no downtime in case of a crash 7

Cassandra - a decentralised structured storage system System Architecture - Membership Scuttlebutt - efficient Gossip based mechanism Φ Accrual Failure Detector module emits a value (Φ) which represents suspicion level for node Φ is calculated based on arrival times of gossip messages using the exponential distribution allows for setting a threshold for suspecting a node is down 8

Cassandra - a decentralised structured storage system System Architecture - Persistence Data is stored in memory and on file system write consists of: 1. file system: commit log update 2. memory: data structure update memory data structure is saved to data file on disk, as it crosses a threshold all writes are sequential and generate an index for lookup merge process runs in background to collate the data files lookup: first check memory then check all files on disk from newest to oldest bloom filter is used to check if file contains a key 9

Cassandra - a decentralised structured storage system Practical experiences Moving data from MySQL to Cassandra using Map/Reduce Different failure detectors produce very different detection times 7 TB of inbox data for over 100M users PHI Detector detects in 15s vs over 120 with other detectors Cassandra is decentralized, but uses Zookeeper for some coordination Inbox Search 50+ TB data stored on a 150 node cluster 10

Kafka: a Distributed Messaging System for Log Processing Problem: managing large amount of log data Log processing has become a critical component of the data pipeline for consumer internet companies. Activity data is a part of the production data pipeline used directly in site features. Every day, China Mobile collects 5 8TB of phone call records and Facebook gathers almost 6TB of various user activity events. System should be distributed, scalable and offer high throughput Log consumption should be possible in real time 11

Kafka: a Distributed Messaging System for Log Processing Existing solutions Early systems for processing this kind of data relied on physically scraping log files off production servers Most systems are designed for collecting and loading the log data into a data warehouse for offline consumption. Systems allowing online consumption are usually overcomplicated, which results in lower performance There are nearly no systems allowing for pull model 12

Kafka: a Distributed Messaging System for Log Processing Architecture Pub/Sub system Producer sends messages to topics Consumer consumes from topics Messages are transferred via broker To balance load, a topic is divided into partitions and each broker stores one or more of those partitions. 13

Kafka: a Distributed Messaging System for Log Processing API Messages can be packed together to reduce overhead Multiple producers and consumers can publish and retrieve messages at the same time Messages are evenly distributed among consumer streams Sample producer code: producer = new Producer(...); message = new Message( test message str.getbytes()); set = new MessageSet(message); producer.send( topic1, set); Sample consumer code: streams[] = Consumer.createMessageStreams( topic1, 1); for (message : streams[0]) { bytes = message.payload(); // do something with the bytes } 14

Kafka: a Distributed Messaging System for Log Processing Architecture... Simple Storage each partition corresponds to a logical log messages have no id, except for file offset messages are consumed in order consumer keeps the state messages are deleted after some period Efficient transfer pull request retrieves multiple messages Linux sendfile API usage no application cache 15

Kafka: a Distributed Messaging System for Log Processing Distributed Coordination All messages from one partition are consumed by a single consumer Consumers coordinate using Zookeeper detecting consumer/broker changes triggering rebalance process keeping track of the consumed offset Rebalancing on broker/consumer change Delivery at-least-once delivery guarantee in order delivery from one partition no broker redundancy 16

Kafka: a Distributed Messaging System for Log Processing LinkedIn usage kafka cluster co-located with each datacenter services publish to local Kafka brokers hardware load-balancer to distribute the publish requests evenly online consumers run within the same datacenter separate datacenter for offline analysis Statistics: end-to-end latency ~10 seconds hundreds of gigabytes of data billion messages a day 17

Kafka: a Distributed Messaging System for Log Processing Experimental results 18

Workload Analysis of a Large-Scale Key-Value Store Memcached - a distributed hash table Typical usage is a layer in data-retrieval hierarchy Memcached exposes the data in RAM to clients over network Expanded by adding RAM or more servers Consistent hashing to determine server per key Stored items can have different size Memory is divided into slab classes, and objects are stored in matching class LRU method used for cache eviction 19

Workload Analysis of a Large-Scale Key-Value Store Methodology & Pools used in the study kernel module used for sniffing captured traces are 3-7 TB Apache HIVE used for analysis comparison of the traces with the logs for verification 20

Workload Analysis of a Large-Scale Key-Value Store Pools used in the study 21

Workload Analysis of a Large-Scale Key-Value Store Key and value size distributions for all traces The sizes of keys, up to Memcached s limit of 250 B (not shown). The sizes of values. Aggregated value sizes by the total amount of data they use in the cache. 22

Workload Analysis of a Large-Scale Key-Value Store Temporal Patterns Figure 3: Request rates at different dates and times of day, Coordinated Universal Time (UTC). 23

Workload Analysis of a Large-Scale Key-Value Store Cache behaviour Hit rates and reasons for misses Locality Repeating keys Locality over time how many keys do not repeat in time proximity Reuse period time between consecutive accesses to a key 24

Workload Analysis of a Large-Scale Key-Value Store Hit rates & miss categories Table 3: Miss categories in last 24 hours of the ETC trace. 25

Workload Analysis of a Large-Scale Key-Value Store 26

Workload Analysis of a Large-Scale Key-Value Store 27

Workload Analysis of a Large-Scale Key-Value Store 28

Workload Analysis of a Large-Scale Key-Value Store Statistical modelling 29

Workload Analysis of a Large-Scale Key-Value Store Discussion Hit rates are inversely correlated with the pool size Hit rates are not correlated with the locality Improvements of hit rates: + Increase RAM different cache eviction policy 30

Making clear graphs sidenote 31

Questions 1. 2. 3. 4. 5. 6. Main concern of Cassandra is write throughput, what are the tradeoffs? Why is Kafka faster than the other services to which it was compared? What are the possible data loss causes in Kafka? What are the advantages of using consensus service (Zookeper) vs replicated master node? How is churn handled in different systems? What is the problem with using Memcached as persistent storage (eg. USR)? 32