Building Interactive Distributed Processing Applications at a Global Scale

Similar documents
Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Flash Storage Complementing a Data Lake for Real-Time Insight

Building Durable Real-time Data Pipeline

CS 655 Advanced Topics in Distributed Systems

Auto Management for Apache Kafka and Distributed Stateful System in General

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Efficient On-Demand Operations in Distributed Infrastructures

Storage for HPC, HPDA and Machine Learning (ML)

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Big Data Hadoop Course Content

CISC 7610 Lecture 2b The beginnings of NoSQL

Deploying Applications on DC/OS

Lecture 11 Hadoop & Spark

VoltDB vs. Redis Benchmark

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Data Acquisition. The reference Big Data stack

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Next-Generation Cloud Platform

Performance and Scalability with Griddable.io

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

Windows Azure Services - At Different Levels

Cloud Analytics and Business Intelligence on AWS

Spark, Shark and Spark Streaming Introduction

Intra-cluster Replication for Apache Kafka. Jun Rao

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Data Acquisition. The reference Big Data stack

Qunar Performs Real-Time Data Analytics up to 300x Faster with Alluxio

Esper EQC. Horizontal Scale-Out for Complex Event Processing

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

PREDICTIVE DATACENTER ANALYTICS WITH STRYMON

Pocket: Elastic Ephemeral Storage for Serverless Analytics

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

A BigData Tour HDFS, Ceph and MapReduce

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Apache BookKeeper. A High Performance and Low Latency Storage Service

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Practical Big Data Processing An Overview of Apache Flink

Apache Flink- A System for Batch and Realtime Stream Processing

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

Hortonworks and The Internet of Things

MapR Enterprise Hadoop

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Memory-Based Cloud Architectures

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

NPTEL Course Jan K. Gopinath Indian Institute of Science

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

The Google File System

Cloud Computing at Yahoo! Thomas Kwan Director, Research Operations Yahoo! Labs

The Google File System (GFS)

WHITEPAPER. MemSQL Enterprise Feature List

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

Tools for Social Networking Infrastructures

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Over the last few years, we have seen a disruption in the data management

Distributed File Systems II

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

The Google File System

microsoft

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Modern Stream Processing with Apache Flink

The Google File System

CSE6331: Cloud Computing

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Challenges for Data Driven Systems

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Warehouse- Scale Computing and the BDAS Stack

CIT 668: System Architecture. Amazon Web Services

Data Analytics with HPC. Data Streaming

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Research Faculty Summit Systems Fueling future disruptions

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

UNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017

Big Data Architect.

Aurora, RDS, or On-Prem, Which is right for you

Data Centers and Cloud Computing. Data Centers

Building Consistent Transactions with Inconsistent Replication

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source

The Google File System

Building a Data-Friendly Platform for a Data- Driven Future

Apache Flink Big Data Stream Processing

High Throughput WAN Data Transfer with Hadoop-based Storage

The Stream Processor as a Database. Ufuk

Apache Kafka Your Event Stream Processing Solution

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Transcription:

Building Interactive Distributed Processing Applications at a Global Scale Shadi A. Noghabi Prelim Committee: Prof. Roy Campbell (advisor), Prof. Indy Gupta (advisor), Prof. Klara Nahrstedt, Dr. Victor Bahl 1

Many Interactive Applications < few seconds 100s of PBs data < few minutes Trillions of events < few milliseconds Millions - billions of devices 2

Low Latency at Large Scale Latency-sensitive, reaching low latency is an important for interactivity 100ms latency (Amazon) 1% loss in revenue $4M per millisec! * Large scale processing, at large and global scales My focus: Building Interactive systems at a production scale * https://blog.gigaspaces.com/amazon-found-every-100ms-of-latency-cost-them-1-in-sales/ 3

At Scale, Low Latency is Challenging Heterogeneity Data Machines Operations Dynamism Workload Environment Distribution State & Replication Imbalance 4

Thesis Statement Latency-driven designs can be used to build production infrastructures for interactive applications at a global scale while addressing myriad challenges on heterogeneity, dynamism, state, and imbalance. 5

Latency-driven Design Achieving latency has the highest priority PACELC 1,2 Latency Latency Latency Consistency Availability Throughput Isolation 1. Abadi, Consistency Tradeoffs in Modern Distributed Database System Design, Computer 2012 2. Brewer, Gilbert, and Lynch Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, Sigact News 2002 6

Related Work Latency-Driven in other frameworks Trend of NoSQL systems: Cassandra, Amazon s DynamoDB, Pileus. Latency-Driven does not fit all Consistency driven: Consistent Storage: BigTable, Spanner, Yahoo s PNUTS, Espresso Exactly-once Stream processing: Flink, Millwheel, Spark Streaming, Trident Throughput driven: Batch processing: Hadoop, Spark, Tez, Pig and Hive Micro-batch streaming: Spark Streaming, media processing High-throughput Storage: HDFS and GFS 7

Latency-driven Techniques Background processing Locality Partitioning & parallelism Tiered data access Load balancing Latency Driven Adaptive Scaling Opportunistic processing 8

Latency-driven at All Layers Edge Edge DC 1 DC 2 Storage DC n Processing Networking Edge Ambry [SIGMOD 16] Samza [VLDB 17] Steel Edge [HotCloud 18] FreeFlow [HotNets 17] General Used Main Integrated container production ~15 Companies, networking with system production with >4 applicable years 100s Azure & of 500M applications to and Docker, users Windows at Kubernetes, at LinkedIn IoT etc. 9

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 10

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 11

Samza: Stateful Scalable Stream Processing at LinkedIn VLDB 17 Shadi A. Noghabi*, Kartik Paramasivam^, Yi Pan^, Navina Ramesh^, Jon Bringhurst^, Indranil Gupta*, Roy Campbell* * University of Illinois at Urbana-Champaign ^ LinkedIn Corp. +

Stream Processing* Interactive feeds or ads Security & Monitoring Internet of Things *Stream processing is the processing of data in motion, i.e., computing on data immediately as it is produced 13

Stream Applications need State Access Stream (IP, access time) DoS Identifier IP Count 1.1.1.1 2 2.3.2.1 30 DoS Stream (Attacker IP) State: durable data read/written along with the processing Many cases need large state Aggregation: aggregating counts over a window Join: two (or more) streams over a window Other: machine learning model 14

Scale of Processing Large scale of input, in Kafka alone: 2.1 Trillion msg/day, 16 Million msg/sec peaks 0.5 PB in, 2 PB out per day Many applications: 100s of production applications Over 10,000 containers Large scale of state Several 100s of TBs for a single application 15

Stateful Stream Processing Need to handle state with low latency, at large-scale, and with correct (consistent) results http://samza.apache.org/ Opensource Apache project Powers hundreds of apps in LinkedIn s production In use at LinkedIn, Uber, Metamarkets, Netflix, Intuit, TripAdvisor, VMware, Optimizely, Redfin, etc. 16

Stateful Stream Processing Processing Model Input partitioning Parallel and independent tasks Locality aware data passing Stateful Local state Incremental checkpointing 3-Tier caching Parallel recovery Host Stickiness Compaction 17

Stateful Stream Processing Processing Model Input partitioning Parallel and independent tasks Locality aware data passing Stateful Local state Incremental checkpointing 3-Tier caching Parallel recovery Host Stickiness Compaction 18

Processing from a Partitioned Source Stream of events Ex: page access, ad views, etc. 19

Processing from a Partitioned Source Stream of events Partition 1 Partition 2 Ex: page access, ad views, etc. Partition k 20

Processing from a Partitioned Source Stream of events Partition 1 Partition 2 Ex: page access, ad views, etc. Partition k 21

Processing from a Partitioned Source Stream of events Partition 1 Task 1 Ex: page access, ad views, etc. Partition 2 Partition k Task 2 Partitioning & parallelism Task k 22

Multi-Stage Dataflow Locality Application logic: Count number of Page accesses for each ip domain in a 5 minute window Page Access in stream Application DAG in tasks DoS Stream out stream Window count Filter SendTo Data Parallelism: each task performs the entire pipeline on its own chunk of input data 23

Stateful Stream Processing Processing Model Input partitioning Parallel and independent tasks Locality aware data passing Stateful Local state Incremental checkpointing 3-Tier caching Parallel recovery Host Stickiness Compaction 24

Stateful Stream Processing Processing Model Input partitioning Parallel and independent tasks Locality aware data passing Stateful Local state Incremental checkpointing 3-Tier caching Parallel recovery Host Stickiness Compaction 25

Stateful Processing Samza Application Task 1 Task 2 Task k Store count of access per ip domain State: Ip access count But, using a remote store is slow 26

Stateful Processing Samza Application Task 1 Task 2 Task k State: Ip access count 27

Stateful Processing Samza Application Task 1 Task 2 Task k [1-100] [100-200] [900-1000] Local State: use the local storage of tasks State: Ip access count Each task, state corresponding to their partition. Ex: task 1, processing input [1-100], state [1-100] Locality 28

Stateful Processing Samza Application Task 1 Task 2 Task k [1-100] [100-200] [900-1000] Local Storage: in-memory: fast, but low capacity (GBs) on-disk: high capacity (TBs) and persistent, but slow Disk-mem: Disk (persistent store) + memory (cache) Fast, high capacity, and persistent Tiered data access 29

Stateful Processing Samza Application Task 1 Task 2 Task k [1-100] [100-200] [900-1000] What about failures? How to not lose state? 30

Fault-Tolerance Task 1 Task 2 Task k Changes saved to a durable change log Recovery by replay change-log Periodically batch & flush changes Changelog e.g., Kafka log compacted topic... partition k partition 2 partition 1

Fault-Tolerance X Task 1 Task 2 Task k Changes saved to a durable change log Recovery by replay change-log Periodically batch & flush changes Changelog e.g., Kafka log compacted topic... partition k partition 2 partition 1

Fault-Tolerance Task 1 Task 2 Task k X = 10 Changes saved to a durable change log Recovery by replay change-log Periodically batch & flush changes Changelog e.g., Kafka log compacted topic... partition k partition 2 partition 1

Fault-Tolerance Background processing Task 1 Task 2 Task k Changes saved to a durable change log Recovery by replay change-log Periodically batch & flush changes Changelog e.g., Kafka log compacted topic... X = 10 partition k partition 2 partition 1

But, what is the cost? Latency Consistency Availability 35

But, what is the cost? Latency Consistency Replayable input + Consistent snapshot Availability Longer failure recovery 36

Consistency Guarantees Task 1 Offset 1005 Task 2 Task k Offsets e.g., Kafka topic Offset 1005 Changelog e.g., Kafka topic... X = 10 partition k partition 2 partition 1

Availability Optimizations Availability is compromised. To improve we use: Parallel recovery Background compaction Partitioning & parallelism Background processing Host stickiness Locality 38

Fast Restarts with Host Stickiness Input Stream Job Change-log Task-1 Task-4 Task-2 Task-3 Host-A Host-B Host-C Durable : Task-Container-Host Mapping Task 1, Task 4 -> Host-A Task 2 -> Host-B Task 3 -> Host-C Host Stickiness in YARN : - Try to place task on same host after restart - Minimize state rebuilding Overhead

Related Work Stateless: do not support state Apache Storm [SIGMOD 14], Heron [SIGMOD 15], S4 [ICDM 10] Remote Store: relying on external storage Trident, MillWheel [VLDB 13], Dataflow [VLDB 15] High latency overhead per input message but, fast recovery (better availability) Local state, but, full state checkpointing: Flink [TCDE 15], Spark Streaming [HotCloud 12], IBM Streams [VLDB 16], Streamscope [NDSI 16], System S [DEBS 11] Does not scale for large application state but, easier repeatable results at failure 40

Evaluation 41

Evaluation Setup Production Cluster 500 node YARN cluster real world applications Small Cluster 8 node cluster: 64GB RAM, 24 core CPUs, a 1.6 TB SSD micro-benchmarks Read-only workload ~ join with table Read-write workload ~ aggregation over time 42

Local State -- Latency > 2 orders of magnitude slower compared to local state Stores: in-mem on-disk disk-mem remote changelog adds minimal overhead on disk w/ caching comparable with in memory Fault-tolerance: None changelog (Clog) remote 43

Failure Recovery Sticky Sticky Growing linearly with size of state Almost constant overhead with host stickiness Overhead independent of % of failures (parallel recovery) 44

Summary of State in Samza Pros 100x better latency Better resource utilization no need for remote DB resources Parallel and independent processing Cons Dependent to input partitioning Does NOT work when state is large and not co-partitionable in input stream Auto-scaling becomes harder Recovery is slower (vs remote)

Conclusion Need to handle state with low latency, at large-scale, and with correct (consistent) results Processing Model Input partitioning Parallel and independent tasks Locality aware data passing Stateful Local state Incremental checkpointing 3-Tier caching Parallel recovery Host Stickiness Compaction Locality Background processing Partitioning & parallelism 46

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 47

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 48

Steel: Unified and Optimized Edge- Cloud Environment HotCloud 18 Shadi A. Noghabi*, John Kolb^, Peter Bodik**, Eduardo Cuervo**, Indranil Gupta*, Roy Campbell* * University of Illinois at Urbana-Champaign ^ University of California, Berkeley ** Microsoft Research 49

Cloud is Not an One-Size-Fits-All Latency < 80ms < 20ms The Cloud is simply too far! > 70 ms round trip time < 10 ms 50

Cloud is Not an One-Size-Fits-All Latency Resources < 80ms < 10 ms < 20ms Bandwidth 10s 100s GB/s Y Battery Energy Heating Not enough resources to use the Cloud 51

Cloud is Not an One-Size-Fits-All < 80ms Latency Resources < 20ms Bandwidth 10s 100s GB/s Y Privacy & Security < 10 ms Battery Energy Heating 52

Edge Computing Edge 3 Cloud Edge 1 Edge 2 Satyanarayanan, M., Bahl, P., et. al. The Case for VM-based Cloudlets in Mobile Computing, Pervasive Computing 2009 53

However Industry is at its infancy of building Edge-Cloud applications No proper unification of Edge & Cloud Hard to configure, deploy, and monitor Manual and redundant optimizations Integrated with production Azure services Steel: A unified edge cloud framework with modular and automated optimizations 54

Applications Steel Abstraction (Logical Spec) A B C D Fabric Compile Deploy Monitor & Analyze Optimization Modules Load Placement Communication Balancing Edge-Cloud Ecosystem 55

Optimizer Modules Placement Where (edge/cloud) to place? Adapt to long-term changes Communication configure edge-cloud links Adapt to short-term spikes Location and vicinity aware Background profiling & what-if analysis Background batching & compression Partitioned and parallel optimizations Change strategy opportunistically (available resources) Locality Background processing Partitioning & parallelism Opportunistic processing 56

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 57

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 58

Ambry: LinkedIn s Scalable Geo- Distributed Object Store SIGMOD 16 Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil Gupta*, Roy H. Campbell* * University of Illinois at Urbana-Champaign SRG: http://srg.cs.illinois.edu and DPRG: http://dprg.cs.uiuc.edu + LinkedIn Corporation Data Infrastructure team 59

Massive Media Objects are Everywhere 100s of Millions of users 100s of TBs to PBs From all around the world 60

How to handle all these objects? We need a geographically distributed system that stores and retrieves objects in an low-latency and scalable manner Ambry In production for >4 years and >500 M users! Open source: https://github.com/linkedin/ambry 61

Geo-Distribution Locality Count success Alice Asynchronous writes Write synchronously only to local datacenter Asynchronously replicate others Reduce user perceived latency Minimize cross-dc traffic Datacenter 1 Datacenter 2 Datacenter 3 item_i 62

Geo-Distribution Locality Background processing Alice Asynchronous writes Write synchronously only to local datacenter Asynchronously replicate others Count success Background Replication Reduce user perceived latency Minimize cross-dc traffic Datacenter 1 Datacenter 2 Datacenter 3 item_i item_i item_i 63

Other Techniques Ambry is a large production system with many other techniques: Partitioning & Logical data partitioning parallelism Caching, indexing, bloom filters Background load balancing. Tiered data access Load balancing 64

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 65

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 66

FreeFlow: High performance container networking HotNets 16 Tianlong Yu ^, Shadi A. Noghabi *, Shachar Raindel, Hongqiang Liu, Jitu Padhye, and Vyas Sekar ^ * University of Illinois at Urbana-Champaign Microsoft Research ^ Carnegie Mellon University 67

Containers are popular Containers provide good portability and isolation https://clusterhq.com/assets/pdfs/state-of-container-usage-june-2016.pdf 68

FreeFlow Opportunistic processing Node 1 Node 2 Locality Container 1 Container 2 Container 3 Shared memory RDMA FreeFlow Network 69

Networking Processing Storage Latency-driven Techniques Background processing Locality Partitioning & parallelism Ambry [SIGMOD 16] Samza [VLDB 17] Tiered data access Latency Driven Adaptive Scaling Steel [HotCloud 18] FreeFlow Load balancing Opportunistic processing [HotNets 17] 70

Future Directions One global system seamlessly across the globe Holistic and cross-layer approach Latency sensitivity-aware approaches (not treating all objects/apps equally) E.g., dynamic use of compression, erasure coding, compaction in storage for old objects E.g., multi-tenant stream processing environments Auto-pilot systems coping with changes automatically and dynamically 71

Thanks to Many Collaborators Advisors: Indranil Gupta and Roy Campbell LinkedIn: Sriram Subramanian, Priyesh Narayanan, Kartik Paramasivam, Yi Pan, Navina Ramesh, et al. Microsoft Research: Victor Bahl, Peter Bodik, Eduardo Cuervo, Hongqiang Liu, Jitu Padhye, et. al. 72

Publications 1. Steel: Unified Management and Optimization of Edge-Cloud Applications, SA Noghabi, J Kolb, P Bodik, E Cuervo, I Gupta, RH Campbell, (in progress) 2. Steel: Simplified Development and Deployment of Edge-Cloud Applications, SA Noghabi, J Kolb, P Bodik, E Cuervo, HotCloud 2018 3. Samza: Stateful Scalable Stream Processing at LinkedIn, SA Noghabi, K Paramasivam, Y Pan, N Ramesh, J Bringhurst, I Gupta, RH Campbell, VLDB 2017 4. Ambry: LinkedIn s Scalable Geo-Distributed Object Store, SA Noghabi, S Subramanian, P Narayanan, S Narayanan, G Holla, M Zadeh, T Li, I Gupta, RH Campbell, SIGMOD 2016 5. Freeflow: High performance container networking, T Yu, SA Noghabi, S Raindel, H Liu, J Padhye, V Sekar, HotNets 2016 6. Steel: Unified Management and Optimization of Edge-Cloud IoT Applications, NSDI poster 2018 7. To edge or not to edge?, F Kalim, SA Noghabi, S Verma, SoCC poster 2017 8. Building a Scalable Distributed Online Media Processing Environment, SA Noghabi, PhD workshop at VLDB 2016 9. Toward Fabric: A Middleware Implementing High-level Description Languages on a Fabric-like Network, SH Hashemi, SA Noghabi, J Bellessa, RH Campbell, ANCS 2016. 10. Towards enabling cooperation between scheduler and storage layer to improve job performance, M Pundir, J Bellessa, SA Noghabi, CL Abad, RH Campbell, PDSW 2015 73

Summary of Thesis Latency-driven designs can be used to build production infrastructures for interactive applications at a global scale Background processing Locality Partitioning & parallelism Processing Samza [VLDB 17] Steel [HotCloud 18] Tiered data access Load balancing Latency Driven Adaptive Scaling Opportunisti c processing Storage Ambry [SIGMOD 16] Networking FreeFlow [HotNets 17] 74

Back up Slides 75

Theoretical Background Latency-driven design can be modeled theoretically with Queueing Theory* System: A hierarchical queuing system Latency: Response time (response_processed - req_enter_time) Techniques: queueing theory models/techniques Parallelism & Partitioning multiple queues, parallel servers Adaptive scaling adding more servers * Gross, Donald. Fundamentals of queueing theory. John Wiley & Sons, 2008. 76

Other Techniques Batch + Stream in one System Reprocess parts/all stream or Database bugs, upgrades, logic changes Batch as a finite stream + conflict resolution, scaling & throttling Adaptive Scaling 77

Local State -- Throughput remote state 30-150x worse than local state on disk w/ caching comparable with in memory changelog adds minimal overhead 78

Snapshot vs Changelog 79