C3: INTERNET-SCALE CONTROL PLANE FOR VIDEO QUALITY OPTIMIZATION

Similar documents
Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation

It s Not the Cost, It s the Quality! Ion Stoica Conviva Networks and UC Berkeley

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

Video AI Alerts An Artificial Intelligence-Based Approach to Anomaly Detection and Root Cause Analysis for OTT Video Publishers

A Routing Infrastructure for XIA

CFA: A Practical Prediction System for Video QoE Optimization

Q Conviva s State of the Streaming TV Industry

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Enabling Data-Driven Optimization of Quality of Experience in Internet Applications. Junchen Jiang

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

Junchen Jiang. 513 N. Neville St. Apt C3, Pittsburgh, PA, URL: junchenj/

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Junchen Jiang. 200 West 16th Street, New York, NY, URL:

The Edge: Delivering the Quality of Experience of Digital Content

QuartzV: Bringing Quality of Time to Virtual Machines

The Guide to Best Practices in PREMIUM ONLINE VIDEO STREAMING

Datacenter replication solution with quasardb

Testing & Assuring Mobile End User Experience Before Production Neotys

Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

Internet Services and Search Engines. Amin Vahdat CSE 123b May 2, 2006

Oboe: Auto-tuning Video ABR Algorithms to Network Conditions

A Cloud Gateway - A Large Scale Company s First Line of Defense. Mikey Cohen Manager - Edge Gateway Netflix

Mobile Middleware Course. Mobile Platforms and Middleware. Sasu Tarkoma

HPC in Cloud. Presenter: Naresh K. Sehgal Contributors: Billy Cox, John M. Acken, Sohum Sohoni

TCP WISE: One Initial Congestion Window Is Not Enough

Data Center Performance

Design Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68

ebay Marketplace Architecture

Cloud Transcoder: Bridging the Format and Resolution Gap between Internet Videos and Mobile Devices

The Guide to Best Practices in PREMIUM ONLINE VIDEO STREAMING

Software-Defined Networking (SDN) Overview

Distributed Computation Models

November 2017 WebRTC for Live Media and Broadcast Second screen and CDN traffic optimization. Author: Jesús Oliva Founder & Media Lead Architect

expressive Internet Architecture:

Open Connect Overview

Managing the Subscriber Experience

Green networking: lessons learned and challenges Prof. Raffaele Bolla CNIT/University of Genoa

Multimedia Streaming. Mike Zink

Architectures for Carrier Network Evolution

BUILDING LARGE VOD LIBRARIES WITH NEXT GENERATION ON DEMAND ARCHITECTURE. Weidong Mao Comcast Fellow Office of the CTO Comcast Cable

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Carrier SDN for Multilayer Control

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

Complex Interactions in Content Distribution Ecosystem and QoE

Copyright 2016 Pivotal. All rights reserved. Cloud Native Design. Includes 12 Factor Apps

VOLTDB + HP VERTICA. page

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Improving Video Quality by Information Sharing!! Ion Stoica, CTO! (also at UC Berkeley and Databricks)!!

Cloud Native Architecture 300. Copyright 2014 Pivotal. All rights reserved.

irtc: Live Broadcasting

ArcGIS Enterprise: An Introduction. Philip Heede

Data Driven Networks. Sachin Katti

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Introduction to Big-Data

Self Programming Networks

Problem Statement of Edge Computing beyond Access Network for Industrial IoT

A Platform for Fine-Grained Resource Sharing in the Data Center

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

Data Driven Networks

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

ACANO SOLUTION RESILIENT ARCHITECTURE. White Paper. Mark Blake, Acano CTO

Raj Jain (Washington University in Saint Louis) Mohammed Samaka (Qatar University)

Better End-to-End Adaptation Using Centralized Predictive Control

Joint Server Selection and Routing for Geo-Replicated Services

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Mikko Ohvo Business Development Manager Nokia

Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments

Octoshape. Commercial hosting not cable to home, founded 2003

Record Placement Based on Data Skew Using Solid State Drives

Addressing the Challenges of Web Data Transport

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

FOUR WAYS TO LOWER THE COST OF REPLICATION

More on Testing and Large Scale Web Apps

Embracing Failure. Fault Injec,on and Service Resilience at Ne6lix. Josh Evans Director of Opera,ons Engineering, Ne6lix

Evolving Prometheus for the Cloud Native World. Brian Brazil Founder

IPv6. Akamai. Faster Forward with IPv6. Eric Lei Cao Head, Network Business Development Greater China Akamai Technologies

Current Trends in IP/Optical Transport Integration

Neural Adaptive Content-aware Internet Video Delivery. Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, Dongsu Han

ebay s Architectural Principles

Take Risks But Don t Be Stupid! Patrick Eaton, PhD

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Mobile Edge Computing for 5G: The Communication Perspective

ESM Live Broadcast System. Hui Zhang Carnegie Mellon University

Varys. Efficient Coflow Scheduling. Mosharaf Chowdhury, Yuan Zhong, Ion Stoica. UC Berkeley

Optimizing Apache Spark with Memory1. July Page 1 of 14

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source

Swarm at the Edge of the Cloud. John Kubiatowicz UC Berkeley Swarm Lab September 29 th, 2013

AWS Lambda: Event-driven Code in the Cloud

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

SMORE: Semi-Oblivious Traffic Engineering

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES

Cloud-Native Applications. Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0

Interactive Branched Video Streaming and Cloud Assisted Content Delivery

SOLUTION BRIEF Enterprise WAN Agility, Simplicity and Performance with Software-Defined WAN

Page 1. Outline / Computer Networking : 1 st Generation Commercial PC/Packet Video Technologies

Transcription:

C3: INTERNET-SCALE CONTROL PLANE FOR VIDEO QUALITY OPTIMIZATION Aditya Ganjam, Jibin Zhan, Xi Liu, Faisal Siddiqi, Conviva Junchen Jiang, Vyas Sekar, Carnegie Mellon University Ion Stoica, University of California, Berkeley, Conviva Hui Zhang, Carnegie Mellon University, Conviva

HIGH EXPECTATIONS ON VIDEO QUALITY Television High bitrate/resolution Interruption-free Fast start Need to optimize video quality through the lifetime of a session 2

WHERE TO IMPLEMENT OPTIMIZATION? Multiple choices for actuation Client-side actuation most favorable to incremental deployment CDN 1" Video Origin " ISP 1" Client CDN 2" Backbone Network" 3

MANY CHOICES & NEED QUICK DECISION Optimization parameters Ø Bitrate x CDN (over time) Existing approaches are reactive, apply a probe and update method Ø Too slow to find optimal choice Time CDN Resource Bitrate Bitrate 4

CENTRALIZED PREDICTIVE CONTROL Ideal solution Predict outcome of each parameter choice in real-time Continuously select optimal choice Logically Centralized Controller Prediction model based on global view Per-client decisions in real-time Achieving ideal solution requires Collect global quality information Quality Information Sub-second response Decisions Build a prediction model Make per-client predictions and Client parameter decisions at Internet-scale 5

CHALLENGE: CLIENT HETEROGENEITY Strawman: implement 100 times Heterogeneous software environments and interfaces Slow software update cycle Prediction model based on global view Quality Information Logically Centralized Controller Sub-second response Per-client decisions in real-time Decisions >100 unique client pla/orms Client 6

DESIGN CHOICE: THIN SENSING/ACTUATION LAYER Logically Centralized Controller Minimal client footprint Ø Move computation to controller Define a narrow waist interface Prediction model based on global view Quality Information Sub-second response Per-client decisions in real-time Decisions Client 7

CHALLENGE: REAL-TIME PREDICTIONS AT INTERNET SCALE Internet scale? Geographic scale - all continents Network scale - all network types Client scale 10s to 100s of millions concurrent Prediction model based on global view Quality Information Logically Centralized Controller Sub-second response Client Per-client decisions in real-time Decisions 8

CHALLENGE: REAL-TIME PREDICTIONS AT INTERNET SCALE Achieving real-time response, with the most recent global view at Internet-scale is not feasible based on today s technologies Strawman solutions Centralized global controller not real-time Partitioned controller no global view Prediction model based on global view Quality Information Logically Centralized Controller Sub-second response Client Per-client decisions in real-time Decisions 9

DESIGN CHOICE: SPLIT CONTROL PLANE Create a global prediction model in near real-time (minutes) Make per-client decisions in real-time (sub-second) based on global model and most recent per-client local information Prediction model based on global view Quality Information Logically Centralized Controller Sub-second response Client Per-client decisions in real-time Decisions 10

C3 ARCHITECTURE C3 Controller Modeling Layer near real-time global prediction modeling Decision Layer real-time per-client decisions Narrow waist protocol Session Data Model (SDM) Control Decisions Diverse Client Platforms Adaptor C3 Sensing/Actuation Layer Narrow waist device API ConvivaStreamerProxy (CSP) 11

C3 ARCHITECTURE C3 Controller Modeling Layer near real-time global prediction modeling Decision Layer real-time per-client decisions Narrow waist protocol Session Data Model (SDM) Control Decisions Diverse Client Platforms Adaptor C3 Sensing/Actuation Layer Narrow waist device API ConvivaStreamerProxy (CSP) 12

MOTIVATING EXAMPLE Compute Video Startup Time metric Default Op<on: Compute Video Startup Time in the client, send to controller Session start Ad start Ad end Play start Startup Time 1 = T(play start) T(session start) Startup Time 2 = T(play start) T(session start) (T(Ad end) T(Ad start)) How do we adapt to changes? 13

IDEA: EXPOSE LOW-LEVEL ACTIONS Metric calculation pushed to the controller Session Data Model (SDM) Events, States, Measurements States robustness to message loss Player States Joining Playing Buffering Playing Events Player Monitoring time Network/ stream connection established Video buffer filled up Video buffer empty Video download rate, Available bandwidth, Dropped frames, Frame rendering rate, etc. Buffer replenished sufficiently Stopped/ Exit User action 14

C3 ARCHITECTURE C3 Controller Modeling Layer near real-time global prediction modeling Decision Layer real-time per-client decisions Narrow waist protocol Session Data Model (SDM) Control Decisions Diverse Client Platforms Adaptor C3 Sensing/Actuation Layer Narrow waist device API ConvivaStreamerProxy (CSP) 15

SPLIT CONTROL-PLANE Dual-loop control Client-driven real-time loop (red) Periodic global model learning and dissemination loop (blue) Intelligent Decision Layer Per-client decisions based on global model and client quality info Aggregation across Clients SDM SDM Prediction Model Metric Computation & Per-client Decisions Control Decisions Prediction Model Training Modeling Layer Decision Layer Sensing/ Actuation Layer 16

WHY SPLIT CONTROL CAN ACHIEVE NEAR OPTIMAL PERFORMANCE Domain specific insight globally optimal decisions tend to be persistent on minute-level timescales Persistence results do not hold for individual clients - still need per-client up-todate state CDF 1 0.8 0.6 0.4 0.2 0 CP A CP B CP C 1 10 100 1000 Persistence(min) Figure shows the distribu<on of persistence of the best CDN for three content providers 17

RESPONSIVENESS & SCALE Sub-second response time Geographically distributed decision layer Geo-distribution using cloud Controller scale Horizontally scaled decision layer Burst scaling using cloud Big-data technologies including Spark for modeling layer Physical View of C3 Controller Aggregation/ Compute Cluster Frontend Data Centers Clients 18

FAULT TOLERANCE Physical View of C3 Controller Decision layer gracefully degrades due to stale or missing global model Client-side failover and decision layer state reconstruction to handle instance failure Aggregation/ Compute Cluster Frontend Data Centers Clients 19

REAL-WORLD DEPLOYMENT & EVOLUTION Earlier phases explored alternate architectures Partitioning Heavy client (client offload of computation and decision logic) C3 platform used by premium video publishers over 8 years Over 1 billion unique devices Over 3 million concurrent devices during major events Ø E.g., World Cup, Super Bowl Significant quality improvement results 20

QUALITY IMPROVEMENT: REBUFFERING Premium content publisher One month A/B test Result: Greater then 50% reduction in Buffering Ratio BufRatio (%) 2.5 2 Native control 1.5 C3 control 1 0.5 0 5/1/14 5/8/14 5/15/14 5/22/14 5/29/14 Time (mm/dd/yy) 21

QUALITY IMPROVEMENT: START FAILURES Premium content publisher One month A/B test Result: Greater then 60% reduction in Video Start Failures VSF (%) 4 3 2 Native control 1 C3 control 0 5/1/14 5/8/14 5/15/14 5/22/14 5/29/14 TIme (mm/dd/yy) 22

QUALITY IMPROVEMENT ACROSS PROVIDERS Rebuffering Ratio improvement consistent results across 5 different content providers BufRatio (%) 3 2.5 Native control 2 C3 control 1.5 1 0.5 0 1 2 3 4 5 Content Providers 23

LESSONS Validates centralized control at an unprecedented scale Reinforces the case for centralized control Drivers: client heterogeneity, global policies, real-time monitoring Enablers: big-data, cloud, application-level resilience Unique challenges driving new ideas Split control balances global view and real-time Minimal client critical to handle heterogeneity Broader applicability of C3 architecture Other applications (e.g., gaming, voice, conferencing) Network layer control (e.g., SDN, CDN) 24

CONCLUSIONS Television is coming to the Internet, high expectations on quality Large optimization space for video quality Ø Drives the need for a proactive approach C3 is a centralized predictive control approach Solves key challenges: (a) client heterogeneity, (b) Internet-scale Minimal sensing/actuation layer & move all computation to the controller Split control-plane scale-out solution exploiting application-level resilience C3 used by many content providers with quality improvement results 25