Data Driven Networks. Sachin Katti

Similar documents
Data Driven Networks

Self Programming Networks

Multi-Domain Service Optimization

Lecture 18: Video Streaming

Cellular Network Traffic Scheduling using Deep Reinforcement Learning

Mobile Network Congestion Management

Neural Adaptive Content-aware Internet Video Delivery. Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, Dongsu Han

Slicing and Orchestration in Service-Oriented 5G Networks

THE MOBILE ACCESS NETWORK, BEYOND CONNECTIVITY. xran.org White Paper October 2016

Wireless SDN 기술. Seungwon Shin KAIST

INTELLIGENT CONTENT PRE-POSITIONING USING LTE- BROADCAST FOR A MEDIA OPTIMIZED NETWORK

A Hierarquical MEC Architecture: Experimenting the RAVEN Use-Case

Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC

The Edge of LTE. Deploying a MEC platform in today s networks. MEC Congress Munich, DE September SpiderCloud Wireless, Inc.

LTE Access Controller (LAC) for Small Cell Tool Design and Technology

Cisco Tetration Analytics

Video AI Alerts An Artificial Intelligence-Based Approach to Anomaly Detection and Root Cause Analysis for OTT Video Publishers

Video-Aware Networking: Automating Networks and Applications to Simplify the Future of Video

APPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology

Technologies for the future of Network Insight and Automation

HOW SDN AND NFV EXTEND THE ADOPTION AND CAPABILITIES OF SELF-ORGANIZING NETWORKS (SON)

QoE Congestion Management With Allot QualityProtector

Video Quality Management Guidebook

Business Drivers for Selecting LTE Technology. HSPA+ & LTE Executive Briefing, Jan 27, 2009 Hank Kafka, Vice President, Network Architecture, AT&T

The path toward C-RAN and V-RAN: benefits and challenges from operator perspective

Deploying IPTV and OTT

EFFECTIVE UTILIZATION OF M- ABR (MULTICAST- ASSISTED ABR) USING BIG DATA AND REAL- TIME ANALYTICS

How YouTube Performance is Improved in the T-Mobile Network. Jie Hui, Kevin Lau, Ankur Jain, Andreas Terzis, Jeff Smith T Mobile, Google

Adaptive Bit Rate (ABR) Video Detection and Control

Self-driving Datacenter: Analytics

Dr. Evaldas Stankevičius, Regulatory and Security Expert.

Exploring Cloud Security, Operational Visibility & Elastic Datacenters. Kiran Mohandas Consulting Engineer

SDN Use-Cases. internet exchange, home networks. TELE4642: Week8. Materials from Prof. Nick Feamster is gratefully acknowledged

Ensuring a good QoE for mobile broadband customers on LTE networks. Niket Srivastava Sales Director Viavi Solutions (formerly JDSU)

Nokia AirGile cloud-native core: shaping networks to every demand

Making Sense of Artificial Intelligence: A Practical Guide

Mobile Networks: Automation for Optimized Performance

5G Network Architecture

Cisco Tetration Analytics

Architectural overview Turbonomic accesses Cisco Tetration Analytics data through Representational State Transfer (REST) APIs. It uses telemetry data

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Opportunities for ML Analytics at the Sensor Endpoint

Cisco Ultra Traffic Optimization

SaaS Providers. ThousandEyes for. Summary

Mobile Edge Computing

Big Data Analytics for Intelligent Backhaul Networks

Get Your Datacenter SDN Ready. Ahmad Chehime Cisco ACI Strategic Product Sales Specialist SPSS Emerging Region

Cisco Crosswork Network Automation

SD-WAN / Hybrid WAN : Leveraging SDN-NFV for Networks Agility

SSD Telco/MSO Case Studies

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Myths and Realities of Wi- Fi. ACTEM Conference, April 2017 Michael McKerley, CTO

Transformation through Innovation

Citrix CloudBridge Product Overview

VMware vcloud Architecture Toolkit Cloud Bursting

Adaptive Video Multicasting

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

WIRELESS BROADBAND Supplemental Broadband Solution

Cloudamize Agents FAQ

Module Day Topic. 1 Definition of Cloud Computing and its Basics

WAN and Cloud Link Analytics for Enterprises

Analyzing Throughput and Stability in Cellular Networks

Intelligent WAN: Leveraging the Internet Secure WAN Transport and Internet Access

November 2017 WebRTC for Live Media and Broadcast Second screen and CDN traffic optimization. Author: Jesús Oliva Founder & Media Lead Architect

CloudCheck TruSpeed. Reliably Fast Broadband & Wi-Fi for the Home. June,

The Guide to Best Practices in PREMIUM ONLINE VIDEO STREAMING

Reduce Subscriber Churn Through Network Experience Optimization (NEO)

Transforming Transport Infrastructure with GPU- Accelerated Machine Learning Yang Lu and Shaun Howell

Revolutionising mobile networks with SDN and NFV

Pocket: Elastic Ephemeral Storage for Serverless Analytics

QOE ISSUES RELEVANT TO VIDEO STREAMING IN CABLE NETWORKS Jeremy Bennington Praveen Mohandas. Cheetah Technologies, Sunnyvale, CA

Sky Italia - Operation Evolution. London March 20th, 2018

The Case for Content-Adaptive Optimization

5G networks use-cases in 4G networks

Cognitive Radio Networks

JStorm Based Network Analytics Platform. Alibaba Cloud Senior Technical Manager, Biao Lyu

Adaptive Resync in vsan 6.7 First Published On: Last Updated On:

Intent Driven Network Operations with AppFormix Advanced Analytics Platform. Joseph Li

Transformation Through Innovation

Connecting to the Cloud

Guaranteeing Video Quality

INTELLIGENT DNS AS A CORNERSTONE OF DIGITAL TRANSFORMATION. CLOUDEXPO 2017

Why Performance Matters When Building Your New SD-WAN

Data Path acceleration techniques in a NFV world

Simplifying WAN Architecture

The Edge: Delivering the Quality of Experience of Digital Content

The information presented is subject to change without notice. Alcatel-Lucent assumes no responsibility for inaccuracies contained herein.

Mobile Edge Computing:

Computer with a Lens.

Knowledge-Defined Networking: Towards Self-Driving Networks

Transitioning Video to All-IP IP Video Better Than Broadcast. Fabio Souza Cisco Video Group May/2017

SDN AT THE SPEED OF BUSINESS THE NEW AUTONOMOUS PARADIGM FOR SERVICE PROVIDERS FAST-PATH TO INNOVATIVE, PROFITABLE SERVICES

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

VMWARE AND NETROUNDS ACTIVE ASSURANCE SOLUTION FOR COMMUNICATIONS SERVICE PROVIDERS

Demystifying Machine Learning

New Testing Paradigm for 5G IoT Ecosystem

Managing the Subscriber Experience

Axyom Small Cell Manager

NFV Infrastructure for Media Data Center Applications

Transcription:

Data Driven Networks Sachin Katti

Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver

CAN WE LEARN THE CONTROL PLANE OF THE NETWORK? DRONE ATC 4K VIDEO VIRTUAL REALITY Applications with Access to Network Control Northbound Application Interfaces Operator Policies Operator Operator & Policies, Intents Intent Policies, & SLA Intent & SLA Real-time Network Telemetry Control & Orchestration Plane automatically synthesized by learning from telemetry data Southbound Infrastructure Interfaces Edge Cloud RAN Infrastructure Control Points You specify what you want, and the network figures out how to deliver it! Result: Dynamic per user/flow control implemented to meet the policy and deliver on the SLA

DATA DRIVEN CONTROL LOOP Operator Policies Intent & SLAs Service KPI Apply Controls Realtime Data Prediction Network Telemetry Will Current Controls Meet KPI? Y Recommend Current Control Settings N What-If Adjust Control Knobs Will Adjusted Controls Meet KPI? Y Meets Operator Policies? Y Recommend Adjusted Control Settings N N

TWO LEARNING APPROACHES: Classical Machine Learning Deep Reinforcement Learning Reward Function Neural Network Reinforcement Learning (Prediction & What-if) Requires extensive domain knowledge Largely blind, truly approximates intent based system design

Classical Machine Learning Deep Reinforcement Learning Build Rules-based Algorithm / Model Learning Agent Deploy State Reward Actions Evaluate Results Application and Network state variables Reward values to achieve desired outcomes Controllable actions Tune / Update Algorithm / Model Environment Millions of Training Cycles per hour

DEEP RL IS A GREAT FIT FOR NETWORK CONTROL LEARN DIRECTLY FROM EXPERIENCE NO MODELS, BRITTLE ASSUMPTIONS POLICIES ADAPT TO ACTUAL ENVIRONMENT & WORKLOAD OPTIMIZE A VARIETY OF OBJECTIVES END-TO-END PROVIDE RAW OBSERVATIONS OF NETWORK CONDITIONS EXPRESS GOALS THROUGH REWARD SYSTEM DOES THE REST NETWORK CONTROL DECISIONS ARE OFTEN HIGHLY REPETITIVE LOTS OF TRAINING DATA SAMPLE COMPLEXITY LESS OF A CONCERN

LEARNING & CONTROL PIPELINE Inference Prediction Control Machine Learning Deep Learning Feature Extraction Neural Networks

INFERENCE Inference Understand how known network input variables affect network KPI (e.g throughput) Leverage traditional supervised machine learning and statistical analysis (random forest, etc.) to develop function f(x t+1 ) to be used as input for deep learning agent X = Average Throughput per user, per cell Current and past collisions per cell (a) Current and past number of users per cell (b) Current and past signal strength per user, per cell (c) X t = f(a, b, c)

PREDICTION Prediction Extracting features from Training Data (offline) Network State Variables a, b, c Input Features from realtime data feed Prediction Neural Network (real-time) X t+1 Throughput KPI Prediction

CONTROL Control Network State Variables Prediction Neural Network (real-time) Deep Learning (unsupervised, reinforced learning) Operator Policies Client Conditions App specific inputs Control Decision

USE CASE: VIDEO STREAMING Network Assisted Mobile Video Streaming CDN STREAM Optimization Objective: Maximize video quality and Minimize Stall Time in challenging network scenarios Next Chunk ABR Result: 50-75% Higher Video Quality 75%-100% Lower Stall Time Policy: Maintain minimum average throughput of 500Kbps per user for non-video traffic Data: Real-time Network State Data Feed Real-time Mobile Application State Data Control: Next Chunk Adaptive Bit Rate (ABR)

API SYSTEM ARCHITECTURE CDN Video Stream ABR Request LTE Video Encoded for ABR Streaming API Real-time Network Conditions ABR guidance request Network Assisted ABR Video ABR recommendation for next chunk 13

CELL DATA INGESTION IN TWO DEPLOYMENTS Melbourne Business District ~90 enodebs, ~270 cells Silicon Valley Corridor ~250 enodebs, ~1300 cells

-6-4 -2 0 2 4 6-6 -4-2 0 2 4 6-6 -4-2 0 2 4 6 INFER Video QoE depends on user throughput. What does user throughput depend on? 58% error 0 20 40 60 80 100 USER SESSION # 40% error 0 20 40 60 80 100 USER SESSION # Throughput bits per time 12% error 0 20 40 60 80 100 USER SESSION # Spectral efficiency bits per resource element Resource allocation rate resource elements per time CQI Rank MCS X Σ Active users Σ Pkt rate, size Cell bandwidth Cell contention User&total demand? User performance can be modeled fairly accurately using features extracted from real time network data feeds

remainder trend -20 20 60 seasonal data -20 20 60-20 20 60-20 20 60 15 20 25 30 35 40 PREDICT Predicting the network state variables and then throughput Actual Foreca st 4% error 00:00 01:00 02:00 03:00 04:00 05:00 0 20 40 60 80 100 time Number of users on a cell Test for seasonality and trend, learning ARIMA using a neural network

Predict + Control inputs Hidden layers past b_app ts conv1d relu past b_seg ts RSRP/RSRQ ts CELT ts relu relu linear thpt prediction video buffer states B R C A relu softmax ABR Policy LSTM thpt forecaster A3C ABR

BUFFER (seconds) BUFFER (seconds) BITRATE (kbps) BITRATE (kbps) WHAT BEHAVIOR DOES THE RL AGENT FOR CONTROLLING VIDEO ABR LEARN? VIDEO ON DEMAND LIVE VIDEO 2000 2000 ON DEMAND VIDEO (buffer limit up to 4 minutes) 1500 1000 1500 1000 LIVE VIDEO (buffer limit less than 30s) Build a safe buffer, then safely play the highest quality without stalling 60 500 0 13:52 13:53 13:54 13:55 13:56 LOCAL TIME 500 0 60 11:14 11:15 11:16 11:17 11:18 LOCAL TIME Track throughput predictions to play as high quality as possible without stalling 40 40 20 20 0 0 13:52 13:53 13:54 13:55 13:56 LOCAL TIME 11:14 11:15 11:16 11:17 11:18 LOCAL TIME Network predictions become more valuable as buffer limit grows smaller

DEPLOYMENT RESULTS FROM A US MOBILE NETWORK 83% 65% 58% 60% 59% 42% 33% 27% 22% BIT RATE 0% 4% -2% STALLS STARTUP BIT RATE STALLS STARTUP BIT RATE STALLS STARTUP BIT RATE STALLS STARTUP PERFECT CONDITIONS CELL EDGE (LOW SIGNAL QUALITY) HIGH CELL CONGESTION HIGH USER MOBILITY COMPARED TO CURRENT ABR SOLUTION 19 Operator Capped Max Bit Rates (1.8 Mbps)

OPEN DRBS CUMULATIVE STALL (seconds) (Mbps) WHY FINE-GRAINED NETWORK PREDICTION MATTERS FOR QOE 1.5 1 No context awareness at the start. Huge 1.0 0.5 0.0 20 10 0 17:39 17:40 17:41 17:42 17:43 17:44 17:45 2 30 startup time! 150 17:39 17:40 17:41 17:42 17:43 17:44 17:45 colour colour BITRATE THP CUM STALL Cell congestion can spike transiently (~10s of seconds). Huge stalls! 100 50 0 TYPE ERAB HANDOVER NORMAL -50 17:39 17:40 17:41 17:42 17:43 17:44 17:45

LINK QUALITY (db) CUMULATIVE STALL (seconds) (Mbps) WHY FINE-GRAINED NETWORK PREDICTION MATTERS FOR QOE 2.0 1.5 colour 1.0 BITRATE 0.5 0.0 20 10 13:11 13:12 13:13 13:14 13:15 13:16 THP colour CUM STALL 3 Downlink interference can spike transiently (~10s of seconds). Huge stalls! 0 13:11 13:12 13:13 13:14 13:15 13:16-12 -14 colour -16 NEIGHBOR SERVING -18-20 13:11 13:12 13:13 13:14 13:15 13:16

WHY FINE-GRAINED NETWORK PREDICTION MATTERS FOR QOE Peaks correspond directly with red lights on Univ Ave, as users get handed off

USE CASE: CONTENT PREPOSITIONING Content Pre-Positioning CDN CONTENT Optimization Objective: Maximize content downloads, while maintaining operator defined minimum throughput requirements When and how much data to download Result: More than double data downloaded on existing infrastructure with deterministic impact on other subscribers Policy: Maintain minimum average throughput of 500Kbps per user Data: Real-time Network State Data Feed Control: When and how much data to download

USE CASE: MOBILE NETWORK LOAD BALANCING RAN Load Prediction RAN Controller A SIGNAL LOAD SIGNAL 1 RAN Handoff Control B LOAD RAN Load Balancing Optimization Objective: In cases where signal strength is sufficient on two or more cells, manage connectivity to balance load across available cells. Policy: Maintain minimum average throughput of 500Kbps per user Data: Real-time Network State Data Feed 2 Control: When and which user to handoff to neighboring cell?

API API API PLATFORM ARCHITECTURE DEEP LEARNING ENGINE Dynamic Network Conditions Uhana Managed Service (Cloud Infrastructure) Application Control Points App Specific Neural Networks App Control CTR/CTUM Feed Over-the-Air Real-time Per-User Fine-Grained Monitoring Deep Learning Engine App Control Fine Grained Mobile Network Visibility Operator App Specific Intents (Desired Outcomes) Operator Policies (Constraints) Varied based on: User, Subscription Level, Device Type, Content Type, etc. 25

THE INFER-PREDICT-CONTROL PRIMITIVES OFFLINE INFER e.g., link throughput What network metrics affect app QoE metrics? e.g., avg bitrate, stalls, switches REAL TIME PREDICT CONTROL e.g., link quality, cell congestion etc. How will network metrics look in the future? e.g., link throughput in the next 5s What should be the app control for the best app QoE? e.g., ABR for the next video chunk e.g., buffer, app throughput

PLATFORM FEEDBACK: PROCESSING LATENCY DICTATES PREDICTION REQUIREMENTS INFER PREDICT CONTROL INGEST PROCESS FORECAST Real-time consumption of network data Real-time computation of network metrics Real-time prediction of network metrics Lower the latency of processing, lower the requirement on prediction horizon (easier the prediction)

CONCLUSION & LESSONS LEARNED Network control is a Big Control problem Learning techniques can lead to automated control and intent based system design Infer + Predict + Control is a general framework Applies to systems beyond mobile networks Streaming analytics pipeline latencies are hard to predict right now Direct implication on hardness of prediction Analytics pipeline system performance requires careful manual tuning Hard to iterate and keep performance predictable Online learning is still an unsolved problem, both in learning techniques as well as system design