The Power of Prediction: Cloud Bandwidth and Cost Reduction

Similar documents
Cloud computing is an emerging IT paradigm that provides

This report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju

Data Transfers in the Grid: Workload Analysis of Globus GridFTP

PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Broadband Rate Design for Public Benefit

Efficient Resource Allocation And Avoid Traffic Redundancy Using Chunking And Indexing

software.sci.utah.edu (Select Visitors)

2

2

2

ICT PROFESSIONAL MICROSOFT OFFICE SCHEDULE MIDRAND

FREQUENTLY ASKED QUESTIONS

All King County Summary Report

App Economy Market analysis for Economic Development

Polycom Advantage Service Endpoint Utilization Report

Monthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

PRESENTATION TITLE GOES HERE

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Section 1.2: What is a Function? y = 4x

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Blockstack, a New Internet for Decentralized Apps. Muneeb Ali

HPE Security Data Security. HPE SecureData. Product Lifecycle Status. End of Support Dates. Date: April 20, 2017 Version:

3. EXCEL FORMULAS & TABLES

Excel Functions & Tables

Traffic Types and Growth in Backbone Networks

University of Osnabruck - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Q3 FY18 Connections Update 13 April 2018

Security of End User based Cloud Services Sang Young

Opera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Q1 FY18 Connections Update 13 October 2017

Yield Reduction Due to Shading:

Polycom Advantage Service Endpoint Utilization Report

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 1, Jan-Feb 2015

HIGH RISK REPORT J.CREW GROUP, INC. September 14, 2017

Docker and HPE Accelerate Digital Transformation to Enable Hybrid IT. Steven Follis Solutions Engineer Docker Inc.

Eindhoven University of Technology - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

2Q Main Highlights: Accelerated Recovering Path

SCI - software.sci.utah.edu (Select Visitors)

Network. Gediz SEZGİN CAPITAL MARKETS DAY 2018

IKS Service GmbH - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Asks for clarification of whether a GOP must communicate to a TOP that a generator is in manual mode (no AVR) during start up or shut down.

POSTAL AND TELECOMMUNICATIONS REGULATORY AUTHORITY OF ZIMBABWE (POTRAZ)

San Francisco Housing Authority (SFHA) Leased Housing Programs October 2015

Birthdates by Species 12,064. Beef Bison Dairy Sheep. Birthdates by Year

METERS AND MORE: Beyond the Meter

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud

COURSE LISTING. Courses Listed. with SAP HANA. 15 February 2018 (05:18 GMT) HA100 - SAP HANA. HA250 - Migration to SAP HANA using DMO

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

September Real Sector Statistics Division. Methodology

Multimedia Streaming. Mike Zink

For personal use only

UNITE Managing Metering: How to Plan for, Tune and Monitor Your Metered Platform. Session MCP3040/OS3040. Guy Bonney & Bob Morrow, MGS, Inc.

South Platte Summary January Compiled by Lee Cunning, P.E.

Mpoli Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Web Gateway Security Appliances for the Enterprise: Comparison of Malware Blocking Rates

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.

APPENDIX E2 ADMINISTRATIVE DATA RECORD #2

University of California - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Cisco Visual Networking Index (VNI) Mobile Data Forecast

Annex A to the DVD-R Disc and DVD-RW Disc Patent License Agreement Essential Sony Patents relevant to DVD-RW Disc

ACTIAN PRODUCTS by Platform - Vector, Vector in Hadoop as of October 18, 2017

XS4ALL Networks - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

IDG Services Metrics. Kerberos Usage. Weblogin Usage. Kerberos & Webauth Services. Kerberos & Webauth Users. Authentication Metrics

GWDG Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Heilbronn University - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Quatius Corporation - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Spiegel Research 3.0 The Mobile App Story

For personal use only. Q1 FY17 Connections update 21 October 2016

October Real Sector Statistics Division. Methodology

AVM Networks - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

POSTAL AND TELECOMMUNICATIONS REGULATORY AUTHORITY OF ZIMBABWE (POTRAZ)

S K Gupta Advisor, TRAI

Measurement and Verification: Savings? Mining and Industrial Energy Optimisation 2010

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on

EVM & Project Controls Applied to Manufacturing

95 th Percentile Billing

SCI - NIH/NCRR Site. Web Log Analysis Yearly Report Report Range: 01/01/ :00:00-12/31/ :59:59.

Control Center Release Notes

Nigerian Telecommunications Sector

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

NCC Cable System Order

Cloud Transcoder: Bridging the Format and Resolution Gap between Internet Videos and Mobile Devices

Omega Engineering Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

The CPA Exam and Requirements. Adapted and modified from material originally created by David Reinus.

XEmacs Project Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Rzeszow University Of Technology - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

ADVALO TRAINING SCHEDULE FOR THE YEAR Exadata Database Machine: 12c Administration Workshop Ed 1

National Aeronautics and Space Admin. - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

2016 MEDIA INFORMATION. energyegypt.net

CMIS 102 Hands-On Lab

1 of 10 8/10/2009 4:51 PM

Asia Key Economic and Financial Indicators

Safaricom Ltd FY 2011 Results Announcement 18 th May 2011

The CPA Exam and Requirements. Adapted and modified from material originally created by David Reinus.

University of Valencia - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Asia Key Economic and Financial Indicators

Transcription:

The Power of Prediction: Cloud Bandwidth and Cost Reduction Eyal Zohar Israel Cidon Technion Osnat(Ossi) Mokryn Tel-Aviv College

Traffic Redundancy Elimination (TRE) Traffic redundancy stems from downloading same or similar information items. We found around 70% redundancyin end-clients traffic, compared with past traffic and local files. 2

TRE Importance Moving to the cloud => higher e2e traffic. Cloud users pay for traffic used in practice => incentive to use TRE. Application Cloud User Pay for Use Cloud Traffic Cloud Provider End-user TRE 3

How TRE Works Server parses the outgoing stream to contentbased chunks and signs with SHA-1 Rolling hash Byte stream Anchor 1 Anchor 2 Anchor 3 Anchor 4 SHA-1 signature Chunk 1 Chunk 2 Chunk 3 Sign. 1 Sign. 2 Sign 3 New bytes Insertion example Chunk 1 Chunk 2 Chunk 3 4

Problems in Existing Solutions In the cloud environment: 1. High processing costs in the cloud. 2. Scalability remember each client. 3. Elasticity - unaware of data from other sources. 4. Do not handle long-term repeats (days/weeks). Server 2 Receiver Server 1 5

Our Solution: PACK (Predictive ACK) Redundancy detection by the client. Repeats appear in chains. Tries to match incoming chunks with a previously received chain or local file. Sends to the server predictionsof the future data. 6

PACK: The Client Prediction Stream chunks Chunk 1 Chunk 2 Chunk 3 SHA-1 signature Chain of chunks Sign. 1 Sign. 2 Sign 3 Received Prediction Each prediction: 1.TCP seq. no server parsing 2.Hint spare unnecessary SHA-1 3.SHA-1 signature TCP seq. Chunk SHA-1 Last-byte hint 7

PACK: Server Operation The server compares the hint with the last-byte to sign. Upon a hint match it performs the expensive SHA-1. PACK saves cloud s computational effort in the absence of redundancy. First receiver-basedtre: the server does not parse. It signs with >99% confidence. 3 1 2 Local storage Client Chain 2,3? 2,3V Server 1 2 3 8

PACK Benefits Minimizes processing costs induced by TRE. Signs with SHA-1 in the presence of redundancy. Receiver-based end-to-end TRE => suitable for cloud server elasticity and client mobility. Does not require the server to continuously maintain clients status. 9

Server Effort Experiment Several data-sets in 3 modes: baseline no-tre, PACK and a sender-based TRE. 140% 25%-30% redundancy: common to many data-sets Single Server Cloud Operational Cost (100%=without TRE system) 120% 100% 80% 60% 40% 20% Sender-based EndRE-like PACK 0% 0% 10% 20% 30% 40% 50% Redundancy Elimination Ratio 10

YouTube Redundancy Traces of 40k clients, captured at an ISP. Found 30% end-to-end (personal) redundancy. 3.0 35% All YouTube Traffic (Gbps) 2.5 2.0 1.5 1.0 0.5 YouTube Traffic PACK TRE 30% 25% 20% 15% 10% 5% PACK TRE (Removed Redundancy) 0.0 0% Time (24 hours) 11

80% 70% Long-Term TRE Social network: eliminated 30%with one hour cache and 75% with a long-term cache. Average Redundancy of Daily Traffic 60% 50% 40% 30% 20% 10% 0% Unlimited 1 Hour 24 Hours 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Days Since Start 12

Cloud Email Redundancy Gmail account with 1,000 Inbox messages. Found 32%static redundancy (higher when messages are read multiple times). 300 Traffic Volume Per Month (MB) 250 200 150 100 50 0 Redundant Non-redundant Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 13

Implementation Linux with NetfilterQueue, 25k lines of C and Java, available for download. Receiver-sender protocol is embedded in the TCP Options field. Transparent use at both sides. 14

Processing Effort in the Client Laptop experiment: PACK-related CPU consumption is ~4% when playing HD video (9 Mbps with 30% redundancy). Smartphone experiment: PACK consumes ~3% of the battery power when processing 1 GB video (avg. monthly data plan). Virtual traffic saves the client the need to chunk or sign. 15

New Chunking Algorithm Most existing solutions use Rabin fingerprint. 16

New Chunking Algorithm 64 bits Mask=00 00 8A 31 10 58 30 80 n n-1 n-2 n-3 n-4 n-5 n-6 n-7 n-8 n-40 n-41 n-42 n-43 n-44 n-45 n-46 n-47 17

Summary Current TRE solutions may not reduce cloud cost. PACK is the first receiver-based TRE leverages the power of prediction. Minimizes processing costs induced by TRE. Suitable for cloud server migration and client mobility. Implementation is available for download. 18