The Power of Prediction: Cloud Bandwidth and Cost Reduction Eyal Zohar Israel Cidon Technion Osnat(Ossi) Mokryn Tel-Aviv College
Traffic Redundancy Elimination (TRE) Traffic redundancy stems from downloading same or similar information items. We found around 70% redundancyin end-clients traffic, compared with past traffic and local files. 2
TRE Importance Moving to the cloud => higher e2e traffic. Cloud users pay for traffic used in practice => incentive to use TRE. Application Cloud User Pay for Use Cloud Traffic Cloud Provider End-user TRE 3
How TRE Works Server parses the outgoing stream to contentbased chunks and signs with SHA-1 Rolling hash Byte stream Anchor 1 Anchor 2 Anchor 3 Anchor 4 SHA-1 signature Chunk 1 Chunk 2 Chunk 3 Sign. 1 Sign. 2 Sign 3 New bytes Insertion example Chunk 1 Chunk 2 Chunk 3 4
Problems in Existing Solutions In the cloud environment: 1. High processing costs in the cloud. 2. Scalability remember each client. 3. Elasticity - unaware of data from other sources. 4. Do not handle long-term repeats (days/weeks). Server 2 Receiver Server 1 5
Our Solution: PACK (Predictive ACK) Redundancy detection by the client. Repeats appear in chains. Tries to match incoming chunks with a previously received chain or local file. Sends to the server predictionsof the future data. 6
PACK: The Client Prediction Stream chunks Chunk 1 Chunk 2 Chunk 3 SHA-1 signature Chain of chunks Sign. 1 Sign. 2 Sign 3 Received Prediction Each prediction: 1.TCP seq. no server parsing 2.Hint spare unnecessary SHA-1 3.SHA-1 signature TCP seq. Chunk SHA-1 Last-byte hint 7
PACK: Server Operation The server compares the hint with the last-byte to sign. Upon a hint match it performs the expensive SHA-1. PACK saves cloud s computational effort in the absence of redundancy. First receiver-basedtre: the server does not parse. It signs with >99% confidence. 3 1 2 Local storage Client Chain 2,3? 2,3V Server 1 2 3 8
PACK Benefits Minimizes processing costs induced by TRE. Signs with SHA-1 in the presence of redundancy. Receiver-based end-to-end TRE => suitable for cloud server elasticity and client mobility. Does not require the server to continuously maintain clients status. 9
Server Effort Experiment Several data-sets in 3 modes: baseline no-tre, PACK and a sender-based TRE. 140% 25%-30% redundancy: common to many data-sets Single Server Cloud Operational Cost (100%=without TRE system) 120% 100% 80% 60% 40% 20% Sender-based EndRE-like PACK 0% 0% 10% 20% 30% 40% 50% Redundancy Elimination Ratio 10
YouTube Redundancy Traces of 40k clients, captured at an ISP. Found 30% end-to-end (personal) redundancy. 3.0 35% All YouTube Traffic (Gbps) 2.5 2.0 1.5 1.0 0.5 YouTube Traffic PACK TRE 30% 25% 20% 15% 10% 5% PACK TRE (Removed Redundancy) 0.0 0% Time (24 hours) 11
80% 70% Long-Term TRE Social network: eliminated 30%with one hour cache and 75% with a long-term cache. Average Redundancy of Daily Traffic 60% 50% 40% 30% 20% 10% 0% Unlimited 1 Hour 24 Hours 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Days Since Start 12
Cloud Email Redundancy Gmail account with 1,000 Inbox messages. Found 32%static redundancy (higher when messages are read multiple times). 300 Traffic Volume Per Month (MB) 250 200 150 100 50 0 Redundant Non-redundant Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 13
Implementation Linux with NetfilterQueue, 25k lines of C and Java, available for download. Receiver-sender protocol is embedded in the TCP Options field. Transparent use at both sides. 14
Processing Effort in the Client Laptop experiment: PACK-related CPU consumption is ~4% when playing HD video (9 Mbps with 30% redundancy). Smartphone experiment: PACK consumes ~3% of the battery power when processing 1 GB video (avg. monthly data plan). Virtual traffic saves the client the need to chunk or sign. 15
New Chunking Algorithm Most existing solutions use Rabin fingerprint. 16
New Chunking Algorithm 64 bits Mask=00 00 8A 31 10 58 30 80 n n-1 n-2 n-3 n-4 n-5 n-6 n-7 n-8 n-40 n-41 n-42 n-43 n-44 n-45 n-46 n-47 17
Summary Current TRE solutions may not reduce cloud cost. PACK is the first receiver-based TRE leverages the power of prediction. Minimizes processing costs induced by TRE. Suitable for cloud server migration and client mobility. Implementation is available for download. 18