Efficient Content Delivery and Low Complexity Codes. Amin Shokrollahi

Similar documents
Summary of Raptor Codes

Online codes. (Extended Abstract) Petar Maymounkov TR November, Abstract

CSCI-1680 Link Layer I Rodrigo Fonseca

CSCI-1680 Link Layer Reliability Rodrigo Fonseca

Networked Systems and Services, Fall 2018 Chapter 2. Jussi Kangasharju Markku Kojo Lea Kutvonen

Lec 19 - Error and Loss Control

Lecture 4: CRC & Reliable Transmission. Lecture 4 Overview. Checksum review. CRC toward a better EDC. Reliable Transmission

CSE 123: Computer Networks Alex C. Snoeren. HW 1 due NOW!

Efficient Erasure Correcting Codes

Packet-Level Forward Error Correction in Video Transmission

CSCI-1680 Link Layer Reliability John Jannotti

ELEC 691X/498X Broadcast Signal Transmission Winter 2018

CSEP 561 Error detection & correction. David Wetherall

Codes, Bloom Filters, and Overlay Networks. Michael Mitzenmacher

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning

PROACTIVE RELIABLE BULK DATA DISSEMINATION IN SENSOR NETWORKS 1

Coding theory for scalable media delivery

A Digital Fountain Approach to Asynchronous Reliable Multicast

CMPE 257: Wireless and Mobile Networking

Lecture 4 September 17

Lecture 7: Sliding Windows. CSE 123: Computer Networks Geoff Voelker (guest lecture)

Long Erasure Correcting Codes: the New Frontier for Zero Loss in Space Applications? Enrico Paolini and Marco Chiani

Programmation système

Takes a file of k packets, encodes it into n

Finding Small Stopping Sets in the Tanner Graphs of LDPC Codes

Analysis of TCP Latency over Wireless Links Supporting FEC/ARQ-SR for Error Recovery

CS244a: An Introduction to Computer Networks

UNIT IV -- TRANSPORT LAYER

Raptor Codes for P2P Streaming

BATS: Achieving the Capacity of Networks with Packet Loss

Ad hoc and Sensor Networks Chapter 6: Link layer protocols. Holger Karl

Graph based codes for distributed storage systems

Error correction guarantees

Lecture 6: Multicast

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

Data Link Control Protocols

A Digital Fountain Approach to Reliable Distribution of Bulk Data

On the construction of Tanner graphs

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

ELEN Network Fundamentals Lecture 15

Communications Software. CSE 123b. CSE 123b. Spring Lecture 3: Reliable Communications. Stefan Savage. Some slides couresty David Wetherall

Implementation and Evaluation of Transport Layer Protocol Executing Error Correction (ECP)

OSI Layer OSI Name Units Implementation Description 7 Application Data PCs Network services such as file, print,

CSCI-1680 Transport Layer III Congestion Control Strikes Back Rodrigo Fonseca

The Importance of Being Opportunistic

CS 641 Project Report Error resilient video transmission over wireless networks. December Gang Ding

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2015

An Efficient Link Bundling Transport Layer Protocol for Achieving Higher Data Rate and Availability

Gain in prob. of success using FEC over non FEC Fraction of FEC (n k)/k

UDP, TCP, IP multicast

Homework 1. Question 1 - Layering. CSCI 1680 Computer Networks Fonseca

Optimizing Joint Erasure- and Error-Correction Coding for Wireless Packet Transmissions

FORWARD ERROR CORRECTION CODING TECHNIQUES FOR RELIABLE COMMUNICATION SYSTEMS

Reliable Multipath Routing with Fixed Delays in MANET Using Regenerating Nodes

PRESENTED BY SARAH KWAN NETWORK CODING

A Digital Fountain Approach to Reliable Distribution of Bulk Data

Multiple unconnected networks

TCP over wireless links

4. Error correction and link control. Contents

Impact of transmission errors on TCP performance. Outline. Random Errors

Transport Layer Protocols for the Land Mobile Satellite Broadcast Channel

All About Erasure Codes: - Reed-Solomon Coding - LDPC Coding. James S. Plank. ICL - August 20, 2004

EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS

DIGITAL fountain codes are a class of code that

Comparison of ISO-OSI and TCP/IP Suit. Functions of Data Link Layer:

1 Reception efficiency

II. Principles of Computer Communications Network and Transport Layer

Data and Computer Communications. Protocols and Architecture

CSC310 Information Theory. Lecture 21: Erasure (Deletion) Channels & Digital Fountain Codes. November 22, 2006 see

Lecture 7: Flow & Media Access Control"

2.1 CHANNEL ALLOCATION 2.2 MULTIPLE ACCESS PROTOCOLS Collision Free Protocols 2.3 FDDI 2.4 DATA LINK LAYER DESIGN ISSUES 2.5 FRAMING & STUFFING

Lecture 2: Links and Signaling

Lecture 6: Reliable Transmission. CSE 123: Computer Networks Alex Snoeren (guest lecture) Alex Sn

TCP so far Computer Networking Outline. How Was TCP Able to Evolve

Guide To TCP/IP, Second Edition UDP Header Source Port Number (16 bits) IP HEADER Protocol Field = 17 Destination Port Number (16 bit) 15 16

CSC310 Information Theory. Lecture 22: Erasure (Deletion) Channels & Digital Fountain Codes. November 30, 2005 see

CS 268: Computer Networking. Taking Advantage of Broadcast

ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS

Exercises TCP/IP Networking With Solutions

AN ABSTRACT OF THE THESIS OF. Arul Nambi Dhamodaran for the degree of Master of Science in

RAJIV GANDHI COLLEGE OF ENGINEERING AND TECHNOLOGY

Wireless Sensornetworks Concepts, Protocols and Applications. Chapter 5b. Link Layer Control

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

CHAPTER 5 PROPAGATION DELAY

Transport Layer Congestion Control

AL-FEC for Streaming Services over LTE Systems

Fountain Codes Based on Zigzag Decodable Coding

Transport layer issues

Error Detection Codes. Error Detection. Two Dimensional Parity. Internet Checksum Algorithm. Cyclic Redundancy Check.

CS/ECE 438: Communication Networks for Computers Spring 2018 Midterm Examination Online

RTP/RTCP protocols. Introduction: What are RTP and RTCP?

Chapter 3: Transport Layer Part A

Wireless Internet Routing. Learning from Deployments Link Metrics

Networking Link Layer

International Journal of Innovations in Engineering and Technology (IJIET)

UDP: Datagram Transport Service

Managing Caching Performance and Differentiated Services

CSCI 1680 Computer Networks Fonseca. Exam - Midterm. Due: 11:50am, 15 Mar Closed Book. Maximum points: 100

COMP750. Distributed Systems. Network Overview

Networking interview questions

Transcription:

Efficient Content Delivery and Low Complexity Codes Amin Shokrollahi

Content Goals and Problems TCP/IP, Unicast, and Multicast Solutions based on codes Applications

Goal Want to transport data from a transmitter to one or more receivers over IP reliably. Current solutions have limitations when amount of data is large and Number of receivers is large (point-multipoint transmission) Video on Demand, new versions of computer games Or, network suffers from unpredictable and transient losses Satellite, wireless Or, connection from transmitter to receiver goes over many hops Software company with development sites around the globe

Cost Measures and Scalability Cost measures: Number of servers Outgoing server bandwidth Bandwidth utilization A solution is called scalable if its cost does not increase with the number of recipients. (Server-scalable, bandwidth-scalable.) Are interested in scalable and reliable solutions which maximize bandwidth utilization.

Implicit Assumptions The solution has to be end-to-end, i.e., should run on any network with IP infrastructure without the need to change intermediate protocols. The solution should be computationally efficient.

TCP over IP TCP ensures reliability using acknowledgements. To avoid acknowledging all packets, a small window is used which constitutes packets in transit.

TCP/IP

TCP/IP TCP/IP is Reliable Not server scalable Processing of ack s increases with number of receivers. Not bandwidth scalable Each recipient requires a different connection. Not throughput maximizing Loss rate of 2% reduces bandwidth utilization by factor of 5, instead of 0.98.

UDP Unicast No back-channel to server. Not reliable Not server scalable Not bandwidth scalable Maximizes throughput

UDP Multicast Is not reliable, Is server scalable, Is bandwidth scalable, Maximizes throughput.

Scalability The only known end-to-end bandwidth scalable protocol is Multicast. It is not widely deployed because of fear of network breakdown (little, or no flow control). In the following, we assume that the network is multicast enabled. For more realisitic scenarios, one should replace Multicast with Unicast, and lose the bandwidth scalability.

Channel Model On a computer network data is sent as packets. Each packet has an identifier which identifies the entity it is coming from and its position within that entity. Each packet has a CRC checksum to check its integrity. Corrupted packets can be regarded as lost. Can concentrate on the erasure channel as a model for transmitting packets.

Solution: Codes Want to have the advantages of Multicast and TCP/IP, but not their disadvantages. Encode the original data and send encoded version across the network. Reconstruction is possible if not too many packets were lost. Reliability Coding. Scalability Multicast (or unicast).

Reed-Solomon Codes Choose pairwise different values x 1,x 2,...,x n in F q. RS-Code of dimension k is the image of the map F q [x] <k F n q, f (f(x 1 ),f(x 2 ),...,f(x n )).

Reed-Solomon Codes: Efficiency Issues Encoding: multiplication of vector of length k with k (n k)-matrix; needs k (n k) additions/multiplications in F q. If q = 2 l, then operations in F q need l 2 bit-operations. Number of operations per bit is n l R (1 R), where R is rate of the code. Decoding: More than k e operations over F q where e is number of erasures within the information positions. Number of operations per bit is l e. Depending on l, number of operations is moderate if n k is small, i.e., only few erasures can be corrected.

Effect of Packet Size and Interleaving Internet packets are typically of size 1024 bytes, but alphabet of 1024 bytes is difficult to handle in software. Good alphabet size in practice: 1 byte. Use interleaving: Can deal with files of size 256 KB at most. What about larger files?

Packet interleaving Good if burst losses, extremely bad if random losses: Will receive lots of duplicate packets. (Occupancy problem.)

Reed-Solomon Codes: Summary Encoding and decoding complexity per bit is l k (1 k/n). Reasonable encoding, decoding, and reception speeds if file is small, few redundant symbols, and worst possible loss rate of network known in advance. But: Need lots of redundant symbols in many applications. Need codes for which encoding and decoding complexity per bit is constant, and which do not need packet interleaving.

How to Make Reed-Solomon Codes more Efficient? Use small blocks to compute RS redundant symbols from. Choose the blocks randomly.

How to Make Reed-Solomon Codes more Efficient? Use small blocks to compute RS redundant symbols from. Choose the blocks randomly.

LDPC Codes Constructed from sparse bipartite graphs. Left nodes are called message nodes, right nodes are called check nodes.

Construction a b a c f! = 0 c b c! d e= 0 d e! a c e= 0 f Every binary linear code has such a representation, but not every code can be represented by a sparse graph. Encoding/decoding times?

Decoding Luby-Mitzenmacher-Shokrollahi-Spielman-Stemann, 1997: Phase 1: Direct recovery a? c?? f g h b b e b d d e

Decoding Phase 2: Substitution a b c?? f g h b d e d e

Decoding Time If each symbol is interpreted as a packet with b bits, then we need #E b operations to produce n b bits, where #E is the number of edges in the graph, and n is the block-length of the code. Number of operations per bit is which is constant for sparse graphs. #E/n

The Inverse Problem We have a fast decoding algorithm. Want the design the graph in such a way that many erasures are correctable with this algorithm.

Experiments Choose random bi-regular graph. p 0 := Maximal fraction of correctable erasures. d k d/k p 0 3 6 0.5 0.429 4 8 0.5 0.383 5 10 0.5 0.341 3 9 0.33 0.282 4 12 0.33 0.2572 Where do these numbers come from?

Theorem (Luby, Mitzenmacher, Shokrollahi, Spielman, Stemann 1997) A random (d,k)-graph allows correction of a p 0 -fraction of erasures (with high probability) if and only if p 0 (1 (1 x) k 1 ) d 1 < x für x (0,p 0 ). d = 3,k = 6: f(x) = p 0 (1 (1 x) 5 ) 2 x p 0 = 0.435 p 0 = 0.429 p 0 = 0.41

Analysis: (3, 6)-Graph Expand neighborhood of message node

Analysis: (3, 6)-Graph p i = Probability that a message node is not corrected after the i-th iteration. Message p i+1 Check Message p i Check p i+1 = p 0 (1 (1 p i ) 5 ) 2 <p i.

Analysis: (3, 6)-Graph Rigorous argument: Neighborhood is a tree with high probability. Above argument works fine for the expected fraction of erasures after the i-th iteration. Actual value is concentrated around the expectation p l : Edge exposure martingale, Azuma s Inequality.

The General Case λ i and ρ i fraction of edges of degree i on the left and the right hand side of the graph. λ(x) := iλ i x i 1, ρ(x) := iρ i x i 1. Condition for successful decoding given a loss fraction p 0 : for all x (0,p 0 ). p 0 λ(1 ρ(1 x)) < x

Encoding Trick: Use erasure correction Richardson-Urbanke, 1999: linear time encoding possible under mild conditions. Enough message nodes of degree 2: λ 2 ρ (1) > 1. ρ(1 λ(1 x)) < x for x (0,1).

Tornado Codes Want to design codes which can asymptotically correct an optimal fraction of erasures. Design λ and ρ such that p 0 λ(1 ρ(1 x)) < x for all x (0,p 0 ), and p 0 arbitrarily close to 1 R = 1 ρ(x)dx 0 1. λ(x)dx 0

Tornado Codes Choose design parameter D: 1 λ(x) := H(D) ( x+ x2 2 + + xd D ) ρ(x) := exp(µ(x 1)), H(D) = 1+1/2+ +1/D, µ = H(D)(1 R)/(1 1/(D +1)). p 0 λ(1 ρ(1 x)) = p 0 λ(1 exp( µx)) < p 0 H(D) ln(exp( µx)) = µ p 0 H(D) x < x. This is true for p 0 < H(D)/µ = (1 R)(1 1/(D +1)).

Tornado Codes: Efficiency Need k (1+ε) entries of a codeword to recover the codeword. Per-bit running time of encoding is O(log(1/ǫ)). Per-bit running time of the decoder is O(log(1/ǫ)/R). Tornado codes scale to arbitary lengths and do not requrie packet interleaving. However, need a good guess of the loss rate of the channel.

Theoretical Applications Capacity achieving sequences on the erasure channel Relationship to random graphs Codes on other channels Algebraic constructions Analysis of finite length codes Cryptography...

Practical Applications Receiver driven rate control Multipath delivery Download from multiple servers Peer-to-peer networks Video-on-Demand Fault tolerant storage...

Receiver Driven Rate Control Network traffic generated by UDP should be fair to other network flows like TCP. Content Different parts of encoding offered at different rates Clients pull content at their own rate A lot of loss, by design. Avoiding duplicate packets is difficult.

Path Diversity Send data across different paths to receiver to bypass links that are down. Avoiding duplicate packets is difficult.

Shortcomings of Traditional Codes Protocols involving erasure correcting codes are excellent candidates for replacing TCP/IP in certain applications. However, traditional codes have many disadvantages: One has to have a good guess on the loss rate of the clients; this is very difficult in scenarios like mobile wireless. Many applications require coordination between senders and receivers to avoid reception of duplicate packets. Many applications require codes of very small rate. But, the running time of fast codes like Tornado codes is proportional to the blocklength, rather than the length of the original content.

What we Really Want To design a completely receiver driven and scalable system one needs codes That adapt themselves to the individual loss rates of the clients; clients with more loss need longer to recover the content; For which the decoding time depends only on the length of the content; That achieve capacity of the erasure channel between server and any of the clients; That have fast encoding and decoding algorithms.

Beyond Tornado: LT Codes Michael Luby has invented a class of codes that achieves all these goals. Input Symbols Output Symbols Their per-bit complexity of encoding/decoding is log(k). (This is optimal.) If a receiver has loss rate p, then the code corresponding to that receiver has rate 1 p c/ k.

Relationship to Random Graphs Consider codes in which all message nodes have degree 2. These are graphs on the set of check nodes. 1 2 2 3 3 4 5 1 5 4 Choose neighbors of message nodes randomly (Poisson distribution). This yields a random graph on the set of check nodes. λ(x) = x, ρ(x) = e a(x 1), a is average degree. λ(1 ρ(1 x)) < x, i.e., 1 x e ax < 0.

Relationship to Random Graphs Largest solution of 1 x e ax = 0 in the interval (0,1) gives fraction of the largest component of the graph. There is a giant component if a > 1. The decoder will not be able to correct all erasures iff a > 1, i.e., iff there is a giant component. 1 1 x e 1.5x 1 x e x 1 x e 0.5x

Relationship to Random Graphs Above relationship can be put into a general framework, which works even when not all the message nodes are of degree 2. The Tornado distribution can be obtained from this observation using a self-similarity assumption. This connection also yields linear time encoders.