HighPerf or mance Swi tchi ng and Routi ng Tele com Cent erw orksh op: Sept 4, 1 97. Delaunay Triangulation Overlays Jörg Liebeherr 2003 1 HyperCast Project HyperCast is a set of protocols for large-scale overlay multicasting and peer-to-peer networking Motivating Research Problems: How to organize thousands of applications in a virtual overlay network? How to do multicasting in very large overlay networks? Jörg Liebeherr 2003 2
Overlay Multicasting Logical overlay resides on top of the Layer-3 network Data is transmitted between neighbors in the overlay No network support needed Overlay topology should match the Layer-3 infrastructure Jörg Liebeherr 2003 3 HyperCast Approach Build overlay network as a graph with known properties N-dimensional (incomplete) hypercube Delaunay triangulation Advantages: Achieve good load-balancing Exploit symmetry Next-hop routing in the overlay is free Claim: Can improve scalability of multicast and peer-topeer networks by orders of magnitude over existing solutions Jörg Liebeherr 2003 4
Hypercast Software Applications organize themselves to form a logical overlay network with a given topology No central control Dynamic membership hypercube Delaunay triangulation Jörg Liebeherr 2003 5 Data Transfer Data is distributed neighborto-neighbor in the overlay network 110 101 111 010 011 000 001 Jörg Liebeherr 2003 6
Delaunay Triangulation Overlays Jörg Liebeherr 2003 7 Nodes in a Plane Nodes are are assigned x-y x-y coordinates (e.g., based on on geographic location) 0,3 0,2 0,1 0,0 1,0 2,0 3,0 Jörg Liebeherr 2003 8
Voronoi Regions The The Voronoi region of of a node is is the the region of of the the plane that that is is closer to to this this node than than to to any any other node. Jörg Liebeherr 2003 9 Delaunay Triangulation The The Delaunay triangulation has has edges between nodes in in neighboring Voronoi regions. Jörg Liebeherr 2003 10
Delaunay Triangulation An An equivalent definition: A triangulation such that that each circumscribing circle of of a triangle formed by by three vertices, no no vertex of of is is in in the the interior of of the the circle. Jörg Liebeherr 2003 11 Locally Equiangular Property Sibson 1977: Maximize the minimum angle C For every convex quadrilateral formed by triangles ACB and ABD that share a common edge AB, the minimum internal angle of triangles ACB and ABD is at least as large as the minimum internal angle of triangles ACD and CBD. A α D β B Jörg Liebeherr 2003 12
Next-hop routing with Compass Routing A node s parent in a spanning tree is its neighbor which forms the smallest angle with the root. A node need only know information on its neighbors no routing protocol is needed for the overlay. A Root B is the Node s Parent B 30 15 Node Jörg Liebeherr 2003 13 Spanning tree tree when node () () is is root. root. The The tree tree can can be be calculated by by both both parents and and children. Jörg Liebeherr 2003 14
Evaluation of Delaunay Triangulation overlays Delaunay triangulation can consider location of nodes in an (x,y) plane, but is not aware of the network topology Question: How does Delaunay triangulation compare with other overlay topologies? Jörg Liebeherr 2003 15 Hierarchical Delaunay Triangulation 2-level hierarchy of Delaunay triangulations The node with the lowest x-coordinate in a domain DT is a member in 2 triangulations Jörg Liebeherr 2003 16
Multipoint Delaunay Triangulation Different ( implicit ) hierarchical organization Virtual nodes are positioned to form a bounding box around a cluster of nodes. All traffic to nodes in a cluster goes through one of the virtual nodes Jörg Liebeherr 2003 17 Overlay Topologies Delaunay Triangulation and variants DT Hierarchical DT Multipoint DT Hypercube overlays used by HyperCast Degree-6 Graph Similar to graphs generated in Narada Degree-3 Tree Similar to graphs generated in Yoid Logical MST Minimum Spanning Tree overlays that assume knowledge of network topology Jörg Liebeherr 2003 18
Transit-Stub Network Transit-Stub GeorgiaTech topology generator 4 transit domains 4 16 stub domains 1024 total routers 128 hosts on stub domain Jörg Liebeherr 2003 19 Evaluation of Overlays Simulation: Network with 1024 routers ( Transit-Stub topology) 2-512 hosts Performance measures for trees embedded in an overlay network: Degree of a node in an embedded tree Relative Delay Penalty : Ratio of delay in overlay to shortest path delay Stress : Number of duplicate transmissions over a physical link Jörg Liebeherr 2003 20
Illustration of Stress and Relative Delay Penalty AA Stress = 2 1 1 1 1 1 1 1 1 1 1 BB Stress = 2 Unicast delay A B : 4 Delay A B in overlay: 6 Relative delay penalty for A B: 1.5 Jörg Liebeherr 2003 21 Average Relative Delay Penalty Jörg Liebeherr 2003 22
90 th Percentile of Relative Delay Penalty Delaunay triangulation Jörg Liebeherr 2003 23 Average Stress Delaunay triangulation Jörg Liebeherr 2003 24
90 th Percentile of Stress Delaunay triangulation Jörg Liebeherr 2003 25 The DT Protocol Protocol which organizes members of a network in a Delaunay Triangulation Each member only knows its neighbors soft-state protocol Topics: Nodes and Neighbors Example: A node joins State Diagram Rendezvous Measurement Experiments Jörg Liebeherr 2003 26
Each node sends messages to to its its neighbors periodically Jörg Liebeherr 2003 27 Each contains the the clockwise (CW) and and counterclockwise (CCW) neighbors Receiver of of a runs runs a Neighbor test test ( ( locally equiangular prop.) CW CW and and CCW are are used to to detect new new neighbors Neighborhood Table of CW = CCW = Neighbor CW CCW Jörg Liebeherr 2003 28
A node that that wants to to join join the the triangulation contacts a node that that is is close New node Jörg Liebeherr 2003 29 Node () () updates its its Voronoi region, and and the the triangulation Jörg Liebeherr 2003 30
() () sends a which contains info info for for contacting its its clockwise and and counterclockwise neighbors Jörg Liebeherr 2003 31 () () contacts these neighbors...... Jörg Liebeherr 2003 32
which update their their respective Voronoi regions. Jörg Liebeherr 2003 33 () () and and () send s and and provide info info for for contacting their their respective clockwise and and counterclockwise neighbors. Jörg Liebeherr 2003 34
() () contacts the the new new neighbor ()...... Jörg Liebeherr 2003 35 which updates its its Voronoi region... Jörg Liebeherr 2003 36
and responds with with a Jörg Liebeherr 2003 37 This This completes the the update of of the the Voronoi regions and and the the Delaunay Triangulation Jörg Liebeherr 2003 38
Rendezvous Methods Rendezvous Problems: How does a new node detect a member of the overlay? How does the overlay repair a partition? Three solutions: 1. Announcement via broadcast 2. Use of a rendezvous server 3. Use `likely members ( Buddy List ) Jörg Liebeherr 2003 39 Rendezvous Method 1: 1: Announcement via via broadcast (e.g., using IP IP Multicast) New node Jörg Liebeherr 2003 40
Leader Rendezvous Method 1: 1: A Leader is is a node with with a Y-coordinate higher than than any any of of its its neighbors. Jörg Liebeherr 2003 41 Rendezvous Method 2: 2: New New node and and leader contact a rendezvous server. Server keeps a cache of of some other nodes Server Request New node Server Server Reply () NewNode NewNode Jörg Liebeherr 2003 42
Rendezvous Method 3: 3: Each node has has a list list of of likely members of of the the overlay network NewNode New node with Buddy List: () () NewNode Jörg Liebeherr 2003 43 State Diagram of a Node Stopped Application starts Leader without Neighbor Neighbor added (with smaller coordinates) All neighbors leave or timeout Neighbor added (with larger coordinates) All neighbors leave or timeout Application exits Leader with Neighbor A new neighbor with greater coordinates is added After removing some neighbor, this node has largest coordinates Not Leader Send Goodbye Send Goodbye Send Goodbye Leaving Jörg Liebeherr 2003 44
Sub-states of a Node Stable Without Candidate Neighbor Node contained in NewNode passes neighbor test After handling the candidate neighbor, node remains stable After neighborhood updating, node becomes not stable After neighborhood updating, node becomes stable. Stable With Candidate Neighbor After neighborhood updating, node becomes not stable Not Stable A node is is stable when all all nodes that that appear in in the the CW CW and and CCW neighbor columns of of the the neighborhood table also also appear in in the the neighbor column Jörg Liebeherr 2003 45 Measurement Experiments Experimental Platform: Centurion cluster at UVA (cluster of 300 Linux PCs) 2 to 10,000 overlay members 1 100 members per PC Random coordinate assignments Internet Switch 3 Gigabit Ethernet centurion183 centurion253-255 Switch 4 Switch 8 centurion149-167 centurion246 centurion250 centurion251 Switch 5 Switch 9 centurion168-182 centurion164-187 centurion249 centurion252 Switch 6 Switch 11 centurion188-211 centurion128-147 centurion228-247 Switch 7 Switch 10 Jörg Liebeherr 2003 46
Experiment: Adding Members How long does it take to add M members to an overlay network of N members? Time to Complete (sec) M+N members Jörg Liebeherr 2003 47 Experiment: Throughput of Multicasting 100 MB bulk transfer for N=2-100 members (1 node per PC) 10 MB bulk transfer for N=20-1000 members (10 nodes per PC) Average throughput (Mbps) Measured values Bandwidth bounds (due to stress) Number of Members N Jörg Liebeherr 2003 48
Experiment: Delay 100 MB bulk transfer for N=2-100 members (1 node per PC) 10 MB bulk transfer for N=20-1000 members (10 nodes per PC) Delay of a packet (msec) Number of Nodes N Jörg Liebeherr 2003 49