IP Multicast Routing Protocols

IP Multicast Routing Protocols Term Paper By : Priyank Porwal (97255) Course : Advanced Computer Networks (CS625) Instructor : Dr. Dheeraj Sanghi Department of CSE, IIT Kanpur. April, 2000.

Table of Contents 1. Introduction 2. Basic Algorithms to Construct Multicast Distribution Trees 3. Distance Vector Multicast Routing Protocol (DVMRP) 4. Protocol Independent Multicast Dense Mode (PIM DM) 5. Multicast Extensions to OSPF (MOSPF) 6. Protocol Independent Multicast Sparse Mode (PIM SM) 7. Core Based Trees (CBT) 8. Interoperability of Different Multicast Routing Protocols 9. Conclusion 10. References 1. Introduction One of the most pressing needs for enhanced communication protocols comes from multipoint (or group) applications. Group communication supports information transfer between a set of participants, defining a "group". It is becoming more and more relevant as the number of applications requiring group communication is increasing. These applications cover a very wide spectrum, including software distribution, replicated database update, command and control systems, audio/video conferencing, distance education, remote collaboration, distributed games, and distributed interactive simulation. For all these applications existing unicast routing protocols and communication systems are not so well suited. What we need is an efficient, adaptive multicast data distribution protocol. 1.1 Issues & Challenges of Multicast Routing Designing multicast routing algorithm is a complex problem. Group membership can change, and network topology can evolve (links and nodes can fail), etc. The challenges of multicast routing are as follows: 1) Minimizing the network load For the purpose of optimizing the network resources we need to avoid loops and traffic concentration on a link or a subnetwork. Also forwarding data onto links not leading to any group members is to be avoided. 2) Providing basic support for reliable transmission There should be minimal effect of route changes and link failure should not increase transmission delay or decrease resource reliability.

3) Designing optimal routes considering different cost functions bandwidth, end to end delay, node connectivity, available resources, price, etc. 4) Minimizing the state stored in routers so that when number of groups increases by considerable amount, memory at routers does not become the bottle neck. 5) Scalability The algorithm should not perform badly if the domain on which it operates grows. 6) Robustness The protocol must have the ability to adapt to dynamics of group membership, i.e. hosts should be able to join/leave a multicast groups at any time without having any major effect on routing. 7) Interoperability The protocol must be interoperable with other multicast routing protocols. In the light of these issues and challenges of one to many and many to many data transfer, we study various algorithms and protocols for efficient multicast data distribution, and compare them. 2. Basic Algorithms to Construct Multicast Distribution Trees All protocols distribute multicast datagrams (packets addressed to multicast groups) by means of a distribution tree, along whose branches copies of the packets are sent out. Therefore, the most important aspect is to construct a good distribution tree. There are three basic algorithms to construct the multicast distribution trees : 1) Source Based Routing/ Reverse Path Forwarding These algorithms construct a multicast tree per source network and per group. The tree construction is initiated by source (data driven), that is, tree construction starts when the first packet addressed to a specific group arrives from a specific source network. The tree finally constructed is implicitly Shortest Path Tree (SPT) rooted at the source. The techniques used to construct the multicast distribution tree is basically "broadcast and prune" The source based algorithms are best suited for dense mode (dense receiver distribution) conditions. Also, these assume that the bandwidth availability is high. e.g. DVMRP, MOSPF, PIM DM 2) Steiner Tree This category of tree construction algorithms try to globally optimize network resources, by looking for a tree spanning all the group members and sources and also having the least cost among all such trees. The problem of finding a Steiner Tree for an undirected weighted graph G(V,E,c) where c:e R is to find a tree T spanning all the vertices (D) in a group and the tree cost (i.e. sum of costs of all edges in it) is minimum among all such trees. T may have vertices not belonging to the group. These vertices are called Steiner Points. But there are several problems with these algorithms, like these involve a centralized calculation of the minimal cost tree, and are monolithic. By monolithic, it is meant that each time there is a change in group membership or network topology, the algorithm has to be run again to get the least cost tree.

Moreover, the problem of finding Steiner Tree is NP complete. Therefore, these algorithms are more of theoretical interest. But, there have been some good heuristics to get approximations to Steiner Trees, like MSPH (Minimum Spanning Tree Heuristic) and ADH (Average Distance Heuristic). 3) Center Based Trees These algorithms construct a single distribution tree per group, rooted a group mapped router "center" or a set of center routers. They aim at multiple sender/multiple receiver situation. The tree construction is initiated by receivers by using explicit join mechanism. There is no concept of broadcast in these algorithms and hence they are useful when group membership is sparse, that is, when only a small fraction of subnets have group members. Though, end to delay is higher than in case of source based trees, but still is bounded by a small factor. Where delays are not so important center based trees would be a good choice. e.g. CBT, PIM SM 3. Distance Vector Multicast Routing Protocol (DVMRP) Topic Index 3.1 Introduction 3.2 Reverse Path Forwarding 3.3 Protocol Overview 3.3.1 Neighbor Discovery 3.3.2 Source Location 3.3.3 Dependent Downstream Routers 3.3.4 Designated Forwarder 3.3.5 Building Multicast Trees 3.3.6 Pruning Multicast Trees 3.3.7 Grafting Multicast Trees 3.4 Analysis 3.5 Hierarchical DVMRP (HDVMRP) 3.5.1 Introduction 3.5.2 Motivation for Deploying HDVMRP on the MBone 3.5.3 Description 3.5.4 Issues Related to HDVMRP 3.1 Introduction DVMRP was modeled by S. Deering in his Ph. D. thesis (Dec. 1991). It is the first, and still predominant multicast routing protocol used in the internet. The MBone is based on DVMRP for routing. DVMRP is an example of source based routing, also known as Reverse Path Forwarding (RPF). It is best suited to dense mode conditions, i.e. when most subnets of the domain have group members, and when bandwidth is plentiful. DVMRP uses a distance vector distributed routing algorithm in order to build per source group multicast delivery trees. DVMRP can be summarized as a "broadcast & prune" multicast routing protocol. It builds

per source broadcast trees based upon routing exchanges, then dynamically creates per source group multicast delivery trees by pruning (removing branches from) the source's truncated broadcast tree. 3.2 Reverse Path Forwarding Datagrams follow multicast delivery trees from a source to all members of a multicast group replicating the packet only at necessary branches in the delivery tree. When a datagram arrives on an interface, the reverse path to the source of the datagram is determined by examining a DVMRP routing table of known source networks. If the datagram arrives on an interface that would be used to transmit datagrams back to the source, then it is forwarded to the appropriate list of downstream interfaces. Otherwise, it is not on the optimal delivery tree and should be discarded. In this way duplicate packets can be filtered when loops exist in the network topology. This technique is known as Reverse Path Forwarding. 3.3 Protocol Overview 3.3.1 Neighbor Discovery Neighbor DVMRP routers are discovered dynamically by sending Neighbor Probe Messages on local multicast capable network interfaces and tunnel pseudo interfaces. These messages are sent periodically to the All DVMRP Routers (224.0.0.4) IP Multicast group address with the IP TTL set to 1. Each Neighbor Probe message contains the list of Neighbor DVMRP routers (addresses) for which Neighbor Probe messages have been received on that interface. In this way, Neighbor DVMRP routers can ensure that they are seen by each other and there is two way adjacency. Each router associates a list of neighbor addresses for each of its interfaces. Probe messages provide : 1) a mechanism for DVMRP routers to locate each other. 2) a way for DVMRP routers to determine the capabilities of each other, by using minor, major & capability fields of the message header. 3) a keep alive function in order to quickly detect neighbor loss. 3.3.2 Source Location When an IP Multicast datagram is received by a router running DVMRP, it first looks up the source network in the DVMRP routing table and performs RPF check on the incoming interface. If it fails the datagram in discarded. Otherwise, the datagram can be forwarded to one or more downstream interfaces. DVMRP maintains and propagates separate routing table. This table is constructed by exchanging routing information which contains a list of source networks and an appropriate metric. The metric used is a hop count which is incremented by the cost of the incoming interface. Despite having certain additional overhead associated with it, it has two very nice consequences: 1) there are no inconsistencies between routers when determining the upstream interface, hence the risk of creating routing loops or black holes is greatly reduced. 2) makes it convenient to have separate paths for unicast versus multicast datagrams.

3.3.3 Dependent Downstream Routers There is a mechanism in DVMRP that allows an upstream router to determine if any downstream router depends on it for forwarding from particular source networks. This mechanism is known as `Poison Reverse'. If a downstream router selects an upstream router as the best next hop to a particular source network, this is indicated by echoing back the route (by means of DVMRP Route Report message) on the upstream interface with a metric equal to the original metric plus infinity (32 hops). When the upstream router receives the report and sees a metric that lies between infinity and twice infinity, it can then add the downstream router from which it received the report to a list of dependent routers for this source. The list of dependent routers thus helps in determining when it is appropriate to prune back the IP source specific multicast trees. Route reports are generally sent periodically over all interfaces with neighbors present. In addition to this, flash updates may even be sent as needed. Flash updates reduce chances of routing loops and black holes occurring when source networks become unreachable through a particular path. Dependency gets canceled, when a dependent neighbor now reports path metric < infinity. 3.3.4 Designated Forwarder When two or more multicast routers are connected to a multi access network, duplicate packets may get forwarded on the network (one copy from each router). DVMRP prevents this possibility by electing a forwarder among these routers for each source as a side effect of its route exchange. When two routers on a multi access network exchange source networks, each of the routers will know the others metric back to each source network. Therefore, of all the DVMRP routers on a shared network, the router with the lowest metric to a source network is responsible for forwarding data from the source network on to the shared network. If two or more routers have an equally low metric, the router with the lowest IP address becomes the designated forwarder for the network. In this way, DVMRP does an implicit designated forwarder election for each source network on each downstream interface. e.g. Figure 3.1 : Designated Forwarder(DF) for the Multi Access Network (N1) R1 DF for source n/w N4, R2 DF for source n/w N2, R3 DF for source n/w N2

Initially, a DVMRP router should assume it is the designated forwarder for all source networks on all downstream interfaces. As it receives route reports, it can determine if other routers on multi access networks have better routes back to a particular source network. A route is considered better if the adjusted received metric is less than the metric that it will advertise for the source network on the received interface or if the metrics are the same but the IP address of the neighbor is lower. If the upstream RPF interface changes, then the router should become the designated forwarder on the previous upstream interface (which is now a potential downstream interface) until it hears from a better candidate. 3.3.5 Building Multicast Trees Figure 3.2 : Change in Designated Forwarder Adding Local Group members : The IGMP local group database is maintained by all IP multicast routers on each physical, multicast capable network. If the destination group address is listed in the local group database, and the router is the designated forwarder for the source, then the interface is included in the list of downstream interfaces. If there are no group members on the interface, then the interface is removed from the outgoing interface list for that source group forwarding cache entry. Adding Interface with Neighbors : Initially, all interfaces with downstream dependent neighbors should be included in the downstream interface list when a forwarding cache entry is first created. This allows the downstream routers to be aware of traffic destined for a particular (source network, group) pair. The downstream routers will then have the option to send prunes and subsequent grafts for this (source network, group) pair as requirements change from their respective downstream routers and local group members. 3.3.6 Pruning Multicast Trees Leaf Routers (i.e. routers at the end of a source specific multicast delivery tree) must detect that there are no further downstream dependent routers. If there are no group members present for a particular multicast datagram received, the leaf routers will start the pruning process by removing their downstream interfaces and sending a prune to the upstream router for that source. If a router removes all of its downstream interfaces from a forwarding cache entry, it notifies the upstream router that it no longer wants traffic destined for a particular (source network, group) pair. This is accomplished by sending a DVMRP Prune message to the upstream router it

depends on for forwarding datagrams from that particular source. Prune message can only be sent by a dependent neighbor. DVMRP Prune message is specific to a (source network, group) pair, though it means absence of downstream group members. Each Prune message has source address, group address and prune time out value (the time for which prune is applicable). If on receiving a prune message, a router finds that there is a prune currently active from the same dependent neighbor for this (source network, group) pair, the timer is reset to new time out value. If the router finds that there are no group members active on the interface, then the interface is removed from all forwarding cache entries for this group. Otherwise, it only affects the specific (source n/w, group) forwarding cache entry. While propagating a prune upstream, the lifetime of the prune sent must be minimum of the remaining lifetimes of the received prunes. 3.3.7 Grafting Multicast Trees To support dynamic group membership, DVMRP had this provision of grafting multicast trees. If hosts join a multicast group at some time and the local router had earlier sent prunes for that group, the DVMRP router uses grafts to cancel these prunes. Separate Graft messages must be sent to the appropriate upstream neighbor for each source network that has been pruned. Since there would be no way to tell if a Graft message sent upstream was lost or the source simply quit sending traffic, it is necessary to acknowledge each Graft message with a DVMRP Graft Ack message. If an acknowledgment is not received within a Graft Time out period, the Graft message should be retransmitted using binary exponential back off between retransmissions. Duplicate Graft Ack messages should simply be ignored. The purpose of the Graft Ack message is to simply acknowledge the receipt of a Graft message. It does not imply that any action was taken as a result of receiving the Graft message. Therefore, all Graft messages received from a neighbor with whom a two way neighbor relationship has been formed should be acknowledged whether or not they cause an action on the receiving router. 3. 4 Analysis DVMRP works well for a multicast group that is densely represented within a subnet. However, for multicast groups that are sparsely represented over a wide area network, the periodic broadcast behavior would cause serious performance problems. Another problem with DVMRP is the amount of multicast routing state information that must be stored in the multicast routers. All the multicast routers must contain state information for every (source, group) pair, either information designating the interface to be used for forwarding multicast messages or prune state information. Thus O(G*S) forwarding information has to be maintained by each router, where G is the number of active groups and S is the number of active sources. For these reasons, DVMRP does not scale to support multicast groups that are sparsely distributed over a large network.

3.5 Hierarchical DVMRP (HDVMRP) 3.5.1 Introduction MBone is currently organized as one `flat' region, with most routers maintaining explicit routing information for each subnet in the network. Exponential growth of MBone in the recent years has resulted in increased routing overhead and processing costs. The solution to this problem is to introduce a level of hierarchy in the routing model. Hence, we have the Hierarchical Distance Vector Routing Protocol, with two levels of hierarchy in the MBone. 3.5.2 Motivation for Deploying HDVMRP on the MBone 1) Reduction of the amount of topological information that MBone routers must store and exchange with other routers. 2) Any region can operate any multicast routing protocol, irrespective of other regions. 3) Effects of low level topology changes, such as link or router failures and recoveries, are isolated to only those routers in the same topological region as the affected components. 4) Newer protocols, which require unicast and multicast topology to be same, can be operated within a region satisfying the constraint, without waiting for it to be satisfied everywhere. 3.5.3 Description It partitions the MBone into non overlapping regions using DVMRP as the inter region routing protocol; intra region routing may be accomplished by any of the existing multicast protocols. It is flexible enough to accommodate additional levels of hierarchy, and protocols other than DVMRP at the higher levels. The routing protocol in each domain maintains detailed topological information only for its own domain, not for other domains, while the inter domain protocol maintains information only about the interconnection of domains, not about their internal topologies. This approach employs region identifiers (not encoded in addresses) and uses encapsulation for inter region forwarding of packets. A region describes a cluster of routers and subnets, each region encompassing one or more unicast routing domains. Each topological region consists of one or more boundary routers responsible for interconnecting different regions and forwarding multicast traffic between them. Let us say that the multicast routers internal to a region run a `Level 1' (L1) multicast protocol for forwarding multicast traffic within the region and the boundary routers run a `Level 2' (L2) multicast protocol for forwarding inter region traffic. The boundary routers must include L1 functionality in order to participate in the L1 routing of each region to which they are attached. L2 routers exchange routing information using `region identifiers' instead of subnet addresses internal to the region. Packets generated from a region are tagged with the region's identifier, which is then placed in an encapsulation header for transit across regions. Boundary router(s) of the destination region(s) remove the encapsulation header before final delivery to group members within the region.

3.5.4 Issues Related to HDVMRP Figure 3.3: Logical Representation of a Boundary Router Enabling Multiple Level of Hierarchies This can be achieved by using either: 1) multiple levels of encapsulation, or 2) CIDR (Classless Inter Domain Routing) for Region Id's. Avoiding Multiple Level of Encapsulation Several multicast routers have tunnels between them. If two L2 routers were connected via tunnels, traffic exchanged between them has to be encapsulated twice, once due to the tunnel encapsulation, and the other due to inter region forwarding. This dual encapsulation overhead associated with every datagram can however be overcome using the address of the tunnel end point as the destination address in the encapsulation header instead of the ABR group, the source region identifier being retained as a tag in the encapsulated packet. Since L2 forwarding is based only on the Region Id and the destination group of the original packet, the change in the destination field of the Encapsulation header does not affect the forwarding mechanism at the destination L2 router. HDVMRP uses address independent region identifiers as the basis of the top level routing, thereby enabling significant reduction in routing table size, regardless of the "aggregatability" of the MBone subnet addresses. Deployment of HDVMRP will also reduce the degree of topological volatility that any router must handle and relax the constraints on maximum MBone diameter. 4. Protocol Independent Multicast Dense Mode (PIM DM) Topic Index 4.1 Introduction 4.2 Protocol Overview 4.2.1 Multicast Datagram Forwarding 4.2.2 Pruning Tree Branches 4.2.3 Joining an Existing Group 4.2.4 Parallel Paths to a Source

4.3 Analysis 4.1 Introduction PIM DM belongs to the class of source specific multicast distribution tree construction algorithms. It constructs per source per group distribution tree. As the name implies it has been designed for dense receiver distribution. In addition, it is unicast routing protocol independent, i.e. unlike DVMRP, it does have its own topology discovery mechanism. Instead it imports unicasts routes from unicast routing, whatever is being used in the domain. Otherwise the protocol is more or less same as DVMRP. 4.2 Protocol Overview Dense mode PIM assumes that when a source starts sending, all downstream systems want to receive multicast datagrams. Initially, multicast datagrams are flooded to all areas of the network. If some areas of the network do not have group members, dense mode PIM will prune off the forwarding branch by setting up prune state. The prune state has an associated timer, which on expiration will turn into forward state, allowing data to go down the branch previously in prune state. The prune state contains source and group address information. When a new member appears in a pruned area, a router can ``graft'' toward the source for the group, turning the pruned branch into forward state. The forwarding branches form a tree rooted at the source leading to all members of the group. This tree is called a source rooted tree. 4.2.1 Multicast Datagram Forwarding Just like DVMRP PIM DM also forwards a multicast datagram only if it clears the RPF check. The arriving interface of multicast data packets from a source S, should match with the unicast interface leading to the best next hop router towards S. If a receiving router does not already have a forwarding entry, it creates it for the source and group G. This forwarding entry is called a (S,G) entry. It includes the following contents: source address, group address, the incoming interface, a list of outgoing interfaces, a few flags and a few timers. The incoming interface for (S,G) is determined by an RPF lookup in the unicast routing table. The (S,G) outgoing interface list contains interfaces that have PIM routers present or host members for group G. Later the outgoing interface list gets modified as prunes are received. 4.2.2 Pruning Tree Branches Unlike PIM SM there are no periodic joins transmitted, only explicit triggered grafts/prunes are used for modifying the tree branches. If a router creates a (S,G) entry with an empty outgoing interface list after receiving a multicast datagram, it must trigger a PIM Prune message (addressed to all PIM routers group address 224.0.0.13) towards the source S. This type of entry is called a negative cache entry. Negative cache entries can be found on leaf routers with no local group members, or on routers where prune messages were received from downstream routers that caused the outgoing interface list to become NULL. To avoid prune storms, prunes must not be sent upstream for every data packet matching a negative cache entry. Instead, there must be some policy deciding

when a prune is to be sent upstream. Prune information is flushed periodically. This causes multicast datagrams to be sent to all downstream PIM routers. This may again trigger prune messages. When a prune message arrives on a point to point link it is pruned immediately. But, prunes received on an interface to a multi access network are delayed for some time (~3 seconds), so that other routers on the LAN may send a prune override join message if they still expect multicast datagrams from the expected upstream router. This reduces the chance of join latency in case a new member immediately joined after the last prune. 4.2.3 Joining an Existing Group If a router is directly connected to a host that wants to become a member of a group, the router may send a Graft message toward known sources. This allows join latency to be reduced below that indicated by the relatively large time out value suggested for prune information. A router receiving the Graft message adds the received interface into the matching (S,G) entry's outgoing interface list. If the entry transitions to forward state due to this added outgoing interface, the router must send a Graft message toward the source. Graft message is the only PIM message that uses a positive acknowledgment strategy. Senders of Graft messages unicast them to their upstream RPF neighbors. The neighbor processes each (S,G) and immediately acknowledges each (S,G) in a GraftAck message. This is relatively easy, since the receiver simply changes the PIM message type from Graft to GraftAck and unicasts the original packet back to the source. The sender periodically retransmits the Graft message for any (S,G) that has not been acknowledged. 4.2.4 Parallel Paths to a Source If two routers have equal cost paths to a source and are connected on a common multi access network, duplicate datagrams will travel downstream onto the LAN. Dense mode PIM will detect such a situation and will not let it persist. If a router receives a multicast datagram on an outgoing interface on a multi access LAN, the packet must be a duplicate. In this case a single forwarder must be elected. The upstream routers can decide which one becomes the forwarder, using Assert messages addressed to 224.0.0.13 on the LAN. Downstream routers listen to the Asserts so they know which one was elected. The upstream router elected is the one that has the best metric to the source. When a packet is received on an outgoing interface, a router will send an Assert packet on the LAN indicating what metric it uses to reach the source of the data packet. If metrics are comparable, the router with the best metric will become the forwarder. If the metrics are incomparable then the preference values associated with each metric kind are used to select the forwarder. This is useful when upstream routers run different unicast routing protocols. All other upstream routers will prune the interface from their outgoing interface list. The downstream routers also do the comparison in case the forwarder is different than the RPF neighbor. This is important so downstream routers send subsequent Prunes or Grafts to the correct neighbor. 4.3 Analysis Like DVMRP, PIM DM performs well if the receiver distribution is dense. Since in PIM DM there is no concept of dependent downstream routers, there is some additional overhead because of broadcast of multicast packets on links leading to downstream routers that do not consider the

upstream router to be best next hop router to the source of the multicast packet. But, there is a saving in terms of route report exchanges because does not have any topology discovery protocol of itself. Instead it uses unicast routes. Moreover, simplicity of the protocol makes it easier to implement and leads to higher performance. 5. Multicast Extensions to OSPF (MOSPF) Topic Index 5.1 Introduction 5.2 Tree Construction & Forwarding 5.3 Support for Non Broadcast Networks 5.4 Analysis 5.1 Introduction As the name says MOSPF is extension of unicast routing protocol OSPF (Open Shortest Path First) to support multicast routing. Like OSPF it provides a database describing the Autonomous System's topology. A new OSPF link state advertisement is added describing the location of multicast destinations, i.e. group membership information. A multicast packet's path is then calculated by building a pruned shortest path tree rooted at the packet's IP source. These trees are built on demand, without even flooding the first datagram of a group transmission, and the results of the calculation are cached for use by subsequent packets. MOSPF builds a per source per group multicast distribution tree. Since MOSPF involves heavy computation at each router and requires a lot of exchange of topology and membership information, it is generally used only within a small domain, named as Autonomous System. 5.2 Tree Construction & Forwarding Each router stores complete topology of the network, including cost of each link. This is achieved by periodic broadcast of link state advertisement (LSA). Also each router maintains all group membership information of all routers in the domain. This is achieved by broadcast of group membership advertisement. Therefore, each router has a consistent view of the network topology and group membership. Tree construction in MOSPF is data driven ("on demand"). By data driven we mean that tree construction for a <source(s), group(g) pair at a router starts only when a data packet originated in S and addressed to G arrives at the router. When a data packet with <S,G pair arrives at a router, the router first performs the RPF check. If the check fails the packet is discarded, otherwise it is considered for forwarding. Then it checks if it has a forwarding cache entry for this pair. If it such an entry, the packet is forwarded on all outgoing interfaces corresponding to this entry. If no,such entry exists the router determines the shortest path tree routed at the packets source (S) using the link state information and Dijkstra's algorithm. Then using the group membership information it removes the branches not leading to any group members. This way it gets the pruned shortest path tree rooted at the packet's source. The packet is then forwarded on all interfaces leading to the routers children on this tree.

Cost is expressed in terms of the OSPF link state metric. For example, if the OSPF metric represents delay, a minimum delay path is chosen. OSPF metrics are configurable. A metric is assigned to each outbound router interface, representing the cost of sending a packet on that interface. The cost of a path is the sum of its constituent (outbound) router interfaces. The forwarding entries have a timer, after they are removed from the cache. This allows the protocol to adapt to dynamics of both network topology and group membership. 5.3 Support for Non Broadcast Networks When forwarding multicast datagrams over non broadcast networks, the datagram cannot be sent as a link level multicast (since neither link level multicast nor broadcast are supported on these networks), but must instead be forwarded separately to specific neighbors. To facilitate this, forwarding cache entries can also contain downstream neighbors as well as downstream interfaces. The IGMP protocol is not defined over non broadcast networks. For this reason, there cannot be group members directly attached to non broadcast networks, nor do non broadcast networks ever appear in local group database entries. 5.4 Analysis The flooding (reliable broadcasting) of group membership information is the predominant factor preventing the link state multicast algorithm (MOSPF) being applicable over the wide area. The other limiting factor is the processing cost of the Dijkstra calculation to compute the shortest path tree for each active source. Therefore, MOSPF is used only for multicast routing within an autonomous system. 6. Protocol Independent Multicast Sparse Mode (PIM SM) Topic Index 6.1 Introduction 6.2 Protocol Overview 6.2.1 Forwarding Cache Entries 6.2.2 Local Hosts Joining a Group 6.2.3 Establishing the RP rooted Shared Tree 6.2.4 Hosts Sending to a Group 6.2.5 Switching from Shared Tree (RP Tree) to SPT 6.2.6 Steady State Maintenance of Distribution Tree (Router State) 6.2.7 Multicast Data Packet Processing 6.2.8 Operation over Multi Access Network 6.2.8.1 Hosts Sending to a Group 6.2.8.2 Parallel Paths Resolution Assert Process 6.2.8.3 Join/Prune Suppression 6.2.9 Unicast Routing Changes 6.3 Rendezvous Point (RP) Discovery 6.3.1 Bootstrap Mechanism

6.1 Introduction This protocol aims at routing to multicast groups that may span wide area (and inter domain) Internet. It is known as PIM SM because it is not dependent on any particular unicast routing protocol and has been designed to for sparse distribution of receivers. This protocol belong to the class of center based schemes, where there is a single distribution tree per group. But it supports both shared tree and source specific tree (SPT shortest path tree). It basically aims at multiple sender group scenario. The PIM architecture has been designed to avoid the overhead of broadcasting packets when group members sparsely populate the internet. Efficiency of the protocol is measured in terms of the router state, control packet processing, and data packet processing required across the entire network in order to deliver data packets to the members of the group. 6.2 Protocol Overview 6.2.1 Forwarding Cache Entries These are used for the purpose of forwarding multicast datagrams to the destination host group. Since PIM SM supports both shared tree and source specific tree, there has to be two type on forwarding entries : 1) (*,G) Shared Tree (RP Tree) any source may match for this entry. RP (Rendezvous Point) is the center of the shared tree. 2) (S,G) Source Specific Tree (SPT) only packets from S and addressed to G can be forwarded according to this entry. With each forwarding entry there are some flags associated indicating the type of the entry : WC bit : WC (wild card) bit indicates the type of the entry 1:(*,G)/0:(S,G) SPT bit : Indicates whether the (S,G) entry is valid(1) or not(0). RPT bit : Indicates whether the join/prune messages for this entry have to propagated up the RP tree(1) or the SPT(0). There is and an incoming interface (iif) & a list of outgoing interfaces (oif's) associated with each entry. Arriving interface check is performed on each multicast datagram matching a particular forwarding entry. Each outgoing interface has a timer associated with it, which is refreshed each time a join message comes for it or when a packet is forwarded to it. After time out the interface is removed from oif of the entry. 6.2.2 Local hosts joining a group In order to join a multicast group, G, a host (R) conveys its membership information through the Internet Group Management Protocol (IGMP) to its designated router (DR). When a DR (e.g., router A in figure 6.1) gets a membership indication from IGMP for a new group, G, the DR looks up the associated RP. The DR creates a wildcard multicast route entry for the group, referred to here as a (*,G) entry; if there is no more specific match for a particular source, the packet will be forwarded according to this entry. The RP address is included in a special field in the route entry and is included in periodic upstream PIM Join/Prune messages. The outgoing interface is set to that included in the IGMP membership indication for the new member. The incoming interface is set to the interface used to send unicast packets to the RP. WC & RPT bits are set for this entry indicating that it is a (*,G) entry.

Figure 6.1 : Receiver Joining & Shared Tree Construction When there are no longer directly connected members for the group, IGMP notifies the DR. If the DR has neither local members nor downstream receivers, the (*,G) state is deleted. 6.2.3 Establishing the RP rooted Shared Tree Triggered by the (*,G) state, the DR creates a Join/Prune message with the RP address in its join list and the WC and RPT bits set to 1. The WC bit indicates that any source may match and be forwarded according to this entry if there is no longer match; the RPT bit indicates that this join is being sent up the shared, RP tree. When the RPT bit is set to 1 it indicates that the join is associated with the shared RP tree and therefore the Join/Prune message is propagated along the RP tree. When the WC bit is set to 1 it indicates that the address is an RP and the downstream receivers expect to receive packets from all sources via this (shared tree) path. The prune list is left empty. Figure 6.2 : Setting up Shared Tree

Each upstream router creates or updates its multicast route entry for (*,G) when it receives a Join/Prune with the RPT bit and WC bit set. The interface on which the Join/Prune message arrived is added to the list of outgoing interfaces (oif s) for (*,G). Based on this entry each upstream router between the receiver and the RP sends a Join/Prune message in which the join list includes the RP. The packet payload contains Multicast Address=G, Join=RP, WC bit, RPT bit, Prune=NULL. The RP recognizes its own address and does not attempt to send join messages for this entry upstream. The incoming interface in the RP's (*,G) entry is set to null. 6.2.4 Hosts Sending to a Group When a host starts sending multicast data packets to a group, initially its DR must deliver each packet to the RP for distribution down the RP tree. The sender's DR initially encapsulates each data packet in a Register message (see figure 1) and unicasts it to the RP for that group. The RP decapsulates each Register message and forwards the enclosed data packet natively to downstream members on the shared RP tree. If the data rate of the source warrants the use of a source specific shortest path tree (SPT), the RP may construct a new multicast route entry that is specific to the source, hereafter referred to as (S,G) state, and send periodic Join/Prune messages toward the source. When and if the RP does switch to the SPT, the routers between the source and the RP build and maintain (S,G) state in response to these messages and send (S,G) messages upstream toward the source. Thereafter, the source's DR stops encapsulating data packets in Register messages until it receives Register Stop messages from the RP. The RP triggers Register Stop messages in response to Registers, if the RP has no downstream receivers for the group or if the RP has already joined the (S,G) tree. Each source's DR maintains, per (S,G), a Register Suppression timer. The Register Suppression timer is started by the Register Stop message; upon expiration, the source's DR resumes sending data packets to the RP, encapsulated in Register messages. 6.2.5 Switching from Shared Tree (RP Tree) to SPT Only the RP and a router with directly connected group members can initiate the process of SPT construction. A router with directly connected members first joins the shared RP tree. The router can switch to a source's shortest path tree (SP tree) after receiving packets from that source over the shared RP tree. It may do so based on some policy, like when the rate at which a source is sending data packets exceeds certain threshold value such an action may be taken.

Figure 6.3 : Switching from Shared Tree to SPT As shown in figure 6.3, router `A' initiates a (S,G) state. When the (S,G) entry is created, the oif list is copied from (*,G), i.e., all local shared tree branches are replicated in the new shortest path tree. The iif for this entry is set to the interface used to send unicast packets to the source S. Also the SPT bit is cleared to indicating that the SPT branch from S has not been completely setup, and hence it should not be used for forwarding packets from that source. A timer (S timer) is started indicating the lifetime of the entry, after expiry of which the entry is removed. When a (S,G) entry is activated (and periodically so long as the state exists), a Join/Prune message is sent upstream (to the best next hop) towards the source, S, with S in the join list. The payload contains Multicast Address=G, Join=S, Prune=NULL. An (S,G) state is kept alive by data packets arriving from that source, which refreshes the S timer. Each upstream router adds/updates its (S,G) entry and propagates the join towards S. To ensure that there are no duplicate packets on the SPT branch, when a router with a (S,G) entry and a cleared SPT bit starts to receive packets from the new source S on the iif for the (S,G) entry it sets the SPT bit. The router then sends a Join/Prune message towards the RP if its shared tree incoming interface differs from its SPT incoming interface, indicating it no longer wants to receive packets from S via the shared RP tree. The Join/Prune message sent towards the RP includes S in the prune list, with the RPT bit set indicating Join/Prune messages for this entry must be propagated up the shared tree. The Join/Prune message payload contains Multicast Address=G, Join=NULL, Prune=S, RPT bit=1. If the router receiving the Join/Prune message has (S,G) state (with or without the route entry's RPT bit flag set), it deletes the arriving interface from the (S,G) oif list. If the router has only (*,G) state, it creates an (S,G) entry with the RPT bit flag set to 1, oif list = oif list of (*,G) minus the arriving interface. An (S,G) entry with WC bit=0 & RPT bit=1 is said to be a Negative cache entry. 6.2.6 Steady State Maintenance of Distribution Tree (Router State) In the steady state each router sends periodic Join/Prune messages for each active PIM route entry; the Join/Prune messages are sent to the neighbor indicated in the corresponding entry.

These messages are sent periodically to capture state, topology, and membership changes. A Join/Prune message is also sent on an event triggered basis each time a new route entry is established for some new source. Join/Prune messages do not elicit any form of explicit acknowledgment; routers recover from lost packets using the periodic refresh mechanism. 6.2.7 Multicast Data Packet Processing A router first performs a longest match on the source and group address in the data packet. A (S,G) entry is matched first if one exists; a (*,G) entry is matched otherwise. If none of the above exists, then the packet is dropped. If a state is matched, the router compares the interface on which the packet arrived to the incoming interface field in the matched route entry. If the iif check fails the packet is dropped, otherwise the packet is forwarded to all interfaces listed in the outgoing interface list. Some special actions are needed to deliver packets continuously while switching from the shared to shortest path tree. In particular, when a (S,G) entry is matched, incoming packets are forwarded as follows: 1) If the SPT bit is set, then: 1) if the incoming interface is the same as a matching (S,G) iif, the packet is forwarded to the oif list of (S,G). 2) if the incoming interface is different than a matching (S,G) iif, the packet is discarded. 2) If the SPT bit is cleared, then: 1) if the incoming interface is the same as a matching (S,G) iif, the packet is forwarded to the oif list of (S,G). In addition, the SPT bit is set for that entry if the incoming interface differs from the incoming interface of the (*,G) entry. 2) if the incoming interface is different than a matching (S,G) iif, the incoming interface is tested against a matching (*,G) entry. If the iif is the same as of (*,G), the packet is forwarded to the oif list of (*,G). 3) otherwise the iif does not match any entry for G and the packet is discarded. 6.2.8 Operation over Multi Access Network 6.2.8.1 Designated Router election When there are multiple routers connected to a multi access network, one of them must be chosen to operate as the designated router (DR) at any point in time. The DR is responsible for sending triggered Join/Prune and Register messages toward the RP. A simple designated router (DR) election mechanism is used for both SM and traditional IP multicast routing. Neighboring routers send Hello messages to each other. The sender with the largest network layer address assumes the role of DR. Each router connected to the multi access LAN sends the Hellos periodically in order to adapt to changes in router status. 6.2.8.2 Parallel Paths Resolution Assert process If a router receives a multicast datagram on a multi access LAN from a source whose corresponding (S,G) outgoing interface list includes the interface to that LAN, the packet must be a duplicate. In this case a single forwarder must be elected. Using Assert messages addressed to `224.0.0.13' (ALL PIM ROUTERS group) on the LAN, upstream routers can resolve which one will act as the forwarder.

Downstream routers listen to the Asserts so they know which one was elected, and therefore where to send subsequent Joins. The upstream router elected is the one that has the shortest distance to the source. Therefore, when a packet is received on an outgoing interface a router sends an Assert message on the multi access LAN indicating what metric it uses to reach the source of the data packet. The router with the smallest numerical metric (with ties broken by highest address) will become the forwarder. All other upstream routers will delete the interface from their outgoing interface list. The downstream routers also do the comparison in case the forwarder is different than the RPF neighbor. 6.2.8.3 Join/Prune Suppression Join/Prune suppression may be used on multi access LANs to reduce duplicate control message overhead; it is not required for correct performance of the protocol. If a Join/Prune message arrives and matches on the incoming interface for an existing (S,G) or (*,G) route entry, and the Holdtime included in the Join/Prune message is greater than the recipient's own [Join/Prune Holdtime] (with ties resolved in favor of the higher network layer address), a timer (the Join/Prune Suppression timer) in the recipient's route entry may be started to suppress further Join/Prune messages. After this timer expires, the recipient triggers a Join/Prune message, and resumes sending periodic Join/Prunes, for this entry. The Join/Prune Suppression timer should be restarted each time a Join/Prune message is received with a higher Holdtime. 6.2.9 Unicast Routing Changes When unicast routing changes an RPF check is done on all affected multicast forwarding entries are updated. In particular, if the new incoming interface appears in the oif list, it is deleted from oif list. The PIM router sends a PIM join message out its new interface to inform upstream routers that it expects multicast datagrams over the interface. I sends a PIM prune message out the old interface, if the link is operational, to inform upstream routers that this part of the distribution tree is going away. 6.3 Rendezvous Point (RP) Discovery The most difficult and controversial aspect of center based multicast tree schemes is to locate the center (RP in PIM SM & core in CBT). In fact, the problem of finding an optimal center is considered to be an NP complete problem. Bootstrap mechanism is generally used for intra domain center discovery and for inter domain center discovery manual configuration is the only solution at present. Still, there have been several proposals for choosing a center : RSST (Random Source Specific Tree) : center is chosen randomly among the sources, with an average tree cost of 1 2 times the minimal Steiner tree. MSPT (Minimum Shortest Path Tree) : tree costs are calculated for the trees rooted at each group member and the member with the lowest cost is chosen as the center. MCT (Maximum Centered Tree) : node with the lowest maximum distance to any group member is chosen as the center. ACT (Average Centered Tree) : node with the lowest average distance to all group members is chosen as the center. DCT (Diameter Centered Tree) : node with the lowest maximum diameter, defined as the

sum of the distances to the two furthest away group members, is chosen as the center. GCT (Globally Centered Tree) : node with the lowest average distance to all nodes in the network regardless of membership, is chosen as the center. TOURNEY : runs a tournament between nodes to determine the center. 6.3.1 Bootstrap Mechanism Bootstrap mechanism is the one that is mostly used by domains running center based multicast routing protocols for the purpose of deciding centers and corresponding group prefixes they act as center. In the domain of operation there is a configured set of routers known as Candidate Centers, which by some simple election mechanism elect a BSR (bootstrap) router among themselves. Then each of them sends Candidate Center Advertisement including their address, and an optional group address and group address prefix, for which it is willing to act as center, to the BSR. BSR then does allotment of group addresses to these CC's, and returns a CC set containing information, like <center, group address, prefix length. This information is sent hop by hop and each router stores the CC set. When a router requires address of center for a group, it uses a hash function on the group address to get an index into the CC set, which gives the corresponding center's address. All bootstrap messages are distributed hop by hop to all routers in the domain. 7. Core Based Trees (CBT) Topic Index 7.1 Introduction 7.2 Design Objectives of CBT 7.3 Protocol Overview 7.3.1 Tree Construction in CBT Architecture 7.3.2 Tree Maintenance 7.3.3 Multicast Data Packet Forwarding 7.3.4 Non Member Sending 7.3.5 Designated Router on Multi Access Networks 7.4 Comparison of CBT with PIM SM 7.1 Introduction CBT architecture for multicast routing belongs to the category of center based tree algorithms. It builds a single delivery tree per group that is shared by all of the group's senders and receivers. The primary advantage of the shared tree approach is that it typically offers more favorable scaling characteristics than all other multicast algorithms. Delay in these trees will not be minimum, but will always be within some acceptable bounds. The router state is O(G), where G is the number of groups operating on the router. Also there is very little protocol overhead, because the protocol is very simple. The scalability of the architecture is measured in terms of network state maintenance, bandwidth efficiency, and protocol overhead. Other factors that can affect these parameters include sender set size, and wide area distribution of group members.

7.2 Design Objectives of CBT 1) Scalability To provide O(G) scaling characteristic, irrespective of delay, unlike PIM SM. It has been designed primarily for many sender applications. 2) Robustness It is not easy to achieve the same degree of robustness in shared tree algorithms as in source specific tree, because the core itself might be the single point of failure. Therefore, mechanisms have been built into the protocol to detect core failure and substitution. 3) Simplicity The protocol is very simple thereby enhancing performance. 4) Interoperability CBT has well defined interoperability mechanisms with DVMRP, which is the protocol being used on the MBone. 7.3 Protocol Overview 7.3.1 Tree Construction in CBT Architecture As with PIM SM the tree construction in CBT is receiver initiated. A host first expresses its interest in joining a group through IGMP messages to its designated CBT router. On receiving this report, a local CBT aware router invokes the tree joining process (unless it has already) by generating a JOIN_REQUEST message, which is sent to the next hop on the path towards the group's core router. (The core router discovery in CBT is same as RP discovery in case of PIM SM.) This join message must be explicitly acknowledged (JOIN_ACK) either by the core router itself, or by another router that is on the unicast path between the sending router and the core, which itself has already successfully joined the tree. Figure 7.1 : Example of Core Based Tree Note that state is bi directional The JOIN_REQUEST message sets up a transient join state in the routers it traverses. This state consists of <group, incoming interface, outgoing interface. "Incoming interface" and "outgoing interface" may be "previous hop" and "next hop", respectively, if the corresponding links do not support multicast transmission. "Previous hop" is taken from the incoming control packet's IP source address, and "next hop" is gleaned from the routing table the next hop to the specified core address. This transient state eventually times out unless it is "confirmed" with a join acknowledgement (JOIN_ACK) from upstream. The JOIN_ACK traverses the reverse path of the corresponding join message, which is possible due to the presence of the transient join