DISSERTATION. Submitted in Partial Fulfillment. of the REQUIREMENTS for the. Degree of. DOCTOR OF PHILOSOPHY (Electrical Engineering) at the

Size: px

Start display at page:

Download "DISSERTATION. Submitted in Partial Fulfillment. of the REQUIREMENTS for the. Degree of. DOCTOR OF PHILOSOPHY (Electrical Engineering) at the"

Shanon Dean
5 years ago
Views:

1 Hardware-Accelerated Signaling: Design, Implementation and Implications DISSERTATION for the Degree of DOCTOR OF PHILOSOPHY (Electrical Engineering) HAOBO WANG November 2004

2 Hardware-Accelerated Signaling: Design, Implementation and Implications DISSERTATION Submitted in Partial Fulfillment of the REQUIREMENTS for the Degree of DOCTOR OF PHILOSOPHY (Electrical Engineering) at the POLYTECHNIC UNIVERSITY by Haobo Wang November 2004 Approved: Department Head Copy No., 20 Date

3 Approved by the Guidance Committee: Major: Electrical Engineering Malathi Veeraraghavan Associate Professor of Electrical Engineering Date Ramesh Karri Associate Professor of Electrical Engineering Date Shivendra Panwar Professor of Electrical Engineering Date Minor: Computer Science Haldun Hadimioglu Associate Professor of Computer Science Date

4 Microfilm or other copies of this dissertation are obtainable from UMI Dissertations Publishing Bell & Howell Information and Learning 300 North Zeeb Road P. O. Box 1346 Ann Arbor, Michigan

5 VITA Mr. Haobo Wang received his B.Sc. and M.Sc. degrees in Electrical Engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 1997 and 2000, respectively. From April 2000 to August 2000, he worked for Lucent Technologies, China, as a hardware (ASIC) design engineer. Since September 2000, Mr. Wang has conducted research work towards his Ph.D. degree in Electrical Engineering at Polytechnic University, where he has been a research fellow of the Center of Advanced Technology in Telecommunications (CATT) at Polytechnic University. The work presented in this dissertation was undertaken between January 2001 and November 2004 under the supervision of Professor Malathi Veeraraghavan and Professor Ramesh Karri. Mr. Wang is a student member of the Institute of Electrical and Electronics Engineers (IEEE), a student member of the IEEE Communications Society (ComSoc), and a student member of the International Society for Optical Engineering (SPIE).

6 i ACKNOWLEDGMENTS I would like to express my deepest gratitude to the numerous people who have been instrumental in the completion of this dissertation. First and foremost, I want to thank my advisors Professor Malathi Veeraraghavan and Professor Ramesh Karri for their guidance and encouragement throughout the course of my Ph.D. program. Professor Veeraraghavan tirelessly guided me in every step and aspect of my research work, from the selection of my research topic to the final preparation of my dissertation defense. Apart from the immense assistance towards my research work, she helped me shape my personality as a researcher by helping me with the most important fundamental aspects, from better English writing skills to performing complex theoretical analyses. She has been the single biggest motivating factor during these important years of my life. I specially like to thank Professor Karri for having devoted his time and energy towards my everyday research work during the last two years. I would also like to thank the other members of my dissertation guidance committee, Professor Shivendra S. Panwar and Professor Haldun Hadimioglu, for their thoughtful and insightful advice. Professor Panwar gave me valuable comments on the applications of hardware-accelerated signaling, which greatly enriched my dissertation. Professor Hadimioglu took the time to proof-read the dissertation and help iron out many mistakes, both technical and grammatical. Many thanks go to my colleagues in the Wireless Networks lab and the CAD Research lab. It has always been enjoyable to discuss technical and non-technical issues with them. I especially thank Tao Li, Dr. Xuan Zheng, and Zhifeng Tao with whom I spent my first two years at Poly. I also thank Dr. Kaijie Wu, Bo Yang, Dr. Piyush Mishra, Nikhil Joshi,

7 ii and Tongquan Wei with whom I have had a great time during the past two years. These wonderful times with them have resulted in great friendship both at a technical and personal level. The journey through these past four years of graduate life at Poly has been one of the best experiences of my life. I owe a huge debt of gratitude to my family: my parents and my sister Minmei, for their dedicated and endless love and support not only throughout the course of my Ph.D. studies, but my entire life. Without them, I could not have achieved anything that I have achieved today. Finally I thank National Science Foundation (NSF) and the Center of Advanced Technology in Telecommunications (CATT) at Polytechnic University for providing me with the required financial support for my Ph.D. research.

8 iii Abbreviations and Acronyms API APS ASIC ATM BGP BNF CAC CAM CCAMP WG CR-LDP DCC DDR DSP DWDM FIFO FPGA FTP GbE GMPLS GPU HBA IANA IETF IntServ IP LDCC LDP LMP LOL LOS LPM LSB LSP LSR Application Programming Interface Automatic Protection Switch Application-Specific Integrated Circuit Asynchronous Transfer Mode Border Gateway Protocol Backus-Naur Form Connection Admission Control Content Addressable Memory Common Control and Measurement Plane Working Group Constraint-based Routed Label Distribution Protocol Data Communication Channel Dual Data Rate Digital Signal Processing Dense Wavelength Division Multiplexing First-In First-Out Field Programmable Gate Array File Transfer Protocol Gigabit Ethernet Generalized Multi-Protocol Label Switching Graphics Processing Unit Host Bus Adapter Internet Assigned Numbers Authority Internet Engineering Task Force Integrated Services Internet Protocol Line Data Communication Channel Label Distribution Protocol Link Management Protocol Loss of Light Loss of Signal Longest Prefix Match Least Significant Bit Label Switched Path Label Switching Router

9 iv LVDS Low Voltage Differential Signaling MAC Media Access Control MPLS Multi-Protocol Label Switching MSB Most Significant Bit MTP Message Transfer Part NIC Network Interface Card NSE Network Search Engine OC Optical Carrier OCSP Optical Circuit-switched Signaling Protocol OIF Optical Internetworking Forum OSPF-TE Open Shortest Path First-Traffic Engineering OTN Optical Transport Network PCB Printed Circuit Board PCI Peripheral Component Interface PCS Physical Coding Sublayer PNNI Private Network to Network Interface QoS Quality of Service RAM Random Access Memory RFC Request for Comment RSVP Resource ReSerVation Protocol RSVP-TE Resource ReSerVation Protocol-Traffic Engineering RTT Round-Trip Time SAN Storage Area Network SD Signal Degrade SDCC Section Data Communication Channel SDH Synchronous Digital Hierarchy SerDes Serializer/Deserializer SONET Synchronous Optical NETwork SRAM Static Random Access Memory SS7 Signaling System 7 STS Synchronous Transport Signal TCAM Ternary Content Addressable Memory TCP Transport Control Protocol TLV Type-Length-Field TOE TCP Offload Engine TTL Time-to-Live

10 v UDP UNI VHDL VHSIC VLSI WDM User Datagram Protocol User Network Interface VHSIC Hardware Description Language Very High Speed Integrated Circuits Very Large Scale Integration Wavelength Division Multiplexing

11 vi AN ABSTRACT Hardware-Accelerated Signaling: Design, Implementation and Implications by Haobo Wang Advisor: Malathi Veeraraghavan Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (Electrical Engineering) November 2004 Despite the dominance of connectionless IP networks, i.e., the Internet, connectionoriented networks are gaining attention because of their inherent support for Quality of Service (QoS). Even IP routers are being enhanced with connection-oriented features. Signaling protocols are used in connection-oriented networks to set up and tear down connections. Signaling protocols are primarily implemented in software for two reasons, complexity and the requirement for flexibility. Although these are two good reasons for software implementations, the price paid is in performance. Software implementations of signaling protocols are rarely capable of handling over 1000 calls/sec. Correspondingly, per-switch call processing delays are in the order of milliseconds. We propose to implement signaling protocols in hardware, expecting a 2-3 orders of magnitude improvement in the call handling capacities of switches. As a first step, we define Optical Circuit-switched Signaling Protocol (OCSP), a performance-oriented sig-

12 vii naling protocol designed for SONET switches. We implement OCSP on WILDFORCE FPGA evaluation board. The simulation results show a call handling rate of 150, 000 calls/sec and a per-switch call processing delay of 6.6µs. We then choose RSVP-TE for hardware acceleration. As a signaling protocol targeting almost all connection-oriented networks, RSVP-TE for GMPLS is complex and flexible, and not intended for hardware implementation. However, it is not only impractical but unnecessary to implement the complete signaling protocol in hardware. Instead, we extract a subset of RSVP-TE for hardware-acceleration and relegate the functionality beyond this subset to software. The subset is large enough to cover time-critical operations while small enough to make hardware implementation feasible. We implement the subset on a Xilinx Virtex-II FPGA device, which we call hardware signaling accelerator. With a fully pipelined architecture and innovative design techniques, a call handling rate of 250, 000 calls/sec is achieved, and per-switch call processing delay is 7.2µs. In order to demonstrate a complete switching system equipped with the hardware signaling accelerator, we design a PCI-based prototype board including both user-plane and control-plane devices. The user-plane carries the user traffic while the control-plane controls the operation of the user-plane. The hardware signaling accelerator is the core component on the control-plane. The total call setup delay consists of three components: round-trip propagation delay, call processing delays, and signaling message emission delays. The round-trip propagation delay is limited by the speed-of-light. With hardware-accelerated signaling, the call processing delays can be reduced to the order of microseconds. The last component is determined by the choice of signaling transport options: in-band signaling or out-of-band

13 viii signaling. We compare the total call setup delay for both options under different scenarios. The impact of hardware-accelerated signaling is quite far-reaching. It will fundamentally change our current view of connection-oriented networks. With the per-switch call processing delay decreasing to the order of microseconds, and the call handling rate increasing to the order of 10 5 calls/sec, connections can be established and released much faster, switches can handle more requests and maintain more simultaneous connections. New architectures, and applications that can better exploit the hardware-accelerated signaling are proposed and analyzed.

14 Contents Acknowledgment... i Abbreviations and Acronyms... iii Abstract... vi 1 Introduction Background Problem statement Related work Outline Optical Circuit-switched Signaling Protocol (OCSP): Specification and Implementation OCSP - a signaling protocol for performance Signaling messages State transition diagram of a connection Data tables Implementation of OCSP on WILDFORCE platform Parameters State transition diagram of the hardware signaling accelerator Managing the available time-slots in hardware Managing the connection references in hardware Implementation and simulation results A Subset of RSVP-TE for Hardware-Acceleration Review of RSVP-TE for GMPLS networks A Subset of RSVP-TE for GMPLS networks RSVP messages RSVP common header RSVP messages RSVP objects FILTER_SPEC/SENDER_TEMPLATE object FLOWSPEC/SENDER_TSPEC object Generalized LABEL object Generalized LABEL_REQUEST object RSVP_HOP object SESSION object Optional objects support SUGGESTED_LABEL object MESSAGE_ID/MESSAGE_ID_ACK object Connection setup and teardown with RSVP-TE signaling messages ix

15 x 4 RSVP-TE Procedures for Hardware-Acceleration Registers Data tables Routing table Incoming Connectivity table Outgoing Connectivity table Outgoing CAC table User/Control Mapping table State table Procedures An FPGA-Based Hardware Signaling Accelerator The architecture of the hardware signaling accelerator Functional modules Register bank Input message buffering system Two-level object dispatching Data table management Resource (bandwidth) management Re-transmission management Implementation and simulation results Re-configurability Design of a Prototype Switching System Architecture of the PCI-Based prototype switching system Main on-board devices Gbps signaling channel Incoming message buffer - FIFO device Data tables - the TCAM and SRAM devices The TCAM and SRAM devices The organization of the data tables User-plane device - VSC Clock distribution networks Power distribution scheme Design and Analysis of Signaling Networks Introduction to in-band signaling and out-of-band signaling Delay models Assumptions Notation Queueing model... 91

16 xi Model without retransmissions Model including re-transmissions Numerical results and implications Input parameter values Hardware-accelerated signaling engine, and µ tx = µ proc Hardware-accelerated signaling engine, and µ tx «µ proc Software signaling processor Some comments Applications of Hardware-Accelerated Signaling Utilization analysis of circuit-switched networks for file transfers Problem statement Utilization analysis and the numerical results Hardware-accelerated signaling and network survivability Utilization and delay analysis for path restoration Conclusions and Future Work 119

17 xii List of Figures Figure 1. Illustration of connection setup Figure 2. Unfolded view of a switch in connection-oriented networks Figure 3. GMPLS protocol stack diagram Figure 4. An example of connection-oriented networks Figure 5. OCSP signaling messages Figure 6. State transition diagram of a connection Figure 7. Data tables used by the signaling protocol Figure 8. Architecture of WILDFORCETM board Figure 9. State transition diagram of the hardware signaling accelerator Figure 10. Time-slot manager Figure 11. Connection reference manager Figure 12. Timing simulation for Setup message Figure 13. RSVP message common header Figure 14. RSVP object Figure 15. TLV structure of the SESSION object Figure 16. FILTER_SPEC/SENDER_TEMPLATE object Figure 17. FLOWSPEC/SENDER_TSPEC object Figure 18. LABEL object Figure 19. Format of a SONET/SDH label Figure 20. SONET multiplexing hierarchy and generalized label Figure 21. LABEL_REQUEST object Figure 22. RSVP_HOP object Figure 23. SESSION object Figure 24. MESSAGE_ID/MESSAGE_ID_ACK object Figure 25. Illustration of connection setup with RSVP-TE messages Figure 26. Network used for illustrative examples Figure 27. Example of Avail_BW Figure 28. Processing of Common header Figure 29. Processing of Path message Figure 30. Processing of SESSION object Figure 31. At the end of Path message processing Figure 32. Architecture of the hardware signaling accelerator Figure 33. Dynamic round-robin style pipelining Figure 34. Two-level of input message buffering Figure 35. Buffer management module Figure 36. TLV style object processing Figure 37. Dependencies among the data tables Figure 38. Priority arbitrator for TCAM

18 xiii Figure 39. Resource management module Figure 40. Comparison of resource allocation schemes Figure 41. Re-transmission management (buffers and timers) Figure 42. Processing of Path message Figure 43. Processing of Resv message Figure 44. Processing of PathTear message Figure 45. Architecture of the prototype board Figure 46. 1Gbps signaling channel Figure 47. Block diagram of L Figure 48. Block diagram of IDT72V Figure 49. RAM and CAM Figure 50. How does TCAM work Figure 51. FPGA/TCAM/SRAM configuration Figure 52. TCAM command bus Figure Bit look-up timing diagram Figure 54. Configurations of the five data tables in the TCAM device Figure 55. Use matched address as interface ID Figure 56. Organization of the data tables Figure 57. Block diagram of VSC Figure 58. Address bus and data bus Figure 59. Timing diagram to program the switch fabric Figure 60. Clock distribution scheme Figure 61. In-band and out-of-band signaling options Figure 62. Three cases of in-band signaling Figure 63. Queueing models of the signaling protocol processor and the signaling channel transmitters Figure 64. Signaling channel transmitter model including re-transmissions Figure 65. In-band/out-of-band signaling with hardware signaling, metro area, µ tx = µ proc Figure 66. In-band/out-of-band signaling with hardware signaling, wide area, µ tx = µ proc Figure 67. In-band/out-of-band signaling with hardware signaling, metro area, µ tx «µ proc Figure 68. In-band/out-of-band signaling with hardware signaling, wide area, µ tx «µ proc Figure 69. In-band/out-of-band signaling with software signaling, metro area Figure 70. In-band/out-of-band signaling with software signaling, wide area Figure 71. Aggregate network utilization vs. offered traffic load Figure 72. Crossover file size vs. percentage of the offered load Figure 73. A model for a circuit-switched node operated in call-blocking mode

19 Figure 74. Crossover file size vs. total utilization (hardware signaling) Figure 75. Crossover file size vs. total utilization (software signaling) Figure 76. Pure software signaling needs more client-side interfaces to create the aggregate load to achieve the desired utilization Figure 77. A sample network with a single link failure xiv

20 xv List of Tables Table 1. Clock cycles consumed by various OCSP messages Table 2. Data registers Table 3. Configuration registers Table 4. Routing table Table 5. Incoming Connectivity table Table 6. Incoming Connectivity table at switch II in Fig Table 7. Outgoing Connectivity table Table 8. Outgoing Connectivity table for switch I in Fig Table 9. Outgoing CAC table Table 10. Outgoing CAC table for switch I in Fig Table 11. User/Control Mapping table Table 12. State table Table 13. Implementation results Table 14. Clock Cycles to receive an RSVP-TE message Table 15. L8104 system interface...71 Table 16. Main interface signals of IDT72V Table 17. Typical TCAM and SRAM signals Table 18. Signals used to program the switch fabric Table 19. Devices and related clock signals Table 20. Poser dissipations of main devices Table 21. Notation Table 22. Input parameter values Table 23. Notation Table 24. Number of circuits needed for a given utilization Table 25. Notation

21 1 Chapter 1 Introduction 1.1 Background Since the premiere of ARPAnet, the precursor to today s Internet, in 1969, the reach of connectionless IP networks has been growing. Nowadays the Internet is so successful that more than 800 million users around the world use the Internet regularly [1]. The Internet Assigned Numbers Authority (IANA) has allocated more than 2 billion IP addresses to 229 countries and regions [2], which account for more than half of total 2 32 IPv4 addresses. The history of connection-oriented networks dates back even further to 1879, when the then Bell Telephone Company set up the first telephone network in the world. Now different types of connection-oriented networks, such as Asynchronous Transfer Mode (ATM) networks, Synchronous Optical NETworks (SONET), Synchronous Digital Hierarchy (SDH) networks, and Dense Wavelength Division Multiplexed (DWDM) networks, are deployed widely. However, these connection-oriented networks are primarily used as carriers for IP traffic. For example, Abilene [3], the backbone of Internet2 [4], leases OC-192c/OC-48c SONET circuits from Qwest Communications to inter-connect backbone routers. Internet2 provides high-speed IP service to end users while the aggregate user traffic is carried on the SONET circuits. Two issues typically associated with connection-oriented networks are as follows. First, in connection-oriented networks, a connection must be established before data transfer can take place. This process is called connection (call) setup, which involves the exchange of signaling messages between end hosts and the associated per-hop resource reservation. The connection (call) setup overhead becomes significant if the holding time of the connection is short, which is typical for today s end user traffic. Second, once a con-

22 2 nection is established, the end-to-end connection is dedicated to the two end systems on that connection. There is no resource-sharing, which may lead to poor utilization if the user traffic is bursty. However, connection-oriented networks have been gaining more attention recently because of their inherent support for Quality of Service (QoS). Standardization organizations, such as Optical Internetworking Forum (OIF) and Internet Engineering Task Force (IETF) Common Control and Measurement Plane (CCAMP) working group are creating protocols, standards for connection-oriented networks, both packet-switched and circuitswitched. New network architectures and applications are proposed to better utilize connection-oriented networks. For example, in [5], the authors note that file transfers have no intrinsic burstiness, and hence identify file transfer as an application that can fully utilize connections. Signaling protocols are used in connection-oriented networks to set up and tear down connections. Examples of signaling protocols include the Signaling System 7 (SS7) in telephone networks [6], the User Network Interface (UNI) [7] and Private Network to Network Interface (PNNI) [8] signaling protocols in ATM networks, Label Distribution Protocol (LDP) [9], Constraint-based Routed LDP (CR-LDP) [10] and Resource ReSerVation Protocol - Traffic Engineering (RSVP-TE) [11] in Multi-Protocol Label Switched (MPLS) networks, and the extensions of these protocols for Generalized MPLS (GMPLS) [12]- [15], which includes SONET/SDH and DWDM networks. Setting up a connection at a switch involves five steps: 1. Determining the next-hop switch toward which the connection should be routed. This task typically requires a data table look-up. Routes to destinations are pre-computed

23 3 using data gathered by routing protocols and stored in data tables. 2. Checking the availability of and reserving required resources (link capacity and optionally buffer space). This step is generally called connection admission control. 3. Assigning labels for the connection. The exact form of the label depends on the type of connection-oriented network. For example, in SONET/SDH switches, a label identifies time-slots on the incoming and outgoing switch interfaces. 4. Programming the switch fabric to map incoming labels to outgoing labels. 5. Updating the state information associated with the connection. Switch SW 4 Switch SW 5 8. Setup Success Switch Source SW 1 1. Setup 7. Setup Success 6. Setup Success 2. Setup Switch SW 2 3. Setup Switch SW 3 5.Setup Success 4. Setup Destina tion Figure 1. Illustration of connection setup. 9. Connection (circuit or virtual circuit) established In a typical connection setup procedure as illustrated in Fig. 1, a Setup message 1 requesting the setup of a connection progresses from the calling end device (source) toward the called end device (destination) hop-by-hop, and a Setup Success message travels in the reverse direction, again hop-by-hop. In this scenario, the first three steps should be performed in the forward direction so that resources are reserved as the Setup proceeds, while the remaining steps could be performed as signaling message proceeds in the forward direction or in the reverse direction. Other variants of this procedure are possible such as reverse direction resource reservation [16]. 1 Here we use a generic name for the message, i.e., Setup. Different signaling protocols call this message by different names, e.g., Setup in ATM UNI, Label Request in CR-LDP, or Path in RSVP-TE.

24 4 Upon completion of data transfer, the connection is released with a similar end-to-end release procedure. Typically release messages are also confirmed. Switches processing the release messages free up bandwidth, optionally buffer, and label resources for usage by the next connection. To support the above-described connection setup and release procedures, signaling messages with parameters in each message, some mandatory and some optional, are defined in a typical signaling protocol. In addition, other messages to support notifications, keep-alive exchanges, etc., are also present in signaling protocols. Control plane Routing process up LMP process Signaling process Input signaling Interfaces NIC NIC Hardware Signaling Accelerator NIC NIC Output signaling Interfaces Input Interfaces User plane Line card Line card Switch Fabric Line card Line card Output Interfaces Figure 2. Unfolded view of a switch in connection-oriented networks. With regards to implementation, we illustrate a typical architecture of a switch (unfolded view) in connection-oriented networks in Fig. 2. The user-plane hardware consists of a switch fabric and line cards that terminate input/output interfaces carrying user traffic. In packet switches, the line cards perform network-layer protocol processing to determine how to forward packets. In circuit switches, the line cards are typically multiplexers/demultiplexers. LMP RSVP-TE UDP CR-LDP TCP BGP OSPF-TE IP Figure 3. GMPLS protocol stack diagram.

25 5 The control-plane unit typically consists of three modules, routing, link management, and signaling. For example, the IETF CCAMP working group is defining the protocols for GMPLS regarding all three aspects, such as Open Shortest Path First - Traffic Engineering (OSPF-TE) [17] for routing, Link Management Protocol (LMP) [18] for link management, and RSVP-TE/CR-LDP for signaling, as shown in Fig. 3. These control-plane modules are typically implemented as software processes residing on the microprocessor, although in Fig. 2, we show a hardware signaling accelerator to speed up the processing of signaling messages. Network Interface Cards (NICs) are shown on the control-plane. These cards are used to process the lower layers of the signaling protocols on which the signaling messages are carried. For example, in SS7 networks, the NICs process the Message Transfer Part (MTP) layers, which are the lower layers of the SS7 protocol stack. In optical networks, the expectation is that an out-of-band IP network will be used to carry signaling messages between switches. In this case, the NICs may be Ethernet cards. It is also possible to carry the signaling messages on the same interface as the user data. For example, the Data Communication Channel (DCC) in each SONET/SDH signal is often used for signaling. It is referred to as in-band signaling. Fig. 4 shows an example of connection-oriented network, where user-plane and control-plane are located in different networks ( out-of-band signaling ). Control plane User plane Figure 4. An example of connection-oriented networks.

26 6 1.2 Problem statement Signaling protocols are primarily implemented in software for two important reasons. First, signaling protocols are quite complex with many messages, parameters and procedures, especially the signaling protocols for GMPLS, which target almost all connectionoriented networks. Second, signaling protocols are updated frequently requiring a certain amount of flexibility for upgrading field implementations. For example, RSVP-TE with extensions for GMPLS evolved from RSVP-TE for MPLS, which, in turn, evolved from RSVP for IP networks. Request for Comments (RFCs) and Internet drafts are being continuously proposed and updated to enhance RSVP-TE. Hence an implementation of RSVP-TE must be capable of frequent upgrades. While these are two good reasons for implementing signaling protocols in software, the price paid is in performance. Signaling protocol implementations in software are rarely capable of handling over 1000 calls/sec. Correspondingly, call setup delays per-switch are in the order of milliseconds [19]. Connection-oriented networks have been gaining more attention because of their inherent QoS support. Even IP routers are being enhanced with connection-oriented features. For example, MPLS is being added to IP networks along with RSVP-TE to establish Label Switched Path (LSP) tunnels. These trends require a break-through in call handling capacities. The problem statement of this research is to determine whether signaling protocols can be implemented in hardware, and if so, to demonstrate it with an actual implementation [20]. In addition, we explore the impact of hardware-accelerated signaling protocol implementations and study the related question of how to reduce signaling message transmission delays. Implementing signaling protocols in hardware is a great challenge considering the

27 7 complexity and the requirement for flexibility of such protocols. The complexity problem can be partly overcome by defining a subset of signaling protocols for hardware acceleration. This subset should cover most time-critical operations. Nevertheless, innovative hardware implementation techniques are required. Towards solving the problem of flexibility, we propose to use re-configurable hardware, i.e., Field Programmable Gate Arrays (FPGAs) as the hardware platform for implementation. FPGAs can be re-configured as signaling protocols evolve. Our implementation is targeted for transit switches in SONET networks, and it supports point-to-point unidirectional connections. 1.3 Related work There are many signaling protocol standards as listed in Section 1.1. In addition, many other signaling protocols have also been proposed in the literature [21]-[26]. Some of these protocols such as fast reservation schemes [21][22], YESSIR [23], PCC [24] have been designed to achieve low call setup delays by improving the signaling protocols themselves. Fast Reservation Protocol (FRP) [25] is the only signaling protocol that has been implemented in Application Specific Integrated Circuit (ASIC) hardware. Such an ASIC implementation is inflexible because upgrading the signaling protocol implementation entails a complete re-design of the ASIC. Recently, a simple signaling protocol called JumpStart, which was designed for hardware implementation, was proposed in [26] for burst-switched networks. Because of the popularity of RSVP-TE for MPLS networks and the desire for scalable QoS support in large IP networks, some researchers have also explored hardware-accelerated implementation of the RSVP-TE signaling protocol. In [27], the Keep-It-Simple

28 8 Signaling (KISS) protocol, a simplified version of RSVP-TE for hardware-acceleration, was proposed. They implemented the KISS protocol in software for debugging and protocol development. They also discussed possible approaches for hardware implementation and estimated the achievable performance, but no hardware implementation was carried out. The KISS protocol is not compatible with RSVP-TE and any other available signaling protocols, it cannot inter-operate with any deployed switches or MPLS-enabled IP routers. The KISS protocol is targeted for MPLS-enabled IP networks, and will not work with other connection-oriented network technologies. Other comparable protocols implemented in hardware include TCP. In [28], Benz proposed to implement the normal TCP functionalities in hardware and handle complex functionalities such as congestion control, error control, in software. He implemented his approach on the Myrinet platform. A similar concept, TCP Offload Engine (TOE) [29], is gaining some popularity in today s market. These solutions off-load part of the TCP functionalities from the CPU to a co-processor located at a NIC or a Host Bus Adapter (HBA, the NIC equivalent in Storage Area Networks). Molinero-Fernandez and Mckeown [30] proposed to implement a technique called TCP switching, in which the TCP SYNchronize segment is used to trigger connection setup and TCP FINish segment is used to trigger connection release. By processing these segments inside switches, the TCP SYN/FIN procedures become comparable to a signaling protocol for connection setup/release. The authors planned to implement this technique in FPGAs.

29 9 1.4 Outline This dissertation is organized as follows. In Chapter 2, we discuss our work on the design and implementation of a new performance-oriented signaling protocol called Optical Circuit-switched Signaling Protocol (OCSP) [31], which we define for SONET networks. While in typical signaling protocol definitions, flexibility is the primary concern, our primary goal in designing this protocol is to achieve high-performance implementation. Concurrent to our definition and implementation of OCSP, the optical networking industry defined new signaling protocols for SONET/SDH/DWDM networks. Of these, RSVP-TE for GMPLS is the one that has gained popularity, and is now widely implemented by many vendors. This makes it important to demonstrate our concept of hardware accelerated signaling with an implementation of RSVP-TE. As a generalized signaling protocol targeting almost all connection-oriented networks, RSVP-TE for GMPLS is quite complex and not intended for hardware implementation. To solve this challenge we define a subset of RSVP-TE [32] for hardware acceleration and relegate the remaining functionality to software. The former should be large enough to handle most signaling messages and yet small enough to make hardware implementation feasible. Chapter 3 describes this work. In Chapter 4, we describe the procedures needed to implement this subset of RSVP- TE [33]. Registers and data tables used in the hardware implementation are defined. Based on the subset of RSVP-TE defined in Chapters 3 and 4, Chapter 5 describes the architecture of hardware implementation. It is implemented on a Xilinx Virtex-II FPGA, which we call the hardware signaling accelerator [34]. We explain the main functional

30 10 modules in detail, present the implementation and simulation results. We also address the issue of re-configurability. In order to demonstrate a real switching system, in which a switch fabric can work under the control of the hardware signaling accelerator described in Chapter 5, we build a PCI-based prototype board. We present this work in Chapter 6. The board-level architecture is given. Main on-board devices are discussed in detail. We also explain the clock and power distribution schemes. In Chapter 7 we discuss a related issue on how to reduce signaling message transmission delays. We compare the call setup delays of two options: in-band signaling and outof-band signaling under different scenarios. This issue is important because the selection of signaling transport mechanisms affects the total call setup delay. Gains in reducing of call setup delay through hardware-accelerated signaling must be through sound design of the signaling transport mechanism. Our work on hardware-accelerated implementation of signaling protocols has a farreaching impact on connection-oriented networks. With decreased call setup delay and increased call handling rate, connections can be established and released much faster, and switches can handle more requests and maintain more simultaneous connections. In Chapter 8, we model and analyze the impact of hardware-accelerated signaling on current connection-oriented networks, and propose applications that can fully exploit the benefit of hardware-accelerated signaling.

31 11 Chapter 2 Optical Circuit-switched Signaling Protocol (OCSP): Specification and Implementation In Chapter 1, we mentioned many signaling protocols. Those protocols are complex and flexible, and not intended for hardware implementation. In order to demonstrate the feasibility and advantage of hardware-accelerated signaling, we design and implement a signaling protocol called OCSP, which is specifically engineered for SONET networks. While in typical signaling protocol definitions, flexibility is a primary concern, our primary goal in designing this protocol is to achieve high-performance implementations. We introduce OCSP in Section 2.1. It is not a complete signaling protocol specification. Our approach is to define a subset large enough that a significant percentage of user requirements can be handled with this subset. Infrequent operations are delegated to the software signaling process. Therefore, often in this description, we will leave out details that are handled by the software. In Section 2.2, we demonstrate the hardware implementation of OCSP. The hardware platform, the design, and the implementation and simulation results are discussed. 2.1 OCSP - a signaling protocol for performance In this section, we first introduce four OCSP signaling messages used to set up (Setup and Setup-Success), and tear down (Release and Release-Confirm) connections. We then give the state transition diagram of a connection at each switch. Finally we discuss the data tables we introduce to facilitate the hardware implementation of the OCSP signaling protocol.

32 Signaling messages Setup Message Setup-Success Message Release/ Release-Confirm Message Message Length TTL Msg.Typ (0001) Connection Ref. (prev.) Destination IP Address Source IP Address Previous Node IP Address Bandwidth Reserved Interface Number Timeslot Number Pad Bits Checksum Message Length Bandwidth Msg.Typ (0010) Connection Ref. (prev.) Connection Ref.(own) Reserved Checksum Message Length Cause Msg.Typ (0011/0100) Connection Ref. (prev.) Connection Ref.(own) Reserved Checksum Figure 5. OCSP signaling messages. We define four signaling messages, Setup, Setup-Success, Release, and Release-Confirm. Fig. 5 illustrates the detailed fields of these four messages. The Message Length field specifies the length of a message. The Time-to-Live (TTL) field has the same meaning as in IP header. The Message Type field distinguishes four different messages. The Connection Reference identifies a connection locally. The Source IP Address, Destination IP Address, and Previous Node IP Address specify the end hosts and the previous node respectively. The Bandwidth field specifies the bandwidth requirement of a connection. The Interface/time-slot pairs are used to program the switch fabric. If there are an odd number of interface/time-slot pairs, a 16-bit pad is inserted because the message is 32-bit aligned. The Checksum field covers the whole message. In Setup-Success message, the Bandwidth field records the allocated bandwidth. In Release and Release-Confirm messages, the Cause field explains the reason of release. The Message Length, Message Type and Connection Reference fields are common to all messages and occupy the same relative position. Such an arrangement simplifies hardware design.

33 State transition diagram of a connection Release Confirm Received Closed (0001) Setup Message Received Release Sent (0100) Release or Release Confirm Received Setup Sent (0010) Release Message Received Established (0011) Setup Success Received Figure 6. State transition diagram of a connection. Each connection passes through a certain sequence of states at each switch. In our protocol, we define four states: Setup-Sent, Established, Release-Sent and Closed. Fig. 6 shows the state transition diagram. Initially, the connection is in the Closed state. When a switch receives a Setup message, if resources are available, the switch accepts the request, reserves the resources, sends the message to the next switch on the path, and marks the state of the connection as Setup-Sent. When a switch receives a Setup-Success message, which means all switches along the path have successfully established the connection, the state of the connection changes to Established. Release-Sent means the switch has received a Release message, freed the allocated resources, and sent the message to the previous node. After the switch receives a Release-Confirm message, the connection is successfully terminated, the state of the connection returns to Closed.

34 Data tables Routing table CAC table Conn. table State table Switch Mapping table Index Return value Destination address Next node address Next node interface# Index Return/Written value Next node address Total bandwidth Available bandwidth Index Return value Neighbor address Neighbor interface# Own interface# Index Own connection reference Own connection reference Index Return/Written value Connection reference Node address State Bandwidth Previous Next Previous Next Sequential offset(0 to BW-1) Incoming Ch. ID Return/Written value Outgoing Ch. ID Interface# Timeslot# Interface# Timeslot# Figure 7. Data tables used by the signaling protocol. There are five data tables associated with the signaling protocol, namely, Routing table, Connection Admission Control (CAC) table, Connectivity table, State table, and Switch-Mapping table, as shown in Fig. 7. The Routing table is used to determine the nexthop switch. The index is the destination address; the fields include the address of the next switch and the corresponding output interface. The CAC table maintains the available bandwidth on the interfaces leading to neighboring switches. The Connectivity table is used to map the interface numbers used at neighboring switches to local interface numbers. This information will be used to program the switch fabric. The State table maintains the state information associated with each connection. The connection reference is the index into the table. The fields include the connection references and addresses of the previous and next switches, the bandwidth allocated for the connection, and most importantly, the state information as defined in Fig. 6. Switch fabrics, such as PMC-Sierra PM5372, Agere TDCS6440G and Vitesse VSC9182, have similar programming interfaces. For example, VSC9182 has an 11-bit address bus A[10:0] and a 10-bit data bus D[9:0]. The switch is programmed by present-

35 15 ing the output interface/time-slot pair on A[10:0] and the input interface/time-slot pair on D[9:0]. We define a generic Switch-Mapping table to emulate this programming interface, with the connection reference as the index and the incoming interface/time-slot pair and the outgoing interface/time-slot pair as the fields. 2.2 Implementation of OCSP on WILDFORCE platform Host 32 PCI Local Bus chip 32 FIFO 0 FIFO 1 SRAM FIFO 4 36 PCI Bus Signaling accelerator CPE 0 Mezzanine memory PE 1 Cross-Bar PE 2 PE 3 PE 4 Local Bus Figure 8. Architecture of WILDFORCETM board. To demonstrate the feasibility and advantage of hardware accelerated signaling, we implement OCSP on FPGA. We use the WILDFORCETM re-configurable computing board shown in Fig. 8. It consists of five XC4000 series Xilinx FPGAs: one XC4036 (CPE0) and four XC4013 (PEn). These five FPGAs can be used for user logic while the cross-bar provides programmable interconnections between the FPGAs. In addition, there are three First-In First-Out (FIFO) devices on the board, and one dual-port Random Access Memory (RAM) device attached to CPE0. The board is connected to the host system through the PCI bus. The board supports a C language based Application Programming Interface (API) through which the host system can dynamically configure the FPGAs and access the on-board FIFOs and RAMs.

36 16 For our prototype implementation, we use CPE0, PE1, FIFO0, FIFO1 and dual-port RAM. The CPE0 implements the hardware signaling accelerator state machine, State and Switch-Mapping tables, FIFO0 controller, and dual-port RAM controller. The dual-port RAM implements the Routing, CAC and Connectivity tables. FIFO0 and FIFO1 work as receive and transmit buffers for signaling messages. PE1 implements the FIFO1 controller and provides the data path between CPE0 and FIFO1. In the following subsections, we discuss some parameters selected for our hardware implementation, and the state transition diagram of the hardware signaling accelerator. We also present two novel approaches for managing resources such as time-slots and connection references Parameters Our implementation supports 5-bit addresses for all nodes, 16 interfaces per switch, a maximum bandwidth of 16 STS-1s per interface, and a maximum of 32 simultaneous connections at each switch. We are primarily limited by the size of the on-board dual-port RAM on our off-the-shelf prototype board, and hence are forced to use these small values for address sizes, etc. The FPGA implementation itself does not limit these parameters. Hence, we can easily increase the values of these parameters when designing a customized board. Recently, there has been significant progress in fast table look-up in both research literature [35][36] and commercial products. Look-up/classification co-processors are widely available, such as Silicon Access Networks iap, PMC-Sierra ClassiPI, Solidum PAX1200, etc. These chips can easily process 100M look-ups/sec. In our prototype implementation, we assume that routing table look-ups can be off-loaded to an external

37 17 co-processor, and use an equivalent four memory access duration to simulate a routing table look-up State transition diagram of the hardware signaling accelerator checksum reset ready failed receive error error error TTL expired no route bandwidth unavailable chk TTL rd route rd CAC wr CAC Setup parse other rd state Release invalid parameter state mismatch rd CAC error error Release-Confirm rd conn allc cref Setup- Success wr CAC rd switch free cref allc ts wr switch bw>1 free ts bw>1 wr state ready transmit Figure 9. State transition diagram of the hardware signaling accelerator. Fig. 9 shows the detailed state transition diagram of the hardware signaling accelerator. When a signaling message arrives, it is temporarily buffered in FIFO0. The hardware signaling accelerator then reads the message from FIFO0 and delimits the message according to the Message Length field. The Checksum field is verified. The State table is consulted to check the current state of the connection. Based on the Message Type field, the hardware signaling accelerator processes the message accordingly. The processing of the Setup message involves checking the TTL field, reading the Routing table to determine the next switch and corresponding output interface, updating the CAC table, reading the Connectivity table to determine the input interface, allocating a connection reference to identify the connection, allocating time-slots and programming the Switch-Mapping table. The Setup-Success message requires no special processing. The processing of the Release

38 18 message involves updating the CAC table, releasing the time-slots reserved for the connection. When processing the Release-Confirm message, the allocated connection reference is freed and thus, the connection is terminated. After processing any message, the State table is updated. The new message is generated and buffered in FIFO1 temporarily, and then transmitted to the next switch along the path Managing the available time-slots in hardware The management of time-slots and connection references is easy in software through simple array manipulations. However, this poses a challenge in hardware implementations. Our solution is to use a priority decoder. interface# write back mark used scratch register 4 2 Priority Decoder output timeslot Figure 10. Time-slot manager. Fig. 10 illustrates our implementation of a time-slot manager. Each entry in the timeslot table is a bit-vector, corresponding to an output interface with the bit-position determining the time-slot number and the bit-value determining availability of the time-slot ( 0 available, 1 used). The priority decoder is used to select the first available time-slot. When an interface number is provided by the signaling state machine to the time-slot manager, the bit-vector corresponding to the interface is sent into the priority decoder and the first available time-slot is returned. Then the bit corresponding to the time-slot is marked

39 19 as used (from 0 to 1 ) and the updated bit-vector is written back to the table. In the example shown in Fig. 10, the time-slot manager is required to find a free time-slot on interface 3. It returns time-slot 14 and marks it as used. De-allocating a time-slot follows a similar pattern but the time-slot number is needed as an input in addition to the interface number in order for the time-slot manager to free the time-slot Managing the connection references in hardware Table Pointer 7 Write back Priority Decoder 3 Pointer Adjust Mark Used Scratch Register Figure 11. Connection reference manager Output Connection Reference 4 A connection reference is used to identify a connection locally. It is allocated when establishing a connection and de-allocated when terminating it. A straightforward implementation of a connection reference manager is a bit-vector combined with a priority decoder. The priority decoder finds the first available bit-position (a bit marked as 0 ), sends its index as the connection reference and updates the bit as used (a bit marked as 1 ). However, this approach is impractical when there are a large number of connections. While our actual implementation only used 32 connections per switch, we designed the connection reference manager to handle 2 12 simultaneous connections, which requires a bit-vector with 4096 entries. This is too large for the simple priority decoder implementa-

40 20 tion as used for time-slots. Our improvement to this basic approach is to use a table with 256 entries of 16-bit vectors to record the availability of a total of 4096 connection references. Fig. 11 illustrates this approach. With 4096 connections, we need a 12-bit connection reference. The first 8 bits of the connection reference correspond to the table pointer, while the remaining 4 bits correspond to the first available connection reference from among the 16 pointed to by the table pointer. The connection reference manager starts with the table pointer set to 0. If any of the 16 connection references corresponding to this row of the table are available (i.e., a bit position is 0 ), the priority decoder will identify this index and write the output connection reference as a concatenation of the 8-bit table pointer and the 4-bit index extracted. In the example shown in Fig. 11, the 12 th bit in the first row is a 0. Therefore it outputs the connection reference number 12. The bit-position is marked as used as illustrated with steps 5-7 of Fig. 11. De-allocating follows a similar approach; the bit corresponding to the connection reference is reset to 0 and the updated bit-vector is written back to the table. We can parallelize this approach by partitioning the table into several smaller tables, each with a pointer and a priority decoder, forming several smaller managers. All these managers work concurrently. A round-robin style counter can be used to choose a connection reference among the managers. Thus, this approach can be generalized if more than 4096 connections are to be handled Implementation and simulation results We develop a prototype VHDL model for the hardware signaling accelerator, use Synplify tool for synthesizing the design and Xilinx Alliance tool for the placement and rout-

21 ing of the design. CPE0 (XC4036) uses 62% of its resources while PE1 (XC4013) uses 8% of its resources. We perform timing simulations of the hardware signaling accelerator using ModelSim simulator.

41 21 ing of the design. CPE0 (XC4036) uses 62% of its resources while PE1 (XC4013) uses 8% of its resources. We perform timing simulations of the hardware signaling accelerator using ModelSim simulator. The simulation results for the Setup message are shown in Fig. 12. It can be seen that while receiving and transmitting a Setup message (requesting a bandwidth of STS-12 at a cross-connect rate of STS-1) consumes 12 clock cycles each, processing of a Setup message consumes 53 clock cycles. Overall, this translates into 77 clock cycles to receive, process and transmit a Setup message. We estimate a maximum of cycles if multiple (four) routing table look-ups are needed. 101 clock Setup message receive Setup message process Setup message transmit Figure 12. Timing simulation for Setup message. Processing Setup-Success, Release and Release-Confirm messages consumes about 70 clock cycles total since these messages are much shorter (2 32-bit words versus bit words for Setup) and require simpler processing. A detailed breakdown of the clock cycles consumed to process each of these signaling messages is shown in Table 1. Assuming a 25MHz clock, this translates into 3.1 to 4.0 microseconds for Setup message processing and about 2.8 microseconds for the combined processing of Setup-Success, Release and Release-Confirm messages. Thus, a complete setup and teardown of a connection consumes about 6.6 microseconds. Accordingly, the achievable call handling rate

42 22 is as high as 150, 000 calls/sec. It is a 100X 1000X speed-up compare to softwarebased implementations of signaling protocols. Table 1. Clock cycles consumed by various OCSP messages. Clock cycles Setup Setup Success Release Release Confirm

43 23 Chapter 3 A Subset of RSVP-TE for Hardware- Acceleration The hardware implementation of OCSP demonstrated a call handling rate of 150, 000 calls/sec, which is a 100X 1000X speedup vis-à-vis its software counterpart. However, OCSP, designed by us as a proof-of-idea, is specifically engineered for performance and limited to SONET networks. It is important to demonstrate our concept of hardware-accelerated signaling with an implementation of a widely used signaling protocol. Among the signaling protocols we mentioned in Chapter 1, RSVP-TE for GMPLS networks attracts our attention. It is one of the most widely implemented signaling protocols. As one of the two signaling protocols recommended for GMPLS (the other is CR- LDP), RSVP-TE can work with different types of connection-oriented networks, such as SONET/SDH networks, DWDM networks. Most telecom equipment vendors support RSVP-TE in their switches. Implementing RSVP-TE for GMPLS in hardware is a challenge because of its complexity. Our solution is to only implement the time-critical functions in hardware and relegate the non-time-critical functions to software. For this purpose, it is essential to extract a subset of RSVP-TE for hardware-acceleration. In Section 3.1, we briefly review the history of RSVP-TE for GMPLS. We then detail the subset we extract in Section 3.2. This subset is targeted for implementation at a transit SONET switch and it only supports pointto-point unidirectional connections.

44 Review of RSVP-TE for GMPLS networks The original RSVP was initially defined as a resource reservation setup protocol for IntServ [37]. RSVP can be used by a host to request specific QoS from the network. It can also be used by routers to deliver QoS requests to all nodes along the path, and to establish and maintain state to provide the requested service. RSVP was designed to work in a multicast scenario. In order to efficiently accommodate large groups, dynamic group membership, and heterogeneous receiver requirements, RSVP makes receivers responsible for requesting a specific QoS. RSVP requests resources for simplex flows, i.e., in one direction only. In order to reserve resources in both directions, two sets of RSVP message exchanging will be incurred. When the MPLS community was looking for a signaling protocol for MPLS, they first came up with LDP and CR-LDP, which are specifically designed for MPLS. At the same time, an enhanced version of RSVP with traffic engineering extensions (RSVP-TE) [11] was introduced as an alternative for MPLS signaling. Both RSVP-TE and CR-LDP are similar while RSVP-TE is more popular in the industry. RSVP-TE was first designed for packet-switched MPLS networks. When MPLS was generalized for circuit-switched networks, such as SONET/SDH networks, DWDM networks, RSVP-TE was also extended to support these network technologies [15]. Besides, technology-specific extensions were defined to address technology-specific issues. For example, [38][39] defined the traffic parameters and labels for SONET/SDH networks, and G.709 Optical Transport Networks (OTNs), respectively. RSVP supports the notion of soft-state and periodic refreshing. If a refresh is not received before the time-out interval expires, connections are released. Since packet for-

45 25 warding is based on the IP routing data table, as routing data changes, the resource reservations need to follow. Hence RSVP included the use of refresh message. A second reason for refresh messages is that since RSVP uses unreliable IP service, the occasional loss of an RSVP message is handled through refreshes. However, the first reason does not hold for GMPLS networks, where once a circuit is established, the routing data table is not consulted for data forwarding. And refreshing is not a good solution to the reliability issue: if the refresh interval is small, the overhead spent in processing refresh messages can become excessive; while if the refresh interval is large, it takes longer to detect the loss of an RSVP message. Therefore RFC2961 [40] introduced a TCP-like mechanism to ensure reliable message transport. The RSVP-TE and related protocols are revised often and enhanced to accommodate new applications while maintaining backward compatibility. The subset we describe in the next section is mainly based on the original RSVP [16], the RSVP-TE for MPLS networks [11], the RSVP-TE extensions for GMPLS networks [15], the SONET/SDH extensions for GMPLS networks [38], and the RSVP refresh overhead reduction extensions [40]. 3.2 A Subset of RSVP-TE for GMPLS networks RSVP messages All RSVP messages begin with a common header, followed by a body consisting of a variable number of variable-length, typed objects RSVP common header Vers Flags Msg Type RSVP Checksum Send_TTL (Reserved) RSVP Length Figure 13. RSVP message common header. Fig. 13 shows the construct of RSVP common header. The 4-bit Vers field specifies

46 26 the protocol version number. Current RSVP (including RSVP-TE and its extensions) is version 1. RSVP reserves 4 bits for Flags. No flag bits are defined in [11][15][16], in [40], bit 0 was introduced as Refresh-Reduction-Capable bit, value 1 indicates the support for the refresh overhead reduction extensions. The RSVP_Checksum field calculates the one s complement of the one s complement sum of the whole message. This field helps to determine possible message error. The RSVP_Length field indicates the total length of the RSVP message in bytes. The Msg_Type field specifies the type of the message. There are 7 messages defined in traditional RSVP, Path, Resv, PathErr, ResvErr, PathTear, ResvTear, and ResvConf. RSVP-TE added Hello message for the purpose of node failure detection. When RSVP- TE was enhanced for GMPLS, Notify message was introduced to support fast failure notification. RSVP messages are carried in IP packet directly. The corresponding protocol number in IP header is 43. There is a TTL field in IP header. RSVP common header also has a Send_TTL field, which records the TTL value of the transmitting node. This field can be used to indicate the existence of non-rsvp capable node. If the received TTL in the IP header is different from the transmission TTL stored in the common header, there must be non-rsvp nodes exist between two neighboring RSVP-capable nodes RSVP messages Totally nine RSVP messages are defined in [11][15][16]. Among these messages, Path and Resv messages are used to set up an LSP, PathTear/ResvTear is used to tear down an LSP. These four messages should be processed by hardware. ResvConf message, used to confirm Resv message, is triggered by an optional object, RESV_CONFIRM, in Resv mes-

47 27 sage. We assume normally there is no ResvConf message in our application. All other messages, PathErr, ResvErr, Hello, and Notify, are non-time-critical. These four messages, together with ResvConf, should be processed by software. This section details the format of the four RSVP messages to be processed by hardware. Messages are defined in Backus-Naur Form (BNF) 1 format. We omit all optional objects in the messages. The order of the objects in a message is recommended, but not mandatory, meaning an implementation should create messages with the objects in the order shown below, but accept the objects in any permissible order. 1. Path message A Path message is used by an upstream Label Switching Router (LSR) to ask a downstream LSR to allocate a label for an LSP. More generally, it is a request to set up an LSP. There are 6 objects mandatory in Path Message, SESSION, RSVP_HOP, TIME_VALUES, LABEL_REQUEST, SENDER_TEMPLATE, and SENDER_TSPEC. <Path Message>::= <Common Header> <SESSION> <RSVP_HOP> <TIME_VALUES> <LABEL_REQUEST> <sender descriptor> <sender descriptor>::= <SENDER_TEMPLATE> <SENDER_TSPEC> 1 Backus-Naur Form, introduced by John Backus and improved by Peter Naur, is one of the most commonly used metasyntactic notation for specifying the syntax of programming languages, command sets, and the like. The meta-symbols of BNF are: ::= meaning is defined as meaning or <> angle brackets used to surround item names [] square brackets used to surround optional items The BNF implies an order for the items.

48 28 2. Resv Message A Resv message is used by a downstream LSR to advertise the binding of a label to a specific LSP. After a label is allocated for an LSP, a Resv message, which carries the label assigned to the LSP, is sent to the upstream LSR. There are 7 objects mandatory in Resv message, SESSION, RSVP_HOP, TIME_VALUES, STYLE, FLOWSPEC, FILTER_SPEC, and LABEL. <Resv Message>::= <Common Header> <SESSION> <RSVP_HOP> <TIME_VALUES> <STYLE> <flow descriptor list> <flow descriptor list>::= <FLOWSPEC> <FILTER_SPEC> <LABEL> 3. PathTear and ResvTear Messages In RSVP-TE, either source or destination can tear down an LSP. Source node tears down an LSP by sending out a PathTear message, while the destination node can do the same by sending out a ResvTear message. <PathTear Message>::= <Common Header> <SESSION> <RSVP_HOP> <sender descriptor> <sender descriptor>::= <SENDER_TEMPLATE> <ResvTear Message> :: =<Common Header> <SESSION>

49 29 <RSVP_HOP> <STYLE> <flow descriptor list> <flow descriptor list>::= <FLOWSPEC> <FILTER_SPEC> RSVP objects Objects following the common header are the real carriers of message fields. Each object, a Type-Length-Value (TLV) triplet, is a self-contained element, consisting of one or more 32-bit words with a one-word header. Fig. 14 shows the construct of an RSVP object. Length (bytes) Class-Num C-Type (Object contents)... Figure 14. RSVP object. As the name suggests, a TLV-structured object consists of a Type field, a Length field, and a variable length Value field. For RSVP, the Type field can be further split into Class- Num field and C-Type field. The Class-Num field specifies the object class, while the C- Type field further qualifies the type within the object class. The Length field specifies the length of the object. Fig. 15 shows SESSION object as an example. As a mandatory object required in every RSVP message, the Class-Num of SESSION object is 1. In [16], SES- SION object is used to identify an RSVP session, the corresponding C-Type is 1. For this purpose, the object contents include the destination address, protocol ID, and destination port number (i.e., application running on the destination for which this connection is being established), as shown in Fig. 15a. With the extension of RSVP to RSVP-TE, a new C- Type was defined (C-Type of 7). The new SESSION object, including destination address,

50 30 tunnel ID, and extended tunnel ID, is used to identify an LSP tunnel. Value Length Type Length (12) Class-Num (1) C-Type (1) IPv4 Dest Address Protocol ID Flags Dst Port Length (16) Class-Num (1) C-Type (7) IPv4 tunnel end point address Must be zero Tunnel ID Extended Tunnel ID (a). SESSION object defined in RFC2205 (RSVP). Figure 15. TLV structure of the SESSION object. (b). SESSION object defined in RFC3209 (RSVP-TE). With RSVP-TE signaling protocols evolve, some fields, parameters are obsolete while more new fields, parameters are introduced. TLV-structured object is flexible, extensible, and suitable for these evolving protocols. In this section, we discuss all mandatory objects in Path, Resv, PathTear, and ResvTear messages FILTER_SPEC/SENDER_TEMPLATE object The FILTER_SPEC/SENDER_TEMPLATE object specifies the source address of an LSP. The former is carried in Resv message with a Class-Num of 10, while the latter carried in Path message with a Class-Num of 11, Fig. 16. C-Type 7 is defined for IPv4 tunnel. FILTER_SPEC/SENDER_TEMPLATE object, together with SESSION object, forms a five tuple <Src_IP_Addr, LSP_ID, Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID>, which uniquely identifies an LSP. Length Class-Num (10/11) C-Type (7) IPv4 tunnel sender address MUST be zero LSP ID Figure 16. FILTER_SPEC/SENDER_TEMPLATE object. The 32-bit IPv4 tunnel sender address is the IP address of the sender, the 16-bit LSP ID is a locally (at the sender) allocated ID for the LSP. The FILTER_SPEC/ SENDER_TEMPLATE object keeps unchanged across the whole LSP FLOWSPEC/SENDER_TSPEC object The FLOWSPEC/SENDER_TSPEC object specifies the required traffic parameters of

51 31 an LSP. The former is carried in Resv message (Class-Num 9), the latter carried in Path message (Class-Num 12), Fig. 17. The FLOWSPEC and SENDER_TEMPLATE objects were first introduced in [16] where the explanations of these two objects are subject to Int- Serv [37]. RSVP-TE (GMPLS) with extensions supporting SONET/SDH introduced a new C-Type (TBA). Length Class-Num (9/12) C-Type (TBA) Signal Type RCC NCC NVC Multiplier (MT) Transparency (T) Figure 17. FLOWSPEC/SENDER_TSPEC object. SONET/SDH signal has a hierarchical structure. Each signal has an elementary signal, which can be further concatenated (contiguously or virtually) to form higher rate signal. The Signal_Type field specifies the type of the elementary signal. The RCC field is used to request for contiguous concatenation. If the RCC field is 1, the contiguous concatenation is used, the NCC field indicates the number of identical elementary signals that are requested to be concatenated. The NVC field indicates the number of signals that are requested to be virtually concatenated. Based on all previous fields, the MT field indicates the number of identical signals that are requested for the LSP, i.e., that form the final signal. These signals can be either identical elementary signals, or identical contiguously concatenated signals, or identical virtually concatenated signals Generalized LABEL object The LABEL object was initially defined in [16] to support the setup of an LSP tunnel, Class-Num 16, C-Type 1. In [13]-[15], the LABEL object is generalized to represent timeslot, wavelength, switch port, etc., as shown in Fig. 18. The generalized LABEL object, carried in Resv message, extends the traditional label by allowing the representation of not

52 32 only labels which travel in-band with associated data packets, but also labels which identify time-slots, wavelengths, or space division multiplexed position. In our application, SONET, a label represents a set of time-slots. Length Class-Num (16) C-Type (2) Label... Figure 18. LABEL object. Each generalized LABEL object carries a variable length label. The format of the label depends on the application. Fig. 19 shows the format of a SONET/SDH label. The explanations of these fields are given in Fig. 20. S U K L M Figure 19. Format of a SONET/SDH label. S U L M STS-3N xn STS-3 SPE 3c MB x3 STS-1 SPE x7 VT Group x2 VT MB DS MB DS2 For example, S>0, U=1, K=0, L=0, M=0 denotes the unstructured SPE of the s-th STS-3 and the 1st STS-1. x3 x4 VT-3 VT MB DS1C 2.048MB E1 VT-1.5 Figure 20. SONET multiplexing hierarchy and generalized label MB DS Generalized LABEL_REQUEST object The LABEL_REQUEST object defines the characteristics of the requested LSP. It was initially defined in [11] to support the setup of an LSP tunnel. In [13]-[15], a generalized LABEL_REQUEST object was introduced, as shown in Fig. 21. The suggested C-Type is

53 33 4. Length Class-Num (19) C-Type (4) LSP Enc. Type Switching Type G-PID Figure 21. LABEL_REQUEST object. The 8-bit LSP Encoding Type indicates the encoding of the LSP being requested. The value for SONET/SDH is 5. The 8-bit Switching Type specifies the type of switching performed on a link. The value for Time-Division-Multiplex Capable is 100 (our implementation is for transit SONET switches, which use TDM technology). G-PID identifies the payload carried by the LSP RSVP_HOP object The RSVP_HOP object was initially defined in [16], Class-Num 3, C-Type 1-2. In [15], in order to support out-of-band signaling (separation of control channel and data channel), a new C-Type, with support for interface ID, was added. The suggested C-Type value is 3. In RSVP, RSVP_HOP object carries the IP address of the RSVP-capable node that sent this message, it is called PHOP ( previous hop ) in downstream messages (Path messages) or NHOP ( next hop ) in upstream messages (Resv messages). When RSVP-TE is extended for GMPLS, in order to support the separation of control channel and data channel, a IF_INDEX TLV is added, Fig. 22. Length (24) Class-Num (3) C-Type (3) IPv4 Next/Previous Hop Address Logic Interface Handle (LIH) Type (3) Length (12) IP Address Interface ID IF_INDEX TLV Figure 22. RSVP_HOP object. The 32-bit IPv4 Next/Previous Hop Address specifies the IP address of the previous

54 34 LSR (in a Path message) or next LSR (in a Resv message) on the control plane. The 32-bit LIH specifies the logical outgoing interface on which the reservation is required. IF_INDEX itself is a TLV. Inside it, the 32-bit IP Address specifies the IP address of the previous LSR or next LSR on the user plane. The 32-bit Interface ID can be used to identify a physical interface SESSION object The SESSION object was initially defined in [16], Class-Num 1, C-Type 1/2. Reference [11] defined C-Types 7/8 for LSP tunnel. In our application, C-Type is 7, meaning IPv4 tunnel, Fig. 23. SESSION object is carried in all four RSVP-TE messages. As mentioned before, SESSION object, together with SENDER_TEMPLATE/FILTER_SPEC object, uniquely identifies an LSP. Length Class-Num (1) C-Type (7) IPv4 tunnel end point address Must be zero Tunnel ID Extended Tunnel ID Figure 23. SESSION object. The 32-bit IPv4 tunnel end point address specifies the destination address of an LSP. The 16-bit tunnel ID and 32-bit extended tunnel ID are used to identify an LSP tunnel. The STYLE and TIME_VALUES objects are also mandatory in [16]. However, these two objects become obsolete in [11][15] when RSVP was extended for MPLS and GMPLS. We include these two objects in the subset because they are mandatory. We expect that the hardware will accept but not process these two objects. Therefore the details of these two objects are omitted Optional objects support Besides the mandatory objects described above, we also support several optional

55 35 objects in the subset, i.e., SUGGESTED_LABEL, MESSAGE_ID and MESSAGE_ID_ ACK. Although these objects are currently optional, we expect them to be widely adopted in implementations of RSVP-TE for GMPLS networks SUGGESTED_LABEL object The SUGGESTED_LABEL object was first introduced in [15] for the purpose of allocating labels in the forwarding direction, Class-Num 129, C-Type 2. The SUGGESTED_LABEL object has the same structure of generalized LABEL object. There is potential conflict in SONET networks if the SUGGESTED_LABEL object in Path message is left as optional. Consider the following scenario. An outgoing interface (STS-12) has four available time slots, A first call requests an STS-3c; the upstream switch tentatively reserves the first three time slots. A second call requesting an STS-1 arrives next, and needs to be routed on this same outgoing interface. The switch will then make a tentative reservation of the remaining time slot. If the Resv message for the second call returns first, and the downstream switch assigns a time slot different from the one tentatively reserved in forward direction, the first call for which a tentative reservation was made can no longer be accommodated because it requires a concatenated assignment. Hence, we recommend that the SUGGESTED_LABEL object be mandatory in Path messages to force the downstream switch to use the label selected by the upstream switch. This is the result of RSVP-TE growing out of RSVP, which was developed as a protocol for receiver-initiated additions to a multicast tree. In GMPLS networks, where hard resource reservations of time slots and wavelengths are necessary, a reservation and corresponding time-slot/wavelength selection need to be made in the forward direction of call setup.

56 MESSAGE_ID/MESSAGE_ID_ACK object The MESSAGE_ID and MESSAGE_ID_ACK objects were defined in [40] to support reliable transmission of RSVP messages, the former with Class-Num 23, C-Type 1, the latter with Class-Num 24, C-Type 1, as shown in Fig. 24. Flags Length Class-Num (23/24) C-Type (1) Epoch Message_Identifier Figure 24. MESSAGE_ID/MESSAGE_ID_ACK object. RSVP-TE runs on top of connectionless, unreliable IP protocol. In [40], a TCP-like mechanism is introduced to provide reliable message transmission. This mechanism includes MESSAGE_ID, MESSAGE_ID_ACK, and exponential back-off retransmission algorithm. Unlike TCP, there is no window-based flow control. This mechanism is defined on a per hop basis. Each RSVP message (Path, Resv, Path- Tear/ResvTear, etc.) includes a MESSAGE_ID object. This MESSAGE_ID object, together with the message generator s IP address, uniquely identifies a message. When a message is received, the contents of the MESSAGE_ID object are copied to a MESSAGE_ID_ACK object, the MESSAGE_ID_ACK object is then sent back to the message generator as an acknowledgement. The MESSAGE_ID_ACK object can be piggy-backed in another RSVP message, or carried in a separate Ack message. Multiple MESSAGE_ID_ACKs can be carried in one RSVP message. If the message generator fails to get back the MESSAGE_ID_ACK before a re-transmission timer expires, the message generator will re-transmit the message. The time-out value of the re-transmission timer doubles per retransmission (exponential back-off).

57 Connection setup and teardown with RSVP-TE signaling messages Switch SW 4 Switch SW 5 Source 8. Tear 1. Path Switch SW 1 7. Tear 6. Tear 2. Path Switch SW 2 3. Path Switch SW 3 5. Tear 4. Path Destina tion 9. Connection (circuit or virtual circuit) established Figure 25. Illustration of connection setup with RSVP-TE messages. Similarly to Fig. 1, Fig. 25 shows the procedure to set up an end-to-end unidirectional connection with RSVP-TE signaling messages. Path messages, carrying the destination address, the required traffic parameters, and other message fields, progress from the source to destination hop-by-hop, and Resv messages travel in the reverse direction, back to the source. The same five steps as mentioned in Chapter 1 are carried out at each switch. Since we support SUGGESTED_LABEL object in our subset, the first three steps are performed in the forward direction, i.e., when Path messages are processed. Upon completion of data exchange, either source or destination can release the connection. The source node can initiate the release procedure with PathTear messages and the destination node can do the same with ResvTear messages. In RSVP-TE, the release procedure is not confirmed.

58 38 Chapter 4 RSVP-TE Procedures for Hardware- Acceleration In Chapter 3, we extracted a subset of RSVP-TE for hardware acceleration. This subset is described in a language of communication protocols. In this chapter, we describe the procedures needed to implement this subset of RSVP-TE. Registers and data tables used in the hardware implementation are defined. Four types of messages are handled in hardware, Path, Resv, PathTear, and ResvTear. As each message is received, it is parsed out and different fields of the objects are dispatched to corresponding registers. We list these registers in Section 4.1. Processing of signaling messages involves reading and writing data tables. We describe these data tables in Section 4.2. In Section 4.3, we use Path message as an example, illustrate the detailed procedures to process a typical RSVP-TE signaling message. 4.1 Registers Registers are commonly used in generic micro-processors, DSPs, ASICs, and other digital VLSIs, to store data temporarily. We define 36 registers to buffer message fields, and intermediate data. We also define 8 configuration registers. Table 2. Data registers. Register Width (bit) Description Msg_Type 8 Message type, Msg Type in Common Header Msg_Len 16 Message length, RSVP length in Common Header Local_Chksum 16 Locally calculated checksum Send_TTL 8 Send TTL, Send_TTL in Common Header Src_IP_Addr 32 Source IP address, IPv4 tunnel sender address in SENDER_TEMPLATE object within Path message or FILTER_SPEC object within Resv message.

59 39 Register Width (bit) Table 2. Data registers. Description LSP_ID 16 LSP ID in SENDER_TEMPLATE object within Path message or FILTER_SPEC object within Resv message. Dest_IP_Addr 32 Destination IP address, IPv4 tunnel end point address in SES- SION object within Path/Resv message Tunnel_ID 16 Tunnel ID in SESSION object within Path/Resv message Ext_Tunnel_ID 32 Extended tunnel ID in SESSION object within Path/Resv message Pre_IP_Addr_Ctrl 32 Previous IP address on control plane, IPv4 Previous Hop Address in RSVP_HOP object within Path message Next_IP_Addr_Ctrl 32 Next IP address on control plane, the result of User/Control Mapping table look-up LIH 32 Logical Interface Handle in RSVP_HOP object within Path message Pre_IP_Addr_User 32 Previous IP address on user plane, IP Address in IF-INDEX TLV within Path message Pre_IF_ID_User 32 Upstream node output interface ID, Interface ID in IF-INDEX TLV within Path message Next_IP_Addr_User 32 Next hop IP address, the result of Routing table look-up Seq_Num 4 Sequence number of the links between two neighboring LSRs. Incoming_IF_ID_User 32 Incoming interface ID, the result of Incoming Connectivity table look-up Outgoing_IF_ID_User 32 Outgoing interface ID, the result of Outgoing Connectivity/CAC table look-up Incoming_PHY_ID 8 Incoming physical interface ID Outgoing_PHY_ID 8 Outgoing physical interface ID Incoming_Assigned_T 12 time-slots assigned to a certain incoming logical link imeslots Outgoing_Assigned_T 12 time-slots assigned to a certain outgoing logical link imeslots Incoming_Label(s) 32 Generalized label(s) for the incoming interface, locally allocated Outgoing_Label(s) 32 Generalized label(s) for the outgoing interface, received from downstream LSR Signal_Type 8 Signal Type in SENDER_TSPEC object within Path message or FLOWSPEC object within Resv message RCC 8 Requested Contiguous Concatenation in SENDER_TSPEC object within Path message or FLOWSPEC object within Resv message NCC 16 Number of Contiguous Concatenation in SENDER_TSPEC object within Path message or FLOWSPEC object within Resv message NVC 16 Number of Virtual Concatenation in SENDER_TSPEC object within Path message or FLOWSPEC object within Resv message MT 16 Multiplier in SENDER_TSPEC object within Path message

60 40 Register Width (bit) Table 2. Data registers. Description State 4 Current state of an LSP Avail_BW 12 Available bandwidth, the result of Outgoing CAC table look-up LSP_Enc_Type 8 LSP Encoding Type in LABEL_REQUEST object within Path message Switching_Type 8 Switching Type in LABEL_REQUEST object within Path message G-PID 16 Generalized PID in LABEL_REQUEST object within Path message Refresh_Period 32 Refresh Period in TIME_VALUES object within Path message Style 5 Option vector in STYLE object within Resv message Registers Width (bits) Table 3. Configuration registers. Init value Meaning Init_LSP_Enc_Type 8 5 SDH ITU-T G.707/SONET ANSI T1.105 Init_Switching_Type Time-Division-Multiplex Capable (TDM) Local_IP_Addr_Ctrl 32 - Local IP address on the control plane Local_IP_Addr_User 32 - Local IP address on the user plane Local_MAC_Addr_ctrl 48 - Local Ethernet MAC address on the control plane Supported_ST 8 1 The cross-connect rate of the switch - 1 means OC1 TO 32 - The initial time-out value of the re-transmission timer PTO 32 - The time-out value of the piggy-back timer 4.2 Data tables Network of IP routers (not necessarily the Internet; could be a specially designed IP network just for signaling traffic) Signaling engine Client Server Signaling engine Switch Signaling engine Signaling engine Serverto-client SONET unidirectional circuit Logical link Switch I Switch II Switch III Physical interface: 5 Physical interface: 5 Physical Physical interface: 7 interface: 2 Figure 26. Network used for illustrative examples. Fig. 26 is an illustrative network that shows: (i) switches can be connected via many

61 41 user-plane interfaces (ii) logical links can be provisioned between non-adjacent switches (iii) a signaling engine may have many signaling links leading to the connectionless signaling network (IP network). We will extensively refer to this figure when we discuss the data tables Routing table Table 4. Routing table. Index Return value Network_Prefix Subnet_Mask Next_IP_Addr_User The Routing table, as shown in Table 4, is initialized and maintained by a software routing module (for example, an implementation of OSPF-TE for GMPLS). It is read-only to the hardware signaling accelerator. The Next_IP_Addr_User is the user-plane IP address for the next hop switch through which the destination IP address can be reached. Look-up of routing table is a longest prefix match. In general, a routing table can have alternative next hop switches. However, we do not implement multiple routing table lookups in the hardware signaling accelerator for simplicity reasons. It attempts only one route; if resources are not available, it will pass off the message to the software signaling process. For a subsequent release, we will re-consider this design decision Incoming Connectivity table Table 5. Incoming Connectivity table. Index Pre_IP_Addr_User Pre_IF_ID_User Incoming_IF _ID_User Return value Incoming_PHY_ID Incoming_Assign ed_timeslots The Incoming Connectivity table, as shown in Table 5, is initialized and updated by a software link management module. Similar to the Routing table, it is read-only to the hardware signaling accelerator. It shows how the interface number used by a neighbor maps to

62 42 an incoming interface number and the corresponding physical interface. The incoming assigned time-slots column is a bit-vector with 1 s for the time-slots assigned to this logical interface and 0 s for other time-slots. Using the example shown in Fig. 26, the incoming connectivity table at switch II will have entries as shown in Table 6. Here we assume that time-slots 1-3 of the physical interface 5 are terminated at switch II while the remaining time-slots 4-12 (we assume all interfaces are STS-12s) are used for the logical link passing through switch II to terminate at switch III. On the other hand, we assume that all 12 time-slots of physical interface 2 terminate at switch II. In cases where all time-slots of a physical interface terminate at a switch, we use the same number for physical interface as for the interface ID used in signaling messages. Otherwise, a logical interface number that maps to a physical interface and time-slots is used in signaling messages to identify logical links. Table 6. Incoming Connectivity table at switch II in Fig. 26. Index Return value Pre_IP_Addr_User Pre_IF_ID_User Incoming_IF _ID_User Incoming _PHY_ID Incoming_Assign ed_timeslots Switch I IP address Switch I IP address Outgoing Connectivity table Table 7. Outgoing Connectivity table. Index Next_IP_Addr_User Seq_Num Outgoing_IF _ID_User Return value Outgoing _PHY_ID Outgoing_Assign ed_timeslots The Outgoing Connectivity table, Table 7, shows the outgoing interfaces leading to neighboring switches. It also maintains mapping of logical interface IDs to physical interface IDs and corresponding time-slots. Since a switch may have multiple links to a neighbor, there may be many rows in this data table corresponding to a single

63 43 Next_IP_Addr_User. Hence we have the Seq_Num column allowing the hardware signaling accelerator to search this table multiple times until it finds an interface to the neighboring switch with sufficient available resources or exhausts all interfaces to the neighboring switches. An example Outgoing Connectivity table for switch I in Fig. 26 is shown in Table 8. Time-slots 1-3 on physical interface 5 at switch I terminate at switch II while the remaining are routed through as a logical link to switch III. Table 8. Outgoing Connectivity table for switch I in Fig. 26. Index Next_IP_Addr_User Seq_Num Outgoing_IF _ID_User Return value Outgoing_ PHY_ID Outgoing_Assign ed_timeslots Switch II IP address Switch II IP address Switch III IP address Outgoing CAC table Table 9. Outgoing CAC table. Index Outgoing_PHY_ID Return value Avail_BW The Outgoing CAC table, Table 9, maintains the available time-slots for each physical outgoing interface. Table 10 shows an example Outgoing CAC table. Table 10. Outgoing CAC table for switch I in Fig. 26. Index Return value Outgoing_PHY_ID Avail_BW Avail_BW is a bit-map of the available time-slots on a certain outgoing interface. 1 means the corresponding time-slot is available while 0 means it is not available. Fig. 27 shows an example of Avail_BW. A request for STS-1 can be satisfied but a request for

64 44 STS-3c cannot, because even though there are 8 time-slots available, the contiguous concatenation requirement cannot be satisfied. Reserve time-slots means setting the corresponding bits to : Available 0: Not available (reserved) Figure 27. Example of Avail_BW User/Control Mapping table Table 11. User/Control Mapping table. Index Next_IP_Addr_User Return value Next_IP_Addr_Ctrl Unlike packet-switched MPLS networks, where implicit in-band signaling is used and there is one-to-one association between control channels and data links, circuit-switched networks may require out-of-band signaling and hence a separation of control channels from corresponding data links. The User/Control Mapping table, Table 11, is used to map user plane IP addresses of neighboring switches to the corresponding control plane IP addresses. Our implementation supports multiple data links between two neighboring LSRs, as shown in Fig. 26. A switch may also have multiple control plane interfaces as shown in Fig. 26 for switch II. We assume that load balancing of signaling load across multiple control plane interfaces will be done outside the signaling module and appropriate data downloaded to this User/Control Mapping table. We assume that there is only one Next_IP_Addr_Ctrl associated with each next-hop switch.

65 State table Table 12. State table. Index (Global connection reference) Control plane Return value Data plane Src_ IP_ Addr LSP _ID Dest _IP_ Addr Tun nel_ ID Ext_ Tunn el_id Pre_IP _Addr _Ctrl LIH Next_I P_Add r_ctrl Pre_IP _Addr _User Pre_I F_ID _User Incomi ng_ass igned_ Timesl ots Outgoi ng_ass igned_ Timesl ots Return value (con t) Data plane (con t) Incoming _IF_ID_ User Incoming_ PHY_ID Incoming _Label(s) Next_I P_Add r_user Outgoing _IF_ID_ User Outgoing _PHY_I D Outgoing _Label(s) Traffic _Spec State The State table, as shown in Table 12, is the most complex of all the data tables. The State table consists of two parts: the index part, which includes the Src_IP_Addr, LSP_ID, Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID, and uniquely identifies an LSP, and a return value part, which consists of many parameters about the LSP. 4.3 Procedures After an RSVP-TE signaling message is parsed, fields from within objects are sent to the registers defined in Section 4.1. The parsed message is then processed according to its message type. In this process, the data tables defined in Section 4.2 are referred and updated. In this section, we use Path message as an example to detail the processing of an RSVP-TE signaling message. Inside the Path message, we focus on the processing of SESSION object. Fig. 28 shows the processing of common header. After a message is completely received, the checksum and the protocol version are verified. If no error detected, the fields in the common header are saved in registers. Then the message is further processed according to the decoded message type. In this example, a Path message is detected

66 46 (Msg_Type=1). Message arrives Calculate Local_Chksum Local_Chksum= RSVP Checksum? No Yes Vers=1? No Yes Msg_Type<- Msg Type Msg_Len<- RSVP Length Send_TTL<- (Send_TTL-1) Msg_Type=? Other values Path message Resv message PathTear message ResvTear message Figure 28. Processing of Common header. The body of a Path message consists of several self-contained objects. Fig. 29 shows the processing of object header. The Class-Num, C-Type, and Length of the object are extracted. The object is further processed based on its type. In this example, it is a SES- SION object (Class-Num=1). Path message Obj_Class<- Class-Num Obj_Type<- C-Type Obj_Len<- Length Obj_Class=? Other values SESSION RSVP_HOP TIME_VALUES SENDER_TEMPLATE SENDER_TSPEC Figure 29. Processing of Path message. LABEL_REQUEST

67 47 Fig. 30 shows the processing of SESSION object. First the C-Type is verified, only C- Type 7 (IPv4 tunnel) is supported in the subset we defined. Then the fields inside the object are extracted and sent to corresponding registers. The next step is to look-up the Routing table with the given destination IP address. If a match is found, the returned value (the next-hop IP address) is used to search the Outgoing Connectivity table to find the corresponding outgoing interface. Then the Outgoing CAC table is referred to get the available bandwidth on that interface. With the bandwidth information fetched from the CAC table and the required traffic parameters extracted from SENDER_TSPEC object, bandwidth (time-slots) reservation is attempted. If it succeeds, the reserved time-slots are marked, the Outgoing CAC table is updated, and new SUGGESTED_LABEL object is created.

68 48 SESSION C-Type supported? No yes (7) Dest_IP_Addr<- IPv4 tunnel end point address Tunnel_ID<- Tunnel ID Ext_Tunnel_ID<- Extended Tunnel ID From SENDER_TEMPLATE F(State table, <Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID, Src_IP_Addr, LSP_ID>)? yes No Next_IP_Addr_User<- F(Routing table, Dest_IP_Addr) Find next hop? No Yes (Outgoing_IF_ID_User, Outgoing_PHY_ID, outgoing_assigned_timeslots) <- F(Outgoing Connectivity table,<next_ip_addr_user, Seq_Num>) Find a match? No Yes Avail_BW<- F(Outgoing CAC table, Outgoing_PHY_ID) From SENDER_TSPEC Avail_BW* satisfies <Signal_Type, MT, NCC>? Yes Update Avail_BW (Set corresponding bits to 0's) Update Outgoing CAC table (Write back Avail_BW) Assign Suggested_Label Done Figure 30. Processing of SESSION object. As mentioned in Chapter 1, setting up a connection involves five steps at each switch. The first three steps, i.e., determining the next-hop switch, checking for the availability of and reserving required resources, and assigning labels for the connection, are accomplished when a Path message is processed.

69 49 All objects done? No Update State table Assemble new Path message Transmit to next-hop Figure 31. At the end of Path message processing. After all objects are successfully processed, the State table is updated, a new Path message is assembled and transmitted to the next-hop address (control plane), as shown in Fig. 31.

70 50 Chapter 5 An FPGA-Based Hardware Signaling Accelerator As mentioned before, complexity and the requirement for flexibility are two challenges for hardware implementations of signaling protocols. In Chapters 3 and 4, we defined a subset of RSVP-TE for hardware acceleration, with the part beyond this subset relegated to software. This approach can partly solve the challenge of complexity. In order to address the challenge of flexibility, we propose to use re-configurable hardware, i.e., FPGAs, as the hardware platform for our RSVP-TE implementation. These devices are a compromise between general-purpose processors used in software implementations at one end of the flexibility-performance spectrum, and ASICs at the opposite end of this spectrum. The latest FPGA devices, such as Xilinx Virtex-II [41][42], even support runtime partial re-configuration [43]. We can re-configure FPGAs with updated versions of implementations as signaling protocols evolve while significantly improving the performance relative to software implementations on general-purpose processors. Based on the subset defined in Chapter 3 and the detailed descriptions given in Chapter 4, we discuss an FPGA based hardware signaling accelerator in this chapter. In Section 5.1 we give an architecture-level view of the hardware signaling accelerator. We then detail various functional modules Section 5.2. We present the implementation and simulation results in Section 5.3. Finally, we discuss the issue of re-configurability in Section 5.4.

71 The architecture of the hardware signaling accelerator Fig. 32 illustrates the architecture of the hardware signaling accelerator. It consists of three stages, message parsing, message processing, and message assembling. In the message parsing stage, signaling messages are buffered in incoming message buffer for checksum verification. Meanwhile, message objects are checked and fields are dispatched to different registers in the register bank. In the message processing stage, message objects are processed in parallel. Finally, appropriate objects are re-assembled into a new message in the message assembling stage. The new message is then sent to the next switch. These three stages are fully pipelined to achieve high throughput. FIFO Interface Cross-connect Interface GbE Interface Message Parsing Incoming Msg Buf Object Dispatcher Message Assembler Message Assembling Message Processing Resource CAC Table Management Register Bank Data Table Management Retransmission Management TCAM & SRAM Intefaces SRAM I/F PCI Local Bus Interface Figure 32. Architecture of the hardware signaling accelerator. 5.2 Functional modules In this section, we discuss the functional modules illustrated in Fig Register bank At the center of Fig. 32 we show a register bank. In Chapter 4, we described the registers that are necessary for our RSVP-TE implementation. Since message parsing, message processing, and message assembling are fully pipelined, each stage must be equipped with

72 52 a set of registers. These three sets of registers form the register bank. (a) Regular pipelining. Register Bank Reg. Set 1 Reg. Set 2 Reg. Set 3 Data flow arbitrator Message Dispatching Message Processing Message Assembling (b) Round-robin style pipelining. Figure 33. Dynamic round-robin style pipelining. Unlike in a regular pipelining scheme where each stage is attached to a fixed set of registers, we propose a dynamic round-robin style pipelining scheme, as shown in Fig. 33. In this scheme, a message, instead of a pipelining stage, is associated with a set of registers. Each register set has 3 flag bits indicating the pipelining stage of the register set and associated signaling message. Similarly, each pipelining stage has 3 flag bits indicating the register set associated with the message it is currently handling. When a message enters the next pipelining stage, the associated flag bits change in a round-robin style. The flag bits for each pipelining stage change in a similar round-robin style, but in the reverse direction. For example, in Fig. 33, register sets 1, 2, and 3 are associated with three signaling messages, which are in message assembling, message parsing, and message process-

73 53 ing stages respectively. When all stages are done with current messages, register sets enter next stage, all flag bits are then updated. All flag bits are connected to a data flow arbitrator, which controls the bidirectional data flows between the register bank and three message handling units. Inside the data flow arbitrator, there are 3 flag bits indicating the status of the three pipelining stages. Only when all three stages are ready, the round-robin style flag bits rotate. This scheme avoids data transfers between stages. It thus helps to reduce processing delay and to increase throughput. The cost of this scheme is the extra arbitration logic and flag bits Input message buffering system 64K x 32 External Message Buffer (FIFO) 256 x 32 Internal Message Buffer (Dual-Port RAM) Msg Buf Mgmt. Module Message arrives Object Dispatcher XC2V3000 FPGA Figure 34. Two-level of input message buffering. In Fig. 32, parallel to the object dispatcher is an incoming message buffer. When a message is received and dispatched to registers, it is simultaneously sent to this buffer, which is a dual-port RAM inside the FPGA. The message is held in the buffer until it is successfully processed, at which point it is flushed out. If any error occurs in the message parsing or processing stage, such as the presence of unknown message/object/field, routing table look-up failure, resource reservation failure, the message is moved to an external FIFO. The internal dual-port RAM, the external FIFO, and the related message buffer

74 54 management module (also inside the FPGA) constitute a two-level message buffering system, as shown in Fig. 34. The internal dual-port RAM buffers the messages in dispatching and processing stages, while the external FIFO stores the messages that cannot be handled by hardware signaling accelerator and need software intervention. The external FIFO, which we will discuss in Chapter 6, works as an interface between the hardware signaling accelerator and the software signaling process. 4-byte Segment 0 Segment 1 Segment 2 Segment Error S2 S1 Free Figure 35. Buffer management module. The internal dual-port RAM makes use of the BlockRAM resource in the Virtex II FPGA device. The memory space is equally partitioned into four segments. Each segment is 256-byte ( bit ), large enough to contain a typical RSVP-TE message. Although at a given instant, there are only two messages in the parsing and processing stages, there could be at most two error messages waiting to be transferred to the external FIFO. Therefore the internal message buffer has four segments. The messages in the buffer are aligned to 256-byte segments instead of being stored continuously. This approach simplifies buffer management by avoiding buffer fragmentation. Each segment can be in one of four possible states, Free, in parsing stage (S1), in processing stage (S2), or Error. Correspondingly, we assign four flag bits to each segment, as

75 55 shown in Fig. 35. Although the flag bits are physically associated with each buffer segment, the Free bits and the Error bits from all segments are logically organized into two separate queues. The Free segment queue indicates which segment is available to accommodate a new message. The Error segment queue indicates the segments waiting to be moved to the external FIFO. The queue pointers advance in round-robin style. In Fig. 35, The message buffer management module maintains the flag bits, queues, and control the write/read operation to/from the internal dual-port RAM, and the write operation to the external FIFO Two-level object dispatching SESSION field dispatcher RSVP_HOP field dispatcher Distributed decoding Object dispatcher Unknown object processor Two levels of dispatchers Figure 36. TLV style object processing. The TLV structure offers the flexibility required to introduce new objects as needed, which is essential to protocols that keep evolving, like RSVP-TE. However, it impacts message parsing for parameter extraction adversely, making it difficult to implement in hardware. For example, when processing an IP packet header, the hardware can always extract the 5 th word from the IP header to obtain the destination IP address. But in RSVP- TE, since the SESSION object carrying the destination IP address can occur anywhere in a

76 56 message, the hardware cannot extract the destination IP address from a fixed location. To address the challenge imposed by the flexible TLV structure, we propose a solution with two-level dispatching and distributed decoding. Fig. 36 illustrates this solution. It is the RTL circuit generated by the Synplify synthesis tool from a VHDL model of our solution. There are two levels of dispatchers, a object dispatcher, and nine field dispatchers for each object type. The object dispatcher mainly consists of two registers and two counters. The registers store the lengths of the message and the object being read. Two counters are used to count the number of words received in an incoming message, one for the whole message and the other for the current object. The message length counter and the message length register are used to delimit a message. Similarly, objects are delimited according to the object length register and the object length counter. The delimited object is sent to all field dispatchers, while only the field dispatcher matching the object type is triggered (distributed decoding). The fields in the matched object are then dispatched into corresponding registers. The unknown object processor captures all unsupported objects. If a message contains such an object (i.e., one outside the set of objects defined in the RSVP- TE subset), the message should be passed to the software signaling process Data table management The processing of an RSVP-TE signaling message involves accessing multiple data tables. Among the six data tables we defined in Chapter 4, the Outgoing CAC table, which we will discuss in the next section, is located inside the FPGA completely. All other data tables, including the Routing table, Incoming Connectivity table, Outgoing Connectivity table, User/Control Mapping table, and State table, reside in an external Ternary Content Addressable Memory (TCAM) device. Some data tables, such as the Routing table, User/

77 57 Control Mapping table, and State table, have extra data stored in an external Static RAM (SRAM) device associated with the TCAM. The organization of these data tables inside the TCAM and SRAM devices will be discussed in Chapter 6. Since multiple data tables share the same TCAM and SRAM devices, and since in the message processing stage, different objects are processed concurrently possibly requesting simultaneous access to different data tables, a data table management module is designed to arbitrate and sequence the requests for different data tables. State table (Read) Routing table Incoming Connectivity table User/Control Mapping table Outgoing Connectivity table State table (Write) Figure 37. Dependencies among the data tables. The data table management module in the FPGA (see Fig. 32) implements a priority arbitrator. We have three guidelines to set the priority. First, if the look-up of a data table results in data that is used for look-ups in other data table, the former has higher priority. Fig. 37 shows the dependencies among the data tables. From Fig. 37 we can conclude that Routing table has higher priority over User/Control Mapping table and Outgoing Connectivity table. Second, if the look-up of a data table has follow-up look-ups while another data table does not, the former has higher priority. For this reason, Routing table has higher priority over Incoming Connectivity table, in which case no other table involved before finally updating the State table. Third, if a data table has more follow-up operations, it has higher priority. For this reason, Outgoing Connectivity table has higher prior-

78 58 ity over User/Control Mapping table. There is no further operation after the access of User/Control Mapping table, while the time-consuming resource-reservation operation follows the access of Outgoing Connectivity table. High Priority Low Req State table Gnt Req Routing table Gnt Outgoing Req Connectivity table Gnt Incoming Req Connectivity table Gnt User/Control Req Mapping table Gnt Priority Arbitrator TCAM_Req Table_Idx Table Index State table 000 Routing table 010 Outgoing Conn table 011 Incoming Conn table 100 User/Ctrl Mapping table 101 Figure 38. Priority arbitrator for TCAM. Fig. 38 shows the priority arbitrator for the TCAM device. It has two interfaces. On the left side, it interfaces to the other part of the hardware signaling accelerator, accepting requests and sending out grant signals. Based on the priority, at any moment only one of the requests can be satisfied. On the right side, it interfaces to the TCAM device. If there is any request for a data table, the TCAM_Req is active. Since multiple data tables reside in the same TCAM device, and typically a TCAM device is partitioned into segments, the data tables can be naturally fitted into different segments. The table index indicates the segment of the TCAM in which each data table resides; it does not necessarily reflect the priority. The TCAM interface is very complex, and we only describe the priority arbitrator related signals in Fig Resource (bandwidth) management Signaling protocols are used in connection-oriented networks to set up and tear down connections. As mentioned in Chapter 1, five steps are involved at each switch to set up a connection. After the first step, i.e., route look-up, a switch checks the availability of resources (bandwidth and optionally buffer space) on the determined output interface. If

79 59 resources are available, the switch reserves the required resources. When tearing down a connection, the switch releases the allocated resources CAC table Bandwidth Allocator 0 & & & & & STS-12c STS-3c STS-1 Figure 39. Resource management module. The hardware signaling accelerator targets SONET switches, in which case resources are the time-slots on the output interface. Fig. 39 shows the resource management module implemented as part of the hardware signaling accelerator. It consists of a CAC table and a bandwidth allocator. The CAC table maintains the available time-slots on each output interface. Our target switch fabric has 64 output interfaces (see Chapter 6), each with 12 time-slots (STS-12 interface). Therefore the CAC table has 64 entries, each with a bit-vector of 12 bits. A bit value 1 indicates that the corresponding time-slot is available or otherwise not available. The CAC table is small enough to fit into on-chip memory in the FPGA. This is also desirable since the CAC table is tightly coupled with the bandwidth allocator, in the sense that each time the CAC table needs to be accessed twice: the bit-vector is first read out from the CAC table, after resource reservation or releasing, the updated bit-vector is then written back to the CAC table. A hierarchical bandwidth allocator, which consists of a STS-1 designator, a STS-3c designator, and a STS-12c designator, is designed because of the hierarchy and concatenation features of SONET signals. There are four 3-input AND gates between STS-1 designator and STS-3c designator and one 4-input AND gate between STS-3c designator and STS-12c designator. A priority decoder (MSB has the highest priority) is used to select

80 60 time-slots as per request. For example, if there is a request for 3 STS-1s on output interface 25 ( ), we first get the entry corresponding to output interface 25 in the CAC table. The entry is then copied to the bandwidth allocator. The designator corresponding to STS-1 is checked and three time-slots are reserved, as shown in Fig. 40a. If the request is for an STS-3c, the STS-3c designator is checked and time-slots reserved. If an STS-12c is requested, the STS-12c designator determines that this request cannot be satisfied. After resource reservation, the updated bit-vector is written back to the CAC table. STS-12c STS-3c STS-1 3 STS-1 STS MSB LSB MSB LSB MSB LSB MSB LSB Before reservation After reservation Before reservation After reservation (a) Priority-decoder based resource allocator. (b) Round-robin style resource allocator. Figure 40. Comparison of resource allocation schemes. A priority decoder offers the best opportunity that the contiguous concatenation at the higher level of hierarchy will not be broken by allocating signals at the lower level. For example, in Fig. 40b, an allocation scheme other than priority decoding is used, resulting in no availability at the STS-3c level. A new request for STS-3c cannot be fulfilled, though there are enough time-slots available Re-transmission management Timers are required to support the solution proposed in RFC 2961 [40] for reliable message transmission. As discussed in Chapter 3, this solution requires every message to carry a MESSAGE_ID object, which is acknowledged by MESSAGE_ID_ACK object carried in an Ack message or piggy-backed in a message in the reverse direction. If the MESSAGE_ID_ACK is not received before a retransmission timer times out at the sender, the message is re-transmitted. A second timer, which we call piggyback timer, is used to

81 61 hold MESSAGE_ID_ACK objects awaiting a message to be sent in the reverse direction to avoid unnecessary Ack messages. An Ack message is generated only if this timer expires. Message Time tag Tail Unacknowledged message buffer Buffer for the 2nd retransmission Tail MSG_ID_ACK Time tag Buffer for MSG_ID_ACKs Tail Head Head Head Neighbor 0 Neighbor k timer 0 timer 2 p 0 p k Transmitting side Receiving side Figure 41. Re-transmission management (buffers and timers). Fig. 41 illustrates the proposed re-transmission management scheme. The hardware signaling accelerator maintains a system timer,, which provides system timing for all other timers. On the transmitting side, when a signaling message is sent out, the message, together with a time tag marking the transmitting time (the value at the moment the message is transmitted), is also copied to the unacknowledged message buffer. The buffer is composed of equal sized blocks, one block for one message, and organized as queue (first in first out). Therefore the head of the queue always contains the oldest message. The time tag of the head message is copied to the retransmission timer, timer 0. Assuming the initial time-out value is TO (this value is kept in a register and set at initialization time), when t SYS timer 0 + TO, the re-transmission timer times out, the message is retransmitted, and copied to the buffer for the first re-transmission. The first re-transmission buffer is organized in a similar way, with a retransmission timer, timer 1, and a time-out value of 2TO (exponential back-off). According to [40], we support at most three re-transmissions. On the receiving side, each received message must be acknowledged with a t SYS t SYS

82 62 MESSAGE_ID_ACK object. The acknowledgment can be delayed (bounded by the piggyback time-out value) so that a MESSAGE_ID_ACK object can be piggybacked in a message in the reverse direction, or multiple MESSAGE_ID_ACKs can be packed into one Ack message. For this purpose, one buffer is allocated per neighbor. Each entry in buffer i is a MESSAGE_ID_ACK object destined for the neighbor i. Associated with each entry is a time tag indicating the time at which a message carrying the corresponding MESSAGE_ID is received. Entries in each buffer are time ordered given the FIFO nature of these buffers. For each neighbor i, we maintain a piggyback timer, which keeps the time tag value of the head entry. Assuming the piggyback time-out value is PTO (this value is kept in a p i register and set at initialization time), when t SYS p i + PTO, the piggyback timer times out. If this happens or the buffer is full, an Ack message is generated to carry all MESSAGE_ID_ACKs in the buffer. After these MESSAGE_ID_ACKs are flushed from the buffer, the piggyback timer p i will be reset to the time tag value of the next available head entry. If there is a message destined to neighbor i before times out, the outstanding MESSAGE_ID_ACKs for this neighbor can be piggybacked and sent with the message (within, of course, the maximum message length limit). The re-transmission timers and piggyback timers work in a similar way. However, important differences do exist. There are three re-transmission timers (and associated queues) in the system, each corresponding to one re-transmission attempt. The time-out value doubles each time. On the other hand, the number of piggyback timers depends on the number of neighbors, and each neighbor has a separate piggyback timer and queue. The time-out values of these piggyback timers are the same. p i

83 Implementation and simulation results As a proof of concept, we developed a prototype VHDL model for the hardware signaling accelerator, used Synplify for synthesis and Xilinx ISE for placement and routing. The implementation uses 12% of the FPGA resources (Xilinx XC2V3000) without the PCI core, which we plan to include on the FPGA. Table 13 shows the detailed implementation results. Table 13. Implementation results. Device PCI core Resource Eq. Gates Max freq. XC2V3000 w/o PCI 12% 360,000 90MHz w/ PCI 21% 630,000 50MHz We performed timing simulations of the hardware signaling accelerator using Model- Sim simulator. Fig. 42-Fig. 44 show the simulation results for processing of Path, Resv, and PathTear messages, respectively. Processing the Path message, which involves the access and updating of the data tables, takes 37 clock cycles. The time to receive a Path message (a message is parsed while it is being received) is 40 clock cycles, as is the time to transmit the outgoing Path message (a message is transmitted while it is being assembled). Receiving, processing, and transmitting all other messages each takes no more than 40 clock cycles. A detailed breakdown of the typical processing time for each message is shown in Table 14. Table 14. Clock Cycles to receive an RSVP-TE message. Clock cycles Path Resv PathTear/ ResvTear Because the parsing, processing, and assembling stages are fully pipelined, idle cycles are inserted if the stage is not ready. As a worst-case estimate, the total time for parsing,

84 64 processing, and assembling a single message consumes 120 clock cycles, which is 2.4 microseconds with a 50MHz clock. Since connection setup/release requires the handling of three signaling messages, we require a total of 7.2 microseconds per call. The call handling capacity is as high as 400, 000 calls/sec because of pipelining, which allows the system to accept a new message every 40 clock cycles. CAC table TCAM interface SRAM interface Path message Previous Routing stage ready table Incoming Conn table Outgoing Conn table U2C Mapping table State table End of processing Figure 42. Processing of Path message. Switch fabric interface TCAM interface SRAM interface Resv message Previous stage ready Read State table Program switch fabric End of processing Figure 43. Processing of Resv message.

65 CAC table TCAM interface SRAM interface PathTear message Previous stage ready Read State table Release allocated timeslot End of processing Figure 44. Processing of PathTear message. 5.

85 65 CAC table TCAM interface SRAM interface PathTear message Previous stage ready Read State table Release allocated timeslot End of processing Figure 44. Processing of PathTear message. 5.4 Re-configurability The reason we choose FPGA as the hardware platform for the signaling accelerator is its re-configurability. The FPGA device we use, Xilinx Virtex-II, can support runtime partial re-configuration, meaning we can re-configure it even when it is working in the field and we can selectively re-configure part of the device and keep the other part unchanged. For example, our hardware signaling accelerator targets SONET switches. Specifically, it is designed for Vitesse 64x64 switch fabric, with STS-12 interfaces and a STS-1 cross-connect rate. Accordingly, the Incoming/Outgoing Connectivity tables each have 64 entries, corresponding to 64 interfaces. The CAC table also has 64 entries for 64 outgoing interfaces, each with 12 bits for 12 time-slots in an STS-12 signal. The bandwidth allocator consists of 3-level hierarchy, the lowest level corresponding to STS-1 (the cross-connect rate), the highest level corresponding to STS-12c (the maximum interface rate). If we are going to support a 512x512 switch fabric with OC-192 interfaces and OC-3c crossconnect rate (such as Sycamore SN16000 [44]), we need to expand the Incoming/Outgoing Connectivity tables, and CAC table to 512 entries. In CAC table, each entry has 64 bits,

A hardware implementation of a signaling protocol

A hardware implementation of a signaling protocol Haobo Wang, Malathi Veeraraghavan and Ramesh Karri * Polytechnic University, New York ABSTRACT Signaling protocols in switches are primarily implemented