Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks O. Liboiron-Ladouceur 1, C. Gray 2, D. Keezer 2 and K. Bergman 1 1 Department of Electrical Engineering, Columbia University, 530 West 120 th Street, New York, New York 10027 2 School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0250 Introduction In large scale high-performance computing systems with physically separated processing and memory elements, one of the most critical challenges is achieving low-latency data exchange among potentially thousands of terminals. The insertion of an optical interconnection network provides ultra-high capacity communications as well as the opportunity to reduce latency by utilizing the wavelength domain for bitparallel message transmission. This further eliminates the delay associated with the parallel-to-serial and serial-to-parallel conversions necessary in current interconnect schemes [1]. In this poster, we demonstrate the routing of WDM bit-parallel messages through a complete 12µ12 optical packet switching (OPS) fabric. The OPS network designed for ultra-high capacity processor-memory interconnection is a timeslotted self-routing deflection architecture based on the data vortex topology [2]. In this highly scalable OPS network, congestion is locally resolved through distributed traffic control signaling eliminating the need for internal optical buffering elements and enabling the transparent transmission of hybrid data structures such as WDM bit-parallel. System overview Two ports (source and destination) are emulated by a CMOS FPGA-based digital logic core (DLC) terminal designed to interface directly with the optical network s physical layer signaling protocol. Providing a stand-alone programmable and flexible bit-parallel message processor, the terminal emulator maps generated data bytes onto sequences of 32 optical packets, each containing four-wavelength WDM bit-parallel messages of 400 ps duration, along with lower bit-rate header and frame signals for proper routing by the data vortex switching nodes. The PECL level electrical signals are precisely aligned in time with a 1.25 GHz source-synchronous global reference clock added to the messages for data sampling and processing at the destination node. The signal conversion interface between the terminal emulator and the OPS network is transparent, fully exploiting the available transmission bandwidth while delivering ultra-low latencies in a WDM bit-parallel format. The bit-parallel word channels and the global reference clock channel are encoded using LiNbO 3 modulators. The header and frame control signals are encoded by directly modulating WDM cooled distributed-feedback lasers. All channels are multiplexed over one fiber to form an optical packet with guard time to allow for routing transients and dead time to distinguish each packet. At the destination node, the bit-parallel channels and the global reference clock are demultiplexed and converted to PECL level electrical signals using commercial optical receiver modules. The terminal emulator uses the global reference clock and the frame signals to process the packet and measure bit-error-rates. Results and conclusions Routing of WDM bit-parallel messages that emulate processor/memory data exchange is demonstrated through the data vortex interconnection network. The high-speed digital terminal emulator specifically designed to directly interface with the optical network transmits and captures a sequence of 32 messages each consisting of four bit-parallel 2.5 Gbps data wavelength channels, a 1.25 GHz reference clocking signal, and five lower-bit rate routing control signals. The interconnection system can potentially reduce processor to memory access times in large scale computing systems by exploiting the high degree of parallelism afforded by WDM. References 1. M.L. Loeb and G.R. Stilwell, High-speed data transmission on an optical fiber using a byte-wide WDM system, J. Lightwave Technol., vol. 6, pp. 1306-11, Aug. 1988. 2. B.A. Small, O. Liboiron-Ladouceur, A. Shacham, J.P. Mack, K. Bergman, Demonstration of Complete 12-Port Terabit Capacity Optical Packet Switching Fabric, OFC 2005, Mar. 2005.
Low Latency Message Exchange E/O interface Bit-parallel FPGAbased Digital logic core O/E interface Bit-parallel E/O interface Routing Signal DEMUX I/O Port A λ GLOBAL CLOCK Bit_Parallel [0] Bit_Parallel [1] Bit_Parallel [2] Bit_Parallel [3] FRAME ROUTING ADDR. [0..3] t MUX OPS OPS network network I/O Port B
Physical Layer Interconnection Protocol Packet Slot Time (64 x 400 ps = 25.6 ns) Frame/ Header Global Clock Guard time (2.0 ns) Pre-Clocks Guard time (2.0 ns) Post-Clocks Dead Time (3.2 ns) Payload Data (4 bits-wide) 11111111112222222222333 012345678901234567890123456789012 Valid Data (32 x 400 ps = 12. 8 ns) Clock/Data Window (46 x 400 ps = 18.4 ns)
Synchronous WDM Bit-Parallel Word 32 four-wavelength WDM bit-parallel messages Low bit skew channels Clock synchronous data Packet Length 32 bits 4 Bit-Parallel Signals 2.5 Gbps 1 Global Clock Signal 1.25 GHz 1 Frame Signal 78.125 Mbps 4 Routing Signals 78.125 Mbps
Power Effective Interconnection Design Bit-Parallel[0..3] Transmission External optical modulation (25/7 W) Routing Signals Transmission Direct modulation of DFB (3.4 W) Bit-Parallel[0..3] Capture Bursty mode receivers (2 W) Digital Logic Core FPGA-based (30 W)