Ultra Low Latency Optical Networks

Size: px
Start display at page:

Download "Ultra Low Latency Optical Networks"

Transcription

1 Ultra Low Latency Optical Networks O. Liboiron-Ladouceur, B. A. Small, W. Lu, and K. Bergman Department of Electrical Engineering, Columbia University, 5 West 2th Street, New York, NY 27 J. S. Davis, C. Hawkins, J. Park, D. S. Wills, D. C. Keezer, and K. P. Martin School of Electrical and Computer Engineering and Microelectronics Research Center, Georgia Tech Atlanta, GA 3332 Abstract G. D. Hughes Laboratory of Physical Science, College Park, MD 2742 In many supercomputing applications, high data reference locality (HDL) allows hardware and software designers to reduce the impact of long data access latency through caching and migration techniques. Other applications (e.g., cryptography, data mining) exhibit low data reference locality (LDL), forcing system designers to pursue minimum data access latency using non-traditional techniques. This paper presents research on an ultra low latency optical network that is on the order of minimum time of flight limits on data access latency. Employed techniques include a non-blocking, bufferless topology, single pulse packets incorporating wavelength division multiplexing (WDM) for both routing header and payload data, and ultra-fast electro optical interfaces built from commercially available components. Early results from simulations and prototype system components show concept feasibility and potential to impact both LDL and HDL supercomputer applications of the future.. Introduction Supercomputer designs have always recognized the importance of interconnect delay. The Cray was built in a cylinder so that cross-machine wiring would require minimal length. As the physical scale of supercomputers has grown from less than a meter to tens of meters (the NEC Earth Simulator is 5 meters in diameter), techniques to minimize interconnect latency have been sought to reduce intrinsic communication cost and maximize opportunities to exploit parallelism. When application access patterns support explicit or automatic identification of high data reference locality (HDL), a range of techniques can be employed to reduce the performance impact of communication latency. Caching and migration techniques can reposition data to a node or cluster where access latency is small. However, these methods are ineffective when applications exhibit low data reference locality (LDL) since no effective data repositioning is possible. Efforts to reposition data often result in increased access latency. Such applications are common in cryptography and data mining where searches rely on data indirect accesses. To effectively execute LDL applications, there is no substitute for low latency memory access. Given the importance of low access latency, the interconnection network for a physically large system must be rethought. Even the best supercomputers today have access latencies dominated by buffer, routing, and interface delay rather than unavoidable time of flight. In the Earth Simulator, for instance, the internode MPI commutation latency is 8.6 µs [] where as the processor cycle time is 2 ns. This 4,3x differential significantly limits how parallel computation can be exploited. Light in a vacuum can cross the 5 M diameter of the Earth Simulator processor core in 66 ns (83x the cycle time). While this fundamental limit is impossible to reach, the 5 times improvement in communication latency (8.6 µs versus 66 ns) would substantially improve performance, especially in LDL applications. In general, a large-scale supercomputer can be modeled as a circular network core surrounded by processing and memory nodes, as shown in Figure. A limit on communications latency is one round trip from a processor node to a memory node and back. This Ultra Low Latency Optical Networks P l l M Figure : Modeling speed of light limits on data access latency.

2 covers a distance of roughly four times the network physical radius (4l) yielding a minimum delay of 4l/c. While cording paths could reduce the traveled length, they significantly complicate the routing mechanism. The goal of this research is to approach this fundamental limit by exploring a feasible interconnection network (topology, routing, medium, packet format, physical switching, electro optical conversion) where delay is minimized. Our metric of success, latency ratio, is defined as the delivered latency divided by the fundamental limit (T delivered c / 4l). This ratio of the Earth Simulator is 5. The objective of this research is to develop techniques to reach a far lower ratio. Software While software issues are outside the scope of this work, it is unlikely that a traditional message passing software mechanism (M PI) will be compatible with system requirements of a LDL system. Software overhead often adds many hundreds of instructions (and cycles) to the latency of a data access. A non-cached shared memory model with atomic raising and lower operators to support synchronization is a lower overhead approach. Link Medium For a physically large system, optical interconnect provides the greatest opportunity for low latency ratio communications. While free space optics offers the lowest time of flight delay, guided wave (fiber) simplifies physical design (and reduces alignment tolerances) for a modest increase in propagation delay. Furthermore, the use of fiber optics enables the leveraging of commercial telecom technologies for this application. The switch elements are employ Semiconductor Optical Amplifiers (SOAs) to provide both fast switching and amplification of multi spectral optical pulses. Data Vortex Topology For optical communication, electro optical conversions must be minimized to keep latency and cost low. Indirect networks better fit the assumed system architecture, allow centralized switching resources, and are less dependent on locality not found in LDL applications. A wide range of well-studied indirect topologies exists. However, optical buffering is problematic. Since intra-network blocking and output blocking requires buffering, the ideal topology for an optical network would exhibit only input blocking. Here electro-optical conversation can be delayed until network access is available eliminating the need for optical buffering. It also offers process backpressure, reducing the need for explicit traffic regulation. Single Pulse Packets Wavelength Division Multiplexing (WDM) is used to encode both header and payload to maximize the data transfer efficiency and maintain low latency. This single pulse format further simplifies the processing required within each switch element as the decoding of the header bits is achieved by passive wavelength filtering. Fast Electro-Optical Interfaces Ultra -high speed interfaces for the single pulse WDM packets are developed to provide variable and controllable electronic data sources that closely emulate parallel data signals of interest in computer systems. The transmitters accurately reproduce, and accurately control, a variety of timing and encoding methods for the transmitted data. The receiver electronic interface is designed to capture and check single pulse data packets converted from optical receivers, emulating the possible data-capture techniques to deliver appropriate electronic signals to the receiving computer system. Paper Outline This paper is organized as follows. First a background section presents recent supercomputer system information. Then the data vortex is presented, followed by a discussion of single pulse packets and the corresponding fast electro-optical interfaces. The paper is concluded with a summary of future work. Ultra Low Latency Optical Networks 2

3 2. Background This section reviews recent supercomputer systems with special attention on the processor to processor and processor to memory interfaces. Table shows information for several systems. The following paragraphs highlight each system. Table : Supercomputer summary with communication rates. Y ear Computer Measu Ma P red x Es GF/s GF/ s Earth Simulator NEC 35,6 4, 5, ASCI White, IBM SP Power 3 7,226, 8, ASCI White, IBM SP Power 3 4,938, 8, ASCI Red Intel Xeon core 2,379 3,2 9, ASCI Blue Pacific SST, IBM 2,44 3,8 5, 64E Intel ASCI Option Red Pentium,338,8 9, Pro 3 52 Hitachi CP-PACS , 48 Intel Paragon XP/S MP , 768 Fujitsu VPP Link BW Netwk CS BW Network Topology 2.3 GB/s 8 TB/s Crossbar 2 GB/s.2 Crossbar & GB/s SP 2 GB/s.2 Crossbar & GB/s SP Mesh MB/s GB/s Switch MB/s GB/s Mesh MB/s GB/s 3 3 Hyper MB/s MB/s Crossbar D Mesh MB/s MB/s 8 57 Crossbar MB/s MB/s Earth Simulator The Earth Simulator is a distributed memory system consisting of 64 processor nodes connected by a 64 x 64 single-stage crossbar switch. Each node is a shared memory cluster composed of eight arithmetic vector processors (AP), a shared memory system of 6 GBytes, a remote access control unit (RCU), and an I/O processor (IOP). The peak performance of each AP is 8 GFlops. Therefore the total number of processors is 5,2 and the total peak performance and the main memory capacity are 4 TFlops and TB. The data internode transfer rate is 2.3 GB/s x 2 [][2]. ASCI White ASCI White is the third generation (flowing ASCI Red and ASCI Blue) on the way to TeraOPS. This supercomputer, designed by IBM, is located at Lawrence Livermore. The system consists of 52 IBM RS/6 SP Nighthawk-2 nodes operating at 375 MHz. Each node includes 6 processors, memory, and network interface. Internode communication is provided the SP switch that delivers 8MB/s of bandwidth [3]. ASCI Red ASCI Red, the first step in the ASCI Platforms Strategy, is a massively parallel, MIMD computer installed at Sandia National Laboratories. It was the world s first TOPS supercomputer. Standard parallel programming interfaces simplify porting parallel applications to this system [4]. ASCI Blue Pacific SST The ASCI Blue-Pacific supercomputer is partitioned into 2 major clusters: the open or public cluster and the closed or private cluster. The two supercomputers are based on SP architecture MPP systems. The open system consists of, MHz PowerPC 64e processors and the larger closed computer is a TeraOp machine of 5,856 processors. The open system is the machine that appears on the Top 5 lists with a peak quoted speed of Ultra Low Latency Optical Networks 3

4 .9 Tflop/s and a sustained Linpack benchmark of.46 Tflop/s. In theory the closed machine has a peak speed of 3.8 Tflop/s, but these results have not yet been reported [8][9]. Intel ASCI Option Red The design of the ASCI option red supercomputer is loosely based on the Intel Paragon supercomputer. The Paragon supercomputer used a 2D mesh interconnection facility (ICF) that could move messages at a peak unidirectional bandwidth of 2 Mbytes per second. Each Paragon node held two (the GP node) or three (the MP node) Intel i86 XP processors [4][]. Hitachi CP-PACS The CP-PACS is an MIMD (Multiple Instruction-streams Multiple Data-streams) parallel computer with a theoretical peak speed of 64Gflops and a distributed memory of 28Gbyte. The system consists of 248 processing units (PU's) for parallel floating point processing and 28 I/O units (IOU's) for distributed input/output processing. These units are connected in an 8x7x6 three-dimensional array by a Hyper Crossbar network. A well-balanced performance of CPU, network and I/O devices supports the high capability of CP-PACS for massively parallel processing [5][7]. Intel Paragon XP/S MP The Paragon is a commercialized offspring of the experimental Touchstone Delta system. The latter machine was built for the Concurrent Supercomputing Consortium at CalTech. The Delta system used i86 processors as computational elements in its nodes but, unlike its predecessor, the ipsc/86, the nodes were not arranged in a hypercube topology but in a 2-D grid. The Delta achieved a speed of.9 GFlops/s was reported for an order 2, full linear system. The Paragon s i86/xp has processor communication hardware on-chip to increase communication bandwidth [5]. Fujitsu VPP-5 The VPP5 vector parallel processor is a highly parallel, distributed memory supercomputer that has a performance range of 6.4 to 355 gigaflops and a main memory capacity from to 222 gigabytes. The system supports between 4 and 222 processors interconnected by a high-bandwidth crossbar network. The VPP5 is built from a custom.6 gigaflops vector processor and 8 MB/sec point-to-point bandwidth between nodes [5][6]. Trends Beyond the obvious trend towards higher performance, there is also significant growth in the internode communication network and memory bandwidth. Based on these trends, the targets are shown in Table. Note that Gops/sec is targeted sign the applications base is on non-floating point formats. The research outlined in the following sections offer possible paths to significant improvements in network bandwidth and latency. 3. Data Vortex Table 2: Target network parameters for research. Max PEs Gops/s,, Link BW 8 GB/s Net BW CS Network Topology 8 TB/s Data Vortex The Data Vortex architecture [4,5] can be viewed as a collection of richly connected routing nodes on multiple fiber cylinders, as seen in Fig.2. The switch fabric size is characterized by two parameters (A,H), representing the number of switching elements nodes along the angle and height dimensions respectively. Parameter A is typically set to be a small odd number (<), and is independent of the choice of H. The available number of input/output (I/O) ports is given by the product HxA. The number of cylinder levels (C) scales as: C = (log 2 H + ). In Fig.2, a switch fabric of (A,H)=(5, 4) is shown with a top view of the routing tours and with a side view of the interconnection patterns at each of the C=3 cylinders. Each cross point shown is a routing node, which can be labeled uniquely by the coordinates (a,c,h), where <a<a, <c<c and <h<h. Ultra Low Latency Optical Networks 4

5 Packets are processed synchronously in a highly parallel manner. Each packet is of a fixed length, and is routed in a slotted manner. Within each clock cycle, every packet in the switch progresses by one angle forward in the given direction either along the solid line towards the same cylinder or along the dashed line towards the inner cylinder. The solid routing paths along the same cylinder are shown in Fig.2 from the side view of each cylinder. These connection patterns are carefully designed and repeat from angle to angle to minimize the packet deflection probability. The dashed-line paths between neighboring cylinders maintain the same height index, h as they are used to forward the packets. As shown, packets are injected at the outermost cylinder (c=) from the input ports, and emerge at the innermost cylinder (c=log 2 H) towards the output ports. Each packet is self-routed in the fashion of binary-tree decoding as it propagates from the outer cylinder towards the inner cylinder. Every cylinder progress fixes a specific bit within the binary header address. The innermost cylinder (c=log 2 H) also allows the packet to circulate around when the output buffers are busy. To avoid packet contention, the switching architecture employs a synchronous and distributed control mechanism to properly schedule the neighboring packet flow. As a result, each node encounters at most one packet at a time and no optical buffering will be necessary within the Data Vortex switch fabric. This also greatly simplifies the routing procedure at each hop and facilitates the photonic implementation of the architecture. Although packet deflection occurs under certain traffic conditions, the probability of that event and its incurred penalty are minimized. This is achieved because packets are provided multiple paths to the destination and are always provided an open path by staying on the same cylinder if they are deflected. The angle dimension thus provides a virtual buffering mechanism for the deflected packets while eliminating the potential packet conflict. Importantly, the hierarchical routing procedure allows the employment of single bit WDM packet encoding, by which the single-bit based routing is accomplished by wavelength filtering in the header retrieval process. Traffic Flow Control and Routing A key design of the system is a distributed control signaling mechanism among routing nodes to achieve the buffer-less operation and simple routing logic. With the embedded synchronous timing, this scheme can schedule the traffic flow of the neighboring nodes properly so that packet conflict is eliminated. To implement the scheme, control lines are applied between any pair of nodes, which have competitive output paths. To see this more clearly, a small group of nodes around node C (,,2) are shown in Fig.3, where each node is labeled by coordinate (a,c,h). A specific example of control signaling (vertical line) between node A (,,3) and node B (,,2) is shown because they both send packets to node C (,,2). The mechanism is very simple: a deflection control message is automatically triggered from node A to node B whenever A sends a packet to C. Since it took a latency of d to deliver the control message, the packet at node A must be slightly earlier than the packet at node B for proper scheduling. Figure 2: Data Vortex topology (A,H) =(5,4) with routing tours seen from the top and the side. Each node is labeled uniquely by the coordinate (a,c,h), where <a<a, <c<c and <h<h. Figure 3: Control signaling (vertical lines) between competing nodes within the neighborhood Ultra Low Latency Optical Networks 5

6 If both B and A have packets addressed to C, the control message is then able to prevent B's packet from progressing to C in the same packet slot. The deflected packet remains at its current cylinder instead propagating to node D in Fig.3. A virtual buffering mechanism is thus provided for the deflected packet and only a slight latency penalty will be introduced. This is because the deflected packet recovers its direction vector towards the target every other clock cycle (in two hops) by staying on the same cylinder. The system can be kept synchronous by properly designing the link latency. As shown in Fig.3, d and d2 represent the propagation latencies for the same-cylinder link (East out path) and the neighbor-cylinder link (South out path) respectively. In practical implementations, the control latency d only takes a small percentage of the packet period because physically the competing nodes are located close to each other. Therefore by simply making d2+d=d, the switch system is able to maintain synchronous operation as well as allow the correct setup of the control mechanism. Performance This section presents initial simulation results comparing the data vortex and butterfly topologies. A message in the simulation is assumed to be one packet long, and each link only holds one packet per cycle. Traffic load is calculated as a percentage of the input ports into which packet injections are attempted on each cycle. For example, if there are 52 input ports and the load is to be 5%, a packet injection is attempted on 256 randomly-chosen input ports in each cycle, with each packet having a randomly-generated output destination address. The acceptance rate is calculated as the percentage of attempted injections that are successful. The data vortex performs virtual buffering by allowing packets to propagate around the angles on each clock cycle in an always-moving fashion, while the butterfly performs single message input buffering at each of the inputs of its constituent 2x2 switches. The data vortex switch allows direct optical switching of WDM-packed pulses without need for OEO conversion. Because the butterfly network can block at each switch, it must perform OE conversion to allow electrical buffering followed by EO conversion to reproduce the optical signal. An all electrical butterfly eliminates OE and EO conversions, but eliminates the opportunity for WDM packing. Since the data vortex employs links as virtual buffers, its angle dimension is used in this comparison to provide buffering only where as cylinder height determines number of inputs. While this increases the number of cross-section links, it substantially reduces switch complexity and size over a butterfly switch for the afore mentioned reasons. In this simulation, the angle is set to 4. Figure 4 shows packet acceptance versus offered load. The data vortex exhibits a higher acceptance rate for 24 input ports. This is due to the fact that the data vortex does not block packets from leaving the input nodes due to the lack of buffering and "always-moving" nature of data handling (i.e., the data packets flow away from the input and into lower cylinders of the vortex rather than buffer near the inputs blocking additional packets from entering the network). Packets in the butterfly network often block at the first level of the topology due to output contention in the 2 x 2 switches, preventing additional packets from entering the network. Accepted Traffic (%) Packet Acceptance Rate for 24 Inputs Data Vortex Butterfly Offered Traffic (%) Figure 4: Packet acceptance rate versus offered load. Ultra Low Latency Optical Networks 6

7 Figure 5 shows that for varying network I/O sizes and a fixed % network workload, the data vortex accepts about 73% of all injected packets that are attempted, whereas the butterfly accepts only 3-44%. The data vortex exhibits roughly the same average number of hops per packet from input to output as the butterfly, while carrying about twice as many packets. Accepted Traffic (%) Packet Acceptance at Maximum Offered Load Network Size Data Vortex Butterfly Figure 5: Packet acceptance rate versus network size at maximum offered traffic. 4. Single Pulse WDM Packets and Ultrafast Electro-Optic Interface System Description We developed a single-bit WDM transmission test-bed that demonstrates the feasibility of this low latency optical link [5]. The optical packets are a single bit in duration and encoded along the wavelength domain. In this fashion the latency associated with the parallel-to-serial and serial-to-parallel conversion is eliminated. The DWDM optical link provides ultra-high capacity data transmission in a cost effective manner by leveraging components from commercial telecom technologies. The optoelectronic digital interface to the optical link times and formats the signals, distributes the clocks, and captures pre-processes incoming data. In the demonstrated test-bed four-bit parallel data are NRZ modulated and transmitted synchronously during each clock cycle to DWDM transmitters of different channel wavelength. An additional fifth presence bit is co-transmitted to the data valid. Each bit is encoded along a different channel wavelength within the C-Band (525nm-625nm). The five bits are then multiplexed in a DWDM arrayed waveguide grating and transmitted along a single fiber. The fiber link dispersion is carefully managed to assure precise bit timing and reduce the skew between channels. Incoming data from the fiber is de-multiplexed by another DWDM arrayed waveguide grating into the four-bit parallel word and the presence bit and converted to electrical pulses through optical receivers. The four-bit parallel electrical pulses are sampled by high-speed PECL circuits and transmitted to the logic interface. The interface performs analysis on the data for pattern error detection. A block diagram of the test bed is shown in Figure 6. Four digitally-programmable delay PECL chips set variable pulse widths and delays. The delay range is ns with a resolution of ps. The four-bit data words are synchronized, while the presence bit is offset to signal when the data is valid. Four high-speed input channels are also provided with an input for a high-speed clock. The receiver channels are designed to operate either independently or in synchronous operation with the transmitter. The data is captured using a PECL register and returned to the high-speed interface where data analysis is performed. The constructed high-speed electro-optical interface board is shown in Figure 7. 5 E-O Transmitter Ch.? Ch. 2? 2 Ch. 3? 3 Ch. 4? 4 Presence Bit RF Clock Source DWDM MUX Transmitter Block Fiber DWDM DEMUX Receiver Block Digital Core Block Ch.? Ch. 2? 2 Ch. 3? 3 Ch. 4? 4 Presence Bit USB 5 E-O Receiver Figure 6: Bit -parallel interconnection test bed Ultra Low Latency Optical Networks 7

8 Power Connector Clock Distribution Tx Delay Generator Tx Data Formatter Tx Outputs Digital Test Core Rx Logic Rx Inputs Figure 7: Photograph of high-speed electro-optic interface board In Figure 8, input data signals consisting of two four-bit parallel words ( & ) are shown. They are generated by four digitally-programmable delay PECL chips which can set variable pulse widths and delays. The current FPGA used in the interface electronics is limited to a maximum data rate of 622Mbps. This sets the upper limit on the demonstrated bit-parallel word rate. Nevertheless, the obtained data pulse widths were approximately 3ps indicating that much faster word rates are achievable. A minimum width data pulse is shown in Figure 9. Ch.? First 4-bit word Ch. 2? 2 Second 4-bit word Ch. 3?3 Ch. 4? 4 Presence Bit Figure 8: Four-bit parallel words and Figure 9: Minimum width pulse from the optoelectronic transmitter interface. Ultra Low Latency Optical Networks 8

9 The four channels optical channels were selected to enable investigations of possible channel cross-talk by closely spacing two channels. The second pair of channels were set a the edge of the C-band to evaluate the gain bandwidth. The optical spectrum of the link is shown in Figure. The overall link latency is the sum of contributions from the interface which includes the transmitter and receiver and from the propagation time of the lightwave. In the current test-bed the overall measured latency is approximately 4.5ns as shown in Figure. We find that the dominant contribution to this latency is in the transmitter module. Ch. 2? =55.72nm Ch. 3? =554.94nm Ch.? =537.4nm Presence Ch. 4 Bit? =556.55nm Electrical Differential Input Data Signal Optical TX Output Data Signal Electrical Differential Output Data Signal Figure : Spectrum of the fiber link with the four-bit word with its presence bit Figure : Bit-parallel interconnection signals It is expected that future commercial optical transmitters will generate lower delays. Importantly we note that in the DWDM bit-parallel system presented the latency will not scale with the number of wavelength channels. The current physical setup is shown in Figure 2. With four data channels and a presence channel, all operating differentially, a total of 2 cables are used for the transmitter and receiver. This setup was originally planned to divide the optical components from the electrical components, since each was being developed by separate research teams. The next generation of this project will be a single board with all of the components integrated on it. This will not only simplify the physical aspect of connecting the two projects, but also minimize transmission line effects from the cabling. The next generation of the optoelectronic tester is currently in the design stage. It will operate up to Gbps using new SiGe parts and all components will be integrated into one circuit board. The final goal is to reach 28 digital channels combined using WDM into one fiber producing an aggregate data rate of.28 Terabitsper-second over one optical channel. Figure 2. Opto-electronic Setup Ultra Low Latency Optical Networks 9

10 6. Summary and Future Work The techniques presented in the paper will enable significant reductions in data access latency in supercomputers executing LDL applications. The methodologies employed include [] A fiber-based link medium that simplifies physical design (and leverages off of existing telecom). [2] Wavelength Division Multiplexing (WDM) is used to encode both header and payload to maximize the data transfer efficiency and maintain low latency. [3] Ultra-high speed interfaces for the single pulse WDM that provide variable and controllable electronic data sources that closely emulate parallel data signals of interest in computer systems. [4] A novel switch fabric architecture specifically designed for optical implementation that solves the optical buffering issue by forcing network blocking to inputs. [5] A hierarchical routing procedure that allows the employment of single bit WDM packet encoding, by which the single-bit based routing is accomplished by wavelength filtering in the header retrieval process. This multi-faceted approach combines the strengths of high-speed electronics and photonics technologies in synergy to enable a scalable, low latency optical network implementations that will propel future supercomputers to new levels of performance. Our future work will include a complete demonstration of a multinode network test-bed based on the Data Vortex architecture incorporating the various physical subsystems described. References [] Earth Simulator: [2] Tetsuya Sato, Shigemune Kitawaki, and Mitsuo Yokokawa, Earth Simulator Running 22, [3] ASCI White: [4] ASCI Red: [5] Parallel Computing Hardware : [6] T. Utsumi, M. Ikeda, and M. Takamura, Architecture of the VPP5 Parallel Supercomputer, Proceedings of the 994 Conference on Supercomputing, pg , December 994, Washington, DC. [7] Netlib Repository, [8] ASCI Blue Pacific News [9] H. Franke, J. Jann, J. E. Moreira, P. Pattnaik, and M. A. Jette, An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific, Proceedings of Supercomputing 999, Portland, Oregon, November 999, [] T. Mattson and G. Henry, The ASCI Option Red Supercomputer, Proceedings of the Intel Supercomputer Users Group, Albuquerque, New Mexico, June 997, [] IBM Redbooks Understanding and Using the SP Switch, [2] Lawrence Livermore National Laboratory, ASCI program homepage [3] TOP 5 Supercomputer site [4] C. Reed, Multiple level minimum logic network, U.S. Patent 59962, Nov.3, 999. [5] Q. Yang, K. Bergman, G. D. Hughes and F. G. Johnson, WDM Packet Routing for High Capacity Data Networks, J. Lightwave Technology, Vol.9, pp.42 (2). [6] J.S. Davis, D.C. Keezer, O. Liboiron-Ladouceur, K. Bergman, Application Details for Embedded Digital Test Core: Optoelectronic Test Bed and Wafer-level Prober, Proc. Of the Int l Test Conf., (paper submitted) Ultra Low Latency Optical Networks

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks O. Liboiron-Ladouceur 1, C. Gray 2, D. Keezer 2 and K. Bergman 1 1 Department of Electrical Engineering,

More information

Performance Evaluation of k-ary Data Vortex Networks with Bufferless and Buffered Routing Nodes

Performance Evaluation of k-ary Data Vortex Networks with Bufferless and Buffered Routing Nodes Performance Evaluation of k-ary Data Vortex Networks with Bufferless and Buffered Routing Nodes Qimin Yang Harvey Mudd College, Engineering Department, Claremont, CA 91711, USA qimin_yang@hmc.edu ABSTRACT

More information

Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks

Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks Odile Liboiron-Ladouceur* and Keren Bergman Columbia University, 500 West 120

More information

Optimizing the performance of a data vortex interconnection network

Optimizing the performance of a data vortex interconnection network Vol. 6, No. 4 / April 2007 / JOURNAL OF OPTICAL NETWORKING 369 Optimizing the performance of a data vortex interconnection network Assaf Shacham and Keren Bergman Department of Electrical Engineering,

More information

Multi-Gigahertz Source Synchronous Testing of an Optical Packet Switching Network

Multi-Gigahertz Source Synchronous Testing of an Optical Packet Switching Network Multi-Gigahertz Source Synchronous Testing of an Optical Packet Switching Network C.E. Gray 1, O. Liboiron-Ladouceur 2, D.C. Keezer 1, K. Bergman 2 1 - Georgia Institute of Technology 2 - Columbia University

More information

Emulation of realistic network traffic patterns on an eight-node data vortex interconnection network subsystem

Emulation of realistic network traffic patterns on an eight-node data vortex interconnection network subsystem Emulation of realistic network traffic patterns on an eight-node data vortex interconnection network subsystem Benjamin A. Small, Assaf Shacham, and Keren Bergman Department of Electrical Engineering,

More information

Co-Development of Test Electronics and PCI Express Interface for a Multi-Gbps Optical Switching Network

Co-Development of Test Electronics and PCI Express Interface for a Multi-Gbps Optical Switching Network Co-Development of Test Electronics and PCI Express Interface for a Multi-Gbps Optical Switching Network C.E. Gray 1, O. Liboiron-Ladouceur 2, D.C. Keezer 1, K. Bergman 2 Georgia Institute of Technology

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Algorithm for finding all routes in Data Vortex switch

Algorithm for finding all routes in Data Vortex switch Algorithm for finding all routes in Data Vortex switch Neha Sharma, Devi Chadha, Vinod Chandra* Department of Electrical Engineering, Indian Institute of Technology, Delhi, New Delhi, INDIA 110016 ABSTRACT

More information

Modeling, Simulating, and Characterizing Performance in

Modeling, Simulating, and Characterizing Performance in Modeling, Simulating, and Characterizing Performance in Optical Switching Networks J. Nathan Kutzc, B. Smalib, W. Lub, and K. Bergmanb adepartment of Applied Mathematics, University of Washington, Box

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

An optically transparent ultra high speed LAN-ring employing OTDM

An optically transparent ultra high speed LAN-ring employing OTDM An optically transparent ultra high speed LAN-ring employing OTDM K. Bengi, G. Remsak, H.R. van As Vienna University of Technology, Institute of Communication Networks Gusshausstrasse 25/388, A-1040 Vienna,

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N.

Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N. Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N. Published in: Proceedings of 20th Annual Symposium of

More information

The Earth Simulator System

The Earth Simulator System Architecture and Hardware for HPC Special Issue on High Performance Computing The Earth Simulator System - - - & - - - & - By Shinichi HABATA,* Mitsuo YOKOKAWA and Shigemune KITAWAKI The Earth Simulator,

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

What have we learned from the TOP500 lists?

What have we learned from the TOP500 lists? What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots

More information

Ultrafast photonic packet switching with optical control

Ultrafast photonic packet switching with optical control Ultrafast photonic packet switching with optical control Ivan Glesk, Koo I. Kang, and Paul R. Prucnal Department of Electrical Engineering, Princeton University, Princeton, NJ 8544 glesk@ee.princeton.edu

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Prioritized Shufflenet Routing in TOAD based 2X2 OTDM Router.

Prioritized Shufflenet Routing in TOAD based 2X2 OTDM Router. Prioritized Shufflenet Routing in TOAD based 2X2 OTDM Router. Tekiner Firat, Ghassemlooy Zabih, Thompson Mark, Alkhayatt Samir Optical Communications Research Group, School of Engineering, Sheffield Hallam

More information

Simulation of Simultaneous All Optical Clock Extraction and Demultiplexing for OTDM Packet Signal Using a SMZ Switch

Simulation of Simultaneous All Optical Clock Extraction and Demultiplexing for OTDM Packet Signal Using a SMZ Switch Simulation of Simultaneous All Optical Clock Extraction and Demultiplexing for OTDM Packet Signal Using a SMZ Switch R. Ngah, and Z. Ghassemlooy, Northumbria University, United Kingdom Abstract In this

More information

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and

More information

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Dr. Vinod Vokkarane Assistant Professor, Computer and Information Science Co-Director, Advanced Computer Networks Lab University

More information

Optical networking technology

Optical networking technology 1 Optical networking technology Technological advances in semiconductor products have essentially been the primary driver for the growth of networking that led to improvements and simplification in the

More information

On contention resolution in the data vortex optical interconnection network

On contention resolution in the data vortex optical interconnection network Vol. 6, No. 6 / June 2007 / JOURNAL OF OPTICAL NETWORKING 777 On contention resolution in the data vortex optical interconnection network Assaf Shacham* and Keren Bergman Department of Electrical Engineering,

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

OFFH-CDM ALL-OPTICAL NETWORK

OFFH-CDM ALL-OPTICAL NETWORK Patent Title: OFFH-CDM ALL-OPTICAL NETWORK Inventor: FOULI K., MENIF M., LADDADA R., AND FATHALLAH H. Status: US PATENT PENDING, APRIL 2008 Reference Number: 000819-0100 1 US Patent Pending: 000819-0100

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Singlemode vs Multimode Optical Fibre

Singlemode vs Multimode Optical Fibre Singlemode vs Multimode Optical Fibre White paper White Paper Singlemode vs Multimode Optical Fibre v1.0 EN 1 Introduction Fibre optics, or optical fibre, refers to the medium and the technology associated

More information

Novel Passive Optical Switching Using Shared Electrical Buffer and Wavelength Converter

Novel Passive Optical Switching Using Shared Electrical Buffer and Wavelength Converter Novel Passive Optical Switching Using Shared Electrical Buffer and Wavelength Converter Ji-Hwan Kim 1, JungYul Choi 2, Jinsung Im 1, Minho Kang 1, and J.-K. Kevin Rhee 1 * 1 Optical Internet Research Center,

More information

Developing flexible WDM networks using wavelength tuneable components

Developing flexible WDM networks using wavelength tuneable components Developing flexible WDM networks using wavelength tuneable components A. Dantcha 1, L.P. Barry 1, J. Murphy 1, T. Mullane 2 and D. McDonald 2 (1) Research Institute for Network and Communications Engineering,

More information

UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING

UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING 1 UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING Syllabus: Physical layer and overview of PL Switching: Multiplexing: frequency division multiplexing, wave length division multiplexing, synchronous

More information

High Performance Computing in Europe and USA: A Comparison

High Performance Computing in Europe and USA: A Comparison High Performance Computing in Europe and USA: A Comparison Erich Strohmaier 1 and Hans W. Meuer 2 1 NERSC, Lawrence Berkeley National Laboratory, USA 2 University of Mannheim, Germany 1 Introduction In

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

AllWave FIBER BENEFITS EXECUTIVE SUMMARY. Metropolitan Interoffice Transport Networks

AllWave FIBER BENEFITS EXECUTIVE SUMMARY. Metropolitan Interoffice Transport Networks AllWave FIBER BENEFITS EXECUTIVE SUMMARY Metropolitan Interoffice Transport Networks OFS studies and other industry studies show that the most economic means of handling the expected exponential growth

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Brief Background in Fiber Optics

Brief Background in Fiber Optics The Future of Photonics in Upcoming Processors ECE 4750 Fall 08 Brief Background in Fiber Optics Light can travel down an optical fiber if it is completely confined Determined by Snells Law Various modes

More information

Simulation of an all Optical Time Division Multiplexing Router Employing Symmetric Mach-Zehnder (SMZ)

Simulation of an all Optical Time Division Multiplexing Router Employing Symmetric Mach-Zehnder (SMZ) Simulation of an all Optical Time Division Multiplexing Router Employing Symmetric Mach-Zehnder (SMZ) Razali Ngah, Zabih Ghassemlooy, and Graham Swift Optical Communications Research Group, School of Engineering,

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp. Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample

More information

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 13, JULY 1,

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 13, JULY 1, JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 13, JULY 1, 2008 1777 The Data Vortex Optical Packet Switched Interconnection Network Odile Liboiron-Ladouceur, Member, IEEE, Assaf Shacham, Member, IEEE,

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

UNIT- 2 Physical Layer and Overview of PL Switching

UNIT- 2 Physical Layer and Overview of PL Switching UNIT- 2 Physical Layer and Overview of PL Switching 2.1 MULTIPLEXING Multiplexing is the set of techniques that allows the simultaneous transmission of multiple signals across a single data link. Figure

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Table 9. ASCI Data Storage Requirements

Table 9. ASCI Data Storage Requirements Table 9. ASCI Data Storage Requirements 1998 1999 2000 2001 2002 2003 2004 ASCI memory (TB) Storage Growth / Year (PB) Total Storage Capacity (PB) Single File Xfr Rate (GB/sec).44 4 1.5 4.5 8.9 15. 8 28

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

DOS - A Scalable Optical Switch for Datacenters

DOS - A Scalable Optical Switch for Datacenters DOS - A Scalable Optical Switch for Datacenters Xiaohui Ye xye@ucdavis.edu Paul Mejia pmmejia@ucdavis.edu Yawei Yin yyin@ucdavis.edu Roberto Proietti rproietti@ucdavis.edu S. J. B. Yoo sbyoo@ucdavis.edu

More information

Simulation of an all Optical Time Division Multiplexing Router Employing TOADs.

Simulation of an all Optical Time Division Multiplexing Router Employing TOADs. Simulation of an all Optical Time Division Multiplexing Router Employing TOADs. Razali Ngah a, Zabih Ghassemlooy a, Graham Swift a, Tahir Ahmad b and Peter Ball c a Optical Communications Research Group,

More information

Photonic Communications Engineering

Photonic Communications Engineering Photonic Communications Engineering Instructor Alan Kost Email - akost@arizona.edu Office OSC 529 Office Hours Walk In or by Appointment 1 Photonic Communications Class Syllabus Engineering Class is divided

More information

Plexxi LightRail White Paper

Plexxi LightRail White Paper White Paper CWDM and Limited Fiber Plant Installations Introduction This document contains information about using the CWDM capabilities of the Plexxi Switch hardware & Control software components within

More information

Optical Loss Budgets

Optical Loss Budgets CHAPTER 4 The optical loss budget is an important aspect in designing networks with the Cisco ONS 15540. The optical loss budget is the ultimate limiting factor in distances between nodes in a topology.

More information

Network Media and Layer 1 Functionality

Network Media and Layer 1 Functionality Network Media and Layer 1 Functionality BSAD 146 Dave Novak Dean, Chapter 3, pp 93-124 Objectives Introduction to transmission media Basic cabling Coaxial Twisted pair Optical fiber Basic wireless (NIC)

More information

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Man-Ting Choy Department of Information Engineering, The Chinese University of Hong Kong mtchoy1@ie.cuhk.edu.hk

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Making a Case for a Green500 List

Making a Case for a Green500 List Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges

More information

Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects

Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects The Low Cost Solution for Parallel Optical Interconnects Into the Terabit per Second Age Executive Summary White Paper PhotonX Networks

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

Lambda Networks DWDM. Vara Varavithya Department of Electrical Engineering King Mongkut s Institute of Technology North Bangkok

Lambda Networks DWDM. Vara Varavithya Department of Electrical Engineering King Mongkut s Institute of Technology North Bangkok Lambda Networks DWDM Vara Varavithya Department of Electrical Engineering King Mongkut s Institute of Technology North Bangkok vara@kmitnb.ac.th Treads in Communication Information: High Speed, Anywhere,

More information

100 Gbit/s Computer Optical Interconnect

100 Gbit/s Computer Optical Interconnect 100 Gbit/s Computer Optical Interconnect Ivan Glesk, Robert J. Runser, Kung-Li Deng, and Paul R. Prucnal Department of Electrical Engineering, Princeton University, Princeton, NJ08544 glesk@ee.princeton.edu

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Fractional Lambda Switching

Fractional Lambda Switching Fractional Lambda Switching Mario Baldi and Yoram Ofek Synchrodyne Networks, Inc. 75 Maiden Lane, Suite 37 New York, NY0038 Abstract - Fractional Lambda (l) Switching (FlS ) adds the necessary efficiency

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Parallel Computer Architecture Part I. (Part 2 on March 18)

Parallel Computer Architecture Part I. (Part 2 on March 18) Parallel Computer Architecture Part I (Part 2 on March 18) Latency Defn: Latency is the time it takes one message to travel from source to destination. Includes various overheads. E.g. Taking the T (or

More information

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to

More information

1. INTRODUCTION light tree First Generation Second Generation Third Generation

1. INTRODUCTION light tree First Generation Second Generation Third Generation 1. INTRODUCTION Today, there is a general consensus that, in the near future, wide area networks (WAN)(such as, a nation wide backbone network) will be based on Wavelength Division Multiplexed (WDM) optical

More information

The Performance of MANET Routing Protocols for Scalable Video Communication

The Performance of MANET Routing Protocols for Scalable Video Communication Communications and Network, 23, 5, 9-25 http://dx.doi.org/.4236/cn.23.522 Published Online May 23 (http://www.scirp.org/journal/cn) The Performance of MANET Routing Protocols for Scalable Video Communication

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue

More information

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

MULTIPLEXER / DEMULTIPLEXER IMPLEMENTATION USING A CCSDS FORMAT

MULTIPLEXER / DEMULTIPLEXER IMPLEMENTATION USING A CCSDS FORMAT MULTIPLEXER / DEMULTIPLEXER IMPLEMENTATION USING A CCSDS FORMAT Item Type text; Proceedings Authors Grebe, David L. Publisher International Foundation for Telemetering Journal International Telemetering

More information

Communication Networks

Communication Networks Communication Networks Chapter 3 Multiplexing Frequency Division Multiplexing (FDM) Useful bandwidth of medium exceeds required bandwidth of channel Each signal is modulated to a different carrier frequency

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?

More information

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS TECHNOLOGY BRIEF March 1999 Compaq Computer Corporation ISSD Technology Communications CONTENTS Executive Overview1 Notice2 Introduction 3 8-Way Architecture Overview 3 Processor and I/O Bus Design 4 Processor

More information

LAN Systems. Bus topology LANs

LAN Systems. Bus topology LANs Bus topology LANs LAN Systems Design problems: not only MAC algorithm, not only collision domain management, but at the Physical level the signal balancing problem (signal adjustment): Signal must be strong

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data Vector Processors A vector processor is a pipelined processor with special instructions designed to keep the (floating point) execution unit pipeline(s) full. These special instructions are vector instructions.

More information

AWG-based Optoelectronic Router with QoS Support

AWG-based Optoelectronic Router with QoS Support AWG-based Optoelectronic Router with QoS Support Annette Böhm, Magnus Jonsson, and Kristina Kunert School of Information Science, Computer and Electrical Engineering, Halmstad University Box 823, S-31

More information

I/O Choices for the ATLAS. Insertable B Layer (IBL) Abstract. Contact Person: A. Grillo

I/O Choices for the ATLAS. Insertable B Layer (IBL) Abstract. Contact Person: A. Grillo I/O Choices for the ATLAS Insertable B Layer (IBL) ATLAS Upgrade Document No: Institute Document No. Created: 14/12/2008 Page: 1 of 2 Modified: 8/01/2009 Rev. No.: 1.00 Abstract The ATLAS Pixel System

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Efficient Hybrid Multicast Routing Protocol for Ad-Hoc Wireless Networks

Efficient Hybrid Multicast Routing Protocol for Ad-Hoc Wireless Networks Efficient Hybrid Multicast Routing Protocol for Ad-Hoc Wireless Networks Jayanta Biswas and Mukti Barai and S. K. Nandy CAD Lab, Indian Institute of Science Bangalore, 56, India {jayanta@cadl, mbarai@cadl,

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

HPCC Random Access Benchmark Excels on Data Vortex

HPCC Random Access Benchmark Excels on Data Vortex HPCC Random Access Benchmark Excels on Data Vortex Version 1.1 * June 7 2016 Abstract The Random Access 1 benchmark, as defined by the High Performance Computing Challenge (HPCC), tests how frequently

More information

Design For High Performance Flexray Protocol For Fpga Based System

Design For High Performance Flexray Protocol For Fpga Based System IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 PP 83-88 www.iosrjournals.org Design For High Performance Flexray Protocol For Fpga Based System E. Singaravelan

More information

FIBER OPTIC NETWORK TECHNOLOGY FOR DISTRIBUTED LONG BASELINE RADIO TELESCOPES

FIBER OPTIC NETWORK TECHNOLOGY FOR DISTRIBUTED LONG BASELINE RADIO TELESCOPES Experimental Astronomy (2004) 17: 213 220 C Springer 2005 FIBER OPTIC NETWORK TECHNOLOGY FOR DISTRIBUTED LONG BASELINE RADIO TELESCOPES D.H.P. MAAT and G.W. KANT ASTRON, P.O. Box 2, 7990 AA Dwingeloo,

More information