PIC training: Interconnect System Design

Size: px
Start display at page:

Download "PIC training: Interconnect System Design"

Transcription

1 PIC training: Interconnect System Design Keren Bergman PhoenixSim Optical hardware Meisam Bahadori, Sébastien Rumley Lightwave Research Laboratory Columbia University Network Application

2 Silicon Photonics for Computing DRAM CMPs 3DI Stack Exaflop-scale high-performance computing system Silicon Photonic Interconnection Network Seamless hierarchical photonic cross-layer communication to the chip Memory Stack CMPs Photonic interconnects support inter-rack communications

3 HPC and Data Centers toward Exascale in a nutshell Exascale equates to FLoating point OPerations per second (FLOPs) Reaching Exascale requires: One CPU performing 10 FLOPs per cycle, clocked at 10 8 GHz, OR 10 8 such CPUs clocked at 1Ghz Consider 1,000 CPUs placed in a drawer, that s 100K drawers With 100 drawers per rack, that s still 1000 racks

4 Supercomputing Performance Current World Top Supercomputers are Petascale #1) Tianhne-2 (China) Peak: 55 PetaFLOPs (PF) #2) Titan (US) 27 PetaFLOPs (PF) #3) Sequoia (US) 20 PetaFLOPs (PF) Need a 20x improvement factor to Exascale Average computing performance of the top 3 Supercomputers over past decade:

5 The Major Lag in Data Communications Top 10 Supercomputers computation capabilities over past 5 years: Vast increase in parallelism requires ever more communications but bandwidth is stagnated Over past 5 years: while system compute power grows by 13X Node I/O bandwidth increases by only < 2X Data-movement is too expensive! ($ and Energy)

6 6 The Real Performance in decline Since 2010 growing gap between computing operations and bandwidth Deterioration of byte/flop ratios: Communication Byte / Computation FLOP Byte/FLOP of top 10 Supercomputers

7 The Photonic Opportunity for Data Movement Energy efficient, low-latency, high-bandwidth data interconnectivity is the core challenge to continued scalability across computing platforms Energy consumption completely dominated by costs of data movement Bandwidth taper from chip to system forces extreme locality Reduce Energy Consumption Eliminate Bandwidth Taper 7

8 Current interconnect and memory bandwidths Memory interfaces: 100s of Gb/s to terabit/s DDR4: 200 Gb/s WideI/O 2: 500Gb/s High-Bandwidth Memory: 1Tb/s - 2Tb/s Hybrid Memory Cube: 1Tb/s - 4Tb/s Network links: 100G is the new standard in HPC Infiniband 4xEDR, Intel omnipath, Bull exascale interconnect Higher bandwidths proposed 12x25 = 300G, 12x50 = 600G (Infiniband, 2017) Router chip envelopes: several Tb/s Cray Aries: 2.2 Tb/s Upcoming intel omnipath: 4.8 Tb/s Director switch envelope: 64 Tb/s for Mellanox biggest switch Era of Multi-Tb/s!

9 Estimating bandwidth needs Bandwidth can be related to compute power through the verbosity metric byte/flops (B/F) Memory bandwidth requirement: Ideally, up to 8 B/F (for the most demanding algorithms) Can be reduced to 0.5 B/F (with HMC) cache for fast/near RAM Can be less for bulk DRAM/NVRAM memory ( B/F) Interconnect requirement: Ideally, same as bulk memory (0.1 B/F) But even 0.02 B/F would be progress Corresponding global (link) BWs at Exascale: Memory: ~500 PB/s (0.5 B/F) Interconnect: ~400PB/s (100 PB/s 0.1 B/F) multiplied by 3-4 hops!

10 supercomputing node architecture Exascale system 20k to 100k such nodes Multi-CPU die delivering 10s of TF 3D stacked near memory modules as Hybrid Memory Cube Bulk and far memory (conventional DRAM or NVRAM) Interconnect switch (opaque or transparent) Optical Network interface (O-NIC) Photonic Memory links

11 Node level bandwidth requirements Assume: 10 Teraflop (TF) node (Exascale with 100K) Near memory bandwidth: 10 TF x 8bit x 0.5B/F = 40Tb/s (split over ~6-10 individual ~5 Tb/s interfaces) Interconnect bandwidth: 0.01 B/F 0.8 Tb/s 0.05 B/F 4Tb/s Bulk memory bandwidth: 0.1 B/F 8Tb/s 0.2 B/F 16 Tb/s (split over ~1-6 links)

12 Power requirements Today s largest envelope: Tianhe-2 = 17MW; RIKEN = 12MW Exascale at 100MW is maximal consideration: 10 GigaFLOP/Joule 20MW total system power envelope preferred: 50 GigaFLOP/Joule Energy efficiencies for the Green500 benchmark (June 2015)

13 [1] C.-H. Hsu, S.W. Poole, D. Maxwell, The Energy Efficiency of the Jaguar Supercomputer System components power budget Need for Gigaflop/J in the next 5 years ~30-50% of power is non-it (cooling, power delivery, etc.) [1] Power envelope 10 Gigaflop/J 50 Gigaflop/J 50 Gigaflop/J Budget per flop: 100 pj 20 pj 20 pj Interconnect Network % of power 10% 10% 10% Networking budget per flop: 10 pj 2 pj 2 pj Network verbosity 0.01 byte/flop 0.01 byte/flop 0.1 byte/flop Budget for a network byte 1 nj/byte 200 pj/byte 20 pj/byte Budget for a network bit 125 pj/bit 25 pj/bit 2.5 pj/bit Memory Memory % of power 15% 15% 15% Memory budget per flop: 15 pj 3pJ 3pJ Memory verbosity 0.5 byte/flop 0.5 byte/flop 1 byte/flop Budget for a memory byte 30 pj/byte 6pJ/byte 3pJ/byte Budget for a memory bit 3.75pJ/bit 0.75 pj/bit pj/bit

14 Energy budget per networking bit (pj) Network energy budget Gigaflop/J, 10% of the envelope 10 Gigaflop/J, 15% of the envelope 50 Gigaflop/J, 10% of the envelope 50 Gigaflop/J, 15% of the envelope Verbosity (byte/flop) Verbosities below 0.05 B/F, energy budget can be ~ 50 pj Above 0.1 B/F, total network energy: ~10pJ for 10 GF/J; ~2pJ for 50GF/J

15 Energy budget per bit (pj) Network energy requirements End-to-end data movement energy budget: 10 Gigaflop/J, 10% of the envelope 10 Gigaflop/J, 15% of the envelope 50 Gigaflop/J, 10% of the envelope 50 Gigaflop/J, 15% of the envelope 10 pjs to fjs! 100s of pj to 10s pj 1 10s of pj to single pjs Verbosity (byte/flop) 0.25 pj/bit

16 Interconnection network energy budget breakdown N+2 = 4 links Source compute node N=2 hops in the topology N+1 = 3 switches Destination compute node Budget network = (N+2) * Budget links + (N+1) * Budget switches + 2 * Budget interface Budget interface = 0 (for simplification) Budget switch: ~50 pj/bit (today s Cray Aries) ~20 pj/bit (upcoming Intel Omnipath) ~5 pj/bit (minimum for Exascale) ~1 pj/bit (target for Exascale) What s the remaining link budget? S. Rumley et al. Design Methodology for Optimizing Optical Interconnection Networks in High Performance Systems, ISC-HPC 2015.

17 Link energy budget Network portion 10% in all cases Verbosity (Byte/Flop) Energy efficiency (Gigaflop/J) Total Network Budget switch N Budget link Budget network pj/bit 50 pj/bit pj/bit pj/bit 50 pj/bit 3 5 pj/bit pj/bit 5 pj/bit pj/bit pj/bit 5 pj/bit 3 3 pj/bit pj/bit 5 pj/bit pj/bit pj/bit 5 pj/bit fj/bit pj/bit 1 pj/bit fj/bit pj/bit 1 pj/bit fj/bit N=2 requires switch radix ~ 96 N=3 switch radix ~ 48 N=2: 3 switches, 4 links N=3: 4 switches, 5 links

18 Interconnect costs Network is ~15% of total system cost $200M considered typical Exascale price $30M max for network Total interconnect bandwidth ~300 PB/s (0.1 B/F) $30M / 300 PB/s 1$/10GB/s 1.25 /Gb/s Cost reduction required: >100X for 0.1 B/F >10X for 0.01 B/F [1] M. Besta, T. Hoefler, Slim Fly: A Cost Effective Low-Diameter Network Topology, Supercomputing 2014

19 Realizing high BW low Energy Links Not Bandwidth per se: What matters is Gb/s/mW and Gb/s/$. Requires complete design space: Relationships between material/geometries, optical/electrical parameters, thermal, optical losses and energy consumption Impact of fabrication variability, limitations Applied to subsys and sys [1] Design Trade-Offs Example: Higher driver voltage increases ring modulator consumption but decreases laser consumption due to improve ER [1] R. Wu, C.-H. Chen, J.-M. Fideli, M. Fournier, R.G. Beausoleil, K.-T. Cheng, Compact modeling and system implications of microring modulators in nanophotonic interconnects, ACM SLIP 2015.

20 20 Beyond wire replacement Optics-enabled system architecture transformations: - Distance-independent, cut-through, bufferless On chip Short distance PCB Long distance PCB Optical link No conversion! 12 conversions! Conventional hop-by-hop data movement Fully flattened end-to-end data movement

21 Columbia PhoenixSim: Integrated Multi-Level Modeling and Design Environment software software Novel design environment enabling HFI across three layers: Application IO primitives Copy memory array to remote location Send, multicast, broadcast messages Thread synchronization (e.g. barrier) Network architecture and protocols Environment Link locking mechanisms (frame detection) Network topology (routing) Arbitration of shared buses, switches Si Photonic Hardware implementations Silicon photonics modulators, switches Complete toolbox of models at each layer Ensure interoperability among models Avoids manual adaptations of data between distinct software Network arch. Appl. model software Hardware SiP devices

22 Multi-layer environment Thread ID rank dest Application void work_in_parallel(int rank) { int[] array = calculate_local_array(rank); int dest = determine_next_dest(array); handshake payload trans. flow control } copy_array_remote(array, dest, address); integrity check onic 1 (rank) Switch onic 2 (dest) Routing Path arbitration Optical path setup dest onic onic 2 onic onic 1 onic Network rank Data transmission Hardware SiP Switch ns SiP WDM Demux 22

23 Verilog SST [9] SuperScalar Cross-layer iterative optimization Traces SST/Macro [10] Physical layer Device models [6] PhoenixSim environment Physical layer Application needs Appl. layer DSENT [8] FDTD Circuits models [7] Optically-sound network arch. Network layer IO requests Interfaces models Application characteristics [4,5] Key parameters Application layer Network performance/costs trade-offs [1-3] Hardware validated and optimized models [7] [1] K. Wen, S. Rumley, K. Bergman, "Reducing Energy per Delivered Bit in Silicon Photonic Interconnection Networks," Optical Interconnects 2014 [2] S. Rumley, el al., "Low Latency, Rack Scale Optical Interconnection Network for Data Center Applications," ECOC 2013 [3] R. Hendry, et al., "Modeling and Evaluation of Chip-to-Chip Scale Silicon Photonic Networks," IEEE Hot Interconnects 2014 [4] S. Rumley, L. Pinals, G. Hendry, K. Bergman, "A Synthetic Task Model for HPC-Grade Optical Network Performance Evaluation," IA^ [5] K. Wen, et al., "Reuse Distance Based Circuit Replacement in Silicon Photonic Interconnection Networks for HPC," IEEE Hot Interconnects [6] D. Nikolova, R. Hendry, S. Rumley and K. Bergman, "Scalability of Silicon Photonic Microring Based Switch" ICTON'14 [7] S. Rumley, R. Hendry, K. Bergman, "Fast Exploration of Silicon Photonic Network Designs for Exascale Systems," ASCR ModSim Workshop [8] C. Sun, et al. DSENT A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling, NoCS 2012 [9] S. Hammond, et al. Towards a standard architectural simulation framework. Workshop on Modeling & Simulation of Exascale Systems & Applications, [10] C. L. Janssen, et al. A simulator for large-scale parallel architectures. International Journal of Parallel and Distributed Systems, 1(2):57-73, 2010.

24 Graphical interface Configuration of the kernel model Configuration of cross-layer parameters Configuration of networking aspects (e.g. switch arbitration) Configuration of hardware parameters and settings

25 Si Photonic physical hardware layer: current features Silicon Photonic WDM links and switch fabrics: Optical signal quality determinants (crosstalk, optical losses, etc.) Photonic network power consumption Photodetectors External laser Chip 1 Optical switch Chip 2 Other chips

26 Physical layer parameters

27 Multi-Level Modeling Environment: Interface Photodetectors External laser Chip 1 Chip 2

28 Multi-Level Modeling Environment: Interface Photodetectors External laser Chip 1 Chip 2

29 Multi-Level Modeling Environment: Interface

30 Multi-Level Modeling Environment: Interface OOK modulation Imperfect ER Couplers Jitter Waveguide on chip Modulator IL Demux - truncation Intermod. crosstalk Demux - crosstalk Demux - IL

31 Environment automated optimization: Q-factor Finding key parameters of ring resonators Size of the ring impacts resonance, and power consumption Size, internal geometry and proximity to waveguide impacts quality factor Ring parameters are optimized for each architecture design and implementation: Optimization Per channel requirements Ideal Q Global conditions Fabrication parameters and limitations Doping Size Proximity Power Signal quality

32 Transmission Optimization of Ring based demultiplexers Quality Factor (Q-factor) main ring parameter Q = l/dl Inverse of the ring bandwidth 3 db Must be optimized for each link format Example: filtering at demux subject to trade-off Truncation of the signal Crosstalk from other signals Dl l Low Q ring Wavelength High Q ring High truncation Signal spectrum Very small leakage in other channels Crosstalk due to leakage Low truncation

33 Modulator Penalty (db) Optimization of Ring based modulators Penalty Trade-offs: Insertion Loss vs. Extinction Ratio vs. Multiplexing Crosstalk Parameters Trade-offs: Channel spacing vs. Resonance shift vs. Q-factor Example: Low-Q ring vs. High-Q ring Parameters Channel Spacing = 1 nm Quality Factor Q = 6000 Q =15000 Resonance Shift (nm) Insertion Loss (db) Extinction Ratio Penalty (db) Crosstalk Penalty (db) ON-OFF Penalty (db) Total Penalty (db) Optimum Value Q = 6000 Q = Resonance Shift (nm) [Bahadori, JLT (under revision)] 33

34 Example end-to-end results Analysis of demultiplexing PP for 1 Tb/s (includes filter Q factor optimization) Optimized design for link throughput [Bahadori, Optical Interconnect 2015] [Bahadori, JLT (under revision)]

35 network layer Thread ID rank dest Application void work_in_parallel(int rank) { int[] array = calculate_local_array(rank); int dest = determine_next_dest(array); handshake payload trans. flow control } copy_array_remote(array, dest, address); integrity check onic 1 (rank) Switch onic 2 (dest) Routing Path arbitration Optical path setup dest onic onic 2 onic onic 1 onic Network rank Data transmission Hardware SiP Switch ns SiP WDM Demux 35

36 Cross-layer software integration 6 node example We assume 6 independent ranks (threads) each running a on distinct node Nodes are connected in a peer-to-peer fashion (all-to-all) Hardware layer: Point-to-point (chip-to-chip) SiP WDM links Network layer: No arbitration, no flow-control (simple design) Application layer: Test distributed algorithm: After an initialization phase, algorithm has N rounds During each round i Rank R sends a message to dest (R+i) Waits until it receives a message from (R-i) Round 1 Round 2 Does some processing Point-to-point SiP links

37 Timeline thread activity visualization Processing Time-to-solution

38 Optimized network links designs for: 0.5Tb/s, 1.0Tb/s, and 1.5Tb/s bandwidth densities 0.5 Terabit/s 20 wavelengths 25 Gb/s 2.39 pj/bit 2710 ns 1 Terabit/s 38 wavelengths Gb/s 2.64 pj/bit 1510 ns 1.5 Terabit/s 54 wavelengths Gb/s 2.9 pj/bit 1110 ns

39 Energy/time-to-solution Pareto fronts 1.5 Terabit/s 54 wavelengths Gb/s 2.9 pj/bit 1110 ns 1 Terabit/s 38 wavelengths Gb/s 2.64 pj/bit 1510 ns Sub-optimal designs 0.5 Terabit/s 20 wavelengths 25 Gb/s 2.39 pj/bit 2710 ns Pareto optimal designs

40 Photonic link power bandwidth trade-off Aggregate line rate (Gb/s) Channel rate < 25 Gb/s Channel rate 25 Gb/s Energy per bit Design: 20% laser wall-plug; 0.5mW ring stabilization; 2mW detector; modulator drive voltage 1.2V; modulator capacitance 100fF System wide optimizations: realize multi-tb/s links and <pj/bit

41 Interconnect power consumption: with transparent optical switching Dimension HPC/Data Center transparent optical network 40k compute nodes (25TFs each), 0.05 B/F 10 Tb/s per node Topology: distance optimized topology, uniform traffic at max rate [1] Transceivers: Silicon photonic energy optimized WDM transceivers (Q max = 15k, ring stab: 0.5mW) Switches 2 Cases: (1) MEMs based switch: 320 ports, 0.14mW and 3Tb/s per port (46 fj/bit); assume 3 switches traversed at ~150fJ/bit; 3.5 db Power penalty [2] (2) Hybrid SOA/MZI switch fabric: ~1W/port for 64-radix; ~6dB power penalty; faster ns-scale switching [3] Transceiver launch power calculated for worse case path Laser assumed always ON, 5% Wall plug efficiency Model total consumption: (transceivers + switches) / (node injection bandwidth) [1] S. Rumley et al. ISC High Performance 2015 conference [2] Calient S320 OCS switch [3] Q. Cheng, A. Wonfor, J.L. Wei, R. V. Penty, I.H. White, Optics Letters 39(18), 2014.

42 PhoenixSim Exascale interconnect power consumption Case 1: MEMS based 134 nodes (25 TFs) connected to each switch (4 depicted here) 299 switches per plane (10 here) 0.05 B/F 53,521 inter-switch connections per plane Up to 3 switches to be traversed (10.5dB) Aggregate line rate: 2 Tb/s (80 x 25Gb/s) 5 planes required to reach 10 Tb/s Launch power per wavelength: -2.22dBm Total consumption: 364 kw Resulting efficiency (end-to-end): 0.91 pj/bit

43 PhoenixSim component breakdown analysis Case 1: MEMS based Baseline 3dB PP (instead of 3.5dB) 3D-MEMs consumption only 10% laser WPE (instead of 5%) 960 ports (instead of 320) Mod/demod Laser Switch End-to-end energy efficiency (pj/bit)

44 PhoenixSim Exascale interconnect power consumption Case 2: SOA-MZI based 8 nodes connected to each switch (2 depicted here) 5000 switches (32 ports) per plane 0.05 B/F Up to 4 switches to be traversed (19.6dB) 120,000 inter-switch connections per plane 34 channels, ~59G totalizing 2 Tb/s 5 parallel planes Launch power per wavelength: 6.27dBm Total power consumption: MW Resulting efficiency (end-to-end): 6.9 pj/bit

45 Another factor: optical circuit switching Optical circuit switching: inherently low average utilization Low utilization as the result of circuit switching: Streaming circuit data cannot be slowed when in motion

46 OCS why low average utilizations The optical circuit is the transmission link When a switch turns, no transmission can occur Turning the switch = breaking circuits No active circuits over a turning switch Unless the circuit is never reconfigured circuit switch cannot be 100% fully utilized Utilization can be high if reconfiguration << circuit ON time Poor utilization if reconfiguration >= circuit ON time Optical switching Unique circuit Input circuit Packet (electrical) switching Output circuit Xbar circuit

47 Packet duration shrink with increased bandwidth Packet durations will trend to ~1-10ns Packet sizes Aggregate Line rates 100B 1KB 10KB 100KB 100Gb/s 8ns 80ns 800ns 8ms 400Gb/s 2ns 20ns 200ns 2ms 1Tb/s 800ps 8ns 80ns 800ns 2.5Tb/s 320ps 3.2ns 32ns 320ns

48 Impact of optical circuit switching on utilization Link unavailability time composed of: Switch configuration (optical path) Link re-establishment (equilibrate, preamble, etc.) Resulting utilization: (worse-case) Resulting utilizations: (switch turns after every second packet) Packet duration Packet duration Link unavailability 1ns 10ns 100ns 100ns 99% 91% 50% 10ns 91% 50% 9% 1ns 50% 9% 1% Link unavailability 1ns 10ns 100ns 100ns 99% 95% 66% 10ns 95% 66% 16% 1ns 66% 16% 2% Need circuit down time no more than ~1ns!

49 What about the laser energy consumption Baseline case: 10Gb/s per wavelength Detector sensitivity: -20dBm Link optical budget including modulation: 10dB Launch power -10dBm = 0.1 mw Laser «wall plug» efficiency: 10% Laser power: 1mW Laser contribution to energy consumption: 0.1 pj/bit * assuming no additional power penalties due to WDM

50 The role of link utilization in energy consumption Assume laser ON continuously But link carries real data traffic 10% of the time Energy efficiency inversely proportional to utilization With 10% utilization, laser consumes the full 1pJ/bit budget

51 Laser energy consumption VS utilization trade-off energy efficiency (pj/bit) energy efficiency (pj/bit) 10% utilization adds 10dB Increase energy efficiency by: Improved laser efficiency Reduced launch power Better receiver sensitivity Reduced link power penalties Improved laser efficiency % 10% 100% link utilization 10 1 Need combined factor of 10X improvement to achieve 0.1pJ/bit at 10% network utilization Reduced launch power 1% 10% 100% link utilization

52 Low average utilization is desirable for performance Why is low utilization advantageous? A close to 100% utilization case. Low utilization needed to guarantee low queuing In particular, queuing synchronization messages threatens parallel efficiency S. Rumley et al. "A Synthetic Task Model for HPC-Grade Optical Network Performance Evaluation," IA^

53 Transmission efficiency (pj/b) Need for ns-scale energy proportionality Transmission efficiency (pj/b) KB Setup time = 10ns Setup time = 100ns Setup time = 1ms Setup time = 10ms Laser always on KB Number of 10 Gb/s channels Number of 10 Gb/s channels 1KB packets require at least 100ns and ~10ns dynamic data optimal proportionality

54 Latency performance impact 100KB 1KB Head-to-tail latency includes both queuing and serialization times Keeping the laser ON yields the best performances but highest energy cost Adding channels improve performance (reduces serialization times) Laser setup time >100ns inflicts a substantial penalty

55 Transmission efficiency (pj/b) Transmission efficiency (pj/b) performance-energy WIN with dynamic proportionality For Tb/s 1KB packets require dynamic energy proportionality of ~10ns KB More wavelengths Setup time = 10ns Setup time = 100ns Setup time = 1ms Setup time = 10ms Laser always on Average head-to-tail latency (ns) High performance, ultra low latency AND low energy/bit with dynamic, energy proportional sources

56 summary 56 HPC scalability drives increased interconnectivity bandwidth: Aggregated compute power (needed Byte/s) Growing parallelism and distributed algorithms (B/F) System wide connectivity and data movement bandwidth key to performance and scalability Energy consumption interconnection network total budget: 0.1B/F and 50GigaFlop/J 1pJ/bit switches and 0.25pJ/bit links Laser power: At 1mW and 10% wall-plug efficiency: consumes 0.1pJ/bit with 100% utilization 10% network utilization adds 10dB, to 1pJ/bit Need combined 10X improvement to regain 0.1pJ/bit at 10% network utilization Unless the circuit is never reconfigured cannot be 100% utilized Utilization can be high if reconfiguration << circuit ON time Poor utilization if reconfiguration >= circuit ON time Packets 1ns-10ns for 1KB and ~Tbit/sec scale Circuit down time requires minimization; traffic impacts arbitration Energy proportionality is key

Silicon Photonics PDK Development

Silicon Photonics PDK Development Hewlett Packard Labs Silicon Photonics PDK Development M. Ashkan Seyedi Large-Scale Integrated Photonics Hewlett Packard Labs, Palo Alto, CA ashkan.seyedi@hpe.com Outline Motivation of Silicon Photonics

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

Scalable Computing Systems with Optically Enabled Data Movement

Scalable Computing Systems with Optically Enabled Data Movement Scalable Computing Systems with Optically Enabled Data Movement Keren Bergman Lightwave Research Laboratory, Columbia University Rev PA1 2 Computation to Communications Bound Computing platforms with increased

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Brief Background in Fiber Optics

Brief Background in Fiber Optics The Future of Photonics in Upcoming Processors ECE 4750 Fall 08 Brief Background in Fiber Optics Light can travel down an optical fiber if it is completely confined Determined by Snells Law Various modes

More information

Silicon Photonics for Exascale Systems

Silicon Photonics for Exascale Systems Silicon Photonics for Exascale Systems OFC 2014 Tutorial Keren Bergman Sébastien Rumley, Noam Ophir, Dessislava Nikolova Robert Hendry, Qi Li, Kishore Padmara, Ke Wen, Lee Zhu Lightwave Research Laboratory,

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects

Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology Manycore SOC roadmap fuels bandwidth demand

More information

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and

More information

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 33, NO. 3, FEBRUARY 1, Silicon Photonics for Exascale Systems

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 33, NO. 3, FEBRUARY 1, Silicon Photonics for Exascale Systems JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 33, NO. 3, FEBRUARY 1, 2015 547 Silicon Photonics for Exascale Systems Sébastien Rumley, Dessislava Nikolova, Robert Hendry, Qi Li, Student Member, IEEE, David Calhoun,

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS

IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS SRISHTI PHOTONICS RESEARCH GROUP INDIAN INSTITUTE OF TECHNOLOGY, DELHI 1 IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS Authors: Janib ul Bashir and Smruti R. Sarangi Indian Institute

More information

HPC Technology Trends

HPC Technology Trends HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations

More information

Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects

Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects I. Artundo, W. Heirman, C. Debaes, M. Loperena, J. Van Campenhout, H. Thienpont New York, August 27th 2009 Iñigo Artundo,

More information

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun In collaboration with: Chia-Hsin Owen Chen George Kurian Lan Wei Jason Miller Jurgen Michel

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

ECE 574 Cluster Computing Lecture 23

ECE 574 Cluster Computing Lecture 23 ECE 574 Cluster Computing Lecture 23 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 December 2015 Announcements Project presentations next week There is a final. time. Maybe

More information

Hybrid Memory Cube (HMC)

Hybrid Memory Cube (HMC) 23 Hybrid Memory Cube (HMC) J. Thomas Pawlowski, Fellow Chief Technologist, Architecture Development Group, Micron jpawlowski@micron.com 2011 Micron Technology, I nc. All rights reserved. Products are

More information

From Majorca with love

From Majorca with love From Majorca with love IEEE Photonics Society - Winter Topicals 2010 Photonics for Routing and Interconnects January 11, 2010 Organizers: H. Dorren (Technical University of Eindhoven) L. Kimerling (MIT)

More information

Jeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt

Jeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt IBM Research PCB Overview Jeff Kash, Dan Kuchta, Fuad Doany, Clint Schow, Frank Libsch, Russell Budd, Yoichi Taira, Shigeru Nakagawa, Bert Offrein, Marc Taubenblatt November, 2009 November, 2009 2009 IBM

More information

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks O. Liboiron-Ladouceur 1, C. Gray 2, D. Keezer 2 and K. Bergman 1 1 Department of Electrical Engineering,

More information

Emerging Platforms, Emerging Technologies, and the Need for Crosscutting Tools Luca Carloni

Emerging Platforms, Emerging Technologies, and the Need for Crosscutting Tools Luca Carloni Emerging Platforms, Emerging Technologies, and the Need for Crosscutting Tools Luca Carloni Department of Computer Science Columbia University in the City of New York NSF Workshop on Emerging Technologies

More information

Exascale challenges. June 27, Ecole Polytechnique Palaiseau France

Exascale challenges. June 27, Ecole Polytechnique Palaiseau France Exascale challenges June 27,28 2012 Ecole Polytechnique Palaiseau France patrick.demichel@hp.com HP Labs around the world Beijing Tokyo Palo Alto Bristol St. Petersburg Bangalore 7 locations 600 researchers

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Intro to: Ultra-low power, ultra-high bandwidth density SiP interconnects

Intro to: Ultra-low power, ultra-high bandwidth density SiP interconnects This work was supported in part by DARPA under contract HR0011-08-9-0001. The views, opinions, and/or findings contained in this article/presentation are those of the author/presenter

More information

Simulation of Simultaneous All Optical Clock Extraction and Demultiplexing for OTDM Packet Signal Using a SMZ Switch

Simulation of Simultaneous All Optical Clock Extraction and Demultiplexing for OTDM Packet Signal Using a SMZ Switch Simulation of Simultaneous All Optical Clock Extraction and Demultiplexing for OTDM Packet Signal Using a SMZ Switch R. Ngah, and Z. Ghassemlooy, Northumbria University, United Kingdom Abstract In this

More information

Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N.

Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N. Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N. Published in: Proceedings of 20th Annual Symposium of

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Optical Interconnection Networks in Data Centers: Recent Trends and Future Challenges

Optical Interconnection Networks in Data Centers: Recent Trends and Future Challenges Optical Interconnection Networks in Data Centers: Recent Trends and Future Challenges Speaker: Lin Wang Research Advisor: Biswanath Mukherjee Kachris C, Kanonakis K, Tomkos I. Optical interconnection networks

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

CMOS Photonic Processor-Memory Networks

CMOS Photonic Processor-Memory Networks CMOS Photonic Processor-Memory Networks Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology Acknowledgments Krste Asanović, Rajeev Ram, Franz Kaertner, Judy Hoyt, Henry Smith,

More information

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research THE PATH TO EXASCALE COMPUTING Bill Dally Chief Scientist and Senior Vice President of Research The Goal: Sustained ExaFLOPs on problems of interest 2 Exascale Challenges Energy efficiency Programmability

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

The Road from Peta to ExaFlop

The Road from Peta to ExaFlop The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Switch Datapath in the Stanford Phictious Optical Router (SPOR)

Switch Datapath in the Stanford Phictious Optical Router (SPOR) Switch Datapath in the Stanford Phictious Optical Router (SPOR) H. Volkan Demir, Micah Yairi, Vijit Sabnis Arpan Shah, Azita Emami, Hossein Kakavand, Kyoungsik Yu, Paulina Kuo, Uma Srinivasan Optics and

More information

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp. Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample

More information

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper A Low Latency Solution Stack for High Frequency Trading White Paper High-Frequency Trading High-frequency trading has gained a strong foothold in financial markets, driven by several factors including

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 14: Photonic Interconnect Instructor: Ron Dreslinski Winter 2016 1 1 Announcements 2 Remaining lecture schedule 3/15: Photonics

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects

Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects Silicon Based Packaging for 400/800/1600 Gb/s Optical Interconnects The Low Cost Solution for Parallel Optical Interconnects Into the Terabit per Second Age Executive Summary White Paper PhotonX Networks

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

FUTURE high-performance computers (HPCs) and data. Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture

FUTURE high-performance computers (HPCs) and data. Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture Chao Chen, Student Member, IEEE, and Ajay Joshi, Member, IEEE (Invited Paper) Abstract Silicon-photonic links have been proposed

More information

Moving Forward with the IPI Photonics Roadmap

Moving Forward with the IPI Photonics Roadmap Moving Forward with the IPI Photonics Roadmap TWG Chairs: Rich Grzybowski, Corning (acting) Rick Clayton, Clayton Associates Integration, Packaging & Interconnection: How does the chip get to the outside

More information

170 Index. Delta networks, DENS methodology

170 Index. Delta networks, DENS methodology Index A ACK messages, 99 adaptive timeout algorithm, 109 format and semantics, 107 pending packets, 105 piggybacking, 107 schematic represenation, 105 source adapter, 108 ACK overhead, 107 109, 112 Active

More information

Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks

Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks Odile Liboiron-Ladouceur* and Keren Bergman Columbia University, 500 West 120

More information

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary

More information

100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate

100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate 100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate @gonzopancho Agenda Edge Router Use Cases Need for Speed Cost, Flexibility, Control, Evolution The Engineering Challenge Solution

More information

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot Sort vs. Hash Join Revisited for Near-Memory Execution Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot 1 Near-Memory Processing (NMP) Emerging technology Stacked memory: A logic die w/ a stack

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

OPTICAL INTERCONNECTS IN DATA CENTER. Tanjila Ahmed

OPTICAL INTERCONNECTS IN DATA CENTER. Tanjila Ahmed OPTICAL INTERCONNECTS IN DATA CENTER Tanjila Ahmed Challenges for Today s Data Centers Challenges to be Addressed : Scalability Low latency Energy Efficiency Lower Cost Challenges for Today s Data Center

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models Lecture: Memory, Multiprocessors Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row

More information

AIM Photonics: Manufacturing Challenges for Photonic Integrated Circuits

AIM Photonics: Manufacturing Challenges for Photonic Integrated Circuits AIM Photonics: Manufacturing Challenges for Photonic Integrated Circuits November 16, 2017 Michael Liehr Industry Driving Force EXA FLOP SCALE SYSTEM Blades SiPh Interconnect Network Memory Stack HP HyperX

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

EXASCALE COMPUTING: WHERE OPTICS MEETS ELECTRONICS

EXASCALE COMPUTING: WHERE OPTICS MEETS ELECTRONICS EXASCALE COMPUTING: WHERE OPTICS MEETS ELECTRONICS Overview of OFC Workshop: Organizers: Norm Jouppi HP Labs, Moray McLaren HP Labs, Madeleine Glick Intel Labs March 7, 2011 1 AGENDA Introduction. Moray

More information

Johnnie Chan and Keren Bergman VOL. 4, NO. 3/MARCH 2012/J. OPT. COMMUN. NETW. 189

Johnnie Chan and Keren Bergman VOL. 4, NO. 3/MARCH 2012/J. OPT. COMMUN. NETW. 189 Johnnie Chan and Keren Bergman VOL. 4, NO. 3/MARCH 212/J. OPT. COMMUN. NETW. 189 Photonic Interconnection Network Architectures Using Wavelength-Selective Spatial Routing for Chip-Scale Communications

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Inter/Intra-Chip Optical Network for Manycore Processors Xiaowen Wu, Student Member, IEEE, JiangXu,Member, IEEE, Yaoyao Ye, Student

More information

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Analyzing the Effectiveness of On-chip Photonic Interconnects with a Hybrid Photo-electrical Topology

Analyzing the Effectiveness of On-chip Photonic Interconnects with a Hybrid Photo-electrical Topology Analyzing the Effectiveness of On-chip Photonic Interconnects with a Hybrid Photo-electrical Topology Yong-jin Kwon Department of EECS, University of California, Berkeley, CA Abstract To improve performance

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

PSMC Roadmap For Integrated Photonics Manufacturing

PSMC Roadmap For Integrated Photonics Manufacturing PSMC Roadmap For Integrated Photonics Manufacturing Richard Otte Promex Industries Inc. Santa Clara California For the Photonics Systems Manufacturing Consortium April 21, 2016 Meeting the Grand Challenges

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect

Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect Today s Focus Areas For Discussion Will look at various technologies

More information

100 Gbit/s Computer Optical Interconnect

100 Gbit/s Computer Optical Interconnect 100 Gbit/s Computer Optical Interconnect Ivan Glesk, Robert J. Runser, Kung-Li Deng, and Paul R. Prucnal Department of Electrical Engineering, Princeton University, Princeton, NJ08544 glesk@ee.princeton.edu

More information

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing

More information

Electro-optic Switches Based on Space Switching of Multiplexed WDM Signals: Blocking vs Non-blocking Design Trade-offs

Electro-optic Switches Based on Space Switching of Multiplexed WDM Signals: Blocking vs Non-blocking Design Trade-offs 1 Electro-optic Switches Based on Space Switching of Multiplexed WDM Signals: Blocking vs Non-blocking Design Trade-offs Apostolos Siokis a,c, Konstantinos Christodoulopoulos b,c, Nikos Pleros d, Emmanouel

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Update on technical feasibility for PAM modulation

Update on technical feasibility for PAM modulation Update on technical feasibility for PAM modulation Gary Nicholl, Chris Fludger Cisco IEEE 80.3 NG00GE PMD Study Group March 0 PAM Architecture Overview [Gary Nicholl] PAM Link Modeling Analysis [Chris

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

Hardware Evolution in Data Centers

Hardware Evolution in Data Centers Hardware Evolution in Data Centers 2004 2008 2011 2000 2013 2014 Trend towards customization Increase work done per dollar (CapEx + OpEx) Paolo Costa Rethinking the Network Stack for Rack-scale Computers

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess Index A Active buffer window (ABW), 34 35, 37, 39, 40 Adaptive data compression, 151 172 Adaptive routing, 26, 100, 114, 116 119, 121 123, 126 128, 135 137, 139, 144, 146, 158 Adaptive voltage scaling,

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Package level Interconnect Options

Package level Interconnect Options Package level Interconnect Options J.Balachandran,S.Brebels,G.Carchon, W.De Raedt, B.Nauwelaers,E.Beyne imec 2005 SLIP 2005 April 2 3 Sanfrancisco,USA Challenges in Nanometer Era Integration capacity F

More information

New Approaches to Optical Packet Switching in Carrier Networks. Thomas C. McDermott Chiaro Networks Richardson, Texas

New Approaches to Optical Packet Switching in Carrier Networks. Thomas C. McDermott Chiaro Networks Richardson, Texas New Approaches to Optical Packet Switching in Carrier Networks Thomas C. McDermott Chiaro Networks Richardson, Texas Outline Introduction, Vision, Problem statement Approaches to Optical Packet Switching

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

1. INTRODUCTION light tree First Generation Second Generation Third Generation

1. INTRODUCTION light tree First Generation Second Generation Third Generation 1. INTRODUCTION Today, there is a general consensus that, in the near future, wide area networks (WAN)(such as, a nation wide backbone network) will be based on Wavelength Division Multiplexed (WDM) optical

More information

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to

More information

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS Exploiting Dark Silicon in Server Design Nikos Hardavellas Northwestern University, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 45nm 32nm 22nm 16nm

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Photonics in computing: use more than a link for getting more than Moore

Photonics in computing: use more than a link for getting more than Moore Photonics in computing: use more than a link for getting more than Moore Nikos Pleros Photonics Systems and Networks (PhosNET) research group Dept. of Informatics, Aristotle Univ. of Thessaloniki, Center

More information

EDA for ONoCs: Achievements, Challenges, and Opportunities. Ulf Schlichtmann Dresden, March 23, 2018

EDA for ONoCs: Achievements, Challenges, and Opportunities. Ulf Schlichtmann Dresden, March 23, 2018 EDA for ONoCs: Achievements, Challenges, and Opportunities Ulf Schlichtmann Dresden, March 23, 2018 1 Outline Placement PROTON (nonlinear) PLATON (force-directed) Maze Routing PlanarONoC Challenges Opportunities

More information