EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect

Size: px
Start display at page:

Download "EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect"

Transcription

1 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 14: Photonic Interconnect Instructor: Ron Dreslinski Winter

2 Announcements 2 Remaining lecture schedule 3/15: Photonics 3/17: Project Meetings 3/22: Student Presentations (2) 3/24: Student Presentations (2) 3/29: Student Presentations (2) 3/31: Student Presentations (2) 4/5: Student Presentations (2) 4/7: Student Presentations (2) 4/12: Project Writeup Due; Group Project Presentations (2 or 3) 4/14: Group Project Presentations (2 or 3) 2 2

3 Photonic Interconenct 3 Used heavily in telecommunications industry Encode data in photons (light) rather than electrons Multiple wavelengths of light provide natural communication channels over a single connection But can we integrate them into the CMOS system and use them for chip-chip or even on-chip communication? 3 3

4 Corona 4 Enter the Corona paper Corona: System Implications of Emerging Nanophotonic Technology Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn. In ISCA-35, Beijing, China, June Discuss the use of 3D integration to provide all the components necessary for on-chip photonics Then address the architectural design 4 4

5 How does it work on chip? 5 Can couple selected wavelength Can Detect the wavelength a) b) c) d) e) Ring resonator SiGe Doped Coupler Waveguide Can pass the value onto other line Using ring resonators the system can couple signals that match only a certain wavelength from the waveguide, pass them between waveguides, or detect the presence 5 5

6 Putting them into a system 6 Fiber I/O s to s or Network stsvs pgctsvs Heat Sink pgctsvs Processor/L1 Die Memory Controller/Directory/L2 Die Analog Electronics Die Optical Die Package Face to Face Bonds Laser 6 6

7 High-Level Architecture 7 Cluster based approach (increase core count without increasing interconnect as much) Cluster 0 Cluster 1 Optical Interconnect (a) Cluster 63 On-chip directory based coherence Hub connects on-chip optically to other clusters Core Core Core Shared L2 Cache Memory Controller Hub S Directory Off-chip memory also uses optical connections to improve bandwidth Core Network Interface Optical Interconnect (b) Figure 2: Architecture Overview 7 7

8 More Detailed Architecture 8 Core Die L1-I L1-I Core 0 L1-D L1-D L1-D L1-D Core 1 Star Coupler Laser L1 L2 Interface L1-I L1-I Core 2 Through Silicon Via Array Core 3 L2 Cache L1 L2 Interface Cache Die MC Direct ory Hub My X-bar Connection NI Peer X-bar Connection Detectors Splitters Optical Die Optically Connected Memory Detectors Splitter Broadcast 4-waveguide bundles N-1 Crossbar Detectors Splitter N N Arbitration Injectors Detectors Figure 3: Layout with Serpentine Crossbar and Resonator Ring Detail ensures that the memory bandwidth grows linearly with increased core count, and it provides local memory accessible Photonic Subsystem Waveguides Ring Resonators Memory K Crossbar K 8 8

9 Mul$ple writer single reader (MWSR) interconnects latchless/ wave-pipelined Arbitration prevents corruption of in-flight data Source: Mikko Lipas$-University of Wisconsin 9

10 Arbitra$on solu$ons Token Channel Single Token / Serial Writes Token Slot Multiple Tokens / Simultaneous Writes Token passing allows token to pace transmission tail (no bubbles) Source: Mikko Lipas$-University of Wisconsin Token passing allows token to directly precede 10 slot

11 Token Protocol 11 injectors Cluster 0 0r 0g 0b 0r 0g 0b detectors How do you prevent more than one writter in a given wavelength (color) Arbitration WG home cluster wavelength r g b 1r 1g 1b r r 1r 1g 1b Cluster 1 Have a token that circulates the system to indicate who is allowed to write b b g 2b 2g 2r b b g b 2b 2g g2r Power WG Cluster 2 Active Ring Resonator Lit Inactive Ring Resonator Unlit Assign each cluster it s own wavelength (color) to read Leads to underutilization of potential interconnect (when token is at a node who doesn t need it) 11 11

12 Evaluation Criteria 12 Resource Value Number of clusters 64 Per-Cluster: L2 cache size/assoc 4 MB/16-way L2 cache line size 64 B L2 coherence MOESI Memory controllers 1 Cores 4 Per-Core: L1 ICache size/assoc 16 KB/4-way L1 DCache size/assoc 32 KB/4-way L1 I & D cache line size 64 B Frequency 5 GHz Threads 4 Issue policy In-order Issue width 2 64 b floating point SIMD width 4 Fused floating point operations Multiply-Add Resource ECM Memory controllers External connectivity 256 fibers 1536 pins Channel width 128 b half duplex 12 b full duplex Channel data rate 10 Gb/s 10 Gb/s Memory bandwidth TB/s 0.96 TB/s Memory latency 20 ns 20 ns Synthetic # Network Benchmark Description Requests Uniform Uniform random 1M Hot Spot All clusters to one cluster 1M Tornado Cluster (i, j) to cluster 1M ((i + bk/2c 1)%k, (j + bk/2c 1)%k), where k = network s radix Transpose Cluster (i, j) to cluster (j, i) 1M SPLASH-2 Data Set # Network Benchmark Experimental (Default) Requests Barnes 64 K particles (16 K) 7.2 M Cholesky tk29.o (tk15.o) 0.6 M FFT 16 M points (64 K) 176 M FMM 1 M particles (16 K) 1.8 M LU matrix ( ) 34 M Ocean grid ( ) 240 M Radiosity roomlarge (room) 4.2 M Radix 64 M integers (1 M) 189 M Raytrace balls4 (car) 0.7 M Volrend head (head) 3.6 M Water-Sp 32 K molecules (512) 3.2 M Table 3: Benchmarks and Configurations 12 12

13 Speedup 13 Normalized Speedup Uniform Hot Spot 13.5 Tornado Transpose Barnes Cholesky FFT FMM Figure 8: Normalized Speedup LU Ocean Radiosity Radix Raytrace LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ Volrend Water-Sp 13 13

14 Bandwidth 14 Bandwidth (TB/s) LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ 1 0 Uniform Hot Spot Tornado Transpose Barnes Cholesky FFT FMM LU Ocean Radiosity Radix Raytrace Figure 9: Achieved Bandwidth Volrend Water-Sp 14 14

15 L2 Miss Latency 15 Average Request Latency (ns) Uniform Hot Spot Tornado Transpose Barnes Cholesky FFT FMM LU Ocean Radiosity Radix Raytrace Figure 10: Average L2 Miss Latency LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ Volrend Water-Sp 15 15

16 Power LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ Power (W) Uniform Hot Spot Tornado Transpose Barnes Cholesky FFT FMM LU Ocean Radiosity Radix Raytrace Volrend Water-Sp Figure 11: On-chip Network Power 16 16

Corona: System Implications of Emerging Nanophotonic Technology

Corona: System Implications of Emerging Nanophotonic Technology International Symposium on Computer Architecture Corona: System Implications of Emerging Nanophotonic Technology Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

System Implications of Integrated Photonics

System Implications of Integrated Photonics System Implications of Integrated Photonics Norman P. Jouppi and Parthasarathy Ranganathan 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

More information

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols Dana Vantrease, Mikko Lipasti, Nathan Binkert 1 Executive Summary Problem: Cache coherence races make protocols complicated

More information

Brief Background in Fiber Optics

Brief Background in Fiber Optics The Future of Photonics in Upcoming Processors ECE 4750 Fall 08 Brief Background in Fiber Optics Light can travel down an optical fiber if it is completely confined Determined by Snells Law Various modes

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

Communication Aware Design Method for Optical Network-on-Chip

Communication Aware Design Method for Optical Network-on-Chip Communication Aware Design Method for Optical Network-on-Chip Johanna Sepúlveda,2, Sebastien Le Beux 2, Jiating Luo, Cedric Killian, Daniel Chillet, Hui Li 2, Ian O Connor 2, Olivier Sentieys INRIA, IRISA,

More information

Light Speed Arbitration and Flow Control for Nanophotonic Interconnects

Light Speed Arbitration and Flow Control for Nanophotonic Interconnects Light Speed Arbitration and Flow Control for Nanophotonic Interconnects Dana Vantrease Univ of Wisconsin - Madison danav@cs.wisc.edu Nathan Binkert HP Laboratories binkert@hp.com Robert Schreiber HP Laboratories

More information

Snoop-Based Multiprocessor Design III: Case Studies

Snoop-Based Multiprocessor Design III: Case Studies Snoop-Based Multiprocessor Design III: Case Studies Todd C. Mowry CS 41 March, Case Studies of Bus-based Machines SGI Challenge, with Powerpath SUN Enterprise, with Gigaplane Take very different positions

More information

Arbitration at the Speed of Light

Arbitration at the Speed of Light Arbitration at the Speed of Light Abstract Optics, as an alternative to purely electrical methods, promises low latency and high bandwidth in chip-wide communication at low power levels. As a result, designs

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

DCOF - An Arbitration Free Directly Connected Optical Fabric

DCOF - An Arbitration Free Directly Connected Optical Fabric IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS 1 DCOF - An Arbitration Free Directly Connected Optical Fabric Christopher Nitta, Member, IEEE, Matthew Farrens, Member, IEEE, and Venkatesh

More information

Bandwidth Adaptive Nanophotonic Crossbars with Clockwise/Counter-Clockwise Optical Routing

Bandwidth Adaptive Nanophotonic Crossbars with Clockwise/Counter-Clockwise Optical Routing Bandwidth Adaptive Nanophotonic Crossbars with Clockwise/Counter-Clockwise Optical Routing Matthew Kennedy and Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University,

More information

Cross-Chip: Low Power Processor-to-Memory Nanophotonic Interconnect Architecture

Cross-Chip: Low Power Processor-to-Memory Nanophotonic Interconnect Architecture Cross-Chip: Low Power Processor-to-Memory Nanophotonic Interconnect Architecture Matthew Kennedy and Avinash Kodi Department of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701

More information

A Multilayer Nanophotonic Interconnection Network for On-Chip Many-core Communications

A Multilayer Nanophotonic Interconnection Network for On-Chip Many-core Communications A Multilayer Nanophotonic Interconnection Network for On-Chip Many-core Communications Xiang Zhang and Ahmed Louri Department of Electrical and Computer Engineering, The University of Arizona 1230 E Speedway

More information

HANDSHAKE AND CIRCULATION FLOW CONTROL IN NANOPHOTONIC INTERCONNECTS

HANDSHAKE AND CIRCULATION FLOW CONTROL IN NANOPHOTONIC INTERCONNECTS HANDSHAKE AND CIRCULATION FLOW CONTROL IN NANOPHOTONIC INTERCONNECTS A Thesis by JAGADISH CHANDAR JAYABALAN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of

More information

OPTICAL TOKENS IN MANY-CORE PROCESSORS. Dana M. Vantrease. A dissertation submitted in partial fulfillment of the requirements for the degree of

OPTICAL TOKENS IN MANY-CORE PROCESSORS. Dana M. Vantrease. A dissertation submitted in partial fulfillment of the requirements for the degree of OPTICAL TOKENS IN MANY-CORE PROCESSORS by Dana M. Vantrease A dissertation submitted in partial fulfillment of the requirements for the degree of Doctorate of Philosophy (Computer Sciences) at the UNIVERSITY

More information

3D-NoC: Reconfigurable 3D Photonic On-Chip Interconnect for Multicores

3D-NoC: Reconfigurable 3D Photonic On-Chip Interconnect for Multicores D-NoC: Reconfigurable D Photonic On-Chip Interconnect for Multicores Randy Morris, Avinash Karanth Kodi, and Ahmed Louri Electrical Engineering and Computer Science, Ohio University, Athens, OH 457 Electrical

More information

More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies

More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies Jun Pang Department of Computer Science Duke University pangjun92@gmail.com Chris Dwyer Department of Electrical and Computer Engineering

More information

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Magnus Ekman Per Stenstrom Department of Computer Engineering, Department of Computer Engineering, Outline Problem statement Assumptions

More information

IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS

IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS SRISHTI PHOTONICS RESEARCH GROUP INDIAN INSTITUTE OF TECHNOLOGY, DELHI 1 IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS Authors: Janib ul Bashir and Smruti R. Sarangi Indian Institute

More information

Energy Efficient And Low Latency Interconnection Network For Multicast Invalidates In Shared Memory Systems

Energy Efficient And Low Latency Interconnection Network For Multicast Invalidates In Shared Memory Systems Energy Efficient And Low Latency Interconnection Network For Multicast Invalidates In Shared Memory Systems Muhammad Ridwan Madarbux Optical Networks Group Electronic and Electrical Engineering Department

More information

DCAF - A Directly Connected Arbitration-Free Photonic Crossbar For Energy-Efficient High Performance Computing

DCAF - A Directly Connected Arbitration-Free Photonic Crossbar For Energy-Efficient High Performance Computing - A Directly Connected Arbitration-Free Photonic Crossbar For Energy-Efficient High Performance Computing Christopher Nitta, Matthew Farrens, and Venkatesh Akella University of California, Davis Davis,

More information

3D Stacked Nanophotonic Network-on-Chip Architecture with Minimal Reconfiguration

3D Stacked Nanophotonic Network-on-Chip Architecture with Minimal Reconfiguration IEEE TRANSACTIONS ON COMPUTERS 1 3D Stacked Nanophotonic Network-on-Chip Architecture with Minimal Reconfiguration Randy W. Morris, Jr., Student Member, IEEE, Avinash Karanth Kodi, Member, IEEE, Ahmed

More information

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess Index A Active buffer window (ABW), 34 35, 37, 39, 40 Adaptive data compression, 151 172 Adaptive routing, 26, 100, 114, 116 119, 121 123, 126 128, 135 137, 139, 144, 146, 158 Adaptive voltage scaling,

More information

FUTURE high-performance computers (HPCs) and data. Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture

FUTURE high-performance computers (HPCs) and data. Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture Chao Chen, Student Member, IEEE, and Ajay Joshi, Member, IEEE (Invited Paper) Abstract Silicon-photonic links have been proposed

More information

Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects

Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology Manycore SOC roadmap fuels bandwidth demand

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS Exploiting Dark Silicon in Server Design Nikos Hardavellas Northwestern University, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 45nm 32nm 22nm 16nm

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Designing Multi-socket Systems Using Silicon Photonics

Designing Multi-socket Systems Using Silicon Photonics Designing Multi-socket Systems Using Silicon Photonics Scott Beamer Krste Asanovic Chris Batten Ajay Joshi Vladimir Stojanovic Electrical Engineering and Computer Sciences University of California at Berkeley

More information

SYMMETRIC multiprocessors (SMPs) are attractive parallel

SYMMETRIC multiprocessors (SMPs) are attractive parallel IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 12, DECEMBER 2004 1093 An Optical Interconnection Network and a Modified Snooping Protocol for the Design of Large-Scale Symmetric Multiprocessors

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate

More information

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun In collaboration with: Chia-Hsin Owen Chen George Kurian Lan Wei Jason Miller Jurgen Michel

More information

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance IEEE/ACM 45th Annual International Symposium on Microarchitecture Dynamic Reconfiguration of D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris, Avinash Karanth

More information

ATAC: Improving Performance and Programmability with On-Chip Optical Networks

ATAC: Improving Performance and Programmability with On-Chip Optical Networks ATAC: Improving Performance and Programmability with On-Chip Optical Networks James Psota, Jason Miller, George Kurian, Nathan Beckmann, Jonathan Eastep, Henry Hoffman, Jifeng Liu, Mark Beals, Jurgen Michel,

More information

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models Lecture: Memory, Multiprocessors Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row

More information

3D Memory Architecture. Kyushu University

3D Memory Architecture. Kyushu University 3D Memory Architecture Koji Inoue Kyushu University 1 Outline Why 3D? Will 3D always work well? Support Adaptive Execution! Memory Hierarchy Run time Optimization Conclusions 2 Outline Why 3D? Will 3D

More information

for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami

for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami 3D Implemented dsram/dram HbidC Hybrid Cache Architecture t for High Performance and Low Power Consumption Koji Inoue, Shinya Hashiguchi, Shinya Ueno, Naoto Fukumoto, and Kazuaki Murakami Kyushu University

More information

A Multicore Processor Designed For PetaFLOPS Computation

A Multicore Processor Designed For PetaFLOPS Computation A Multicore Processor Designed For PetaFLOPS Computation Weiwu Hu Institute of Computing Technology, Chinese Academy of Sciences Loongson Technologies Corporation Limited hww@ict.ac.cn 1 Contents Background

More information

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks

Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks Snatch: Opportunistically Reassigning Power Allocation between and in 3D Stacks Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa, Ulya Karpuzcu, Radu Teodorescu, Nam Sung Kim,

More information

826 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 6, JUNE 2014

826 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 6, JUNE 2014 826 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 6, JUNE 2014 LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip Cheng Li, Student Member,

More information

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Bushra Ahsan and Mohamed Zahran Dept. of Electrical Engineering City University of New York ahsan bushra@yahoo.com mzahran@ccny.cuny.edu

More information

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas. a.cs.uiuc.edu

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas.   a.cs.uiuc.edu : Ligh htweight hardware support for data ra ce detection ec during systematic testing Adrian Nistor, Darko Marinov, Josep Torrellas University of Illinois, Urbana Champaign http://iacoma a.cs.uiuc.edu

More information

Topology Optimization of 3D Hybrid Optical-Electronic

Topology Optimization of 3D Hybrid Optical-Electronic , October 19-21, 16, San Francisco, USA Topology Optimization of 3D Hybrid Optical-lectronic Networks-on-Chip Zhicheng Zhou, Ning Wu, and Gaizhen Yan Abstract Power and latency constraints of the electronic

More information

Multicast Snooping: A Multicast Address Network. A New Coherence Method Using. With sponsorship and/or participation from. Mark Hill & David Wood

Multicast Snooping: A Multicast Address Network. A New Coherence Method Using. With sponsorship and/or participation from. Mark Hill & David Wood Multicast Snooping: A New Coherence Method Using A Multicast Address Ender Bilir, Ross Dickson, Ying Hu, Manoj Plakal, Daniel Sorin, Mark Hill & David Wood Computer Sciences Department University of Wisconsin

More information

Designing Multisocket Systems with Silicon Photonics. by Scott Beamer. Research Project

Designing Multisocket Systems with Silicon Photonics. by Scott Beamer. Research Project Designing Multisocket Systems with Silicon Photonics by Scott Beamer Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley,

More information

Designing Chip-Level Nanophotonic Interconnection Networks

Designing Chip-Level Nanophotonic Interconnection Networks IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 2, NO. 2, JUNE 2012 137 Designing Chip-Level Nanophotonic Interconnection Networks Christopher Batten, Member, IEEE, Ajay Joshi,

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Inter/Intra-Chip Optical Network for Manycore Processors Xiaowen Wu, Student Member, IEEE, JiangXu,Member, IEEE, Yaoyao Ye, Student

More information

Kotura Analysis: WDM PICs improve cost over LR4

Kotura Analysis: WDM PICs improve cost over LR4 Kotura Analysis: WDM PICs improve cost over LR4 IEEE P802.3bm - 40 Gb/s & 100 Gb/s Fiber Optic Task Force Sept 2012 Contributors: Mehdi Asghari, Kotura Samir Desai, Kotura Arlon Martin, Kotura Recall the

More information

Rack-Scale Optical Network for High Performance Computing Systems

Rack-Scale Optical Network for High Performance Computing Systems Rack-Scale Optical Network for High Performance Computing Systems Peng Yang, Zhengbin Pang, Zhifei Wang Zhehui Wang, Min Xie, Xuanqi Chen, Luan H. K. Duong, Jiang Xu Outline Introduction Rack-scale inter/intra-chip

More information

Performance of coherence protocols

Performance of coherence protocols Performance of coherence protocols Cache misses have traditionally been classified into four categories: Cold misses (or compulsory misses ) occur the first time that a block is referenced. Conflict misses

More information

LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet

More information

Switch Datapath in the Stanford Phictious Optical Router (SPOR)

Switch Datapath in the Stanford Phictious Optical Router (SPOR) Switch Datapath in the Stanford Phictious Optical Router (SPOR) H. Volkan Demir, Micah Yairi, Vijit Sabnis Arpan Shah, Azita Emami, Hossein Kakavand, Kyoungsik Yu, Paulina Kuo, Uma Srinivasan Optics and

More information

A Fully Optical Ring Network-on-Chip with Static and Dynamic Wavelength Allocation

A Fully Optical Ring Network-on-Chip with Static and Dynamic Wavelength Allocation IEICE TRANS. INF. & SYST., VOL.E96 D, NO.12 DECEMBER 2013 2545 PAPER Special Section on Parallel and Distributed Computing and Networking A Fully Optical Ring Network-on-Chip with Static and Dynamic Wavelength

More information

Performance Evaluation of a Multicore System with Optically Connected Memory Modules

Performance Evaluation of a Multicore System with Optically Connected Memory Modules 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip Performance Evaluation of a Multicore System with Optically Connected Memory Modules Paul Vincent Mejia, Rajeevan Amirtharajah, Matthew

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

PSMC Roadmap For Integrated Photonics Manufacturing

PSMC Roadmap For Integrated Photonics Manufacturing PSMC Roadmap For Integrated Photonics Manufacturing Richard Otte Promex Industries Inc. Santa Clara California For the Photonics Systems Manufacturing Consortium April 21, 2016 Meeting the Grand Challenges

More information

Reconfigurable Optical and Wireless (R-OWN) Network-on-Chip for High Performance Computing

Reconfigurable Optical and Wireless (R-OWN) Network-on-Chip for High Performance Computing Reconfigurable Optical and Wireless (R-OWN) Network-on-Chip for High Performance Computing Md Ashif I Sikder School of Electrical Engineering and Computer Science, Ohio University Athens, OH-45701 ms047914@ohio.edu

More information

Flynn s Classification

Flynn s Classification Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:

More information

The Impact of Optics on HPC System Interconnects

The Impact of Optics on HPC System Interconnects The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes

More information

Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis

Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Accelerating Multi-core Processor Design Space Evaluation Using Automatic Multi-threaded Workload Synthesis Clay Hughes & Tao Li Department of Electrical and Computer Engineering University of Florida

More information

Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory

Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Computer Architecture Research Lab h"p://arch.cse.ohio-state.edu Universal Demand for Low Power Mobility Ba"ery life Performance

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

arxiv: v2 [cs.oh] 14 Mar 2017

arxiv: v2 [cs.oh] 14 Mar 2017 MorphoNoC: Exploring the Design Space of a Configurable Hybrid NoC using Nanophotonics Vikram K. Narayana a,, Shuai Sun a, Abdel-Hameed A. Badawy b, Volker J. Sorger a, Tarek El-Ghazawi a a The George

More information

CMOS Photonic Processor-Memory Networks

CMOS Photonic Processor-Memory Networks CMOS Photonic Processor-Memory Networks Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology Acknowledgments Krste Asanović, Rajeev Ram, Franz Kaertner, Judy Hoyt, Henry Smith,

More information

Author's personal copy

Author's personal copy J. Parallel Distrib. Comput. 68 (2008) 1413 1424 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Two proposals for the inclusion of

More information

EXASCALE COMPUTING: WHERE OPTICS MEETS ELECTRONICS

EXASCALE COMPUTING: WHERE OPTICS MEETS ELECTRONICS EXASCALE COMPUTING: WHERE OPTICS MEETS ELECTRONICS Overview of OFC Workshop: Organizers: Norm Jouppi HP Labs, Moray McLaren HP Labs, Madeleine Glick Intel Labs March 7, 2011 1 AGENDA Introduction. Moray

More information

Electrical Engineering and Computer Science Department

Electrical Engineering and Computer Science Department Electrical Engineering and Computer Science Department Technical Report Number: NU-EECS-13-08 July, 2013 Galaxy: A High-Performance Energy-Efficient Multi-Chip Architecture Using Photonic Interconnects

More information

LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip for Future Parallel Architectures

LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip for Future Parallel Architectures LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip for Future Parallel Architectures Cheng Li, Mark Browning, Paul V. Gratz, Sam Palermo Texas A&M University {seulc,mabrowning,pgratz,spalermo}@tamu.edu

More information

Effect of Data Prefetching on Chip MultiProcessor

Effect of Data Prefetching on Chip MultiProcessor THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 819-0395 744 819-0395 744 E-mail: {fukumoto,mihara}@c.csce.kyushu-u.ac.jp, {inoue,murakami}@i.kyushu-u.ac.jp

More information

The SPLASH-2 Programs: Characterization and Methodological Considerations

The SPLASH-2 Programs: Characterization and Methodological Considerations Appears in the Proceedings of the nd Annual International Symposium on Computer Architecture, pages -36, June 995 The SPLASH- Programs: Characterization and Methodological Considerations Steven Cameron

More information

PREDICTION MODELING FOR DESIGN SPACE EXPLORATION IN OPTICAL NETWORK ON CHIP

PREDICTION MODELING FOR DESIGN SPACE EXPLORATION IN OPTICAL NETWORK ON CHIP PREDICTION MODELING FOR DESIGN SPACE EXPLORATION IN OPTICAL NETWORK ON CHIP SARA KARIMI A Thesis in The Department Of Electrical and Computer Engineering Presented in Partial Fulfillment of the Requirements

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

OWN: Optical and Wireless Network-on-Chip for Kilo-core Architectures

OWN: Optical and Wireless Network-on-Chip for Kilo-core Architectures OWN: Optical and Wireless Network-on-Chip for Kilo-core Architectures Md Ashif I Sikder, Avinash K Kodi, Matthew Kennedy and Savas Kaya School of Electrical Engineering and Computer Science Ohio University

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects

Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects I. Artundo, W. Heirman, C. Debaes, M. Loperena, J. Van Campenhout, H. Thienpont New York, August 27th 2009 Iñigo Artundo,

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders

A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs Marco Bekooij & Frank Ophelders Outline Context What is cache coherence Addressed challenge Short overview of related work Related

More information

Supporting Distributed Shared Memory. Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009

Supporting Distributed Shared Memory. Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009 Supporting Distributed Shared Memory Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009 Memory content in today s SoCs 3 Elements in SoC Processing: Well understood;

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, intro to multi-thread programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row read/write automatically

More information

Architectures. A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree.

Architectures. A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Dynamic Bandwidth and Laser Scaling for CPU-GPU Heterogenous Network-on-Chip Architectures A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial

More information

A Case Study of Signal-to-Noise Ratio in Ring-Based Optical Networks-on-Chip

A Case Study of Signal-to-Noise Ratio in Ring-Based Optical Networks-on-Chip A Case Study of Signal-to-Noise Ratio in Ring-Based Optical Networks-on-Chip Luan H. K. Duong, Jiang Xu, Xiaowen Wu, Zhehui Wang, and Peng Yang Hong Kong University of Science and Technology Sébastien

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Lect. 6: Directory Coherence Protocol

Lect. 6: Directory Coherence Protocol Lect. 6: Directory Coherence Protocol Snooping coherence Global state of a memory line is the collection of its state in all caches, and there is no summary state anywhere All cache controllers monitor

More information

NANOPHOTONIC INTERCONNECT ARCHITECTURES FOR MANY-CORE MICROPROCESSORS

NANOPHOTONIC INTERCONNECT ARCHITECTURES FOR MANY-CORE MICROPROCESSORS NANOPHOTONIC INTERCONNECT ARCHITECTURES FOR MANY-CORE MICROPROCESSORS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Proposal for Thesis Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

More information

Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory

Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory Colin Blundell (University of Pennsylvania) Joe Devietti (University of Pennsylvania) E Christopher Lewis (VMware,

More information

International Journal of Advanced Research in Computer Engineering &Technology (IJARCET) Volume 2, Issue 8, August 2013

International Journal of Advanced Research in Computer Engineering &Technology (IJARCET) Volume 2, Issue 8, August 2013 ISSN: 2278 323 Signal Delay Control Based on Different Switching Techniques in Optical Routed Interconnection Networks Ahmed Nabih Zaki Rashed Electronics and Electrical Communications Engineering Department

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Future Memory and Interconnect Technologies

Future Memory and Interconnect Technologies Future Memory and Interconnect Technologies Yuan Xie Pennsylvania State University, USA AMD Research, Advanced Micro Devices, Inc., USA Email: yuanxie@cse.psu.edu Abstract The improvement of the computer

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers

TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers Pablo Abad, Pablo Prieto, Lucia Menezo, Adrian Colaso, Valentin Puente, Jose-Angel Gregorio University

More information