Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Size: px
Start display at page:

Download "Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1"

Transcription

1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1

2 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and benefits Summary 2

3 Good-old Bus hip hip hip hip Good at board level, does not extend well Transmission lines Loss, signal integrity Need signal termination Limited frequency Width limited by pins, board area Signal processing power Frequency Width Energy/bit Data rate Power MHz bit 1 nj 1-10 GB/sec 10+ W 3

4 Point-to-point Bus hip hip Fast signaling, long distance Between chips, boards, racks High frequency, narrow links 1D Ring, 2D Mesh and Torus to reduce latency Higher logic complexity and latency in each node Frequency MHz Width 8-32 bit Energy/bit 0.5 nj Bisection bandwidth 50+ GB/sec Power 200+ W Emergence of packet switched network 4

5 ASI Red The first TFLOP 32 X 32 X 2 Mesh Interconnect Simultaneous bidirectional signaling 50 GB/sec bisection bandwidth 5

6 21.72mm The 80-ore TFLOP hip (2006) 12.64mm I/O Area single tile 1.5mm 2.0mm 8 X 10 Mesh PLL I/O Area TAP 32 bit links 320 GB/sec bisection 5 GHz 6

7 DDR3 M DDR3 M 21.4mm DDR3 M DDR3 M 48 ore Single hip loud (2009) 26.5mm TIL E PLL TIL E JTAG VR System Interface + I/O 2 ore clusters in 6 X 4 Mesh (why not 6 x 8?) 128 bit links 256 GB/sec bisection 2 GHz 7

8 Power Breakdown 80 ore TFLOP hip 48 ore S hip lock dist. 11% Dual FPMAs 36% M & DDR % ores 70% Router + Links 28% 10-port RF 4% IMEM + DMEM 21% Routers & 2Dmesh 10% Global locking 1% Does pt-to-pt packet switched network on a chip make sense? 8

9 A Sample Multi-core System 10mm ore ache 50% 50% 65nm, 4 ores 1V, 3GHz 10mm die, 5mm each core ore Logic: 6MT, ache: 44MT Total transistors: 200M 45nm 10mm 32nm 10mm 22nm 10mm 16nm 10mm 8 ores, 1V, 3GHz 3.5mm each core 16 ores, 1V, 3GHz 2.5mm each core 32 ores, 1V, 3GHz 1.8mm each core 64 ores, 1V, 3GHz 1.3mm each core Total: 400MT Total: 800MT Total: 1.6BT Total: 3.2BT 9

10 A Sample M Network 5mm 0.4mm Packet Switched Mesh 16B=128 bit each direction 1.5u pitch 192GB/s Bisection BW Tech ore (mm) Port size (mm) 65nm nm nm nm nm Bisection BW GB/sec@3GHz 10

11 Switch Energy/bit (pj) Network Power (W) Mesh 3GHz, 1V Measured X 1.4X 1 Extrapolated u 0.18u 65nm 22nm 8nm 0 65nm 45nm 32nm 22nm 16nm 1. Network power will be too high 2. Worse if link width scales up each generation 3. ache coherency mechanism is complex 11

12 Delay (ps) Energy/bit (pj) Interconnect Delay & Energy u Pitch Length (mm)

13 Delay (ps) mw/bit Interconnect Delay & Power nm, 3GHz Router Delay Length (mm) 0 13

14 Packet Switched Interconnect STOP STOP STOP STOP 5mm 3-5 lk <1 lk 3-5 lk 5 lk 0.5mW 0.5mW 0.5mW 10mm 2-3 lk 3-5 lk 3-5 lk 6-7 lk 0.5mW 1mW 0.5mW 1. Router acts like STOP signs adds latency 2. Each hop consumes power (unnecessary) 14

15 Bus The Other Extreme Issues: Slow, < 300MHz Shared, limited scalability? Solutions: Repeaters to increase freq Wide busses for bandwidth Multiple busses for scalability Benefits: Power? Simpler cache coherency Move away from frequency, embrace parallelism 15

16 Repeated Bus (ircuit Switched) O R R R R R Arbitration: Each cycle for the next cycle Decision visible to all nodes Repeaters: Align repeater direction No driving contention R R R Assume: 10mm die, 1.5u bus pitch 50ps repeater delay ore (mm) Bus Seg Delay (ps) Max Bus Freq (GHz) 65nm nm nm nm nm Anders et al, A 2.9Tb/s 8W 64-ore ircuit-switched Network-on-hip in 45nm MOS, ESSIR 2008 Anders et al, A 4.1Tb/s Bisection-Bandwidth 560Gb/s/W Streaming ircuit-switched 8 8 Mesh Network-on-hip in 45nm MOS, ISS

17 Example of a Bus Repeater 17

18 Other Bus Enhancements Differential, low voltage swing Twisted to reduce cross-talk Optimal repeater placement Not necessarily at the core Higher bus frequency Wide bus, 1024 bit or more, transfer lots of data in one cycle Multiple busses for concurrency Employ interconnect engineering techniques 18

19 Bisection BW (GB/s) Bus BW (GB/s) Full Swing Bus Power (W) 100mV Swing Bus Power (W) Bus Power and Bandwidth X Includes bus and repeater power Full Swing 6 1X 0.1V Differential 5 1.4X nm 45nm 32nm 22nm 16nm 7 1.4X 65nm 45nm 32nm 22nm 16nm X 1.4X Mesh X 1.4X Bus nm 45nm 32nm 22nm 16nm 0 65nm 45nm 32nm 22nm 16nm 19

20 A ircuit Switched Network Routers 8x8 ircuitswitched No Packet-switched Request Plk Src 0 1 n Dest ircuit-switched Acknowledge lk ircuit-switched Data Transmission Routers lk 2mm links ircuit-switched No eliminates intra-route data storage Packet-switching used only for channel requests High bandwidth and energy efficiency (1.6 to 0.6 pj/bit) Anders et al, A 4.1Tb/s Bisection-Bandwidth 560Gb/s/W Streaming ircuit-switched 8 8 Mesh Network-on-hip in 45nm MOS, ISS

21 hannel Selection & Transmission ircuit-switched hannel Selection Router Queues 2 hannel Not Ready 1 2 Requested hannel Selected Slot hannel Ready Src Ack 1 Dest Ack Queue slots store multiple channel requests Two acknowledges required to indicate ready channel More possible channels 87% throughput increase 21

22 8 x 8 NO Die Photo lock 8x8 No I/O Traffic Generation and Measurement Ports ore North South East West Process 45nm Hi-K/MG MOS, 9 Metal u Nominal Supply 1.1V Arbitration and Router Logic Supports 512b data Data Bus Width 1b (2mm link distance) Number of Transistors 2.85M Die Area 6.25mm 2 22

23 Factors Affecting Latency Packet Switched Arbitration in each node, multiple arbitration cycles ircuit Switched Single arbitration for entire transaction Multiple hops from source to destination Pipelined data flow 3-5 lock latency in each node One time latency to establish a circuit Fast clock (3 GHz) Slow clock (1 GHz) One source and destination Broadcast possible 23

24 Truth in between Bus R Bus R Bus Bus to connect over short distances Bus R 2 nd Level Bus Bus R Hierarchy of Busses Or hierarchical circuit and packet switched networks 24

25 Link Width (a.u.) Bytes/Op Hierarchical & Heterogeneous Local Regional luster Global Local. Wide Slow 25

26 But wait, what about Optical? 65nm Optical: Pre-driver Driver VSEL TIA LA DeMUX DR lk Buffer Possible today Hopeful Source: Hirotaka Tamure (Fujitsu), ISS 08 Workshop on HS Transceivers 26

27 Summary Point to point busses are not necessary for on-die networks Rings and meshes were devised for point to point busses over long distances overkill for on chip network? Router power could be prohibitive Wide busses, circuit switched networks, show promise Hierarchical, heterogeneous, tapered, circuit and packet switched schemes suitable for on-die networks Go slower, wider, simpler, and efficient 27

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp. Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Exascale Computing a fact or a fiction?

Exascale Computing a fact or a fiction? Exascale Computing a fact or a fiction? IPDPS 2013 Shekhar Borkar Intel Corp. May 21, 2013 This research was, in part, funded by the U.S. Government, DOE and DARPA. The views and conclusions contained

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub. es > 100 MB/sec Pentium 4 Processor L1 and L2 caches Some slides adapted from lecture by David Culler 3.2 GB/sec Display Memory Controller Hub RDRAM RDRAM Dual Ultra ATA/100 24 Mbit/sec Disks LAN I/O Controller

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary

More information

Brief Background in Fiber Optics

Brief Background in Fiber Optics The Future of Photonics in Upcoming Processors ECE 4750 Fall 08 Brief Background in Fiber Optics Light can travel down an optical fiber if it is completely confined Determined by Snells Law Various modes

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1, Topics for Today Network Layer Introduction Addressing Address Resolution Readings Sections 5.1, 5.6.1-5.6.2 1 Network Layer: Introduction A network-wide concern! Transport layer Between two end hosts

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

3D WiNoC Architectures

3D WiNoC Architectures Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures",

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

Hybrid Memory Cube (HMC)

Hybrid Memory Cube (HMC) 23 Hybrid Memory Cube (HMC) J. Thomas Pawlowski, Fellow Chief Technologist, Architecture Development Group, Micron jpawlowski@micron.com 2011 Micron Technology, I nc. All rights reserved. Products are

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX

More information

Lecture 18: DRAM Technologies

Lecture 18: DRAM Technologies Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

A 1.5GHz Third Generation Itanium Processor

A 1.5GHz Third Generation Itanium Processor A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram

More information

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Silvano Gai 1 The sellable HPSR Seamless LAN/WLAN/SAN/WAN Network as a platform System-wide network intelligence as platform for

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Intel QuickPath Interconnect Electrical Architecture Overview

Intel QuickPath Interconnect Electrical Architecture Overview Chapter 1 Intel QuickPath Interconnect Electrical Architecture Overview The art of progress is to preserve order amid change and to preserve change amid order Alfred North Whitehead The goal of this chapter

More information

Platforms Design Challenges with many cores

Platforms Design Challenges with many cores latforms Design hallenges with many cores Raj Yavatkar, Intel Fellow Director, Systems Technology Lab orporate Technology Group 1 Environmental Trends: ell 2 *Other names and brands may be claimed as the

More information

Interconnection Networks

Interconnection Networks Lecture 15: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Credit: some slides created by Michael Papamichael, others based on slides from Onur Mutlu

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems

More information

SoC Communication Complexity Problem

SoC Communication Complexity Problem When is the use of a Most Effective and Why MPSoC, June 2007 K. Charles Janac, Chairman, President and CEO SoC Communication Complexity Problem Arbitration problem in an SoC with 30 initiators: Hierarchical

More information

SCORPIO: 36-Core Shared Memory Processor

SCORPIO: 36-Core Shared Memory Processor : 36- Shared Memory Processor Demonstrating Snoopy Coherence on a Mesh Interconnect Chia-Hsin Owen Chen Collaborators: Sunghyun Park, Suvinay Subramanian, Tushar Krishna, Bhavya Daya, Woo Cheol Kwon, Brett

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

OPTICAL INTERCONNECTS IN DATA CENTER. Tanjila Ahmed

OPTICAL INTERCONNECTS IN DATA CENTER. Tanjila Ahmed OPTICAL INTERCONNECTS IN DATA CENTER Tanjila Ahmed Challenges for Today s Data Centers Challenges to be Addressed : Scalability Low latency Energy Efficiency Lower Cost Challenges for Today s Data Center

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods

NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods 1 NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods V. Venkatraman, A. Laffely, J. Jang, H. Kukkamalla, Z. Zhu & W. Burleson Interconnect Circuit

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

EE482C, L1, Apr 4, 2002 Copyright (C) by William J. Dally, All Rights Reserved. Today s Class Meeting. EE482S Lecture 1 Stream Processor Architecture

EE482C, L1, Apr 4, 2002 Copyright (C) by William J. Dally, All Rights Reserved. Today s Class Meeting. EE482S Lecture 1 Stream Processor Architecture 1 Today s Class Meeting EE482S Lecture 1 Stream Processor Architecture April 4, 2002 William J Dally Computer Systems Laboratory Stanford University billd@cslstanfordedu What is EE482C? Material covered

More information

Architecture or Parallel Computers CSC / ECE 506

Architecture or Parallel Computers CSC / ECE 506 Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture

More information

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued) Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn

More information

EE482S Lecture 1 Stream Processor Architecture

EE482S Lecture 1 Stream Processor Architecture EE482S Lecture 1 Stream Processor Architecture April 4, 2002 William J. Dally Computer Systems Laboratory Stanford University billd@csl.stanford.edu 1 Today s Class Meeting What is EE482C? Material covered

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

The Road from Peta to ExaFlop

The Road from Peta to ExaFlop The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess Index A Active buffer window (ABW), 34 35, 37, 39, 40 Adaptive data compression, 151 172 Adaptive routing, 26, 100, 114, 116 119, 121 123, 126 128, 135 137, 139, 144, 146, 158 Adaptive voltage scaling,

More information

HyperTransport. Dennis Vega Ryan Rawlins

HyperTransport. Dennis Vega Ryan Rawlins HyperTransport Dennis Vega Ryan Rawlins What is HyperTransport (HT)? A point to point interconnect technology that links processors to other processors, coprocessors, I/O controllers, and peripheral controllers.

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges

More information

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers I N S T I T U T D E R E C H E R C H E T E C H N O L O G I Q U E L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers 10/04/2017 Les Rendez-vous de

More information

Different network topologies

Different network topologies Network Topology Network topology is the arrangement of the various elements of a communication network. It is the topological structure of a network and may be depicted physically or logically. Physical

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture 1 Physical Implementation of the DSPI etwork-on-chip in the FAUST Architecture Ivan Miro-Panades 1,2,3, Fabien Clermidy 3, Pascal Vivet 3, Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris,

More information

The Processor That Don't Cost a Thing

The Processor That Don't Cost a Thing The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

I/O Channels. RAM size. Chipsets. Cluster Computing Paul A. Farrell 9/8/2011. Memory (RAM) Dept of Computer Science Kent State University 1

I/O Channels. RAM size. Chipsets. Cluster Computing Paul A. Farrell 9/8/2011. Memory (RAM) Dept of Computer Science Kent State University 1 Memory (RAM) Standard Industry Memory Module (SIMM) RDRAM and SDRAM Access to RAM is extremely slow compared to the speed of the processor Memory busses (front side busses FSB) run at 100MHz to 800MHz

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Large Scale Multiprocessors and Scientific Applications. By Pushkar Ratnalikar Namrata Lele

Large Scale Multiprocessors and Scientific Applications. By Pushkar Ratnalikar Namrata Lele Large Scale Multiprocessors and Scientific Applications By Pushkar Ratnalikar Namrata Lele Agenda Introduction Interprocessor Communication Characteristics of Scientific Applications Synchronization: Scaling

More information

New Approaches to Optical Packet Switching in Carrier Networks. Thomas C. McDermott Chiaro Networks Richardson, Texas

New Approaches to Optical Packet Switching in Carrier Networks. Thomas C. McDermott Chiaro Networks Richardson, Texas New Approaches to Optical Packet Switching in Carrier Networks Thomas C. McDermott Chiaro Networks Richardson, Texas Outline Introduction, Vision, Problem statement Approaches to Optical Packet Switching

More information

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information