ECE/CS 757: Advanced Computer Architecture II Interconnects

Similar documents
ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures. Introduc1on. Prof. Natalie Enright Jerger

1/12/11. ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures. Introduc3on. Interconnec3on Networks Introduc3on

Interconnection Networks

TDT Appendix E Interconnection Networks

Lecture 2: Topology - I

Lecture 1: Introduction

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Lecture 1: Introduction

Network on Chip Architecture: An Overview

Types of Computer Networks and their Topologies Three important groups of computer networks: LAN, MAN, WAN

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Lecture 7: Flow Control - I

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Basic Low Level Concepts

Interconnection Networks and Clusters. Interconnection Networks and Clusters. Interconnection Networks and Clusters

Interconnection Networks

Lecture 9: MIMD Architecture

Interconnection Networks

Low-Power Interconnection Networks

Network-on-Chip Architecture

TYPES OF COMPUTER NETWORKS

Introduction to Computer Networks INTRODUCTION TO COMPUTER NETWORKS

Lecture 9: MIMD Architectures

IT 4504 Section 4.0. Network Architectures. 2008, University of Colombo School of Computing 1

Network Superhighway CSCD 330. Network Programming Winter Lecture 13 Network Layer. Reading: Chapter 4

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

F.1 Introduction F-2 F.2 Interconnecting Two Devices F-6 F.3 Connecting More Than Two Devices F-20 F.4 Network Topology F-30 F.

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Message Passing Models and Multicomputer distributed system LECTURE 7

What are Clusters? Why Clusters? - a Short History

CSCD 330 Network Programming

E.1 Introduction E-2 E.2 Interconnecting Two Devices E-5 E.3 Connecting More than Two Devices E-20 E.4 Network Topology E-29 E.

An introduction on the on-chip networks (NoC)

4. Networks. in parallel computers. Advances in Computer Architecture

Modular Platforms Market Trends & Platform Requirements Presentation for IEEE Backplane Ethernet Study Group Meeting. Gopal Hegde, Intel Corporation

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

ECE 551 System on Chip Design

INTERCONNECTION NETWORKS LECTURE 4

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

Transport is now key for extended SAN applications. Main factors required in SAN interconnect transport solutions are:

Multicomputer distributed system LECTURE 8

EE 6900: Interconnection Networks for HPC Systems Fall 2016

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Communication Networks - 3 general areas: data communications, networking, protocols

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Basics of datacommunication

Interconnection Networks

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

High Performance Datacenter Networks

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

Introduction to Computer Science (I1100) Networks. Chapter 6

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Local Area Networks (LANs): Packets, Frames and Technologies Gail Hopkins. Part 3: Packet Switching and. Network Technologies.

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

Scientific Applications. Chao Sun

Parallel Computing Platforms

Fast Flexible FPGA-Tuned Networks-on-Chip

Real Parallel Computers

Network-on-chip (NOC) Topologies

Outline: Connecting Many Computers

Local Area Network Overview

ES1 An Introduction to On-chip Networks

Lecture 3: Topology - II

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Architecting the High Performance Storage Network

Lecture 3: Flow-Control

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

The Impact of Optics on HPC System Interconnects

Digital Communication Networks

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

Uniprocessor Computer Architecture Example: Cray T3E

Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs

Network.... communication system for connecting end- systems. End-systems a.k.a. hosts PCs, workstations dedicated computers network components

Prepared by Agha Mohammad Haidari Network Manager ICT Directorate Ministry of Communication & IT

Lecture 22: Router Design

Interconnect Routing

ECE 757 Review: Parallel Processors

Choosing the Right. Ethernet Solution. How to Make the Best Choice for Your Business

CMSC 611: Advanced. Interconnection Networks

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

Part IV: 3D WiNoC Architectures

Lecture 9: MIMD Architectures

Interconnect Technology and Computational Speed

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Introduction. High Speed LANs. Emergence of High-Speed LANs. Characteristics of High Speed LANS. Text ch. 6, High-Speed Networks and

Switching and Forwarding Reading: Chapter 3 1/30/14 1

Data Center Interconnect Solution Overview

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

3D WiNoC Architectures

1. What is a Computer Network? interconnected collection of autonomous computers connected by a communication technology

Transcription:

ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger

Lecture Outline Introduction to Networks Network Topologies Network Routing Network Flow Control Router Microarchitecture Mikko Lipasti-University of Wisconsin 2

Computer Architecture What is an Interconnection Network? Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register Transfer Level Circuits Devices Technology Logic Mem Logic Mem Communication Logic Mem Logic Mem 3

Computer Architecture What is an Interconnection Network? Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register Transfer Level Circuits Devices Technology Logic Mem Logic Mem Logic Mem Logic Mem Dedicated Channels Application: Ideally wants low-latency, high-bandwidth, dedicated channels between logic and memory Technology: Dedicated channels too expensive in terms of area and power 4

Computer Architecture What is an Interconnection Network? Application Algorithm Programming Language Operating System Instruction Set Architecture An Interconnection Network is a programmable system that transports data between terminals Technology: Interconnection network helps efficiently utilize scarce resources Application: Managing communication can be critical to performance Microarchitecture Register Transfer Level Circuits Devices Technology Logic Mem Logic Mem Interconnection Network Logic Mem Logic Mem 5

Interconnection Networks Introduction Interconnection networks should be designed to transfer the maximum amount of information within the least amount of time (and cost, power constraints) so as not to bottleneck the system 6

Types of Interconnection Networks Interconnection networks can be grouped into four domains Depending on number and proximity of devices to be connected On-Chip Networks (OCNs or NoCs) Devices include microarchitectural elements (functional units, register files), caches, directories, processors Current/Future systems: dozens, hundreds of devices Ex: Intel TeraFLOPS research prototypes 80 cores Intel Single-chip Cloud Computer 48 cores Proximity: millimeters 7

Types of Interconnection Networks (2) System/Storage Area Networks (SANs) Multiprocessor and multicomputer systems Interprocessor and processor-memory interconnections Server and data center environments Storage and I/O components Hundreds to thousands of devices interconnected IBM Blue Gene/L supercomputer (64K nodes, each with 2 processors) Maximum interconnect distance: tens of meters (typical) to a few hundred meters Examples (standards and proprietary) InfiniBand, Myrinet, Quadrics, Advanced Switching Interconnect 8

Types of Interconnection Networks (3) Local Area Networks (LANs) Interconnect autonomous computer systems Machine room or throughout a building or campus Hundreds of devices interconnected (1,000s with bridging) Maximum interconnect distance few kilometers to few tens of kilometers Example (most popular): Ethernet, with 10 Gbps over 40Km Wide Area Networks (WANs) Interconnect systems distributed across globe Internetworking support required Many millions of devices interconnected Max distance: many thousands of kilometers Example: ATM (asynchronous transfer mode) 9

Distance (meters) Interconnection Network Domains 10 6 10 3 10 0 Local Area Networks System/Storage Area Networks Metropolita n Area Networks Wide Area Networks 10-3 1 10 100 1,000 10,000 >100,000 Number of devices interconnected 10 Slide courtesy Timothy Mark Pinkston and José Duato

Distance (meters) Interconnection Network Domains 10 6 10 3 10 0 Local Area Networks System/Storage Area Networks Metropolitan Area Networks Wide Area Networks 10-3 On-Chip Interconnection Networks/Networks-on- Chip 1 10 100 1,000 10,000 >100,000 Number of devices interconnected 11 Slide courtesy Timothy Mark Pinkston and José Duato

Why Study Networks on Chip? 12

Why Study On-Chip Networks? 13

Example of Multi- and Many-Core Architectures 14

Why study interconnects? They provide external connectivity from system to outside world Also, connectivity within a single computer system at many levels I/O units, boards, chips, modules and blocks inside chips Trends: high demand on communication bandwidth increased computing power and storage capacity switched networks are replacing buses Computer architects/engineers must understand interconnect problems and solutions in order to more effectively design and evaluate systems Networks Slide (Enright courtesy Jerger) Timothy Mark Pinkston and José Duato 15

On-Chip Networks (OCN or NoCs) Why On-Chip Network? Ad-hoc wiring does not scale beyond a small number of cores Prohibitive area Long latency OCN offers scalability efficient multiplexing of communication often modular in nature (ease verification) 16

Differences between on-chip and offchip networks Significant research in multi-chassis interconnection networks (off-chip) Supercomputers Clusters of workstations Internet routers Leverage research and insight but Constraints are different New opportunities 17

Off-chip vs. on-chip Off-chip: I/O bottlenecks Pin-limited bandwidth Inherent overheads of off-chip I/O transmission On-chip Wiring constraints Metal layer limitations Horizontal and vertical layout Short, fixed length Repeater insertion limits routing of wires Avoid routing over dense logic Impact wiring density 18

Off-Chip vs On-Chip On-Chip Power Consume 10-15% or more of die power budget Latency Different order of magnitude Routers consume significant fraction of latency 19

New opportunities Abundant wiring Change in relative cost of wires and buffers Many short flits Tightly integrated into system Not commodity fully customized design Allows for optimization with uncore Cache coherence Emerging technology Optics 3D 20

On-Chip Network Evolution Ad hoc wiring Small number of nodes Buses and Crossbars Simplest variant of on-chip networks Low core counts Like traditional multiprocessors Bus traffic quickly saturates with a modest number of cores Crossbars: higher bandwidth Poor area and power scaling 21

Multicore Examples (1) Niagara 2: 8x9 crossbar (area ~= core) Rock: Hierarchical crossbar (5x5 crossbar connecting clusters of 4 cores) XBAR Sun Niagara 0 1 2 3 4 5 Networks (Enright Jerger) 0 1 2 3 4 5 22

Multicore Examples (2) RING IBM Cell Element Interconnect Bus 12 elements 4 unidirectional rings 16 Bytes wide Operates at 1.6 GHz IBM Cell 23

Many Core Example 2D MESH Intel TeraFLOPS 80 core prototype 5 GHz Each tile: Processing engine + on-chip network router

Many-Core Example (2): Intel SCC Intel s Single-chip Cloud Computer (SCC) uses a 2D mesh with state of the art routers Networks (Enright Jerger) 25

Saturation throughput Latency (sec) Performance and Cost Zero load latency Offered Traffic (bits/sec) Performance: latency and throughput Cost: area and power 26

Lecture Outline Introduction to Networks Network Topologies Network Routing Network Flow Control Router Microarchitecture Mikko Lipasti-University of Wisconsin 27

Readings Read: [18] N. Enright Jerger, L.-S. Pei, On-Chip Networks, Synthesis Lectures on Computer Architecture, http://www.morganclaypool.com/doi/abs/10.2200/s 00209ED1V01Y200907CAC008 Review: [19] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. B. III, and A. Agarwal. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, vol. 27, no. 5, pp. 15-31, 2007 Mikko Lipasti-University of Wisconsin 28