Network Calculus: A Comparison

Similar documents
Cross Clock-Domain TDM Virtual Circuits for Networks on Chips

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Parallel Code Generation of Synchronous Programs for a Many-core Architecture

State-based Communication on Time-predictable Multicore Processors

Dynamic Flow Regulation for IP Integration on Network-on-Chip

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

The Argo software perspective

Support for the Logical Execution Time Model on a Time-predictable Multicore Processor

Quality-of-Service for a High-Radix Switch

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

A Rant on Queues. Van Jacobson. July 26, MIT Lincoln Labs Lexington, MA

Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip

Topologies. Maurizio Palesi. Maurizio Palesi 1

Network Model for Delay-Sensitive Traffic

A DRAM Centric NoC Architecture and Topology Design Approach

Router s Queue Management

Stream Sessions: Stochastic Analysis

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Impact of Traffic Aggregation on Network Capacity and Quality of Service

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies

Time-Predictable Communication on a Time-Division Multiplexing Network-on-Chip Multicore

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

end systems, access networks, links circuit switching, packet switching, network structure

Topologies. Maurizio Palesi. Maurizio Palesi 1

Prediction Router: Yet another low-latency on-chip router architecture

Dynamic Routing of Hierarchical On Chip Network Traffic

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Low-Power Interconnection Networks

A Worst Case Performance Model for TDM Virtual Circuit in NoCs

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

D 8.4 Workshop Report

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Buffer Sizing in a Combined Input Output Queued (CIOQ) Switch

Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip

Lecture 24: Scheduling and QoS

Optical Packet Switching

Computer Networks. ENGG st Semester, 2010 Hayden Kwok-Hay So

NOC: Networks on Chip SoC Interconnection Structures

EP2210 Scheduling. Lecture material:

D 3.8 Integration report of the full system implemented on FPGA

System-Level Analysis of Network Interfaces for Hierarchical MPSoCs

Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems

Network-on-chip (NOC) Topologies

Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor

Mapping and Physical Planning of Networks-on-Chip Architectures with Quality-of-Service Guarantees

Basic Switch Organization

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

MUD: Send me your top 1 3 questions on this lecture

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Real-Time Mixed-Criticality Wormhole Networks

048866: Packet Switch Architectures

Teletraffic theory (for beginners)

On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering

WCTT bounds for MPI Primitives in the PaterNoster NoC

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

The Nostrum Network on Chip

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Mapping of Real-time Applications on

Stateless Resource Sharing AND ATS

MAC Protocols and Packet Switching

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Basic Low Level Concepts

What Global (non-digital) Communication Network Do You Use Every Day?"

ECE 697J Advanced Topics in Computer Networks

TDT Appendix E Interconnection Networks

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Application of Network Calculus to the TSN Problem Space

Introduction: Two motivating examples for the analytical approach

Fast Flexible FPGA-Tuned Networks-on-Chip

Lecture: Interconnection Networks

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

Measurement Study of Lowbitrate Internet Video Streaming

Lecture 22: Router Design

What Global (non-digital) Communication Network Do You Use Every Day?"

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

WITH THE CONTINUED advance of Moore s law, ever

Optimizing Memory Bandwidth of a Multi-Channel Packet Buffer

Ultra-Fast NoC Emulation on a Single FPGA

D 3.4 Report documenting hardware implementation of the self-timed NOC

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

A framework to determine the optimal loss rate of RED queue for next generation Internet routers

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

Unit 2 Packet Switching Networks - II

«Computer Science» Requirements for applicants by Innopolis University

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

EECS 122: Introduction to Computer Networks Resource Management and QoS. Quality of Service (QoS)

BROADBAND AND HIGH SPEED NETWORKS

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Embedded Systems: Hardware Components (part II) Todor Stefanov

NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Flow Control can be viewed as a problem of

HopliteRT: An Efficient FPGA NoC for Real-time Applications

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Transcription:

Time-Division Multiplexing vs Network Calculus: A Comparison Wolfgang Puffitsch, Rasmus Bo Sørensen, Martin Schoeberl RTNS 15, Lille, France

Motivation Modern multiprocessors use networks-on-chip Congestion leads to excessive worst-case latencies Real-time systems require guaranteed service Static arbitration with predefined schedule Time-division multiplexing Dynamic arbitration with rate control Network calculus (Response-time analysis) 2 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Time-Division Multiplexing Static schedule Packets delayed at source, then flow freely No collisions, no flow control, no buffering Fits bottom-up approach Time-predictable computer architectures 3 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

TDM Latency One packet per round, over n hops: L TDM m x = T r 1 + (n 1)p + nd + m x T r... length of TDM round p... router traversal time d... link traversal time m x... maximum packet length 4 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Network Calculus Mathematical results for dynamically scheduled networks Fits top-down approach Analysis of COTS platforms Traffic described by arrival and departure curves cumulative data arrival curve delay backlog departure curve time 5 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Network Calculus Latency Approach by Dupont de Dinechin for Kalray NoC [1] Bandwidth ρ and burstiness σ ρ number of packets N m x that may be injected over sliding time window T σ = ρ(1 ρ)t Over n hops: L RC m x = σ ρ + (n 1) m x ρ + n(d + m x ) d... link traversal time m x... maximum packet length 6 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Network-on-Chip Model TDM NoC modeled after Argo NoC [2] Rate-controlled NoC like Kalray MPPA NoC [1] Tiles Processing element PE Network interface NI Router R Source routing, push communication PE NI R 7 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Network-on-Chip Model TDM NoC modeled after Argo NoC [2] Rate-controlled NoC like Kalray MPPA NoC [1] 4 4 bi-torus (max. 6 hops) Link trav. time d = 1 Router trav. time p = 2 3-word packets m x = 3 Queue size Q s ze = 401 7 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Evaluation Setup Meta-heuristic scheduler for TDM [3] Shortest-path routing Multiple packets on different routes Optimizes for round length T r Network calculus approach by Dupont de Dinechin Assume fixed routes (one per flow) Generate bandwidth constraints We use ILOG OPL constraint solver to solve for bandwidths ρ and time window T Maximize bandwidths or minimize latencies 8 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

All-to-All Schedule Every node sends one packet to every other node Time-division multiplexing Solution with T r = 19 m x = 57 cycles Optimal solution T r = 18 m x = 54 cycles L TDM : 66 to 75 cycles m x Network calculus approach Solution with T = 15 m x = 45 cycles Optimal L RC : 144 to 291 cycles m x 9 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

First Observations 26% higher bandwidth with network calculus Significantly lower latencies with TDM Number of hops critical in network calculus Application mapping important for network calculus, not so much so for TDM 10 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Application-Specific Schedules All-to-all-schedule maybe not representative Communication benchmarks FFT, encoders/decoders, robot control,... 28 to 226 flows, varying bandwidths Worst-case latencies and bandwidths Minimum, maximum, average Different flows have different latencies Detailed tables in paper 11 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Example: Reed-Solomon Decoder Worst-Case Latency Bandwidth min avg max min avg max TDM 72 100.7 108.0323.0579.0690 NC L 30 176.0 285.0667.1325.7333 NC B 42 218.9 338.0645.1332.7419 Latencies in cycles, bandwidths as fraction of link capacity NC L : Network calculus, optimize for average latency NC B : Network calculus, optimize for bandwidth 12 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Example: Reed-Solomon Decoder Worst-Case Latency Bandwidth min avg max min avg max TDM 72 100.7 108.0323.0579.0690 NC L 30 176.0 285.0667.1325.7333 NC B 42 218.9 338.0645.1332.7419 Lower average latency with TDM 12 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Example: Reed-Solomon Decoder Worst-Case Latency Bandwidth min avg max min avg max TDM 72 100.7 108.0323.0579.0690 NC L 30 176.0 285.0667.1325.7333 NC B 42 218.9 338.0645.1332.7419 Higher average bandwidth with network calculus approach 12 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Example: Reed-Solomon Decoder Worst-Case Latency Bandwidth min avg max min avg max TDM 72 100.7 108.0323.0579.0690 NC L 30 176.0 285.0667.1325.7333 NC B 42 218.9 338.0645.1332.7419 Network calculus approach makes use of spare capacities Some flows are assigned a very high bandwidth 12 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Example: Reed-Solomon Decoder Worst-Case Latency Bandwidth min avg max min avg max TDM 72 100.7 108.0323.0579.0690 NC L 30 176.0 285.0667.1325.7333 NC B 42 218.9 338.0645.1332.7419 For some flows, network calculus can guarantee lower latencies than TDM TDM scheduler only optimizes T r, not distribution of slots 12 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Example: Reed-Solomon Decoder Worst-Case Latency Bandwidth min avg max min avg max TDM 72 100.7 108.0323.0579.0690 NC L 30 176.0 285.0667.1325.7333 NC B 42 218.9 338.0645.1332.7419 For other flows, network calculus leads to much higher latencies than TDM 12 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Hardware Costs I Hardware costs depend on many factors Feature size: 130nm, 65nm, 32nm,... Optimizing for area/speed/power Network interfaces, routers, links Published results not comparable Main functionality of routers similar Rate-controlled NoC needs buffers, TDM does not Relate size of TDM router to size of SRAM cell 13 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Hardware Costs II Argo router in 65 nm: 8000 μm 2 SRAM cell in 65 nm: 0.5 μm 2 500 32-bit words same size as TDM router Buffers in Kalray NoC 16 larger than Argo router Increasing bandwidth through wider links may be better investment than buffers 14 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Best-Effort/Mixed Criticality TDM NoC cannot accomodate best-effort traffic without adding buffers Rate-controlled NoC only needs adding flow control Relatively minor change Network calculus can handle priorities Guaranteed service vs best-effort traffic High-criticality vs low-criticality traffic 15 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Future Work More aggressive scheduling for TDM Minimize latencies, not only schedule length Use spare capacities More advanced network calculus techniques Use different windows T for different flows Extend comparison to response-time analysis Combine low latencies with high bandwidth? 16 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Conclusion TDM provides better latencies Potential for even better latencies Network calculus leads to higher bandwidths Use of spare capacities TDM NoC hardware smaller Wider links vs buffers? Network calculus more flexible 17 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015

Bibliography B. Dupont de Dinechin, Y. Durand, D. van Amstel, and A. Ghiti. Guaranteed services of the NoC of a manycore processor. In International Workshop on Network on Chip Architectures (NoCArc), pages 11 16, New York, NY, USA, Dec. 2014. ACM. E. Kasapaki, M. Schoeberl, R. B. Sørensen, C. T. Müller, K. Goossens, and J. Sparsø. Argo: A real-time network-on-chip architecture with an efficient GALS implementation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP, 2015. R. B. Sørensen, J. Sparsø, M. R. Pedersen, and J. Højgaard. A metaheuristic scheduler for time division multiplexed network-on-chip. In Software Technologies for Future Embedded and Ubiquitous Systems (SEUS), 2014. IEEE, 2014. 18 DTU Compute Time-Division Multiplexing vs Network Calculus 6.11.2015