NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores
|
|
- Darcy Barnett
- 5 years ago
- Views:
Transcription
1 NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona 1,2, Carles Hernandez 1, Enrico Mezzetti 1, Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona Supercomputing Center (BSC) 2 Universitat Politècnica de Catalunya (UPC) 3 IIIA-CSIC December 13 th Nashville, USA 39 th IEEE Real-Time Systems Symposium RTSS 2018
2 Critical Real-Time Embedded Systems Used in industries like: Avionics Railway Space Require: Functional Correctness Timing Correctness Validation & Verification (V&V) process Need to provide evidence against the safety standards Avionics: DO178B/C Automotive: ISO
3 Increasing Performance Needs in CRTES New software implementing complex functionalities Complex AI algorithms Manage Huge amounts of data Performance needs increase significantly ARM predicts that the performance requirements of ADAS to grow 100x from 2016 to 2024 Autonomous driving 3
4 Covering high-performance needs How to deliver the performance needed by CRTES Software in an efficient way? Embrace high-performance hardware coming from mainstream market Multicores and Manycores Caches Networks on chip (NoCs) Accelerators SnapDragon (automotive) Nvidia Pascal (automotive) Kalray MPPA-256 (aviation) 4
5 The other side of the coin High-performance (complex) hardware complicates timing analysis, i.e. deriving WCET estimates for tasks Source of the problem: contention Must be bounded and reduced Worst-case Contention Delay (WCD) Worst Case Execution Time (WCET) 2x2 2D mesh with 4 cores 5
6 Related work Real-Time Specific NoC designs: Provide Contention-Free NoCs and easy to V&V Do not scale well (bad average performance in general) High costs for being adopted in Industry Wormhole NoC designs (wnoc) Best-effort wormhole NoCs (wormhole switching) Used in Commercial Off the shelf processors (low costs for industry) More difficult to derive upperbounds (can be very pessimistic) Optimize parameters of these NoCs» Mapping, Routing, Bandwidth distribution, 6
7 Worst Case Execution Time (WCET) - ZLL WCET = f(zll,wcd) Zero Load Latency (ZLL) = f(distance) 1. Mapping 3 hops 5 hops 7
8 Worst Case Execution Time (WCET) WCET = f(zll,wcd) Zero Load Latency (ZLL) = f(distance) 1. Mapping Worst case Contention Delay (WCD) = f(routing, Arbitration) 2. Routing 3. Bandwidth weighted allocation (walloc) 8
9 Worst Case Execution Time (WCET) - WCD WCET = f(zll,wcd) Zero Load Latency (ZLL) = f(distance) 1. Mapping Worst case Contention Delay (WCD) = f(routing, Arbitration) 2. Routing 3. Bandwidth weighted allocation (walloc) Y X 3x3 mesh flows mapping using XY 3x3 mesh flows mapping using XY-YX combination 9
10 Worst Case Execution Time (WCET) - WCD WCET = f(zll,wcd) Zero Load Latency (ZLL) = f(distance) 1. Mapping WCET is affected by all three parameters: Worst case Contention Delay (WCD) = f(routing, Arbitration) 2. Routing Mapping, Routing and Walloc 3. Bandwidth weighted allocation (walloc) WCD = 15 WCD = 10 2x2 2D mesh XY flows mapping RR arbitration Weighted mesh arbitration (WRR) 10
11 Parameters are inter-dependent WCET = f(zll, WCD) = f(mapping, Routing, Walloc) Optimizing each individually or in pairs, does not provide a global optimal All the NoC parameters configuration, just need a local one. to be optimized at the same time Mapping constraints Routing constraints Bandwidth constraints 11
12 Our proposal: NoCo Given a Workload (Tasks) Wormhole Mesh NoC configuration Optimizes The WCET of applications finding the best mesh configuration: Mapping Routing Weights allocation (Walloc) NoCo uses: Stochastic exploration to optimize routing Integer Linear Programming (ILP) to optimize Mapping and Walloc 12
13 Agenda Introduction and Motivation Background and problem analysis NoCo: Stochastic/ILP model Evaluation Conclusions 13
14 NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona PROPOSAL: NOCO
15 Approach/Concept NoCo Optimization Framework Routing Stochastic Generates random routes and pass it to the ILP optimizer Placement, walloc Integer Linear Programming (ILP) Placement and Walloc are optimized per each routing Selection of the best setup Stochastic Random selection Route Route Route ILP NoC NoC Optimized NoC Performance configurations e Route generation Mapping and Walloc Optimization Best configuration 15
16 NoCo Proposal Problem description: Tasks information Execution Time Observed (ETO) Memory Accesses NoC information: Target Node location Number of routers Constraints (only one task to each core) (mapping and walloc) Main stages of NoCo Framework 16
17 NoCo Proposal Routing Stochastic exploration Generate Randomly Routing configurations Minimal distance routing policies Deterministic routing policies (ie XY, YX) Deadlock avoidance Prohibiting certain turns (no cycles) (mapping and walloc) Main stages of NoCo Framework 17
18 1% 0,1% NoCo Proposal Routing Random sampling (finite population) C = probability that one of the top X% routes is not in the random sample. Worst routings Best routings The probability of having 1 routing in the 1% of the top routings in a 1000 size sample is 1-0, = 0, (99,99%) 18
19 NoCo Proposal Stochastic Routing It warrantees stochastically to find one of the best routing solutions at low cost (without exploring all the possible routings) With 330 samples out of 2^16 = routings finds the best routing (0,5% of the population) (mapping and walloc) Main stages of NoCo Framework 19
20 NoCo Proposal Mapping and Walloc ILP optimization Main stages of NoCo Framework 20
21 NoCo Proposal ILP model Objective function: Parallel applications W_C1 W_C2 W_C3 W_C3 The WCET of the application is determined by the WCET of the slowest thread 21
22 NoCo Proposal ILP model Compute WCET: Bandwidth and WCD modeling WCET in isolation Number of Memory accesses BW distribution constraints from Routing configuration Path flows mapping from Routing configuration 22
23 NoCo Proposal ILP model Compute WCET: Routing rules: Bandwidth distribution Path restrictions Encoded in Boolean matrixes Other restrictions: One task assigned in one core One core can only run one task BW assigned to a cores > 0.0 WCD of all tasks > 0.0 Total BW in the mesh must be 1 23
24 NoCo Proposal Stochastic + ILP model Local solutions: Provides WCET of each task Mapping Bandwidth distribution (arbitration weights) Post processing (minimum WCET) Global solution Main stages of NoCo Framework 24
25 NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona EVALUATION
26 Evaluation Cycle-accurate Simulator SoCLib simulator integrated with gnocsim Benchmarks Key parameter: frequency of access to the NoC for loads/stores Workloads Cover the range shown for Mediabench and EEMBC auto MIX Benchmarks (i.e MIX1 => ABCDEFFGH) 26
27 Evaluation: impact of optimizing each parameter Incremental Evaluation NoC configuration Static-base (RR) Static-opt (WRR) Map Map + Walloc Map + Walloc + R ILP Optimizations Routing Weights Mapping Baseline NoCo Optimization versions evaluated 27
28 Results Incremental Evaulation Static_opt (WRR) vs Static_base (RR) -1% 16% Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads) 28
29 Results Incremental Optimizations Map vs Static-base (RR) 17% 30% 23% Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads) 29
30 Results Incremental Optimizations Map_Walloc vs Static-base (RR) 31% 41% 37% Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads) 30
31 Results Incremental Optimizations Map_Walloc_Routing (NoCo) vs XY_RR 9% 14% 50% 40% 46% 23% 3% Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads) 31
32 NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona CONCLUSIONS
33 Conclusions Optimizing NoC to reduce WCET is a multidimensional problem Zero Load Latency (Mapping) Worst Case Delay (Routing and Arbitration) Some proposals exist in the state of the art that optimize one or combinations of the mentioned parameters that increase the WCET of applications. We propose NoCo a stochastic/ilp hybrid solution that optimizes at the same time: Routing (XY, YX combinations) Arbitration (Walloc) Applications mapping NoCo reduces the maxwcet of heterogeneous tasks in 3x3 meshes between 40 and 50% with respect XY-RR configuration. 33
34 NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona 1,2, Carles Hernandez 1, Enrico Mezzetti 1, Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona Supercomputing Center (BSC) 2 Universitat Politècnica de Catalunya (UPC) 3 IIIA-CSIC December 13 th Nashville, USA 39 th IEEE Real-Time Systems Symposium RTSS 2018
35 BACKUP 35
36 Reliability of stochastic method Random Routing vs Optimal Routing 88,7% 2^9 = 512 routings 2^16 = routings 94,6% 100 samples 98,5% 330 samples best solution in fifth examples 36
37 Reliability of stochastic method Improvement in homogeneous tasks running in parallel Reducing max WCET of all tasks rilp (9 threads): Avg w.r.t RR 74% Avg w.r.t WRR 26% rilp (16 threads): Avg w.r.t RR 88% Avg w.r.t WRR 29% rilp(m,w) maxwcet results for 9 threads rilp(m,w) maxwcet results for 16 threads 37
38 Reliability of stochastic method Improvement in heterogeneous tasks running in parallel Reducing summation of max WCET of all tasks rilp (9 threads): Avg w.r.t RR 26% Avg w.r.t WRR 19% rilp (16 threads): Avg w.r.t RR 30% Avg w.r.t WRR 23% rilp(m,w) sumwcet results for 9 threads rilp(m,w) sumwcet results for 16 threads 38
MC2: Multicore and Cache Analysis via Deterministic and Probabilistic Jitter Bounding
www.bsc.es MC2: Multicore and Cache Analysis via Deterministic and Probabilistic Jitter Bounding Enrique Díaz¹,², Mikel Fernández¹, Leonidas Kosmidis¹, Enrico Mezzetti¹, Carles Hernandez¹, Jaume Abella¹,
More informationResource Sharing and Partitioning in Multicore
www.bsc.es Resource Sharing and Partitioning in Multicore Francisco J. Cazorla Mixed Criticality/Reliability Workshop HiPEAC CSW Barcelona May 2014 Transition to Multicore and Manycores Wanted or imposed
More informationNetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013
NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching
More informationCONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART
CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART Gabriel Fernandez 1, Jaume Abella 2, Eduardo Quiñones 2, Christine Rochange 3, Tullio Vardanega 4 and Francisco
More informationIA 3 : An Interference Aware Allocation Algorithm for Multicore Hard Real-Time Systems
IA 3 : An Interference Aware Allocation Algorithm for Multicore Hard Real-Time Systems Marco Paolieri Barcelona Supercomputing Center (BSC) Barcelona, Spain e-mail: marco.paolieri@bsc.es Eduardo Quiñones
More informationFig. 1. AMBA AHB main components: Masters, slaves, arbiter and decoder. (Picture from AMBA Specification Rev 2.0)
AHRB: A High-Performance Time-Composable AMBA AHB Bus Javier Jalle,, Jaume Abella, Eduardo Quiñones, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona Supercomputing Center, Spain Universitat
More informationA Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study
A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study Javier Jalle,, Eduardo Quiñones, Jaume Abella, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona
More informationConsumer Electronics Processors for Critical Real-Time Systems: a (Failed) Practical Experience
Consumer Electronics Processors for Critical Real-Time Systems: a (Failed) Practical Experience Gabriel Fernandez, Francisco Cazorla, Jaume Abella To cite this version: Gabriel Fernandez, Francisco Cazorla,
More informationMemory Architectures for NoC-Based Real-Time Mixed Criticality Systems
Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems Neil Audsley Real-Time Systems Group Computer Science Department University of York York United Kingdom 2011-12 1 Overview Motivation:
More informationHardware Support for WCET Analysis of Hard Real-Time Multicore Systems
Hardware Support for WCET Analysis of Hard Real-Time Multicore Systems Marco Paolieri Barcelona Supercomputing Center (BSC) Barcelona, Spain marco.paolieri@bsc.es Eduardo Quiñones Barcelona Supercomputing
More informationOn the Evaluation of the Impact of Shared Resources in Multithreaded COTS Processors in Time-Critical Environments
On the Evaluation of the Impact of Shared Resources in Multithreaded COTS Processors in Time-Critical Environments PETAR RADOJKOVIĆ, Barcelona Supercomputing Center SYLVAIN GIRBAL and ARNAUD GRASSET, Thales
More informationDTM: Degraded Test Mode for Fault-Aware Probabilistic Timing Analysis
DTM: Degraded Test Mode for Fault-Aware Probabilistic Timing Analysis Mladen Slijepcevic,, Leonidas Kosmidis,, Jaume Abella, Eduardo Quiñones, Francisco J. Cazorla, Universitat Politècnica de Catalunya
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationHigh-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V
THEME ARTICLE: Automotive Computing High-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V Enrico Mezzetti Leonidas Kosmidis Jaume Abella Francisco J. Cazorla Barcelona
More informationWCET-Aware C Compiler: WCC
12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
More informationCCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward
More informationAn Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors
ACM IEEE 37 th International Symposium on Computer Architecture Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors Enric Herrero¹, José González²,
More informationMeasurement-Based Timing Analysis of the AURIX Caches
Measurement-Based Timing Analysis of the AURIX Caches Leonidas Kosmidis 1,3, Davide Compagnin 2, David Morales 3, Enrico Mezzetti 3, Eduardo Quinones 3, Jaume Abella 3, Tullio Vardanega 2, and Francisco
More informationNetwork Calculus: A Comparison
Time-Division Multiplexing vs Network Calculus: A Comparison Wolfgang Puffitsch, Rasmus Bo Sørensen, Martin Schoeberl RTNS 15, Lille, France Motivation Modern multiprocessors use networks-on-chip Congestion
More informationThomas Moscibroda Microsoft Research. Onur Mutlu CMU
Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank
More informationEPC Enacted: Integration in an Industrial Toolbox and Use Against a Railway Application
EPC Enacted: Integration in an Industrial Toolbox and Use Against a Railway Application E. Mezzetti, M. Fernandez, A. Bardizbanyan I. Agirre, J. Abella, T. Vardanega, F. Cazorla, * This project Real-Time
More informationHybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance
Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University {wzhang4,ding4}@vcu.edu
More informationEfficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip
ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,
More informationEnsuring Schedulability of Spacecraft Flight Software
Ensuring Schedulability of Spacecraft Flight Software Flight Software Workshop 7-9 November 2012 Marek Prochazka & Jorge Lopez Trescastro European Space Agency OUTLINE Introduction Current approach to
More informationBarcelona Supercomputing Center. Spanish National Research Council (IIIA-CSIC)
Bus Designs for Time-Probabilistic Multicore Processors Javier Jalle,, Leonidas Kosmidis,, Jaume Abella, Eduardo Quiñones, Francisco J. Cazorla, Universitat Politècnica de Catalunya Barcelona Supercomputing
More informationA Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems
A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationSireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern
Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationDesigning Predictable Real-Time and Embedded Systems
Designing Predictable Real-Time and Embedded Systems Juniorprofessor Dr. Jian-Jia Chen Karlsruhe Institute of Technology (KIT), Germany 0 KIT Feb. University 27-29, 2012 of at thetu-berlin, State of Baden-Wuerttemberg
More informationReconciling Time Predictability and Performance in Future Computing Systems
Reconciling Time Predictability and Performance in Future Computing Systems Francisco J. Cazorla, Jaume Abella, Enrico Mezzetti, Carles Hernandez, Tullio Vardanega, Guillem Bernat Barcelona Supercomputing
More informationQoS-aware resource allocation and load-balancing in enterprise Grids using online simulation
QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation * Universität Karlsruhe (TH) Technical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC) Samuel
More informationReal-Time Mixed-Criticality Wormhole Networks
eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks
More informationA Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders
A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs Marco Bekooij & Frank Ophelders Outline Context What is cache coherence Addressed challenge Short overview of related work Related
More informationJoint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals
Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University
More informationDesign and Analysis of Time-Critical Systems Introduction
Design and Analysis of Time-Critical Systems Introduction Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Structure of this Course 2. How are they implemented?
More informationA Server-based Approach for Predictable GPU Access Control
A Server-based Approach for Predictable GPU Access Control Hyoseung Kim * Pratyush Patel Shige Wang Raj Rajkumar * University of California, Riverside Carnegie Mellon University General Motors R&D Benefits
More informationOpenMP tasking model for Ada: safety and correctness
www.bsc.es www.cister.isep.ipp.pt OpenMP tasking model for Ada: safety and correctness Sara Royuela, Xavier Martorell, Eduardo Quiñones and Luis Miguel Pinho Vienna (Austria) June 12-16, 2017 Parallel
More informationModeling and Verification of Networkon-Chip using Constrained-DEVS
Modeling and Verification of Networkon-Chip using Constrained-DEVS Soroosh Gholami Hessam S. Sarjoughian School of Computing, Informatics, and Decision Systems Engineering Arizona Center for Integrative
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi $, Prathap Kumar Valsan^, Renato Mancuso *, Heechul Yun $ $ University of Kansas, ^ Intel, * Boston University
More informationNetwork-on-Chip Architecture
Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)
More informationLecture 3: Flow-Control
High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor
More informationShared Cache Aware Task Mapping for WCRT Minimization
Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,
More informationImplementing Flexible Interconnect Topologies for Machine Learning Acceleration
Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges
More informationPower and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip
2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri
More informationOpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April
More informationAgenda. System Performance Scaling of IBM POWER6 TM Based Servers
System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies
More informationNoC Simulation in Heterogeneous Architectures for PGAS Programming Model
NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe
More informationEnabling TDMA Arbitration in the Context of MBPTA
Enabling TDMA Arbitration in the Context of MBPTA Miloš Panić,, Jaume Abella, Eduardo Quiñones, Carles Hernandez, Theo Ungerer, Francisco J. Cazorla, Universitat Politècnica de Catalunya Barcelona Supercomputing
More informationEfficient Latency Guarantees for Mixed-criticality Networks-on-Chip
Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip Sebastian Tobuschat, Rolf Ernst IDA, TU Braunschweig, Germany 18.
More informationAsymmetry-aware execution placement on manycore chips
Asymmetry-aware execution placement on manycore chips Alexey Tumanov Joshua Wise, Onur Mutlu, Greg Ganger CARNEGIE MELLON UNIVERSITY Introduction: Core Scaling? Moore s Law continues: can still fit more
More informationAn Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms
An Efficient Network-on-Chip (NoC) based Multicore Platform for Hierarchical Parallel Genetic Algorithms Yuankun Xue 1, Zhiliang Qian 2, Guopeng Wei 3, Paul Bogdan 1, Chi-Ying Tsui 2, Radu Marculescu 3
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationNew ARMv8-R technology for real-time control in safetyrelated
New ARMv8-R technology for real-time control in safetyrelated applications James Scobie Product manager ARM Technical Symposium China: Automotive, Industrial & Functional Safety October 31 st 2016 November
More informationPseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks
Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department
More informationData Bus Slicing for Contention-Free Multicore Real-Time Memory Systems
Data Bus Slicing for Contention-Free Multicore Real-Time Memory Systems Javier Jalle,, Eduardo Quiñones, Jaume Abella, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona Supercomputing Center
More informationA Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements
More informationAchieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation
Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation
More informationTwo-Level Address Storage and Address Prediction
Two-Level Address Storage and Address Prediction Enric Morancho, José María Llabería and Àngel Olivé Computer Architecture Department - Universitat Politècnica de Catalunya (Spain) 1 Abstract. : The amount
More informationA Predictable Simultaneous Multithreading Scheme for Hard Real-Time
A Predictable Simultaneous Multithreading Scheme for Hard Real-Time Jonathan Barre, Christine Rochange, and Pascal Sainrat Institut de Recherche en Informatique de Toulouse, Université detoulouse-cnrs,france
More informationComputing Safe Contention Bounds for Multicore Resources with Round-Robin and FIFO Arbitration
Computing Safe Contention Bounds for Multicore Resources with Round-Robin and FIFO Arbitration 1 Gabriel Fernandez,, Javier Jalle,, Jaume Abella, Eduardo Quiñones, Tullio Vardanega, Francisco J. Cazorla,
More information-- the Timing Problem & Possible Solutions
ARTIST Summer School in Europe 2010 Autrans (near Grenoble), France September 5-10, 2010 Towards Real-Time Applications on Multicore -- the Timing Problem & Possible Solutions Wang Yi Uppsala University,
More informationElaborazione dati real-time su architetture embedded many-core e FPGA
Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T
More informationHigh-Performance Real-Time Lab (HiPeRT) Marko Bertogna University of Modena, Italy
High-Performance Real-Time Lab (HiPeRT) Marko Bertogna University of Modena, Italy marko.bertogna@unimore.it http://hipert.unimore.it/ HiPeRT Lab Research on High-Performance Real-Time Systems ~20 people
More information3D WiNoC Architectures
Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures",
More informationBasic Switch Organization
NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization
More informationIETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY
IETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY Contributors: Sarah Banks:sbanks@akamai.com Muhammad Durrani: mdurrani@brocade.com Mike Chen: mchen@brocade.com Objective Create comprehensive VNF performance
More informationFast Flexible FPGA-Tuned Networks-on-Chip
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe
More informationContext. Hardware Performance. Increasing complexity. Software Complexity. And the Result is. Embedded systems are becoming more complex every day:
Context Embedded systems are becoming more complex every day: Giorgio uttazzo g.buttazzo@sssup.it more functions higher performance higher efficiency Scuola Superiore Sant nna new hardware s Increasing
More informationContext. Giorgio Buttazzo. Scuola Superiore Sant Anna. Embedded systems are becoming more complex every day: more functions. higher performance
Giorgio uttazzo g.buttazzo@sssup.it Scuola Superiore Sant nna Context Embedded systems are becoming more complex every day: more functions higher performance higher efficiency new hardware platforms 2
More informationParallel Code Generation of Synchronous Programs for a Many-core Architecture
Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat Supervisors: Reviewers: Pascal Raymond (Verimag), Matthieu Moy (LIP) Benoît Dupont de Dinechin (Kalray) Jan
More informationA Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions
A Timing Effects of DDR Memory Systems in Hard Real-Time Multicore Architectures: Issues and Solutions MARCO PAOLIERI, Barcelona Supercomputing Center (BSC) EDUARDO QUIÑONES, Barcelona Supercomputing Center
More informationSIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core
SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science SIC: Provably Timing-Predictable
More informationA Thermal-aware Application specific Routing Algorithm for Network-on-chip Design
A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationManaging Memory for Timing Predictability. Rodolfo Pellizzoni
Managing Memory for Timing Predictability Rodolfo Pellizzoni Thanks This work would not have been possible without the following students and collaborators Zheng Pei Wu*, Yogen Krish Heechul Yun* Renato
More informationStudying Optimal Spilling in the light of SSA
Studying Optimal Spilling in the light of SSA Quentin Colombet, Florian Brandner and Alain Darte Compsys, LIP, UMR 5668 CNRS, INRIA, ENS-Lyon, UCB-Lyon Journées compilation, Rennes, France, June 18-20
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology
More informationFuture Gigascale MCSoCs Applications: Computation & Communication Orthogonalization
Basic Network-on-Chip (BANC) interconnection for Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Abderazek Ben Abdallah, Masahiro Sowa Graduate School of Information
More informationWMC. MPSoCs for Mixed-Criticality Systems: Challenges and Opportunities. Mohamed Hassan
WMC MPSoCs for Mixed-Criticality Systems: Challenges and Opportunities Mohamed Hassan 1 IBM s Acorn Smart Phones Automotive 1943 1989 2010s 1981 2000s Now-Near Colossus NEC s UltaLite Wearables IoT/Smart
More informationIntegration of Mixed Criticality Systems on MultiCores: Limitations, Challenges and Way ahead for Avionics
Integration of Mixed Criticality Systems on MultiCores: Limitations, Challenges and Way ahead for Avionics TecDay 13./14. Oct. 2015 Dietmar Geiger, Bernd Koppenhöfer 1 COTS HW Evolution - Single-Core Multi-Core
More informationPrediction Router: Yet another low-latency on-chip router architecture
Prediction Router: Yet another low-latency on-chip router architecture Hiroki Matsutani Michihiro Koibuchi Hideharu Amano Tsutomu Yoshinaga (Keio Univ., Japan) (NII, Japan) (Keio Univ., Japan) (UEC, Japan)
More informationPerformance Tools for Technical Computing
Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology
More informationQuest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling
Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer
More informationInteractive Realtime Multimedia Applications on SOIs
Interactive Realtime Multimedia Applications on SOIs Advance Reservations for Distributed Real-Time Workflows with Probabilistic Service Guarantees Tommaso Cucinotta Real-Time Systems Laboratory Scuola
More informationIntelligent Interconnect for Autonomous Vehicle SoCs. Sam Wong / Chi Peng, NetSpeed Systems
Intelligent Interconnect for Autonomous Vehicle SoCs Sam Wong / Chi Peng, NetSpeed Systems Challenges Facing Autonomous Vehicles Exploding Performance Requirements Real-Time Processing of Sensors Ultra-High
More informationPartitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions
Partitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions *, Alessandro Biondi *, Geoffrey Nelissen, and Giorgio Buttazzo * * ReTiS Lab, Scuola Superiore Sant Anna, Pisa, Italy CISTER,
More informationReal-Time Communication Services for Networks on Chip. Zheng Shi
Real-Time Communication Services for Networks on Chip Zheng Shi Submitted for the degree of Doctor of Philosophy Computer Science The University of York November 2009 Abstract Networks-on-Chip (NoCs),
More informationManaging Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems
Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems Zimeng Zhou, Lei Ju, Zhiping Jia, Xin Li School of Computer Science and Technology Shandong University, China Outline
More informationOptimal Implementation of Simulink Models on Multicore Architectures with Partitioned Fixed Priority Scheduling
The 39th IEEE Real-Time Systems Symposium (RTSS 18) Optimal Implementation of Simulink Models on Multicore Architectures with Partitioned Fixed Priority Scheduling Shamit Bansal, Yecheng Zhao, Haibo Zeng,
More informationExploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API
EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,
More informationOverview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications
Overview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications Marc Gatti, Thales Avionics Sylvain Girbal, Xavier Jean, Daniel Gracia Pérez, Jimmy
More informationSTLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip
STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,
More informationMeasurement-Based Probabilistic Timing Analysis and Its Impact on Processor Architecture
Measurement-Based Probabilistic Timing Analysis and Its Impact on Processor Architecture Leonidas Kosmidis, Eduardo Quiñones,, Jaume Abella, Tullio Vardanega, Ian Broster, Francisco J. Cazorla Universitat
More informationA Detailed GPU Cache Model Based on Reuse Distance Theory
A Detailed GPU Cache Model Based on Reuse Distance Theory Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal Eindhoven University of Technology (Netherlands) Henri Bal Vrije Universiteit Amsterdam
More informationHybrid Implementation of 3D Kirchhoff Migration
Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation
More information