Parallel Code Generation of Synchronous Programs for a Many-core Architecture
|
|
- Bridget Burns
- 5 years ago
- Views:
Transcription
1 Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat Supervisors: Reviewers: Pascal Raymond (Verimag), Matthieu Moy (LIP) Benoît Dupont de Dinechin (Kalray) Jan Reineke (Universität des Saarlandes) Robert de Simone (INRIA Sophia-Antipolis, Aoste) Examiners: Anne Bouillard (ENS Paris) Alain Girault (INRIA Grenoble) Christine Rochange (IRIT) Thesis defense, November 16th, /48
2 2/48 SPACE, WEIGHT, AND POWER
3 2/48 SPACE, WEIGHT, AND POWER
4 3/48 SAFETY CRITICAL SYSTEMS Example of aircraft flight controller (control-command) Pilot (stick) Actuator Sensors (altitude) Processor feedback Time-critical: latency constraints are part of the specification (e.g. < some 10ms) Designed with eg., Synchronous Languages
5 4/48 THE SYNCHRONOUS DATA-FLOW LANGUAGES (1/2) Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 pre init Lustre (academic), Scade (industrial) Network of nodes
6 5/48 THE SYNCHRONOUS DATA-FLOW LANGUAGES (2/2) Logical i n 1 = 1 i n = 3 i n+1 = 7... instant n-1 instant n instant n+1... i t o n 1 = 2 o n = 6 o n+1 = 14 2 o One execution is called a logical instant
7 THE SYNCHRONOUS DATA-FLOW LANGUAGES (2/2) Logical i n 1 = 1 i n = 3 i n+1 = 7... instant n-1 instant n instant n+1... i t o n 1 = 2 o n = 6 o n+1 = 14 One execution is called a logical instant 2 o Physical i n i n 1 i n+1 instant n-1 instant n instant n+1... Step Step Step... t o n 1 o n o n+1 Requirement: o n before i n+1. Worst-Case Execution Time (WCET) 5/48
8 6/48 LUSTRE/SCADE DELAY OPERATOR instant 1 instant 2 instant 3 a a=1 a=2 a=3 init=0 c x 2 b pre init=0 b=1 b=4 b=9 d c=1 c=2 c=3 d=0 d=1 d=4 t t + t e e=1 e=3 e=7 previous operator Logical/functional delay
9 7/48 CODE GENERATION FOR SINGLE-CORE Node 1 Node 2 Node 3 Formal semantics Determinism Code Generation Compilation.c for each period { i = sensors(); o = main_step(i); actuators(o); } Static schedule Synchronous languages void main_step(i) { Lustre, o1 = Heptagon, N1_step(i); SCADE o2 = N2_step(i); Industrial returns use: N3_step(o1, SCADE in Airbus o2); A380 } Single-core processor Static schedule N1 N2 N3... WCET
10 CODE GENERATION FOR SINGLE-CORE Node 1 Node 2 Node 3 Formal semantics Determinism Code Generation Compilation.c for each period { i = sensors(); o = main_step(i); actuators(o); } Static schedule Synchronous languages void main_step(i) { Lustre, o1 = Heptagon, N1_step(i); SCADE o2 = N2_step(i); Industrial returns use: N3_step(o1, SCADE in Airbus o2); A380 } Single-core processor Static schedule N1 N2 N3... WCET OK for sequential execution. What about parallel? 7/48
11 THE KALRAY MPPA2 Core Properties of the Kalray MPPA2: Cores No complex branch prediction Only LRU caches Cluster Banked Shared-Memory (16*128ko) Independent arbiter for each memory bank. Network-on-Chip (NoC) between clusters Bandwidth limiter (Network calculus possible) Performance + Predictability Cluster (=multi-core) 8/48
12 9/48 OVERVIEW Kalray MPPA2 Bostan Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 pre init
13 10/48 Part I: Semantics Preserving Parallelization Extraction of Parallelism Mapping / Scheduling Code Generation Communication Channels Implementation Part II: Real-Time Guarantees Part III: Evaluation and Conclusion
14 11/48 EXTRACTION OF PARALLELISM Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction Method 1: functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Code Generation System + Communication Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing In the top-level node: 1 node = 1 task (runnable). Generation of sequential code for each node Similar to Architecture Description Language (Prelude, Giotto) Method 2: based on fork-join: see manuscript Executable for Kalray
15 11/48 EXTRACTION OF PARALLELISM Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction Method 1: functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Code Generation System + Communication Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing In the top-level node: 1 node = 1 task (runnable). Generation of sequential code for each node Similar to Architecture Description Language (Prelude, Giotto) Method 2: based on fork-join: see manuscript Executable for Kalray
16 12/48 MAPPING / SCHEDULING Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Need non-preemptive static schedule External scheduling tool We can easily check the schedule Code Generation System + Communication Executable for Kalray
17 13/48 MAPPING/SCHEDULING: EXAMPLE Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Core 0: N1 ; N4 ; N6 Core 1: N2 ; N5 Core 2: N3 Schedule checked using the dependency graph
18 14/48 CODE GENERATION Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Communication and Dependency graph N1 N2 N3 N4 N6 N5 Mapping + Non-preemptive scheduling NoC Routing Code Generation System + Communication Executable for Kalray
19 14/48 CODE GENERATION Contribution External tool N1 N2 N3 N4 N5 pre init N6 Dependencies compiled into wait for input. functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Sketch of code for core 0. for each period{ wait_inputs_n1(); // N1 N1_step(); write_outputs_n1(); wait_inputs_n4(); // N4 N4_step(); write_outputs_n4(); Code Generation System + Communication Executable for Kalray } wait_inputs_n6(); // N6 N6_step(); write_outputs_n6();
20 15/48 COMMUNICATION CHANNELS Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Two kinds of communications: - instantaneous ( ) - delayed ( ) Code Generation System + Communication Executable for Kalray
21 INSTANTANEOUS COMMUNICATION Contribution External tool N1 N2 N3 N4 N5 pre init N6 void write_outputs_n2() { copy(n4.in2, N2.out); functional code N1.c N2.c N3.c N4.c N5.c N6.c Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 } copy(n5.in1, N2.out); void wait_for_inputs_n4() { Mapping + Non-preemptive scheduling NoC Routing Code Generation System + Communication Executable for Kalray } Efficient hardware synchronization Software-based cache coherency 16/48
22 INSTANTANEOUS COMMUNICATION Contribution External tool N1 N2 N3 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Code Generation System + Communication Executable for Kalray N4 N5 pre init Communication and Dependency graph N1 N2 N4 N6 Parallelism Extraction N6 N5 N3 NoC Routing void write_outputs_n2() { copy(n4.in2, N2.out); channel N2 N4 = true; // + cache management notify(core N4); copy(n5.in1, N2.out); } void wait_for_inputs_n4() { while(! channel N2 N4 ) { wait(); } channel N2 N4 = false; while(! channel_n1_n4) { wait(); } channel_n1_n4 = false; // + cache management } Efficient hardware synchronization Software-based cache coherency 16/48
23 17/48 COMMUNICATION CHANNELS Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Two kinds of communications: - instantaneous ( ) - delayed ( ) Code Generation System + Communication Executable for Kalray
24 18/48 DELAYED COMMUNICATION i1 i2 A pre B o1 init b o2 A S B Transformation into a SWAP + scheduling constraints. Scenario 1 Scenario 2 n-1 instant n n+1 n-1 instant n n+1 A.in S.in Y X A B S... A.in S.in X Y B A S... Constraint: B S Constraint: A S
25 18/48 DELAYED COMMUNICATION i1 i2 A pre B o1 init b o2 A S B Transformation into a SWAP + scheduling constraints. Scenario 1 Scenario 2 n-1 instant n n+1 n-1 instant n n+1 A.in S.in Y X A B S... A.in S.in X Y B A S... Constraint: B S Constraint: A S
26 18/48 DELAYED COMMUNICATION i1 i2 A pre B o1 init b o2 A S B Transformation into a SWAP + scheduling constraints. Scenario 1 Scenario 2 n-1 instant n n+1 n-1 instant n n+1 A.in S.in Y X A B S... A.in S.in X Y B A S... Constraint: B S Constraint: A S
27 19/48 CONCLUSION OF PART I Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Semantics preserved Next step: Temporal Guaranties Code Generation System + Communication Executable for Kalray
28 20/48 Part I: Semantics Preserving Parallelization Part II: Real-Time Guarantees Shared Memory Interference Time-Triggered Execution Model Real-Time Guarantees with Network-on-Chip Part III: Evaluation and Conclusion
29 21/48 INTERFERENCE AND REACTION TIME Core 0 Task A Core 1... M E M Core 0 Core 1 Task A WCET Core 15 Single-core: WCET is sufficient
30 INTERFERENCE AND REACTION TIME Core 0 WCET WCRT Task A Core 0 Task A M Core 1 E Request Access Task B M... Core 1 Task B Core 15 Request Access Single-core: WCET is sufficient Multi-core: WCET+interference on shared resources = Worst-Case Response Time (WCRT). WCET+interference too pessimistic in the general case exploit the execution model in the analysis 21/48
31 22/48 BANKED MEMORY Bank 0 Core 0 Bank 1 Core 1... Round Robin... Private arbiter for each bank Available in the Kalray MPPA2 (silicon)
32 23/48 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only...
33 23/48 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only...
34 23/48 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Communication Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only...
35 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Communication Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only How to activate tasks?... 23/48
36 24/48 PROBLEM WITH ASAP EXECUTION Core 0 Core 1 WCET A Task A WCET C Task C WCET B Task B WCET D Task D Deadline
37 24/48 PROBLEM WITH ASAP EXECUTION WCET A WCET B Core 0 Task A Task B WCET C WCET D Core 1 Task C Task D Deadline WCET B Faster execution of A interference between B and C. Core 0 WCET A Task A Task B Timing Anomaly WCET C WCET D Core 1 Task C Task D Deadline
38 24/48 PROBLEM WITH ASAP EXECUTION WCET A WCET B Core 0 Task A Task B WCET C WCET D Core 1 Task C Task D Deadline WCET B Faster execution of A interference between B and C. Core 0 WCET A Task A WCET C Task B WCET D Timing Anomaly Solution: release dates for tasks (time-triggered) Core 1 Task C Task D Deadline
39 TIME-TRIGGERED EXECUTION MODEL Principle Release = 0 Release = 427 WCRT A WCRT B Core 0 Task A Task B Core 1 Release = 0 WCRT C WCRT D Task C Task D Release = 398 Compute a static release date when data is guaranteed to be available Time-Triggered execution prevents tasks from starting earlier Implementation void wait_inputs_n1() { while(time() < t_period + release_date_n1){ // wait } } 25/48
40 26/48 MULTI-CORE INTERFERENCE ANALYSIS WCET Dependencies Memory access Mapping MIA Release dates of tasks WCRT Multi-Core Interference Analysis (MIA) Tool [Hamza Rihani, RTNS 2016] www-verimag.imag.fr/multi-core-interference-analysis.html
41 26/48 MULTI-CORE INTERFERENCE ANALYSIS WCET Dependencies Memory access Mapping MIA Release dates of tasks WCRT Multi-Core Interference Analysis (MIA) Tool [Hamza Rihani, RTNS 2016] www-verimag.imag.fr/multi-core-interference-analysis.html
42 FRAMEWORK Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing + Rate Attribution WCET Analysis (Otawa, AiT) Computation of the release dates (MIA tool) Insertion in the executable. Code Generation System + Communication Executable for Kalray Network Calculus (WCTT) [Submitted: Graillat A., Rihani H., Maiza C., Moy M., Raymond P., Dupont de Dinechin B., Real-Time Systems Journal] release dates WCET Analysis MIA 27/48
43 28/48 OUR EXECUTION MODEL Static scheduling Bare metal, no interrupt, no preemption Banked memory mapping (1 core 1 bank) Time-triggered Deterministic and WCRT guarantee
44 OVERVIEW Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Communication and Dependency graph N1 N2 N3 N4 N6 N5 Mapping + Non-preemptive scheduling NoC Routing + Rate Attribution Code Generation System + Communication Executable for Kalray Network Calculus (WCTT) release dates WCET Analysis MIA 29/48
45 30/48 THE KALRAY MPPA2 S NOC flow 2 North East East North Local East, North, East, Local Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B
46 30/48 THE KALRAY MPPA2 S NOC North East East North Local East, North, East, Local flow 1 Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B
47 30/48 THE KALRAY MPPA2 S NOC flow 2 North East East North Full Duplex Links Local East, North, East, Local flow 1 Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B
48 30/48 THE KALRAY MPPA2 S NOC flow 2 North East North East rate = 1 2 B rate = 1 2 B Local East, North, East, Local flow 1 Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B
49 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f Routing Route = sequence of directions computed by sender cluster router [Dupont de Dinechin B., Graillat A., NoCArc 2017]
50 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f East East South 1. Routing Route = sequence of directions computed by sender Local South cluster router [Dupont de Dinechin B., Graillat A., NoCArc 2017]
51 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.1 f f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) [Dupont de Dinechin B., Graillat A., NoCArc 2017]
52 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.2 f1.1 f f2.1 f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) [Dupont de Dinechin B., Graillat A., NoCArc 2017]
53 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.2 f1.1 f f2.1 f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness [Dupont de Dinechin B., Graillat A., NoCArc 2017]
54 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.1 f f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, [Dupont de Dinechin B., Graillat A., NoCArc 2017]
55 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.2 f f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}... [Dupont de Dinechin B., Graillat A., NoCArc 2017]
56 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f cluster router f f [Dupont de Dinechin B., Graillat A., NoCArc 2017] 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2} Rate attribution Fair attribution e.g. {0.5, 0.5},
57 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f cluster router f f [Dupont de Dinechin B., Graillat A., NoCArc 2017] 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2} Rate attribution Fair attribution e.g. {0.5, 0.5}, {1.0, 1.0}...
58 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f cluster router f f [Dupont de Dinechin B., Graillat A., NoCArc 2017] 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2} Rate attribution Fair attribution e.g. {0.5, 0.5}, {1.0, 1.0} Network Calculus Compute latency and buffer level
59 32/48 1. UNICAST ROUTING f1 Hamiltonian Routing Up-Network Down-Network Deadlock-free Path-Diversity
60 33/48 1. MULTICAST ROUTING PROBLEM f Deadlock-free heuristic to travelling salesman problem : One route? Tree is no possible with Kalray architecture Hamiltonian: minimum of two routes Source Destination
61 33/48 1. MULTICAST ROUTING PROBLEM f Deadlock-free heuristic to travelling salesman problem : One route? Tree is no possible with Kalray architecture Hamiltonian: minimum of two routes Source Destination
62 34/48 1. MULTICAST ROUTING: HAMILTONIAN Dual-Path + Hamiltonian without path diversity (Lin et al., 1992) Small path diversity, we can do better
63 35/48 1. MULTICAST ROUTING: HOE Even Odd Allowed with HOE Even Hamiltonian Odd Even (Bahrebar et al.) routing: allows some non-hamiltonian paths Our choice: Dual-Path + HOE
64 36/48 EVALUATION OF DEADLOCK-FREE MULTICAST ROUTING
65 EVALUATION OF DEADLOCK-FREE MULTICAST ROUTING Minimum rate vs. Network load (Higher is better) 1 Dual Path+Hamiltonian Dual Path+HOE Realistic network load Network Load Minimal rate increased of up to 19% with Dual Path HOE compared to Dual Path Hamiltonian. 36/48
66 37/48 2. ROUTE SELECTION f1 f1.2 f f1.3 cluster router f f2.1 f2.2 f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations, run step 3 on each solution, keep the best. f2.1 f2.2 f2.3 f , , , 0.5 f , , , 1.0 f , , , 1.0 Enumeration of 9 combinations of possible routes
67 38/48 2. ROUTE SELECTION f1 f1.2 f2 f f f2.1 f2.2 f Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Exploration with pruning: Ignore combinations with non-minimal bottleneck cluster router [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] f1.1 f1.2 f1.3 f2.1 f2.2 f2.3
68 38/48 2. ROUTE SELECTION f1 f1.2 f f1.3 cluster router f f2.1 f2.2 f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Exploration with pruning: Ignore combinations with non-minimal bottleneck Apply rate attribution (step 3) f2.1 f2.2 f2.3 f1.1 f , , 1.0 f , , 1.0 Enumeration of 4 combinations
69 39/48 Exploration vs. Our Exploration with Pruning
70 39/48 Exploration vs. Our Exploration with Pruning
71 40/48 2. ROUTE SELECTION f1 f1.2 f f1.3 cluster router f f2.1 f2.2 f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates
72 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router 1 4 f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] 1 4 Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates
73 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router 1 4 f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] 1 4 Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates 2. Remove smallest alternative routes
74 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates 2. Remove smallest alternative routes
75 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates 2. Remove smallest alternative routes 3. Enumerate the 3 combinations
76 41/48 EVALUATION OF OUR LP-BASED HEURISTIC
77 41/48 EVALUATION OF OUR LP-BASED HEURISTIC
78 42/48 LP-BASED HEURISTIC VS. OPTIMAL EXPLORATION Optimal algorithms (100%): naive exploration, exploration with pruning Minimal rate as indicator of fairness
79 43/48 Part I: Semantics Preserving Parallelization Part II: Real-Time Guarantees Use Cases and Evaluation Conclusion Part III: Evaluation and Conclusion
80 THE ROSACE CASE STUDY Altitude-only flight controller Open source (Simulink, Lustre, Giotto) [Pagetti, Soussié, RTAS 14] Sequential Best effort pure +100 cycles +200 cycles +100: each task augmented with 100 cycles +200: each task augmented with 200 cycles Best effort (ASAP): no real-time guarantees Time-Triggered (TT): real-time guarantees 44/48
81 THE ROSACE CASE STUDY Altitude-only flight controller Open source (Simulink, Lustre, Giotto) [Pagetti, Soussié, RTAS 14] Sequential Best effort pure +100 cycles +200 cycles +100: each task augmented with 100 cycles +200: each task augmented with 200 cycles Best effort (ASAP): no real-time guarantees Time-Triggered (TT): real-time guarantees 44/48
82 45/48 SYNTHETIC BENCHMARK ON 64 CORES 3 phases: Dispatch, Compute, Gather 20 Bytes per flow, high network congestion for gather phase Compute Dispatch Gather 13 15
83 SYNTHETIC BENCHMARK ON 64 CORES 3 phases: Dispatch, Compute, Gather 20 Bytes per flow, high network congestion for gather phase 54% of WCRT for functional code 46% of WCRT for communication and system code 45/48
84 CONCLUSION Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Communication and Dependency graph N1 N2 N3 N4 N6 N5 Mapping + Non-preemptive scheduling NoC Routing + Rate Attribution Code Generation System + Communication Executable for Kalray Network Calculus (WCTT) release dates WCET Analysis MIA 46/48
85 CONCLUSION Contrib Externa Semantics init N6 Parallelism Extraction Preserving Code Communication and Dependency graph N1 N2 N4 N6 N5 N3 Generation NoC Routing + Rate Attribution Executable for Kalray Network Calculus (WCTT) release dates WCET Analysis MIA 46/48
86 CONCLUSION Contrib Externa Semantics init N6 Parallelism Extraction Preserving Code Communication and Dependency graph N1 N2 N4 N6 N5 N3 Generation NoC Routing + Rate Attribution Time-triggered Network Calculus (WCTT) release dates Execution Model 46/48
87 CONCLUSION Contrib Externa Semantics Preserving Code Hard realtime Guarantees Generation Many-Core Time-triggered Performance release dates Execution Model 46/48
88 CONCLUSION Contrib Externa Semantics Preserving Code Generation Time-triggered Hard realtime Guarantees Many-Core Performance Bridge between academic and industry release dates Execution Model 46/48
89 47/48 FUTURE WORK Semantics Preserving Code Hard realtime Guarantees Generation Many-Core Time-triggered Performance Execution Model
90 47/48 FUTURE WORK Semantics Preserving Code Hard realtime Guarantees Generation Release date computation and memory interferences analysis with NoC Time-triggered Execution Model Many-Core Performance
91 47/48 FUTURE WORK Semantics Memory size problem (overlays?) Release date computation and memory interferences analysis with NoC Preserving Code Generation Time-triggered Execution Model Hard realtime Guarantees Many-Core Performance
92 47/48 FUTURE WORK Input/Output Management Semantics Memory size problem (overlays?) Release date computation and memory interferences analysis with NoC Preserving Code Generation Time-triggered Execution Model Hard realtime Guarantees Many-Core Performance
93 47/48 FUTURE WORK Input/Output Management Semantics Memory size problem (overlays?) Preserving Code Generation Hard realtime Guarantees What should be the standard real-time multi-core processor? Many-Core Release date computation Time-triggered Performance and memory interferences Execution analysis with NoC Model
94 FUTURE WORK Input/Output Management Semantics Memory size problem (overlays?) Preserving Code Generation Hard realtime Guarantees What should be the standard real-time multi-core processor? Many-Core Performance Release date computation Time-triggered and memory interferences Execution analysis with NoC Model Thank you for your attention. Questions? 47/48
95 48/48 Published Graillat A., Dupont de Dinechin B., DATE 2018 Parallel Code Generation of Synchronous Programs for a Many-core Architecture. Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018 Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor. Dupont de Dinechin B., Graillat A., NoCArc 2017 Feed-Forward Routing for the Wormhole Switching Network-on-Chip of the Kalray MPPA2-256 Processor. Dupont de Dinechin B., Graillat A., AISTECS 2017 Network-on-Chip Service Guarantees on the Kalray MPPA-256 Bostan Processor. Submitted Graillat A., Rihani H., Maiza C., Moy M., Raymond P., Dupont de Dinechin B., Real-Time Systems Journal Implementation Framework for Real-Time Data-Flow Synchronous Programs on Many-Cores. Graillat A, Maiza C., Moy M., Raymond P., Dupont de Dinechin B., DATE 2019 Response Time Analysis of Dataflow Applications on a Many-Core Processor with Shared-Memory and Network-on-Chip.
96 48/48 REFERENCES [1] Bertsekas, D. P., Gallager, R. G., and Humblet, P. (1992). Data networks, vol. 2. Prentice Hall International, Englewood Cliffs, New Jersey, 7632: [2] Chen, S. and Nahrstedt, K. (1998). Maxmin fair routing in connection-oriented networks. In Proc. Euro-Parallel and Distributed Systems Conf, pages
97 48/48 Backup
98 48/48 DEADLOCK IN WORMHOLE NETWORKS A R1 R2 R4 R3 Deadlock-freeness can be ensured at routing time B Solutions: XY, Hamiltonian Odd-Even, Turn Prohibition, etc This instance of flows deadlocks A wormhole packet is spread along the route Links 1-4 and 3-2 are shared A holds 1-4 but waits for 3-2 B holds 3-2 but waits for 1-4
99 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate cluster router
100 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = cluster router
101 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = 1 4 Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2) cluster router
102 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = 1 4 Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2) cluster router Example 2: f1 = 2 3, f2 = 1 3, f3 = 1 3, f4 = 1 3
103 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = 1 4 Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2) cluster router Example 2: f1 = 2 3, f2 = 1 3, f3 = 1 3, f4 = 1 3 Valid and max-min fair (increasing f2, f3 or f4 cannot be done without reducing one of them) Classical solution: the Water Filling algorithm [1] 48/48
104 48/48 DETERMINISTIC NETWORK CALCULUS (DNC) PRINCIPLE f1 f2 f3 f cluster router b r R r b delay backlog R t T t T γ r,b (t) β R,T (t) (β R,T γ r,b )(t) Arrival Curve Service Curve Convolution t Arrival curve is a maximum traffic entering the network Service curve is a minimum traffic handeled by the network How to compute service curve?
105 48/48 KALRAY MPPA2 NETWORK-ON-CHIP f1 f2 f3 f4 f3= f2= 1 3 f4= f1= cluster router FIFO Round-Robin... f1 f3 f2 f4 Kalray MPPA2 Network Elements 7 11
106 48/48 SEPARATED FLOW ANALYSIS (1/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f Separated Flow Analysis: Compute the service curve of each network element Compute the successive arrival curves at each network element Convolution of the element service curves network service curve
107 48/48 SEPARATED FLOW ANALYSIS (2/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f Service curve offered to f2? At routers 1, 2, 3, 7 and 11.
108 48/48 BLIND MULTIPLEXING f2= 1 3 f1= 2 3 R=1 - b1 r1 = R-r1 Output link Peak rate Arrival curve of f1 T = b1 R r1 Service Curve for f2 No information about the arbitration: consider f2 is low priority. Can we do better?
109 48/48 ROUND ROBIN MULTIPLEXING f2= 1 3 f1= 2 3 N=2 Packets of size l max Restriction: Rate R (not applicable to f1) Blind multiplexing is the conservative solution. T = (N 1)l max Service Curve for f2 R = 1 N
110 WORST-CASE TRAVERSAL TIME (WCTT): APPLICATION (1/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f Service curve offered to f2? Router 1: Round Robin multiplexing (N=2) Router 2: non active (alone) Router 3: Round Robin multiplexing (N=2, f3 and f4 are aggregated with b a = i 2 b i, r a = i 2 r i) Router 7 and 11: non active 48/48
111 48/48 WORST-CASE TRAVERSAL TIME (WCTT): APPLICATION (2/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f f1 f2 f3 f4 WCTT (cycles) With packets of l max =17 flits.
112 48/48 EVALUATION OF OUR LP-BASED HEURISTIC
113 48/48 EVALUATION OF OUR LP-BASED HEURISTIC
114 48/48 LP-BASED HEURISTIC VS. OPTIMAL EXPLORATION Optimal algorithms (100%): naive exploration, exploration with pruning Minimal rate as indicator of fairness
115 48/48 [1] Bertsekas, D. P., Gallager, R. G., and Humblet, P. (1992). Data networks, vol. 2. Prentice Hall International, Englewood Cliffs, New Jersey, 7632: [2] Chen, S. and Nahrstedt, K. (1998). Maxmin fair routing in connection-oriented networks. In Proc. Euro-Parallel and Distributed Systems Conf, pages
Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor
Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor Hamza Rihani, Matthieu Moy, Claire Maiza, Robert Davis, Sebastian Altmeyer To cite this version: Hamza Rihani, Matthieu
More informationNetwork Calculus: A Comparison
Time-Division Multiplexing vs Network Calculus: A Comparison Wolfgang Puffitsch, Rasmus Bo Sørensen, Martin Schoeberl RTNS 15, Lille, France Motivation Modern multiprocessors use networks-on-chip Congestion
More informationParallel Code Generation of Synchronous Programs for a Many-core Architecture
Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat, Matthieu Moy, Pascal Raymond, enoît Dupont de Dinechin To cite this version: Amaury Graillat, Matthieu Moy,
More informationComputing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor
Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor Marc Boyer ONERA Toulouse, France Benoît Dupont de Dinechin Kalray Grenoble, France Amaury Graillat Kalray, Verimag
More informationReal Time Systems Compilation with Lopht. Dumitru Potop-Butucaru INRIA Paris/ AOSTE team Jan. 2016
Real Time Systems Compilation with Lopht Dumitru Potop-Butucaru INRIA Paris/ AOSTE team Jan. 2016 1 Embedded control systems Computing system that controls the execution of a piece of «physical» equipment,
More informationDesign and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256
Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Jan Reineke @ saarland university computer science ACACES Summer School 2017
More informationEfficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip
ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,
More informationNOC Deadlock and Livelock
NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,
More informationCross Clock-Domain TDM Virtual Circuits for Networks on Chips
Cross Clock-Domain TDM Virtual Circuits for Networks on Chips Zhonghai Lu Dept. of Electronic Systems School for Information and Communication Technology KTH - Royal Institute of Technology, Stockholm
More informationTowards Semantics-Preserving Implementation of Synchronous Programs on Many-Core Architectures
Master of Science in Informatics at Grenoble Master Mathématiques Informatique - spécialité Informatique option Parallel Distributed Embedded Systems Towards Semantics-Preserving Implementation of Synchronous
More informationDesign and Analysis of Time-Critical Systems Introduction
Design and Analysis of Time-Critical Systems Introduction Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Structure of this Course 2. How are they implemented?
More informationBARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs
-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The
More informationDeadlock and Livelock. Maurizio Palesi
Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,
More informationTDT Appendix E Interconnection Networks
TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages
More informationRouting Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip
Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract
More informationMemory Architectures for NoC-Based Real-Time Mixed Criticality Systems
Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems Neil Audsley Real-Time Systems Group Computer Science Department University of York York United Kingdom 2011-12 1 Overview Motivation:
More informationInsights on the performance and configuration of AVB and TSN in automotive applications
Insights on the performance and configuration of AVB and TSN in automotive applications Nicolas NAVET, University of Luxembourg Josetxo VILLANUEVA, Groupe Renault Jörn MIGGE, RealTime-at-Work (RTaW) Marc
More informationNetworks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized
More informationLecture 16: On-Chip Networks. Topics: Cache networks, NoC basics
Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality
More informationLecture 3: Flow-Control
High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor
More informationReal-Time Mixed-Criticality Wormhole Networks
eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks
More informationKalray MPPA Massively Parallel Processor Array
Kalray MPPA Massively Parallel Processor Array Engineering a Manycore Processor Platform for Mission-Critical Applications Benoît Dupont de Dinechin, CTO Kalray Solutions Target Two Strategic Markets Cloud
More informationEP2210 Scheduling. Lecture material:
EP2210 Scheduling Lecture material: Bertsekas, Gallager, 6.1.2. MIT OpenCourseWare, 6.829 A. Parekh, R. Gallager, A generalized Processor Sharing Approach to Flow Control - The Single Node Case, IEEE Infocom
More informationPushing the limits of CAN - Scheduling frames with offsets provides a major performance boost
Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost Nicolas NAVET INRIA / RealTime-at-Work http://www.loria.fr/~nnavet http://www.realtime-at-work.com Nicolas.Navet@loria.fr
More informationQUESTION BANK UNIT I
QUESTION BANK Subject Name: Operating Systems UNIT I 1) Differentiate between tightly coupled systems and loosely coupled systems. 2) Define OS 3) What are the differences between Batch OS and Multiprogramming?
More informationBasic Low Level Concepts
Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock
More informationProgramming Languages for Real-Time Systems. LS 12, TU Dortmund
Programming Languages for Real-Time Systems Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 20 June 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 41 References Slides are based on Prof. Wang Yi, Prof.
More informationCS 856 Latency in Communication Systems
CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationOverview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services
Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling
More informationA Thermal-aware Application specific Routing Algorithm for Network-on-chip Design
A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong
More informationA Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing
727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni
More informationNetwork Model for Delay-Sensitive Traffic
Traffic Scheduling Network Model for Delay-Sensitive Traffic Source Switch Switch Destination Flow Shaper Policer (optional) Scheduler + optional shaper Policer (optional) Scheduler + optional shaper cfla.
More informationComposable Resource Sharing Based on Latency-Rate Servers
Composable Resource Sharing Based on Latency-Rate Servers Benny Akesson 1, Andreas Hansson 1, Kees Goossens 2,3 1 Eindhoven University of Technology 2 NXP Semiconductors Research 3 Delft University of
More informationWorst-Case Delay Analysis of Master-Slave Switched Ethernet Networks
Worst-Case Delay Analysis of Master-Slave Switched Ethernet Networks Mohammad Ashjaei Mälardalen University, Västerås, Sweden mohammad.ashjaei@mdh.se Meng Liu Mälardalen University, Västerås, Sweden meng.liu@mdh.se
More informationPriority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies
CSCD 433/533 Priority Traffic Advanced Networks Spring 2016 Lecture 21 Congestion Control and Queuing Strategies 1 Topics Congestion Control and Resource Allocation Flows Types of Mechanisms Evaluation
More informationResource allocation in networks. Resource Allocation in Networks. Resource allocation
Resource allocation in networks Resource Allocation in Networks Very much like a resource allocation problem in operating systems How is it different? Resources and jobs are different Resources are buffers
More informationDynamic Flow Regulation for IP Integration on Network-on-Chip
Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden Agenda The IP integration problem Why
More informationTiming Analysis Enhancement for Synchronous Program
Timing Analysis Enhancement for Synchronous Program Extended Abstract Pascal Raymond, Claire Maiza, Catherine Parent-Vigouroux, Fabienne Carrier, and Mihail Asavoae Grenoble-Alpes University Verimag, Centre
More informationMinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect
MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU
More informationDesigning Predictable Real-Time and Embedded Systems
Designing Predictable Real-Time and Embedded Systems Juniorprofessor Dr. Jian-Jia Chen Karlsruhe Institute of Technology (KIT), Germany 0 KIT Feb. University 27-29, 2012 of at thetu-berlin, State of Baden-Wuerttemberg
More informationA Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems
A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems
More informationThomas Moscibroda Microsoft Research. Onur Mutlu CMU
Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationModule 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth
Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012
More informationReducing CAN latencies by use of weak synchronization between stations
Reducing CAN latencies by use of weak synchronization between stations Hugo Daigmorte 1, Marc Boyer 1, Jörn Migge 2 1 ONERA, Université de Toulouse, France 2 RealTime-at-Work, France Scheduling frames
More informationDesign and Implementation of Buffer Loan Algorithm for BiNoC Router
Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationSupporting Distributed Shared Memory. Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009
Supporting Distributed Shared Memory Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009 Memory content in today s SoCs 3 Elements in SoC Processing: Well understood;
More informationLecture 18: Communication Models and Architectures: Interconnection Networks
Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi
More informationReal-Time Component Software. slide credits: H. Kopetz, P. Puschner
Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software
More informationRouting Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)
Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup
More informationComputer Networking. Queue Management and Quality of Service (QOS)
Computer Networking Queue Management and Quality of Service (QOS) Outline Previously:TCP flow control Congestion sources and collapse Congestion control basics - Routers 2 Internet Pipes? How should you
More informationA Survey of Timing Verification Techniques for Multi-Core Real-Time Systems
A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems Claire Maiza, Hamza Rihani, Juan M. Rivas, Joël Goossens, Sebastian Altmeyer, Robert I. Davis Verimag Research Report n o TR-2018-9
More informationDeadlock-free XY-YX router for on-chip interconnection network
LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ
More informationBasic Switch Organization
NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization
More informationComparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks
Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Andreas Lankes¹, Soeren Sonntag², Helmut Reinig³, Thomas Wild¹, Andreas Herkersdorf¹
More informationLecture 21. Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov.
Lecture 21 Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov. 7 http://money.cnn.com/2011/11/07/technology/juniper_internet_outage/
More informationDesign and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures
Design and Analysis of Real-Time Systems Predictability and Predictable Microarcectures Jan Reineke Advanced Lecture, Summer 2013 Notion of Predictability Oxford Dictionary: predictable = adjective, able
More informationSoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik
SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on
More informationOASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS
OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu
More informationStudent Name: University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science
University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science CS 162 Spring 2011 I. Stoica FINAL EXAM Friday, May 13, 2011 INSTRUCTIONS READ THEM
More informationNoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores
www.bsc.es NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona 1,2, Carles Hernandez 1, Enrico Mezzetti 1, Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona
More informationCSI3131 Final Exam Review
CSI3131 Final Exam Review Final Exam: When: April 24, 2015 2:00 PM Where: SMD 425 File Systems I/O Hard Drive Virtual Memory Swap Memory Storage and I/O Introduction CSI3131 Topics Process Computing Systems
More informationChapter 6 Queuing Disciplines. Networking CS 3470, Section 1
Chapter 6 Queuing Disciplines Networking CS 3470, Section 1 Flow control vs Congestion control Flow control involves preventing senders from overrunning the capacity of the receivers Congestion control
More informationCPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )
CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling
More informationWorst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni
orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU
More informationCHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP
133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located
More informationFormal Modeling and Analysis of Stream Processing Systems
Formal Modeling and Analysis of Stream Processing Systems Linh T.X. Phan March 2009 Computer and Information Science University of Pennsylvania 1 High performance Highly optimized requirements systems
More informationMC7204 OPERATING SYSTEMS
MC7204 OPERATING SYSTEMS QUESTION BANK UNIT I INTRODUCTION 9 Introduction Types of operating systems operating systems structures Systems components operating systems services System calls Systems programs
More informationTraversal time for weakly synchronized CAN bus
Traversal time for weakly synchronized CAN bus Hugo Daigmorte, Marc Boyer ONERA The French aerospace lab ème Séminaire Toulousain en Réseaux (STORE) Travaux présentés dans : H. Daigmorte, M. Boyer, Traversal
More informationFault-adaptive routing
Fault-adaptive routing Presenter: Zaheer Ahmed Supervisor: Adan Kohler Reviewers: Prof. Dr. M. Radetzki Prof. Dr. H.-J. Wunderlich Date: 30-June-2008 7/2/2009 Agenda Motivation Fundamentals of Routing
More informationClass-based Packet Scheduling Policies for Bluetooth
Class-based Packet Scheduling Policies for Bluetooth Vishwanath Sinha, D. Raveendra Babu Department of Electrical Engineering Indian Institute of Technology, Kanpur - 08 06, INDIA vsinha@iitk.ernet.in,
More informationEmbedded Software Programming
Embedded Software Programming Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann-Hang Lee yhlee@asu.edu (480) 727-7507 Event and Time-Driven Threads taskspawn (name,
More information2.993: Principles of Internet Computing Quiz 1. Network
2.993: Principles of Internet Computing Quiz 1 2 3:30 pm, March 18 Spring 1999 Host A Host B Network 1. TCP Flow Control Hosts A, at MIT, and B, at Stanford are communicating to each other via links connected
More informationQoS in Frame relay networks
Frame elay: characteristics Frame elay ndrea ianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Packet switching with virtual circuit service Label named
More informationEfficient Latency Guarantees for Mixed-criticality Networks-on-Chip
Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip Sebastian Tobuschat, Rolf Ernst IDA, TU Braunschweig, Germany 18.
More informationModel-based Analysis of Event-driven Distributed Real-time Embedded Systems
Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Gabor Madl Committee Chancellor s Professor Nikil Dutt (Chair) Professor Tony Givargis Professor Ian Harris University of California,
More informationEECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture
EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,
More informationIncorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems. Scott Marshall and Stephen Twigg
Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems Scott Marshall and Stephen Twigg 2 Problems with Shared Memory I/O Fairness Memory bandwidth worthless without memory
More informationModelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system
Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system A.J.M. Moonen Information and Communication Systems Department of Electrical Engineering Eindhoven University
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationKalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013
Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013 The End of Dennard MOSFET Scaling Theory 2013 Kalray SA All Rights Reserved MPSoC
More informationSE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert
SE300 SWE Practices Lecture 10 Introduction to Event- Driven Architectures Tuesday, March 17, 2015 Sam Siewert Copyright {c} 2014 by the McGraw-Hill Companies, Inc. All rights Reserved. Four Common Types
More informationChair for Network Architectures and Services Prof. Carle Department of Computer Science Technische Universität München.
Chair for Network Architectures and Services Prof. Carle Department of Computer Science Technische Universität München Network Analysis 2b) Deterministic Modelling beyond Formal Logic A simple network
More informationA VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK
A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements
More informationScheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router
Scheduling algorithms Scheduling Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Scheduling: choose a packet to transmit over a link among all
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationEfficient And Advance Routing Logic For Network On Chip
RESEARCH ARTICLE OPEN ACCESS Efficient And Advance Logic For Network On Chip Mr. N. Subhananthan PG Student, Electronics And Communication Engg. Madha Engineering College Kundrathur, Chennai 600 069 Email
More informationSireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern
Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion
More informationLecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control
Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,
More informationARTIST-Relevant Research from Linköping
ARTIST-Relevant Research from Linköping Department of Computer and Information Science (IDA) Linköping University http://www.ida.liu.se/~eslab/ 1 Outline Communication-Intensive Real-Time Systems Timing
More informationResource Control and Reservation
1 Resource Control and Reservation Resource Control and Reservation policing: hold sources to committed resources scheduling: isolate flows, guarantees resource reservation: establish flows 2 Usage parameter
More informationDESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC
DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com
More informationRSVP 1. Resource Control and Reservation
RSVP 1 Resource Control and Reservation RSVP 2 Resource Control and Reservation policing: hold sources to committed resources scheduling: isolate flows, guarantees resource reservation: establish flows
More informationAtacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet
Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet Gonzalo Carvajal 1,2 and Sebastian Fischmeister 1 1 University of Waterloo, ON, Canada 2 Universidad de Concepcion,
More informationPDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs
PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao
More informationGeneric Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture
Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,
More informationInterconnection Networks: Routing. Prof. Natalie Enright Jerger
Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly
More informationJoint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals
Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University
More information