Parallel Code Generation of Synchronous Programs for a Many-core Architecture

Size: px
Start display at page:

Download "Parallel Code Generation of Synchronous Programs for a Many-core Architecture"

Transcription

1 Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat Supervisors: Reviewers: Pascal Raymond (Verimag), Matthieu Moy (LIP) Benoît Dupont de Dinechin (Kalray) Jan Reineke (Universität des Saarlandes) Robert de Simone (INRIA Sophia-Antipolis, Aoste) Examiners: Anne Bouillard (ENS Paris) Alain Girault (INRIA Grenoble) Christine Rochange (IRIT) Thesis defense, November 16th, /48

2 2/48 SPACE, WEIGHT, AND POWER

3 2/48 SPACE, WEIGHT, AND POWER

4 3/48 SAFETY CRITICAL SYSTEMS Example of aircraft flight controller (control-command) Pilot (stick) Actuator Sensors (altitude) Processor feedback Time-critical: latency constraints are part of the specification (e.g. < some 10ms) Designed with eg., Synchronous Languages

5 4/48 THE SYNCHRONOUS DATA-FLOW LANGUAGES (1/2) Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 pre init Lustre (academic), Scade (industrial) Network of nodes

6 5/48 THE SYNCHRONOUS DATA-FLOW LANGUAGES (2/2) Logical i n 1 = 1 i n = 3 i n+1 = 7... instant n-1 instant n instant n+1... i t o n 1 = 2 o n = 6 o n+1 = 14 2 o One execution is called a logical instant

7 THE SYNCHRONOUS DATA-FLOW LANGUAGES (2/2) Logical i n 1 = 1 i n = 3 i n+1 = 7... instant n-1 instant n instant n+1... i t o n 1 = 2 o n = 6 o n+1 = 14 One execution is called a logical instant 2 o Physical i n i n 1 i n+1 instant n-1 instant n instant n+1... Step Step Step... t o n 1 o n o n+1 Requirement: o n before i n+1. Worst-Case Execution Time (WCET) 5/48

8 6/48 LUSTRE/SCADE DELAY OPERATOR instant 1 instant 2 instant 3 a a=1 a=2 a=3 init=0 c x 2 b pre init=0 b=1 b=4 b=9 d c=1 c=2 c=3 d=0 d=1 d=4 t t + t e e=1 e=3 e=7 previous operator Logical/functional delay

9 7/48 CODE GENERATION FOR SINGLE-CORE Node 1 Node 2 Node 3 Formal semantics Determinism Code Generation Compilation.c for each period { i = sensors(); o = main_step(i); actuators(o); } Static schedule Synchronous languages void main_step(i) { Lustre, o1 = Heptagon, N1_step(i); SCADE o2 = N2_step(i); Industrial returns use: N3_step(o1, SCADE in Airbus o2); A380 } Single-core processor Static schedule N1 N2 N3... WCET

10 CODE GENERATION FOR SINGLE-CORE Node 1 Node 2 Node 3 Formal semantics Determinism Code Generation Compilation.c for each period { i = sensors(); o = main_step(i); actuators(o); } Static schedule Synchronous languages void main_step(i) { Lustre, o1 = Heptagon, N1_step(i); SCADE o2 = N2_step(i); Industrial returns use: N3_step(o1, SCADE in Airbus o2); A380 } Single-core processor Static schedule N1 N2 N3... WCET OK for sequential execution. What about parallel? 7/48

11 THE KALRAY MPPA2 Core Properties of the Kalray MPPA2: Cores No complex branch prediction Only LRU caches Cluster Banked Shared-Memory (16*128ko) Independent arbiter for each memory bank. Network-on-Chip (NoC) between clusters Bandwidth limiter (Network calculus possible) Performance + Predictability Cluster (=multi-core) 8/48

12 9/48 OVERVIEW Kalray MPPA2 Bostan Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 pre init

13 10/48 Part I: Semantics Preserving Parallelization Extraction of Parallelism Mapping / Scheduling Code Generation Communication Channels Implementation Part II: Real-Time Guarantees Part III: Evaluation and Conclusion

14 11/48 EXTRACTION OF PARALLELISM Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction Method 1: functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Code Generation System + Communication Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing In the top-level node: 1 node = 1 task (runnable). Generation of sequential code for each node Similar to Architecture Description Language (Prelude, Giotto) Method 2: based on fork-join: see manuscript Executable for Kalray

15 11/48 EXTRACTION OF PARALLELISM Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction Method 1: functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Code Generation System + Communication Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing In the top-level node: 1 node = 1 task (runnable). Generation of sequential code for each node Similar to Architecture Description Language (Prelude, Giotto) Method 2: based on fork-join: see manuscript Executable for Kalray

16 12/48 MAPPING / SCHEDULING Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Need non-preemptive static schedule External scheduling tool We can easily check the schedule Code Generation System + Communication Executable for Kalray

17 13/48 MAPPING/SCHEDULING: EXAMPLE Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Core 0: N1 ; N4 ; N6 Core 1: N2 ; N5 Core 2: N3 Schedule checked using the dependency graph

18 14/48 CODE GENERATION Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Communication and Dependency graph N1 N2 N3 N4 N6 N5 Mapping + Non-preemptive scheduling NoC Routing Code Generation System + Communication Executable for Kalray

19 14/48 CODE GENERATION Contribution External tool N1 N2 N3 N4 N5 pre init N6 Dependencies compiled into wait for input. functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Sketch of code for core 0. for each period{ wait_inputs_n1(); // N1 N1_step(); write_outputs_n1(); wait_inputs_n4(); // N4 N4_step(); write_outputs_n4(); Code Generation System + Communication Executable for Kalray } wait_inputs_n6(); // N6 N6_step(); write_outputs_n6();

20 15/48 COMMUNICATION CHANNELS Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Two kinds of communications: - instantaneous ( ) - delayed ( ) Code Generation System + Communication Executable for Kalray

21 INSTANTANEOUS COMMUNICATION Contribution External tool N1 N2 N3 N4 N5 pre init N6 void write_outputs_n2() { copy(n4.in2, N2.out); functional code N1.c N2.c N3.c N4.c N5.c N6.c Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 } copy(n5.in1, N2.out); void wait_for_inputs_n4() { Mapping + Non-preemptive scheduling NoC Routing Code Generation System + Communication Executable for Kalray } Efficient hardware synchronization Software-based cache coherency 16/48

22 INSTANTANEOUS COMMUNICATION Contribution External tool N1 N2 N3 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Code Generation System + Communication Executable for Kalray N4 N5 pre init Communication and Dependency graph N1 N2 N4 N6 Parallelism Extraction N6 N5 N3 NoC Routing void write_outputs_n2() { copy(n4.in2, N2.out); channel N2 N4 = true; // + cache management notify(core N4); copy(n5.in1, N2.out); } void wait_for_inputs_n4() { while(! channel N2 N4 ) { wait(); } channel N2 N4 = false; while(! channel_n1_n4) { wait(); } channel_n1_n4 = false; // + cache management } Efficient hardware synchronization Software-based cache coherency 16/48

23 17/48 COMMUNICATION CHANNELS Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Two kinds of communications: - instantaneous ( ) - delayed ( ) Code Generation System + Communication Executable for Kalray

24 18/48 DELAYED COMMUNICATION i1 i2 A pre B o1 init b o2 A S B Transformation into a SWAP + scheduling constraints. Scenario 1 Scenario 2 n-1 instant n n+1 n-1 instant n n+1 A.in S.in Y X A B S... A.in S.in X Y B A S... Constraint: B S Constraint: A S

25 18/48 DELAYED COMMUNICATION i1 i2 A pre B o1 init b o2 A S B Transformation into a SWAP + scheduling constraints. Scenario 1 Scenario 2 n-1 instant n n+1 n-1 instant n n+1 A.in S.in Y X A B S... A.in S.in X Y B A S... Constraint: B S Constraint: A S

26 18/48 DELAYED COMMUNICATION i1 i2 A pre B o1 init b o2 A S B Transformation into a SWAP + scheduling constraints. Scenario 1 Scenario 2 n-1 instant n n+1 n-1 instant n n+1 A.in S.in Y X A B S... A.in S.in X Y B A S... Constraint: B S Constraint: A S

27 19/48 CONCLUSION OF PART I Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing Semantics preserved Next step: Temporal Guaranties Code Generation System + Communication Executable for Kalray

28 20/48 Part I: Semantics Preserving Parallelization Part II: Real-Time Guarantees Shared Memory Interference Time-Triggered Execution Model Real-Time Guarantees with Network-on-Chip Part III: Evaluation and Conclusion

29 21/48 INTERFERENCE AND REACTION TIME Core 0 Task A Core 1... M E M Core 0 Core 1 Task A WCET Core 15 Single-core: WCET is sufficient

30 INTERFERENCE AND REACTION TIME Core 0 WCET WCRT Task A Core 0 Task A M Core 1 E Request Access Task B M... Core 1 Task B Core 15 Request Access Single-core: WCET is sufficient Multi-core: WCET+interference on shared resources = Worst-Case Response Time (WCRT). WCET+interference too pessimistic in the general case exploit the execution model in the analysis 21/48

31 22/48 BANKED MEMORY Bank 0 Core 0 Bank 1 Core 1... Round Robin... Private arbiter for each bank Available in the Kalray MPPA2 (silicon)

32 23/48 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only...

33 23/48 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only...

34 23/48 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Communication Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only...

35 BANKED MEMORY IN KALRAY MPPA2 Node 1 Node 4 Node 2 Node 6 Node 3 Node 5 Bank 0 for core 0 Core 0 N1 N4 N6 Communication Bank 1 for Core 1 Core 1 N2 N5... Round Robin Choice: Code, input buffer, local variables are mapped in core s bank of the core Interference on communication only How to activate tasks?... 23/48

36 24/48 PROBLEM WITH ASAP EXECUTION Core 0 Core 1 WCET A Task A WCET C Task C WCET B Task B WCET D Task D Deadline

37 24/48 PROBLEM WITH ASAP EXECUTION WCET A WCET B Core 0 Task A Task B WCET C WCET D Core 1 Task C Task D Deadline WCET B Faster execution of A interference between B and C. Core 0 WCET A Task A Task B Timing Anomaly WCET C WCET D Core 1 Task C Task D Deadline

38 24/48 PROBLEM WITH ASAP EXECUTION WCET A WCET B Core 0 Task A Task B WCET C WCET D Core 1 Task C Task D Deadline WCET B Faster execution of A interference between B and C. Core 0 WCET A Task A WCET C Task B WCET D Timing Anomaly Solution: release dates for tasks (time-triggered) Core 1 Task C Task D Deadline

39 TIME-TRIGGERED EXECUTION MODEL Principle Release = 0 Release = 427 WCRT A WCRT B Core 0 Task A Task B Core 1 Release = 0 WCRT C WCRT D Task C Task D Release = 398 Compute a static release date when data is guaranteed to be available Time-Triggered execution prevents tasks from starting earlier Implementation void wait_inputs_n1() { while(time() < t_period + release_date_n1){ // wait } } 25/48

40 26/48 MULTI-CORE INTERFERENCE ANALYSIS WCET Dependencies Memory access Mapping MIA Release dates of tasks WCRT Multi-Core Interference Analysis (MIA) Tool [Hamza Rihani, RTNS 2016] www-verimag.imag.fr/multi-core-interference-analysis.html

41 26/48 MULTI-CORE INTERFERENCE ANALYSIS WCET Dependencies Memory access Mapping MIA Release dates of tasks WCRT Multi-Core Interference Analysis (MIA) Tool [Hamza Rihani, RTNS 2016] www-verimag.imag.fr/multi-core-interference-analysis.html

42 FRAMEWORK Contribution External tool N1 N2 N3 N4 N5 pre init N6 functional code N1.c N2.c N3.c N4.c N5.c N6.c Mapping + Non-preemptive scheduling Parallelism Extraction Communication and Dependency graph N1 N2 N4 N6 N5 N3 NoC Routing + Rate Attribution WCET Analysis (Otawa, AiT) Computation of the release dates (MIA tool) Insertion in the executable. Code Generation System + Communication Executable for Kalray Network Calculus (WCTT) [Submitted: Graillat A., Rihani H., Maiza C., Moy M., Raymond P., Dupont de Dinechin B., Real-Time Systems Journal] release dates WCET Analysis MIA 27/48

43 28/48 OUR EXECUTION MODEL Static scheduling Bare metal, no interrupt, no preemption Banked memory mapping (1 core 1 bank) Time-triggered Deterministic and WCRT guarantee

44 OVERVIEW Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Communication and Dependency graph N1 N2 N3 N4 N6 N5 Mapping + Non-preemptive scheduling NoC Routing + Rate Attribution Code Generation System + Communication Executable for Kalray Network Calculus (WCTT) release dates WCET Analysis MIA 29/48

45 30/48 THE KALRAY MPPA2 S NOC flow 2 North East East North Local East, North, East, Local Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B

46 30/48 THE KALRAY MPPA2 S NOC North East East North Local East, North, East, Local flow 1 Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B

47 30/48 THE KALRAY MPPA2 S NOC flow 2 North East East North Full Duplex Links Local East, North, East, Local flow 1 Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B

48 30/48 THE KALRAY MPPA2 S NOC flow 2 North East North East rate = 1 2 B rate = 1 2 B Local East, North, East, Local flow 1 Router Rate limiter head route data TX buffer head route data DMA Core 0 MEM Core 0 RX buffer MEM Cluster A Cluster B

49 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f Routing Route = sequence of directions computed by sender cluster router [Dupont de Dinechin B., Graillat A., NoCArc 2017]

50 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f East East South 1. Routing Route = sequence of directions computed by sender Local South cluster router [Dupont de Dinechin B., Graillat A., NoCArc 2017]

51 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.1 f f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) [Dupont de Dinechin B., Graillat A., NoCArc 2017]

52 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.2 f1.1 f f2.1 f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) [Dupont de Dinechin B., Graillat A., NoCArc 2017]

53 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.2 f1.1 f f2.1 f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness [Dupont de Dinechin B., Graillat A., NoCArc 2017]

54 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.1 f f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, [Dupont de Dinechin B., Graillat A., NoCArc 2017]

55 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f1.2 f f cluster router 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}... [Dupont de Dinechin B., Graillat A., NoCArc 2017]

56 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f cluster router f f [Dupont de Dinechin B., Graillat A., NoCArc 2017] 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2} Rate attribution Fair attribution e.g. {0.5, 0.5},

57 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f cluster router f f [Dupont de Dinechin B., Graillat A., NoCArc 2017] 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2} Rate attribution Fair attribution e.g. {0.5, 0.5}, {1.0, 1.0}...

58 31/48 REAL-TIME GUARANTEES WITH NETWORK-ON-CHIP f1 f cluster router f f [Dupont de Dinechin B., Graillat A., NoCArc 2017] 1. Routing Route = sequence of directions computed by sender Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE)) 2. Route Selection Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2} Rate attribution Fair attribution e.g. {0.5, 0.5}, {1.0, 1.0} Network Calculus Compute latency and buffer level

59 32/48 1. UNICAST ROUTING f1 Hamiltonian Routing Up-Network Down-Network Deadlock-free Path-Diversity

60 33/48 1. MULTICAST ROUTING PROBLEM f Deadlock-free heuristic to travelling salesman problem : One route? Tree is no possible with Kalray architecture Hamiltonian: minimum of two routes Source Destination

61 33/48 1. MULTICAST ROUTING PROBLEM f Deadlock-free heuristic to travelling salesman problem : One route? Tree is no possible with Kalray architecture Hamiltonian: minimum of two routes Source Destination

62 34/48 1. MULTICAST ROUTING: HAMILTONIAN Dual-Path + Hamiltonian without path diversity (Lin et al., 1992) Small path diversity, we can do better

63 35/48 1. MULTICAST ROUTING: HOE Even Odd Allowed with HOE Even Hamiltonian Odd Even (Bahrebar et al.) routing: allows some non-hamiltonian paths Our choice: Dual-Path + HOE

64 36/48 EVALUATION OF DEADLOCK-FREE MULTICAST ROUTING

65 EVALUATION OF DEADLOCK-FREE MULTICAST ROUTING Minimum rate vs. Network load (Higher is better) 1 Dual Path+Hamiltonian Dual Path+HOE Realistic network load Network Load Minimal rate increased of up to 19% with Dual Path HOE compared to Dual Path Hamiltonian. 36/48

66 37/48 2. ROUTE SELECTION f1 f1.2 f f1.3 cluster router f f2.1 f2.2 f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations, run step 3 on each solution, keep the best. f2.1 f2.2 f2.3 f , , , 0.5 f , , , 1.0 f , , , 1.0 Enumeration of 9 combinations of possible routes

67 38/48 2. ROUTE SELECTION f1 f1.2 f2 f f f2.1 f2.2 f Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Exploration with pruning: Ignore combinations with non-minimal bottleneck cluster router [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] f1.1 f1.2 f1.3 f2.1 f2.2 f2.3

68 38/48 2. ROUTE SELECTION f1 f1.2 f f1.3 cluster router f f2.1 f2.2 f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Exploration with pruning: Ignore combinations with non-minimal bottleneck Apply rate attribution (step 3) f2.1 f2.2 f2.3 f1.1 f , , 1.0 f , , 1.0 Enumeration of 4 combinations

69 39/48 Exploration vs. Our Exploration with Pruning

70 39/48 Exploration vs. Our Exploration with Pruning

71 40/48 2. ROUTE SELECTION f1 f1.2 f f1.3 cluster router f f2.1 f2.2 f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates

72 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router 1 4 f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] 1 4 Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates

73 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router 1 4 f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] 1 4 Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates 2. Remove smallest alternative routes

74 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates 2. Remove smallest alternative routes

75 40/48 2. ROUTE SELECTION f1 f f f1.3 cluster router f f2.1 f f [Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018] Input: set of route for each flow Output: one route per flow Maximize fairness Naive exploration: Enumeration all combinations (9 combinations) Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) Our LP-based Heuristic: Consider all alternative routes at once 1. Attribute fair rates 2. Remove smallest alternative routes 3. Enumerate the 3 combinations

76 41/48 EVALUATION OF OUR LP-BASED HEURISTIC

77 41/48 EVALUATION OF OUR LP-BASED HEURISTIC

78 42/48 LP-BASED HEURISTIC VS. OPTIMAL EXPLORATION Optimal algorithms (100%): naive exploration, exploration with pruning Minimal rate as indicator of fairness

79 43/48 Part I: Semantics Preserving Parallelization Part II: Real-Time Guarantees Use Cases and Evaluation Conclusion Part III: Evaluation and Conclusion

80 THE ROSACE CASE STUDY Altitude-only flight controller Open source (Simulink, Lustre, Giotto) [Pagetti, Soussié, RTAS 14] Sequential Best effort pure +100 cycles +200 cycles +100: each task augmented with 100 cycles +200: each task augmented with 200 cycles Best effort (ASAP): no real-time guarantees Time-Triggered (TT): real-time guarantees 44/48

81 THE ROSACE CASE STUDY Altitude-only flight controller Open source (Simulink, Lustre, Giotto) [Pagetti, Soussié, RTAS 14] Sequential Best effort pure +100 cycles +200 cycles +100: each task augmented with 100 cycles +200: each task augmented with 200 cycles Best effort (ASAP): no real-time guarantees Time-Triggered (TT): real-time guarantees 44/48

82 45/48 SYNTHETIC BENCHMARK ON 64 CORES 3 phases: Dispatch, Compute, Gather 20 Bytes per flow, high network congestion for gather phase Compute Dispatch Gather 13 15

83 SYNTHETIC BENCHMARK ON 64 CORES 3 phases: Dispatch, Compute, Gather 20 Bytes per flow, high network congestion for gather phase 54% of WCRT for functional code 46% of WCRT for communication and system code 45/48

84 CONCLUSION Contribution External tool N1 N2 N3 N4 N5 pre init N6 Parallelism Extraction functional code N1.c N2.c N3.c N4.c N5.c N6.c Communication and Dependency graph N1 N2 N3 N4 N6 N5 Mapping + Non-preemptive scheduling NoC Routing + Rate Attribution Code Generation System + Communication Executable for Kalray Network Calculus (WCTT) release dates WCET Analysis MIA 46/48

85 CONCLUSION Contrib Externa Semantics init N6 Parallelism Extraction Preserving Code Communication and Dependency graph N1 N2 N4 N6 N5 N3 Generation NoC Routing + Rate Attribution Executable for Kalray Network Calculus (WCTT) release dates WCET Analysis MIA 46/48

86 CONCLUSION Contrib Externa Semantics init N6 Parallelism Extraction Preserving Code Communication and Dependency graph N1 N2 N4 N6 N5 N3 Generation NoC Routing + Rate Attribution Time-triggered Network Calculus (WCTT) release dates Execution Model 46/48

87 CONCLUSION Contrib Externa Semantics Preserving Code Hard realtime Guarantees Generation Many-Core Time-triggered Performance release dates Execution Model 46/48

88 CONCLUSION Contrib Externa Semantics Preserving Code Generation Time-triggered Hard realtime Guarantees Many-Core Performance Bridge between academic and industry release dates Execution Model 46/48

89 47/48 FUTURE WORK Semantics Preserving Code Hard realtime Guarantees Generation Many-Core Time-triggered Performance Execution Model

90 47/48 FUTURE WORK Semantics Preserving Code Hard realtime Guarantees Generation Release date computation and memory interferences analysis with NoC Time-triggered Execution Model Many-Core Performance

91 47/48 FUTURE WORK Semantics Memory size problem (overlays?) Release date computation and memory interferences analysis with NoC Preserving Code Generation Time-triggered Execution Model Hard realtime Guarantees Many-Core Performance

92 47/48 FUTURE WORK Input/Output Management Semantics Memory size problem (overlays?) Release date computation and memory interferences analysis with NoC Preserving Code Generation Time-triggered Execution Model Hard realtime Guarantees Many-Core Performance

93 47/48 FUTURE WORK Input/Output Management Semantics Memory size problem (overlays?) Preserving Code Generation Hard realtime Guarantees What should be the standard real-time multi-core processor? Many-Core Release date computation Time-triggered Performance and memory interferences Execution analysis with NoC Model

94 FUTURE WORK Input/Output Management Semantics Memory size problem (overlays?) Preserving Code Generation Hard realtime Guarantees What should be the standard real-time multi-core processor? Many-Core Performance Release date computation Time-triggered and memory interferences Execution analysis with NoC Model Thank you for your attention. Questions? 47/48

95 48/48 Published Graillat A., Dupont de Dinechin B., DATE 2018 Parallel Code Generation of Synchronous Programs for a Many-core Architecture. Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018 Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor. Dupont de Dinechin B., Graillat A., NoCArc 2017 Feed-Forward Routing for the Wormhole Switching Network-on-Chip of the Kalray MPPA2-256 Processor. Dupont de Dinechin B., Graillat A., AISTECS 2017 Network-on-Chip Service Guarantees on the Kalray MPPA-256 Bostan Processor. Submitted Graillat A., Rihani H., Maiza C., Moy M., Raymond P., Dupont de Dinechin B., Real-Time Systems Journal Implementation Framework for Real-Time Data-Flow Synchronous Programs on Many-Cores. Graillat A, Maiza C., Moy M., Raymond P., Dupont de Dinechin B., DATE 2019 Response Time Analysis of Dataflow Applications on a Many-Core Processor with Shared-Memory and Network-on-Chip.

96 48/48 REFERENCES [1] Bertsekas, D. P., Gallager, R. G., and Humblet, P. (1992). Data networks, vol. 2. Prentice Hall International, Englewood Cliffs, New Jersey, 7632: [2] Chen, S. and Nahrstedt, K. (1998). Maxmin fair routing in connection-oriented networks. In Proc. Euro-Parallel and Distributed Systems Conf, pages

97 48/48 Backup

98 48/48 DEADLOCK IN WORMHOLE NETWORKS A R1 R2 R4 R3 Deadlock-freeness can be ensured at routing time B Solutions: XY, Hamiltonian Odd-Even, Turn Prohibition, etc This instance of flows deadlocks A wormhole packet is spread along the route Links 1-4 and 3-2 are shared A holds 1-4 but waits for 3-2 B holds 3-2 but waits for 1-4

99 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate cluster router

100 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = cluster router

101 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = 1 4 Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2) cluster router

102 48/48 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = 1 4 Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2) cluster router Example 2: f1 = 2 3, f2 = 1 3, f3 = 1 3, f4 = 1 3

103 MAX MIN FAIR RATE ATTRIBUTION Rate limiter configuration to avoid buffer overflow Rate of a flow f i noted ρ i f1 f2 f3 f Valid: for each link, i ρ i 1 flit/cycle Fair attribution: max-min fairness [2]: cannot increase a rate without decreasing an already smaller (or equal) rate Example 1: f1 = 1 2, f2 = 1 2, f3 = 1 4, f4 = 1 4 Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2) cluster router Example 2: f1 = 2 3, f2 = 1 3, f3 = 1 3, f4 = 1 3 Valid and max-min fair (increasing f2, f3 or f4 cannot be done without reducing one of them) Classical solution: the Water Filling algorithm [1] 48/48

104 48/48 DETERMINISTIC NETWORK CALCULUS (DNC) PRINCIPLE f1 f2 f3 f cluster router b r R r b delay backlog R t T t T γ r,b (t) β R,T (t) (β R,T γ r,b )(t) Arrival Curve Service Curve Convolution t Arrival curve is a maximum traffic entering the network Service curve is a minimum traffic handeled by the network How to compute service curve?

105 48/48 KALRAY MPPA2 NETWORK-ON-CHIP f1 f2 f3 f4 f3= f2= 1 3 f4= f1= cluster router FIFO Round-Robin... f1 f3 f2 f4 Kalray MPPA2 Network Elements 7 11

106 48/48 SEPARATED FLOW ANALYSIS (1/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f Separated Flow Analysis: Compute the service curve of each network element Compute the successive arrival curves at each network element Convolution of the element service curves network service curve

107 48/48 SEPARATED FLOW ANALYSIS (2/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f Service curve offered to f2? At routers 1, 2, 3, 7 and 11.

108 48/48 BLIND MULTIPLEXING f2= 1 3 f1= 2 3 R=1 - b1 r1 = R-r1 Output link Peak rate Arrival curve of f1 T = b1 R r1 Service Curve for f2 No information about the arbitration: consider f2 is low priority. Can we do better?

109 48/48 ROUND ROBIN MULTIPLEXING f2= 1 3 f1= 2 3 N=2 Packets of size l max Restriction: Rate R (not applicable to f1) Blind multiplexing is the conservative solution. T = (N 1)l max Service Curve for f2 R = 1 N

110 WORST-CASE TRAVERSAL TIME (WCTT): APPLICATION (1/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f Service curve offered to f2? Router 1: Round Robin multiplexing (N=2) Router 2: non active (alone) Router 3: Round Robin multiplexing (N=2, f3 and f4 are aggregated with b a = i 2 b i, r a = i 2 r i) Router 7 and 11: non active 48/48

111 48/48 WORST-CASE TRAVERSAL TIME (WCTT): APPLICATION (2/2) f3= 1 3 f2= 1 3 f4= 1 3 f1= FIFO Round-Robin... f1 f3 f2 f f1 f2 f3 f4 WCTT (cycles) With packets of l max =17 flits.

112 48/48 EVALUATION OF OUR LP-BASED HEURISTIC

113 48/48 EVALUATION OF OUR LP-BASED HEURISTIC

114 48/48 LP-BASED HEURISTIC VS. OPTIMAL EXPLORATION Optimal algorithms (100%): naive exploration, exploration with pruning Minimal rate as indicator of fairness

115 48/48 [1] Bertsekas, D. P., Gallager, R. G., and Humblet, P. (1992). Data networks, vol. 2. Prentice Hall International, Englewood Cliffs, New Jersey, 7632: [2] Chen, S. and Nahrstedt, K. (1998). Maxmin fair routing in connection-oriented networks. In Proc. Euro-Parallel and Distributed Systems Conf, pages

Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor

Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor Hamza Rihani, Matthieu Moy, Claire Maiza, Robert Davis, Sebastian Altmeyer To cite this version: Hamza Rihani, Matthieu

More information

Network Calculus: A Comparison

Network Calculus: A Comparison Time-Division Multiplexing vs Network Calculus: A Comparison Wolfgang Puffitsch, Rasmus Bo Sørensen, Martin Schoeberl RTNS 15, Lille, France Motivation Modern multiprocessors use networks-on-chip Congestion

More information

Parallel Code Generation of Synchronous Programs for a Many-core Architecture

Parallel Code Generation of Synchronous Programs for a Many-core Architecture Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat, Matthieu Moy, Pascal Raymond, enoît Dupont de Dinechin To cite this version: Amaury Graillat, Matthieu Moy,

More information

Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor

Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor Marc Boyer ONERA Toulouse, France Benoît Dupont de Dinechin Kalray Grenoble, France Amaury Graillat Kalray, Verimag

More information

Real Time Systems Compilation with Lopht. Dumitru Potop-Butucaru INRIA Paris/ AOSTE team Jan. 2016

Real Time Systems Compilation with Lopht. Dumitru Potop-Butucaru INRIA Paris/ AOSTE team Jan. 2016 Real Time Systems Compilation with Lopht Dumitru Potop-Butucaru INRIA Paris/ AOSTE team Jan. 2016 1 Embedded control systems Computing system that controls the execution of a piece of «physical» equipment,

More information

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Jan Reineke @ saarland university computer science ACACES Summer School 2017

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Cross Clock-Domain TDM Virtual Circuits for Networks on Chips

Cross Clock-Domain TDM Virtual Circuits for Networks on Chips Cross Clock-Domain TDM Virtual Circuits for Networks on Chips Zhonghai Lu Dept. of Electronic Systems School for Information and Communication Technology KTH - Royal Institute of Technology, Stockholm

More information

Towards Semantics-Preserving Implementation of Synchronous Programs on Many-Core Architectures

Towards Semantics-Preserving Implementation of Synchronous Programs on Many-Core Architectures Master of Science in Informatics at Grenoble Master Mathématiques Informatique - spécialité Informatique option Parallel Distributed Embedded Systems Towards Semantics-Preserving Implementation of Synchronous

More information

Design and Analysis of Time-Critical Systems Introduction

Design and Analysis of Time-Critical Systems Introduction Design and Analysis of Time-Critical Systems Introduction Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Structure of this Course 2. How are they implemented?

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems

Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems Neil Audsley Real-Time Systems Group Computer Science Department University of York York United Kingdom 2011-12 1 Overview Motivation:

More information

Insights on the performance and configuration of AVB and TSN in automotive applications

Insights on the performance and configuration of AVB and TSN in automotive applications Insights on the performance and configuration of AVB and TSN in automotive applications Nicolas NAVET, University of Luxembourg Josetxo VILLANUEVA, Groupe Renault Jörn MIGGE, RealTime-at-Work (RTaW) Marc

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

Kalray MPPA Massively Parallel Processor Array

Kalray MPPA Massively Parallel Processor Array Kalray MPPA Massively Parallel Processor Array Engineering a Manycore Processor Platform for Mission-Critical Applications Benoît Dupont de Dinechin, CTO Kalray Solutions Target Two Strategic Markets Cloud

More information

EP2210 Scheduling. Lecture material:

EP2210 Scheduling. Lecture material: EP2210 Scheduling Lecture material: Bertsekas, Gallager, 6.1.2. MIT OpenCourseWare, 6.829 A. Parekh, R. Gallager, A generalized Processor Sharing Approach to Flow Control - The Single Node Case, IEEE Infocom

More information

Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost

Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost Nicolas NAVET INRIA / RealTime-at-Work http://www.loria.fr/~nnavet http://www.realtime-at-work.com Nicolas.Navet@loria.fr

More information

QUESTION BANK UNIT I

QUESTION BANK UNIT I QUESTION BANK Subject Name: Operating Systems UNIT I 1) Differentiate between tightly coupled systems and loosely coupled systems. 2) Define OS 3) What are the differences between Batch OS and Multiprogramming?

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Programming Languages for Real-Time Systems. LS 12, TU Dortmund

Programming Languages for Real-Time Systems. LS 12, TU Dortmund Programming Languages for Real-Time Systems Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 20 June 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 41 References Slides are based on Prof. Wang Yi, Prof.

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Network Model for Delay-Sensitive Traffic

Network Model for Delay-Sensitive Traffic Traffic Scheduling Network Model for Delay-Sensitive Traffic Source Switch Switch Destination Flow Shaper Policer (optional) Scheduler + optional shaper Policer (optional) Scheduler + optional shaper cfla.

More information

Composable Resource Sharing Based on Latency-Rate Servers

Composable Resource Sharing Based on Latency-Rate Servers Composable Resource Sharing Based on Latency-Rate Servers Benny Akesson 1, Andreas Hansson 1, Kees Goossens 2,3 1 Eindhoven University of Technology 2 NXP Semiconductors Research 3 Delft University of

More information

Worst-Case Delay Analysis of Master-Slave Switched Ethernet Networks

Worst-Case Delay Analysis of Master-Slave Switched Ethernet Networks Worst-Case Delay Analysis of Master-Slave Switched Ethernet Networks Mohammad Ashjaei Mälardalen University, Västerås, Sweden mohammad.ashjaei@mdh.se Meng Liu Mälardalen University, Västerås, Sweden meng.liu@mdh.se

More information

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies CSCD 433/533 Priority Traffic Advanced Networks Spring 2016 Lecture 21 Congestion Control and Queuing Strategies 1 Topics Congestion Control and Resource Allocation Flows Types of Mechanisms Evaluation

More information

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

Resource allocation in networks. Resource Allocation in Networks. Resource allocation Resource allocation in networks Resource Allocation in Networks Very much like a resource allocation problem in operating systems How is it different? Resources and jobs are different Resources are buffers

More information

Dynamic Flow Regulation for IP Integration on Network-on-Chip

Dynamic Flow Regulation for IP Integration on Network-on-Chip Dynamic Flow Regulation for IP Integration on Network-on-Chip Zhonghai Lu and Yi Wang Dept. of Electronic Systems KTH Royal Institute of Technology Stockholm, Sweden Agenda The IP integration problem Why

More information

Timing Analysis Enhancement for Synchronous Program

Timing Analysis Enhancement for Synchronous Program Timing Analysis Enhancement for Synchronous Program Extended Abstract Pascal Raymond, Claire Maiza, Catherine Parent-Vigouroux, Fabienne Carrier, and Mihail Asavoae Grenoble-Alpes University Verimag, Centre

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

Designing Predictable Real-Time and Embedded Systems

Designing Predictable Real-Time and Embedded Systems Designing Predictable Real-Time and Embedded Systems Juniorprofessor Dr. Jian-Jia Chen Karlsruhe Institute of Technology (KIT), Germany 0 KIT Feb. University 27-29, 2012 of at thetu-berlin, State of Baden-Wuerttemberg

More information

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems Martin Schoeberl, Florian Brandner, Jens Sparsø, Evangelia Kasapaki Technical University of Denamrk 1 Real-Time Systems

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Reducing CAN latencies by use of weak synchronization between stations

Reducing CAN latencies by use of weak synchronization between stations Reducing CAN latencies by use of weak synchronization between stations Hugo Daigmorte 1, Marc Boyer 1, Jörn Migge 2 1 ONERA, Université de Toulouse, France 2 RealTime-at-Work, France Scheduling frames

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Supporting Distributed Shared Memory. Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009

Supporting Distributed Shared Memory. Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009 Supporting Distributed Shared Memory Axel Jantsch Xiaowen Chen, Zhonghai Lu Royal Institute of Technology, Sweden September 16, 2009 Memory content in today s SoCs 3 Elements in SoC Processing: Well understood;

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Computer Networking. Queue Management and Quality of Service (QOS)

Computer Networking. Queue Management and Quality of Service (QOS) Computer Networking Queue Management and Quality of Service (QOS) Outline Previously:TCP flow control Congestion sources and collapse Congestion control basics - Routers 2 Internet Pipes? How should you

More information

A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems

A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems Claire Maiza, Hamza Rihani, Juan M. Rivas, Joël Goossens, Sebastian Altmeyer, Robert I. Davis Verimag Research Report n o TR-2018-9

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Andreas Lankes¹, Soeren Sonntag², Helmut Reinig³, Thomas Wild¹, Andreas Herkersdorf¹

More information

Lecture 21. Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov.

Lecture 21. Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov. Lecture 21 Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov. 7 http://money.cnn.com/2011/11/07/technology/juniper_internet_outage/

More information

Design and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures

Design and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures Design and Analysis of Real-Time Systems Predictability and Predictable Microarcectures Jan Reineke Advanced Lecture, Summer 2013 Notion of Predictability Oxford Dictionary: predictable = adjective, able

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

Student Name: University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science

Student Name: University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science CS 162 Spring 2011 I. Stoica FINAL EXAM Friday, May 13, 2011 INSTRUCTIONS READ THEM

More information

NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores www.bsc.es NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores Jordi Cardona 1,2, Carles Hernandez 1, Enrico Mezzetti 1, Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona

More information

CSI3131 Final Exam Review

CSI3131 Final Exam Review CSI3131 Final Exam Review Final Exam: When: April 24, 2015 2:00 PM Where: SMD 425 File Systems I/O Hard Drive Virtual Memory Swap Memory Storage and I/O Introduction CSI3131 Topics Process Computing Systems

More information

Chapter 6 Queuing Disciplines. Networking CS 3470, Section 1

Chapter 6 Queuing Disciplines. Networking CS 3470, Section 1 Chapter 6 Queuing Disciplines Networking CS 3470, Section 1 Flow control vs Congestion control Flow control involves preventing senders from overrunning the capacity of the receivers Congestion control

More information

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections ) CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling

More information

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Formal Modeling and Analysis of Stream Processing Systems

Formal Modeling and Analysis of Stream Processing Systems Formal Modeling and Analysis of Stream Processing Systems Linh T.X. Phan March 2009 Computer and Information Science University of Pennsylvania 1 High performance Highly optimized requirements systems

More information

MC7204 OPERATING SYSTEMS

MC7204 OPERATING SYSTEMS MC7204 OPERATING SYSTEMS QUESTION BANK UNIT I INTRODUCTION 9 Introduction Types of operating systems operating systems structures Systems components operating systems services System calls Systems programs

More information

Traversal time for weakly synchronized CAN bus

Traversal time for weakly synchronized CAN bus Traversal time for weakly synchronized CAN bus Hugo Daigmorte, Marc Boyer ONERA The French aerospace lab ème Séminaire Toulousain en Réseaux (STORE) Travaux présentés dans : H. Daigmorte, M. Boyer, Traversal

More information

Fault-adaptive routing

Fault-adaptive routing Fault-adaptive routing Presenter: Zaheer Ahmed Supervisor: Adan Kohler Reviewers: Prof. Dr. M. Radetzki Prof. Dr. H.-J. Wunderlich Date: 30-June-2008 7/2/2009 Agenda Motivation Fundamentals of Routing

More information

Class-based Packet Scheduling Policies for Bluetooth

Class-based Packet Scheduling Policies for Bluetooth Class-based Packet Scheduling Policies for Bluetooth Vishwanath Sinha, D. Raveendra Babu Department of Electrical Engineering Indian Institute of Technology, Kanpur - 08 06, INDIA vsinha@iitk.ernet.in,

More information

Embedded Software Programming

Embedded Software Programming Embedded Software Programming Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann-Hang Lee yhlee@asu.edu (480) 727-7507 Event and Time-Driven Threads taskspawn (name,

More information

2.993: Principles of Internet Computing Quiz 1. Network

2.993: Principles of Internet Computing Quiz 1. Network 2.993: Principles of Internet Computing Quiz 1 2 3:30 pm, March 18 Spring 1999 Host A Host B Network 1. TCP Flow Control Hosts A, at MIT, and B, at Stanford are communicating to each other via links connected

More information

QoS in Frame relay networks

QoS in Frame relay networks Frame elay: characteristics Frame elay ndrea ianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Packet switching with virtual circuit service Label named

More information

Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip

Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Efficient Latency Guarantees for Mixed-criticality Networks-on-Chip Sebastian Tobuschat, Rolf Ernst IDA, TU Braunschweig, Germany 18.

More information

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Gabor Madl Committee Chancellor s Professor Nikil Dutt (Chair) Professor Tony Givargis Professor Ian Harris University of California,

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems. Scott Marshall and Stephen Twigg

Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems. Scott Marshall and Stephen Twigg Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems Scott Marshall and Stephen Twigg 2 Problems with Shared Memory I/O Fairness Memory bandwidth worthless without memory

More information

Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system

Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system A.J.M. Moonen Information and Communication Systems Department of Electrical Engineering Eindhoven University

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013 Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013 The End of Dennard MOSFET Scaling Theory 2013 Kalray SA All Rights Reserved MPSoC

More information

SE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert

SE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert SE300 SWE Practices Lecture 10 Introduction to Event- Driven Architectures Tuesday, March 17, 2015 Sam Siewert Copyright {c} 2014 by the McGraw-Hill Companies, Inc. All rights Reserved. Four Common Types

More information

Chair for Network Architectures and Services Prof. Carle Department of Computer Science Technische Universität München.

Chair for Network Architectures and Services Prof. Carle Department of Computer Science Technische Universität München. Chair for Network Architectures and Services Prof. Carle Department of Computer Science Technische Universität München Network Analysis 2b) Deterministic Modelling beyond Formal Logic A simple network

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router Scheduling algorithms Scheduling Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Scheduling: choose a packet to transmit over a link among all

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Efficient And Advance Routing Logic For Network On Chip

Efficient And Advance Routing Logic For Network On Chip RESEARCH ARTICLE OPEN ACCESS Efficient And Advance Logic For Network On Chip Mr. N. Subhananthan PG Student, Electronics And Communication Engg. Madha Engineering College Kundrathur, Chennai 600 069 Email

More information

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

ARTIST-Relevant Research from Linköping

ARTIST-Relevant Research from Linköping ARTIST-Relevant Research from Linköping Department of Computer and Information Science (IDA) Linköping University http://www.ida.liu.se/~eslab/ 1 Outline Communication-Intensive Real-Time Systems Timing

More information

Resource Control and Reservation

Resource Control and Reservation 1 Resource Control and Reservation Resource Control and Reservation policing: hold sources to committed resources scheduling: isolate flows, guarantees resource reservation: establish flows 2 Usage parameter

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

RSVP 1. Resource Control and Reservation

RSVP 1. Resource Control and Reservation RSVP 1 Resource Control and Reservation RSVP 2 Resource Control and Reservation policing: hold sources to committed resources scheduling: isolate flows, guarantees resource reservation: establish flows

More information

Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet

Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet Gonzalo Carvajal 1,2 and Sebastian Fischmeister 1 1 University of Waterloo, ON, Canada 2 Universidad de Concepcion,

More information

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information