HeMPS Platform v7.3. Marcelo Ruaro, Eduardo Wachter, Guilherme Madalozzo, Guilherme Castilhos, André del Mestre Fernando G. Moraes

Size: px

Start display at page:

Download "HeMPS Platform v7.3. Marcelo Ruaro, Eduardo Wachter, Guilherme Madalozzo, Guilherme Castilhos, André del Mestre Fernando G. Moraes"

Kory Thomas
5 years ago
Views:

1 1 HeMPS Platform v7.3 Marcelo Ruaro, Eduardo Wachter, Guilherme Madalozzo, Guilherme Castilhos, André del Mestre Fernando G. Moraes PUCRS University, Computer Science Department, Porto Alegre, Brazil

2 Platform Overview 2 PE PE PE PE PE PE PE Architecture PE PE PE PE PE PE PE PE PE PE PE PE Router Processor DMNI LOCAL MEMORY PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE 6x6 MPSoC instance Homogeneous MPSoC Each PE has the same architecture PE is composed of one processor, local memory, DMNI, and router

3 Platform Organization 3 Processing Element - PE APPLICATION REPOSITORY Router Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel CM Cluster Manager PE / SP Slave PE Cluster-based organization Provides scalability of management and traffic isolation Reclustering is allowed Each cluster is managed by a cluster manager (CM) One CM is responsible for access a external repository containing the application task code

4 4 Architectural Features

Processor 5 Plama v2 microprocessor 1 32 bits RISC Processing Element - PE 3-stage pipeline MIPS I ISA APPLICATION REPOSITORY Router Processor DMNI Memory page 3 task

5 Processor 5 Plama v2 microprocessor 1 32 bits RISC Processing Element - PE 3-stage pipeline MIPS I ISA APPLICATION REPOSITORY Router Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel Add. pagination support CM Cluster Manager PE / SP Slave PE UART Memory mapped registers syscall 1.

6 Local Memory 6 Scratchpad memory RAM Dual port Size is parameterizable Pages are logically managed CM Cluster Manager PE / SP Slave PE APPLICATION REPOSITORY Router Processing Element - PE Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel The memory implement a true dual-port interface enabling simultaneous access of processor and DMNI

7 DMNI 7 Direct Memory Network Interface 2 Processing Element - PE APPLICATION REPOSITORY Router Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel CM Cluster Manager PE / SP Slave PE The DMNI implements a direct interface between the local memory and the NoC. It is an approach specialized to design of NoC-based MPSoC systems 2. DMNI: A Specialized Network Interface for NoC-based MPSoCs. In: ISCAS, 2016.

8 Router 8 Hermes NoC 3 XY addressing XY and WF routing Packet Switching Wormhole with credit-based flow control Takes 5 clock cycles to arbitrage and routing a packet CM Cluster Manager PE / SP Slave PE APPLICATION REPOSITORY Router Processing Element - PE Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel HeMPS v7.3 uses the simplest Hermes NoC implementation. There are several others Hermes derivations Asynchronous Virtual-channel Frequency Scaling Circuit-Switching Multicast, 3. HERMES: an infrastructure for low area overhead packet-switching networks on chip. In Jornal of Integration on VLSI, 2004

9 NoC packet and message structure 9 Packet header Packet payload Target Address Payload Size Service Header Service Payload (optional) Message header Message payload From the NoC point of view, the packet has a header and a payload From a task point of view a message contains Message header Ø Encapsulates the packet and service header Message payload Ø Optional field. It may contain for example user data or an object code of a task

10 Application Repository 10 An external memory (off-chip) Stores the application description and its task object code APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE Router Processing Element - PE Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel

11 11 Logical Features

12 Logical Features 12 Logical Features are implemented by software components µkernels Ø Slave Ø Manager User s tasks Logical Features: System Management User s Application Execution

13 13 Logical Features System Management

14 Cluster-based Management 14 Means that the system is logically divided into groups of processors managed by one Cluster Manager (CM) APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE Router Processing Element - PE Processor DMNI Memory page 3 task page 2 task page 1 task page 0 kernel CM is a PE that runs the manager µkernel Performs management functions Task mapping Task migration Reclustering

15 Task Mapping 15 App. is stored into AR and request to execute into MPSoC App APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

16 Task Mapping 16 App. is stored into AR and request to execute into MPSoC App APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

17 Task Mapping 17 CM handles the request and select an appropriated cluster to receive the App description APPLICATION REPOSITORY App CM Cluster Manager PE / SP Slave PE

18 Task Mapping 18 CM start to map the application s tasks into its SP cluster SP SP SP SP SP t2 SP t3 Task code is transferred from AP CM SP SP CM t1 SP SP App APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

19 Reclustering 19 SP SP SP SP t2 t1 SP SP t2 SP SP SP t1 SP t2 SP SP t3 t3? CM SP SP CM t1 SP SP t3 APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

20 Reclustering 20 CM as for borrowed resource from neighbors CMs SP SP SP SP t2 t1 SP SP t2 SP SP SP t1 SP t2 SP SP t3 t3? CM SP SP CM t1 SP SP t3 APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

21 Reclustering 21 CMs response the resource request SP SP SP SP t2 t1 SP SP t2 SP SP SP t1 SP t2 SP SP t3 t3? CM SP SP CM t1 SP SP t3 APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

22 Reclustering 22 SP SP SP SP t2 t1 SP SP t2 SP SP SP t1 SP t2 SP SP t3 t3? CM SP SP CM t1 SP SP t3 APPLICATION REPOSITORY CM choice one resource, releasing others resources CM Cluster Manager PE / SP Slave PE

23 Reclustering 23 Task is allocated into borrowed resource SP SP SP t3 SP t2 t1 SP SP t2 SP SP SP t1 SP t2 SP SP t3 CM SP SP CM t1 SP SP t3 APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

t1 t2 SP SP Task migration is based on recreation process

24 24 Task Migration SP SP SP SP SP t2 SP Task is stopped at source processor and migrated to new processor SP SP SP SP t1 t2 SP SP Task migration is based on recreation process APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

25 Task Migration 25 Migration occurs into steps Task keeps running during its some migration steps Task is only stopped when safe points are automatically identified by the migration process (software) Safe point are moment which the task is not waiting for a message from another task Migration order Task keeps running Task is stoped and the context is saved Source PE CODE MIGRATION Safe state DATA MIGRATION Context is restored and task resume its execution 2. data migration Target PE Task recreation overhead time T M in RUNNING state Migration overhead T M in READY state 1. Code migration

26 26 Logical Features User s Application Execution

27 Application 27 An application is a set of communicating tasks (each task is a.c file) Application are described as a CTG: Communicating Task Graph. Example of applications: P1 RECOG P2 P3 BANK INPUT IVLC IQUANT IDCT OUTPUT MJPEG DTW P4

28 Task 28 Task is a.c file which perform some computation and communication with other(s) task(s) Example of a task code Example of an application task files P1 P2 RECOG BANK P3 DTW P4

29 User s Application Execution 29 SP are dedicated to execute the user applications SP is a PE that runs the slave µkernel Performs support for user task execution TCB Task Control Block Inter-task communication Scheduling Interruption Handling API by System Calls Idle APPLICATION REPOSITORY CM Cluster Manager PE / SP Slave PE

30 API 30 HeMPS API (MPI-based) void Send(Msg * msg, unsigned int target _task_id) void Receive (Msg * msg, unsigned int source _task_id) unsigned int GetTick(void) void Echo(char * string) void Exit(char * string) Task communicate using Sendand Receive primitives

31 Inter-task Communication 31 Task communicate using Sendand Receive primitives Producer Task Case 1: Send() before REQUEST, store msg in PIPE pipe MESSAGE REQUEST 1 Consumer Task Receive() READY RUNNING pipe pipe MESSAGE DELIVERY 1 TIME WAITING MESSAGE REQUEST 2 Case 2: Send() after REQUEST, deliveries the msg Receive() (a) pipe pipe MESSAGE DELIVERY 2 (b)

32 Communication Layers 32 Application Send the messsage by calling the Send API primitive Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

33 Communication Layers 33 ukernel programs the DMNI to send memory block in a packet format Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

34 Communication Layers 34 DMNI (send) copies packet from memory and inject into NoC Can perform serialization Must to implement the NoC flow control Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

35 Communication Layers 35 Producer PE Consumer PE LOCAL MEMORY Header Paylo. Size Payload... Need to send a packet: send_packet() DMNI programing DMNI µkernel Receive the packet: read_packet() DMNI programing Interruption DMNI Header Paylo. Size Payload... LOCAL MEMORY Send Hardware Receive Network On Chip send_packet(): function that programs DMNI to copy a memory block to the NoC Ø Assumes that the memory block is in the format of a NoC packet

36 DMNI - Send 36 Objective: copy a memory block injecting into the NoC The particular feature of this module is the possibility to transfer two memory blocks with one software programming send_packet() API is responsible for to expose the DMNI send feature to the software by configuring MMR

37 Communication Layers 37 NoC Network on Chip Send the packet to the destination PE Packet is divided in flits Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

38 Communication Layers 38 NoC Network on Chip Send the packet to the destination PE Packet is divided in flits Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

39 Communication Layers 39 DMNI (receive) copies packet from NoC and transfers into memory Can perform deserialization Must to implement the NoC flow control Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

DMNI programing Interruption DMNI Header Paylo. Size Payload.

40 Communication Layers 40 Producer PE Consumer PE LOCAL MEMORY Header Paylo. Size Payload... Need to send a packet: send_packet() DMNI programing DMNI µkernel Receive the packet: read_packet() DMNI programing Interruption DMNI Header Paylo. Size Payload... LOCAL MEMORY Send Hardware Receive Network On Chip read_packet(): function that programs the DMNI to copy a NoC packet to a memory block Ø Fired by a interruption

41 DMNI - Receive 41 Objective: receiving a NoC packet coping to a specified memory address Also it generates a software interruption when detects a incoming packet receive_packet() API is responsible for expose the DMNI receive feature to the software by configuring MMR Called through a software interruption, generated when a incoming packet is detected by DMNI

42 Communication Layers 42 ukernel Handle packets from DMNI by implementing a interruption handling mechanism (OS_InterruptServiceRoutine) Is responsible to program the DMNI Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

43 Communication Layers 43 Application Receive the packet by calling the Receive primitive API Application Application ukernel ukernel Processor «DMNI Processor «DMNI Interconnection - NoC

44 44 Debugging

45 Debugging 45 Debugging can be performed from two perspective From the system developer viewpoint By using the HeMPS Debugger Tool (HDT) From the user viewpoint By using Deloream Ø Currently integrated into HDT

46 Debugging Framework 46 Data Extraction (back-end) Extracts simulated data from platform Inserts into a DB or generated log files Data extraction following a standard to be generic Graphical Debugging (front-end) Read extracted data from DB or log files Enable easy debugability by the graphical features Operating System object code Simulator MPSoC Description (RTL, TLM, Virtual) Database Set of applicatons object code Data Extraction Layer (DEL) Scope of the proposed framework Proposed Debugging Graphical Tool Set Others debugging front-end

47 Overview 47 Operating System object code Set of applicatons object code Simulator MPSoC Description (RTL, TLM, Virtual) Data Extraction Layer (DEL) GDB Log files waveforms table insertions Communication Table Database queries Computation Table Proposed Debugging Graphical Tool Set platform.cfg packet.cfg CPU.cfg

48 Main View 48 Debug: Communication flows Routing Algorithms Link utilization Management Protocols Parallel communications

49 Mapping View 49 Debug Task mapping algorithm PEs occupation Task execution status

50 CPU Utilization View 50 Enables the designer to verify the CPU use by different software parts over the time Debug Scheduling algorithms OS and task bugs Other software malfucntions

51 Deloream 51 Used to debug the application s tasks logs It is a debugging tool for the application developer

Achieving Composability in NoC-Based MPSoCs Through QoS Management at Software Level

Achieving Composability in NoC-Based MPSoCs Through QoS Management at Software Level Everton Carara 1, Gabriel Marchesan Almeida 2, Gilles Sassatelli 2, Fernando Gehm Moraes 1 1 PUCRS, Porto Alegre, Brazil,