Getting to Work with OpenPiton

Size: px

Start display at page:

Download "Getting to Work with OpenPiton"

Marlene Page
5 years ago
Views:

1 Getting to Work with OpenPiton Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, David Wentzlaff Princeton University OpenPit

2 Extension ASPLOS

3 Interfacing with the Networks-on-Chip 1.Packet format Highlighting key packet fields 2.Definition files.h files 3.Instantiations in Verilog design 3

4 NoC: packet format 64-bit flits 1 packet header (64b) + X packet payload flits (64b * X) Ex: Cache request from L1.5 to L2 Header flit + req. address flit + metadata flit Ex: Cache response from L2 to L1.5 Header flit + 2x data flits (16B cache line) Ex: Instruction cache response Header flit + 4x data flits (32B cache line) 4

5 NoC: packet format CHIPID: The highest bit indicates whether the destination is on-chip or off-chip, the rest bits indicates the chip ID XPOS: The position of the destination tile in the X dimension YPOS: The position of the destination tile in the Y dimension FBITS: The router output port to the destination PAYLOAD LENGTH: The number of payload packets UNSPECIFIED: Unused bits (used by the underlying coherence protocol) 5

6 NoC:.h files piton/design/include/network_define.h Defines the header flits b63-22 (all except messageid, tag, and options 1) piton/design/include/define.vh defines the rest 6

7 NoC: instantiations piton/design/chip/rtl/chip.v.pyv Chip-wide connections between tiles Auto generated using PYHP 7

8 NoC: instantiations piton/design/chip/tile/rtl/tile.v.pyv Instantiation of NoC1/2/3 piton/design/chip/tile/rtl/tile.v.pyv Selectable between router and crossbar design 8

9 Cache Coherence Protocol Directory-based MESI coherence Protocol - Four-hop message communication (no direct communication between private L1.5 caches) - Uses 3 physical NoCs with point-to-point ordering to avoid deadlock - The directory and L2 are co-located but state information are maintained separately - Silent eviction in E and S states - No need for acknowledgement upon write-back of dirty lines from L1.5 to L2 9

10 Memory Hierarchy Datapath Private L1.5 NoC1 NoC2 NoC3 Distributed shared L2 NoC1 NoC2 NoC3 Off-chip Chipset ASPLOS

11 Cache Coherence Protocol Directory-based MESI coherence Protocol - Four-hop message communication (no direct communication between private L1.5 caches) - Uses 3 physical NoCs with point-to-point ordering to avoid deadlock ReqRd FwdRd Req I->S Dir M->S Owner M->S L1.5 L2 L1.5 AckDt FwdRdAck 11

12 Cache Coherence Protocol (2) Directory-based MESI coherence Protocol - The directory and L2 are co-located but state information are maintained separately L2 State Dir State Tag Data Sharer List 12

13 Cache Coherence Protocol (3) Directory-based MESI coherence Protocol - Silent eviction in E and S states Req S->I Req E->I - No need for acknowledgement upon write-back of dirty lines from L1.5 to L2 WbGuard Req M->I Dir M->I E->I Wb 13

14 NoC Messages In order to avoid deadlock, NoC3 messages will never be blocked Load Store Ifill NoC1 L1.5 L2 NoC2 Load Ack Store Ack Downgrade Inv Mem Req NoC2 NoC3 DG ack Inv ack Mem Reply L1.5/ Memory ASPLOS

15 Coherence Transaction Example Core 1 Core 2 I E L1.5 L1.5 ❹ Data Ack I ❶ Load ❷ Mem Req I E L2 ❸ Mem Reply Core 1 Ld Core2 Memory ASPLOS

16 Coherence Transaction Example (2) Core 1 E I Core 2 L1.5 L1.5 ❹ Data Ack I M ❸ DG Ack ❷ Downgrade E M Memory L2 ❶ Store Core 1 Ld Core2 St ASPLOS

17 Coherence Transaction Example (3) Core 1 I Core 2 L1.5 L1.5 ❷ Writeback M I M I L2 ❶ WbGuard Core 1 Core2 Memory Ld St Wb ASPLOS

18 Example: Add an on-chip accelerator 1. Implement the NoC interface for the accelerator 2. Design and implement the control flow for the accelerator Use interrupt packets to init and stop the accelerator Use special load and stores to config the accelerator Follow the coherence protocol if a coherence cache is maintained 3. Connect the accelerator to NoCs and assign it a new tile ID 4. Modify the OS code to init the accelerator if needed 5. Write tests to test the accelerator 18

Getting to Work with OpenPiton

Getting to Work with OpenPiton Jonathan Balkind, Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Alexey Lavrov, Mohammad Shahrad, Adi Fuchs, Samuel Payne, Xiaohua Liang, Matthew Matl, David Wentzlaff