Dynamic Dataflow. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Size: px

Start display at page:

Download "Dynamic Dataflow. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology"

Eleanore Sherman
5 years ago
Views:

machine, ETL, Japan M.I.T. Engineering model K.

1 Dynamic Dataflow Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L- Dataflow Machines Static - mostly for signal processing NEC - NEDIP and IPP Hughes, Hitachi, AT&T, Loral, TI, Sanyo... Dynamic EM4: single-chip dataflow micro, 80PE Sigma-: multiprocessor, The largest ETL, Japan dataflow machine, ETL, Japan M.I.T. Engineering model K. Hiraki Manchester ( 8) Greg Papadopoulos S. Sakai M.I.T. - TTDA, Monsoon ( 88) M.I.T./Motorola - Monsoon ( 9) (8 PEs, 8 IS) ETL - SIGMA- ( 88) (8 PEs,8 John IS) Gurd T. Shimada ETL - EM4 ( 90) (80 PEs), EM-X ( 96) (80 PEs) Sandia - EPS88, EPS- IBM - Empire Andy... Y. Kodama Chris Jack Boughton Joerg Costanza Shown at Supercomputing 96 Shown at Supercomputing 9 Related machines: Burton Smith s Denelcor HEP, Monsoon Horizon, Tera L-

2 Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance L- Dataflow Graphs {x = a + b; y = b * 7 in (x-y) * (x+y)} Values in dataflow graphs are represented as tokens token < ip, p, v > instruction ptr port data ip = p = L a b + *7 x y An operator executes when all its input tokens are present; * copies of the result token are distributed to the destination operators no separate control flow L-4

3 Static Dataflow Machine: Instruction Templates 4 Opcode Destination + L 4L R 4R * - L + R * out Destination Operand Operand Presence bits Each arc in the graph has a operand slot in the program a + *7 x * b y L- Static Dataflow Machine Jack Dennis, 97 Receive Instruction Templates Op dest dest p src p src... FU FU FU FU FU Send <s, p, v >, <s, p, v > Many such processors can be connected together Programs can be statically divided among the processor L-6

4 Static Dataflow: Problems/Limitations Mismatch between the model and the implementation The model requires unbounded FIFO token queues per arc but the architecture provides storage for one token per arc The architecture does not ensure FIFO order in the reuse of an operand slot The merge operator has a unique firing rule The static model does not support Function calls - No easy solution in Data Structures the static framework - Dynamic dataflow provided a framework for solutions L-7 Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance L-8 4

5 Dynamic Dataflow Architectures Allocate instruction templates, i.e., a frame, dynamically to support each loop iteration and procedure call termination detection needed to deallocate frames The code can be shared if we separate the code and the operand storage a token <fp, ip, port, data> frame pointer instruction pointer L-9 A Frame in Dynamic Dataflow 4 + * - + * 4 L, 4L R, 4R L R out Program a b + *7 x <fp, ip, p, v> y * 4 Frame Need to provide storage for only one operand/operator L-0

Monsoon Processor Greg Papadopoulos op r d,d Code ip Instruction Fetch Frames fp+r Operand Fetch Token Queue ALU Form Token Network Network L- Temporary Registers & Threads Robert Iannucci n sets of

6 Monsoon Processor Greg Papadopoulos op r d,d Code ip Instruction Fetch Frames fp+r Operand Fetch Token Queue ALU Form Token Network Network L- Temporary Registers & Threads Robert Iannucci n sets of registers (n = pipeline depth) op r S,S Code Frames Registers Instruction Fetch Operand Fetch Token Queue Registers evaporate when an instruction thread is broken Registers are also used for exceptions & interrupts Robert Iannucci Form Token Network ALU Network L- 6

7 Actual Monsoon Pipeline: Eight Stages Instruction Memory Instruction Fetch Effective Address Presence bits Frame Memory 7 Presence Bit Operation Frame Operation User Queue System Queue R, W ALU 44 Network Registers Form Token L- Instructions directly control the pipeline The opcode specifies an operation for each pipeline stage: opcode r dest [dest] EA WM RegOp ALU FormToken Easy to implement; EA - effective address no hazard detection FP + r: frame relative r: absolute IP + r: code relative (not supported) WM - waiting matching Unary; Normal; Sticky; Exchange; Imperative PBs X port PBs X Frame op X ALU inhibit Register ops: ALU: V L X V R V L X V R, CC Form token: V L X V R X Tag X Tag X CC Token X Token L-4 7

8 Procedure Linkage Operators f a... an get frame extract tag change Tag 0 change Tag change Tag n Like standard call/return but caller & callee can be active simultaneously Fork : n: Graph for f token in frame 0 token in frame change Tag 0 change Tag L- Outline Static Dataflow Machines Not general-purpose enough Dynamic Dataflow Machines As easy to build as a simple pipelined processor The software view The memory model: I-structures Monsoon and its performance L-6 8

9 Parallel Language Model Tree of Activation Frames f: Global Heap of Shared Objects g: h: active threads loop asynchronous and parallel at all levels L-7 Id World implicit parallelism Id Dataflow Graphs + I-Structures +... TTDA Monsoon *T *T-Voyager L-8 9

Id World people Rishiyur Nikhil, Keshav Pingali, Vinod Kathail,

Jamey Hicks, Alex Caro, Andy Shaw, Boon Ang Shail Anditya R Paul

Derek Chiou Arun Iyangar Zena Ariola Mike Bekerle R.S.

Ang Jamey Hicks Derek Chiou Ken Traub Steve Heller K.

.. L-9 Data Structures in Dataflow Data structures reside in a

... P a I-structures: Write-once, Read multiple times or

10 Id World people Rishiyur Nikhil, Keshav Pingali, Vinod Kathail, David Culler Ken Traub Steve Heller, Richard Soley, Dinart Mores Jamey Hicks, Alex Caro, Andy Shaw, Boon Ang Shail Anditya R Paul Johnson Paul Barth Jan Maessen Christine Flood Jonathan Young Derek Chiou Arun Iyangar Zena Ariola Mike Bekerle R.S. Nikhil Keshav Pingali David Culler Boon S. Ang Jamey Hicks Derek Chiou Ken Traub Steve Heller K. Eknadham (IBM), Wim Bohm (Colorado), Joe Stoy (Oxford),... L-9 Data Structures in Dataflow Data structures reside in a structure store tokens carry pointers P Memory.... P a I-structures: Write-once, Read multiple times or allocate, write, read,..., read, deallocate No problem if a reader arrives before the writer at the memory location I-fetch a v I-store L-0 0

11 I-Structure Storage: Split-phase operations & Presence bits s t I-Fetch <s, fp, a > address to be read forwarding address a a a 4 L- a I-structure Memory v v fp.ip s I-Fetch <a, Read, (t,fp)> split phase t Need to deal with multiple deferred reads other operations: fetch/store, take/put, clear Next time Compiling Id/pH into dataflow graphs Monsoon Performance L-

Computer Architecture: Dataflow/Systolic Arrays

Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model