History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion

Size: px

Start display at page:

Download "History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion"

Bridget Carson
5 years ago
Views:

1 By Matthew Johnson

2 History Importance Comparison to Tomasulo Demo Dataflow Overview: Static Dynamic Hybrids Problems: CAM I-Structures Potential Applications Conclusion

1964-1 st Scoreboard Computer: CDC 6600 1967- Tomasulo 1970- MIT Static Dataflow Processors: Tokens trigger next instruction 1970- Kahn Dataflow Processors: Sequential Processes communicate with each

3 st Scoreboard Computer: CDC Tomasulo MIT Static Dataflow Processors: Tokens trigger next instruction Kahn Dataflow Processors: Sequential Processes communicate with each via messages through FIFO s (Hybrid) MIT Dynamic Dataflow Processors 1985 Exception Solutions in Out of Order Execution : Smith & Pleszkun devise precise exception handling Restricted Out of Order Execution > RISC: Yale Pratt publishes paper on in which he simulates a restricted OO architecture and it outperforms RISC benchmarks 1988 MIT Monsoon Processors 1990 IBM POWER1: 1 st Out of Order Microprocessor (floating point only) 1995 Out of Order goes Mainstream KEY Data Flow Hybrid Out of Order

4 Naturally latency tolerant because of its ability to dynamically switch between threads Eliminate processor switching problems Eliminates Data dependencies Naturally takes advantage of Inherent parallelism in code

6 Token- Output of an operation and its destination Operation Packet- contains instructions, destination and 2 operands Instruction Cell- The location in main memory that holds the operation packet

7 Y=4 Y = X = 4 * Y Z = 4 * X Memory Distribution Network Y=4 Multiple X 4 Y ADD Y 1 3 Arbitration Network Y=4 Multiple Z 4 Y

8 Y = X = 4 * Y Z = 4 * X Memory Distribution Network Multiple X 4 4 Multiple Z 4 4 Arbitration Network

9 X execution X SWITCH T F T execution SWITCH T F X execution X SWITCH T F F execution SWITCH T F MERGE node SWITCH node

10 x n i SWITCH T F b P n j n k f g X

11 Describes dataflow graph Data tokens propagate along the arc Single Assignment Rule: variables can only be assigned once Examples: VAL, LUCID, Mean = (x + y + z)/3; StDev = SQRT( (x^2 + y^2 + z^2 )/3 Mean^2); z y x Mean sqrt StDev

12 Static Dynamic

13 Allows at most 1 data token on the arc of data flow To Ensure 1 token per arc, must use hand shaking Can only run 1 iteration of a loop at a time Static Pipeline: o Check instructions for operands o Select subset of available instruction o Update instructions that are complete Example: MIT Static Dataflow Machines x 2 * sqrt n i n j y 3 z data token acknowledge signal data arc acknowledgement arc

14 Pros: Simple Model Cons: Loops can not be run in parallel Token Traffic Double Hand shaking wastes time No support for procedure calls

15 Allows multiple data tokens on the arc of data flow Allows multiple loop iterations at one time An extra tag is added to tokens and contains address context and loop number Instruction Cells are changed to accept multiple tokens of different tags Overall effect is the creation of reentrant subgraphs Examples: MIT Tagged-Token Data Flow, Manchester Dataflow

16 Pros: Greater Performance( No handshaking, multiple tokens/arc, loops can run in parallel) Supports procedure calls Cons: Tokens need to be buffered because matching tokens takes LOTS of time(large Buffer Needed) No instruction locality (can t use registers)

17 Threaded Dataflow- subgraphs with low degrees of parallelisms are converted into sequential thread Monsoon- Replaces associative memory with ram Uses eight stage pipeline to inject instructions and execute enabled instructions

18 Lack of locality make current storage hierarchy ineffective Matching tokens takes a large amount of time and requires buffering Building CAMs large enough to hold dependencies of a real program Data Structures are difficult to implement

19 Content Addressable Memory Read- returns the address of the data Size is small, RAM is 8X (DRAM: 64 MB CAM: 8MB single chip ) Expensive Large footprint Excessive power

20 Main problem: every write to a structure requires the entire structure to be rewritten Solution: Define size of data structure Each element has a series of status bits (states) Can only write an element once

21 Digital Signal Processing Network routing Graphics processing Telemetry Data warehousing Everywhere Parallel computing is used

22 Potential to unlock the inherent parallelism of the algorithm Naturally eliminates data hazards Current technologies in CAM are not advanced enough to machines based in RAM Serious Problems still remain such as implementing data structures

24 SILC, JURIJ, BORUT ROBIC, and THEO UNGERER, eds. "DATAFLOW ARCHITECTURES."DATAFLOW ARCHITECTURES. Jožef Stefan Institute. Web. 7 May Wikipedia contributors. "DATAFLOW ARCHITECTURES." Wikipedia, The Free Encyclopedia. < Wikipedia contributors. "Out of Order Execution."Wikipedia, The Free Encyclopedia. < K. Pagiamtzis, A. Sheikholeslami, Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey, IEEE J. of Solid-state circuits. March 2006 Rajaraman, V. Elements of Parallel Computing. EE. New Delhi: PHI Learning, Web. < 02&dq=data flow "acknowledgement tokens&source=bl&ots=1akpb6pd8t&sig=mpfuclfj1ogbpjo3sts1byzbs8y &hl=en&sa=x&ei=ksyjuc23mijm0wh_iyhoda&ved=0ceqq6aewba Dennis, Jack, and David Misunas. "A preliminary architecture for a basic data-flow processor." ACM SIGARCH Computer Architecture News. 3.4 (1974): Web. 7 May < Najjar, Walid, Edward Lee, and Guang Gao. "Advances in the Dataflow Computational Model." Parallel Computing. 25. (1999): Web. 7 May

Computer Architecture: Dataflow/Systolic Arrays

Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model