DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS

Size: px

Start display at page:

Download "DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS"

Tamsyn Taylor
5 years ago
Views:

1 th August DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS Stefan Nürnberger, Randolf Rotta, Gabor Drescher, Daniel Danner, Jörg Nolte

2 ACKNOWLEDGED EVENT PROPAGATION What does it do? Make events observable in a networked system Make sure events are globally observable Enforce ordering of events What is it good for? Memory Consistency Coherence Protocols Atomic Operations How to implement it? Just use broadcast with acknowledgement... Motivation

3 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir $ C $ $ $ $ C C C... Cn Motivation

4 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ C $ $ $ $ C C C... Cn Motivation

5 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x C $ $ $ $ C C C... Cn Motivation

6 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x C $ $ $ $ C C C... Cn Motivation

7 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x C $ x $ $ $ C C C... Cn Motivation

8 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x $ x $ x $ $ C C C C... Cn Motivation

9 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x $ x $ x $ x $ C C C C... Cn Motivation

10 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir $ x $ x $ x $ x $ C C C C... Cn Motivation

11 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir invalidate (x) $ x $ x $ x $ x $ C C C C... Cn Motivation

12 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir invalidate (x) $ x $ x $ x $ x $ C C C C... Cn Motivation

13 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir $ x $ x $ x $ x $ x C C C C... Cn Motivation

14 OUTLINE. & of Broadcast. The Diamond Ring Topology. Evaluation & of Broadcast

15 THROUGHPUT & LATENCY time from sending out message to reception of acknowledgement determined by longest path (#hops + processing at each node) lower is better number of messages processed within fixed time span determined by node with maximum overhead (i.e. bottleneck) requires pipelining of messages (latency hiding) higher is better & of Broadcast

16 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

17 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

18 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

19 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

20 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

21 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

22 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

23 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

24 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

25 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

26 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

27 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

28 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

29 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

30 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

31 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

32 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

33 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

34 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

35 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

36 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

37 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

38 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

39 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

40 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

41 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

42 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

43 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

44 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

45 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

46 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

47 FORWARD PROCESS ACK Message Forwarding as Acknowledgement possible in ring structures halve number of sent messages (network contention) may increase latency (processing time at node) Ring Structure. Receive Message. Process Message. Forward Message (Ack) Tree Structure. Receive Message. Forward Message (except leaves). Process Message. Receive Ack (except leaves). Forward Ack Not an issue if only message reception needs acknowledgement. & of Broadcast

48 OUTLINE. & of Broadcast. The Diamond Ring Topology. Evaluation The Diamond Ring Topology

49 THE DIAMOND RING TOPOLOGY Combine Ring and Balanced Tree Logarithmic path length for low latency Forwarding is acknowledgement Parallel message propagation Computable topology Diamond Ring: Directed Graph D l k k Arity of tree nodes l Levels of tree scattering Based on a balanced tree B l k Mirrored at the leaves Closed to ring at the root D l k = (k+)kl (k+) k D l+ k = D l k +kl +k l+ The Diamond Ring Topology

50 THE DIAMOND RING TOPOLOGY Combine Ring and Balanced Tree Logarithmic path length for low latency Forwarding is acknowledgement Parallel message propagation Computable topology Diamond Ring: Directed Graph D l k k Arity of tree nodes l Levels of tree scattering Based on a balanced tree B l k Mirrored at the leaves Closed to ring at the root D l k = (k+)kl (k+) k D l+ k = D l k +kl +k l+ The Diamond Ring Topology

51 THE PERFECT DIAMOND RING D - diamond ring with nodes The Diamond Ring Topology

52 THE PERFECT DIAMOND RING D - diamond ring with nodes root scatter center gather root The Diamond Ring Topology

53 THE PERFECT DIAMOND RING D - diamond ring with nodes + (no bottleneck version) The Diamond Ring Topology

54 THE PERFECT DIAMOND RING D - diamond ring with nodes + (no bottleneck version) root scatter center gather The Diamond Ring Topology

55 SOME MORE EXAMPLES D - diamond ring with nodes The Diamond Ring Topology

56 SOME MORE EXAMPLES D - diamond ring with nodes The Diamond Ring Topology

57 SOME MORE EXAMPLES D - diamond ring with nodes The Diamond Ring Topology

58 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

59 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

60 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

61 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

62 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

63 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

64 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

65 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

66 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

67 DEALING WITH ODD NODE COUNTS D - diamond ring with nodes (- nodes) The Diamond Ring Topology

68 DEALING WITH ODD NODE COUNTS D - diamond ring with nodes (- nodes) root scatter center gather root The Diamond Ring Topology

69 DEALING WITH ODD NODE COUNTS () D - diamond ring with nodes (+ nodes) The Diamond Ring Topology

70 DEALING WITH ODD NODE COUNTS () D - diamond ring with nodes (+ nodes) root scatter center gather root The Diamond Ring Topology

71 COMPARISON TO BALANCED TREES and is reduced due to shorter longest path is increased since nodes have less communication partners Contention on the network is reduced due to less messages sent Balanced Tree Diamond Ring Ring Longest Path log k (n) log k (n) n Max. Overhead (k + ) k Messages sent (n ) k k+ n n The Diamond Ring Topology

72 COMPARISON TO BALANCED TREES and is reduced due to shorter longest path is increased since nodes have less communication partners Contention on the network is reduced due to less messages sent Balanced Tree Diamond Ring Ring Longest Path log k (n) log k (n) n Max. Overhead (k + ) k + Messages sent (n ) k k+ n+ n The Diamond Ring Topology

73 OUTLINE. & of Broadcast. The Diamond Ring Topology. Evaluation Evaluation

74 EVALUATION OF DIAMOND RINGS Hypothesis Acknowledged broadcasts using diamond rings should have.... lower latency,. higher throughput... than balanced trees. Benchmark Setup Custom active message framework Messages in shared memory Topologies: Balanced Tree (BT), Diamond Ring (DR), Sequenced Diamond Ring (SDR) Three different evaluation platforms Evaluation

75 EVALUATION PLATFORMS EZ-Chip Tilera TILE-Gx (in-order) Low- Mesh Network (UDN) Intel Xeon E v Sockets, Cores, (out-of-order) Slotted Rings, QPI between Sockets Intel Xeon Phi P Cores, (in-order) Slotted Ring Network Evaluation

76 EZ-CHIP TILERA TILE-GX median latency [µs] arity= arity= arity= number of cores BT DR SDR median events per µs..... arity= arity= arity= number of pipelined broadcasts BT DR SDR Evaluation

77 INTEL XEON V median latency [µs]..... arity= arity= arity= number of hardware threads BT DR SDR median events per µs.... arity= arity= arity= number of pipelined broadcasts BT DR SDR Evaluation

78 INTEL XEON PHI P median latency [µs] arity= arity= arity= number of hardware threads BT DR SDR median events per µs arity= arity= arity= number of pipelined broadcasts BT DR SDR Evaluation

79 RESULTS OVERVIEW median latency [µs] TILE Gx ( nodes) Xeon E v ( nodes) XeonPhi P ( nodes) BT DR max median throughput [broadcasts per µs] SDR Evaluation

80 RESULTS OVERVIEW median latency [µs].... TILE Gx ( nodes) Xeon E v ( nodes) XeonPhi P ( nodes) max median throughput [broadcasts per µs].... BT DR SDR Evaluation

81 SUMMARY Acknowledged Event Propagation is very important in consistency management. and require a trade-off. Diamond Rings offer a better trade-off than balanced trees. are acknowledged broadcast s best friend. Thank you for your attention! Questions? This work was supported by the German Research Foundation (DFG) under grant no. NO /- and SCHR /- The End

Diamond Rings: Acknowledged Event Propagation in Many-Core Processors

Diamond Rings: Acknowledged Event Propagation in Many-Core Processors Stefan Nürnberger, Randolf Rotta, Gabor Drescher, Daniel Danner, and Jörg Nolte Brandenburg University of Technology, Cottbus-Senftenberg,