FairRide: Near-Optimal Fair Cache Sharing

Size: px

Start display at page:

Download "FairRide: Near-Optimal Fair Cache Sharing"

Virgil Walters
5 years ago
Views:

1 UC BERKELEY FairRide: Near-Optimal Fair Cache Sharing Qifan Pu, Haoyuan Li, Matei Zaharia, Ali Ghodsi, Ion Stoica 1

2 Caches are crucial 2

3 Caches are crucial 2

4 Caches are crucial 2

5 Caches are crucial 2

6 Cache sharing 3

7 Cache sharing Increasingly, caches are shared among multiple users 3

8 Cache sharing Increasingly, caches are shared among multiple users Especially with the advent of cloud 3

9 Cache sharing Increasingly, caches are shared among multiple users Especially with the advent of cloud Backend (storage/network) 3

10 Cache sharing Increasingly, caches are shared among multiple users Especially with the advent of cloud Backend (storage/network) 3

11 Cache sharing Increasingly, caches are shared among multiple users Especially with the advent of cloud Benefits: Provide low latency Reduce backend load Backend (storage/network) 3

12 Cache sharing Increasingly, caches are shared among multiple users Especially with the advent of cloud Benefits: Provide low latency Reduce backend load * Cache Backend (storage/network) 3

13 Problems with cache algorithms Cache * Backend (storage/network) 4

14 Problems with cache algorithms LRU, LFU, LRU-K Cache * Backend (storage/network) 4

15 Problems with cache algorithms LRU, LFU, LRU-K Cache * Backend (storage/network) 4

16 Problems with cache algorithms LRU, LFU, LRU-K Cache * Backend (storage/network) 4

17 Problems with cache algorithms LRU, LFU, LRU-K Cache * Backend (storage/network) 4

18 Problems with cache algorithms LRU, LFU, LRU-K Cache data likely to be accessed in the future Cache * Backend (storage/network) 4

19 Problems with cache algorithms Cache LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency * Backend (storage/network) 4

20 Problems with cache algorithms Cache * Backend (storage/network) LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency Single user gets arbitrarily small cache 4

21 Problems with cache algorithms Cache * Backend (storage/network) LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency Single user gets arbitrarily small cache 4

22 Problems with cache algorithms Cache * Backend (storage/network) LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency Single user gets arbitrarily small cache 4

23 Problems with cache algorithms Cache * Backend (storage/network) LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency Single user gets arbitrarily small cache Prone to strategic behavior 4

24 Problems with cache algorithms Cache * Backend (storage/network) LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency Single user gets arbitrarily small cache Prone to strategic behavior 4

25 Problems with cache algorithms Cache * Backend (storage/network) LRU, LFU, LRU-K Cache data likely to be accessed in the future Optimize global efficiency Single user gets arbitrarily small cache Prone to strategic behavior 4

26 Solution? Statically allocated Cache Cache Cache Backend (storage/network)

27 Solution? Statically allocated Cache Cache Cache Backend (storage/network) Isolation Strategy-proof

28 What we want Statically allocated Globally shared Cache Cache Cache Cache * Backend (storage/network) Isolation Strategy-proof Backend (storage/network) 6

29 What we want Statically allocated Globally shared Cache Cache Cache Cache * Backend (storage/network) Isolation Strategy-proof Backend (storage/network) Higher utilization Share data 6

30 What we want Statically allocated Globally shared Cache Cache Cache Cache * Backend (storage/network) Isolation Strategy-proof Backend (storage/network) Higher utilization Share data 6

31 Our contribution 7

32 Our contribution First analysis of cache allocation policies with well defined resource-sharing properties. 7

33 Our contribution First analysis of cache allocation policies with well defined resource-sharing properties. Impossibility result: no policy achieves all good properties! 7

34 Our contribution First analysis of cache allocation policies with well defined resource-sharing properties. Impossibility result: no policy achieves all good properties! A new policy that is near-optimal and outperforms other policies when users cheat. 7

35 A simple model 8

36 A simple model Users access equal-sized files at constant rates r ij the rate user i accesses file j 8

37 A simple model Users access equal-sized files at constant rates r ij the rate user i accesses file j A allocation policy decides which files to cache p j the % of file j put in cache 8

38 A simple model Users access equal-sized files at constant rates r ij the rate user i accesses file j A allocation policy decides which files to cache p j the % of file j put in cache Users care their hit ratio user i s hit ratio: 8

39 A simple model Users access equal-sized files at constant rates r ij the rate user i accesses file j A allocation policy decides which files to cache p j the % of file j put in cache Users care their hit ratio user i s hit ratio: HR i = total _ hits total _ accesses = å j år ij j p j r ij 8

40 A simple model Users access equal-sized files at constant rates r ij the rate user i accesses file j A allocation policy decides which files to cache p j the % of file j put in cache Users care their hit ratio user i s hit ratio: HR i = total _ hits total _ accesses = å j år ij j p j r ij Results hold with varied file sizes, access partial files, p j is binary, etc. 8

41 Outline What properties do we want? Can we extend max-min to solve the problem? How do we solve it? (FairRide) How well does FairRide work in practice? 9

42 Properties Isolation Guarantee No user should be worse off than static allocation 10

43 Properties Isolation Guarantee No user should be worse off than static allocation Strategy-Proofness No user can improve by cheating 10

44 Properties Isolation Guarantee No user should be worse off than static allocation Strategy-Proofness No user can improve by cheating Pareto Efficiency Can t improve a user without hurting others 10

45 Strategy proofness 11

46 Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses 11

47 Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice 11

48 Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site2 MySQL Instance 11

49 Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site2 Amazon Elasticache * MySQL Instance 11

50 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site Amazon Elasticache * MySQL Instance 6 0 site1 site time (min) 11

51 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site Amazon Elasticache * MySQL Instance 6 0 site1 site time (min) 11

52 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site Amazon Elasticache * MySQL Instance 6 0 site1 site time (min) 11

53 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site Amazon Elasticache * MySQL Instance 6 0 site1 site time (min) 11

54 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site Amazon Elasticache * MySQL Instance 6 0 site1 site time (min) 11

55 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site Amazon Elasticache * MySQL Instance 6 0 site1 site time (min) 11

56 miss ratio (%) Strategy proofness Very easy to cheat, hard to detect e.g., by making spurious accesses Can happen in practice Site1 Site2 18 Amazon Elasticache * MySQL Instance site1 site2 2x time (min) 11

57 Properties Isolation Guarantee No user should be worse off than static allocation Strategy-Proofness No user can improve by cheating Pareto Efficiency Can t improve a user without hurting others 12

58 Outline What properties do we want? Can we extend max-min fairness to solve the problem? How do we solve it? (FairRide) How well does FairRide work in practice? 13

59 What is max-min fairness? 14

60 What is max-min fairness? Maximize the the user with minimum allocation Solution: allocate each 1/n (fair share) 33% 33% 33% 14

61 What is max-min fairness? Maximize the the user with minimum allocation Solution: allocate each 1/n (fair share) 33% 33% 33% Handles if some users want less than fair share 20% 40% 40% 14

62 What is max-min fairness? Maximize the the user with minimum allocation Solution: allocate each 1/n (fair share) 33% 33% 33% Handles if some users want less than fair share 20% 40% 40% Widely successful to other resources: OS: round robin, prop sharing, lottery sched Networking: fair queueing, wfq, wf2q, csfq, drr Datacenter: DRF, Hadoop fair sched, Quincy 14

63 An example 15

64 An example A B C 15

65 An example Alice A B Bob C 15

66 An example Alice 5 req/sec A B Bob 5 req/sec C 15

67 An example Alice 5 req/sec A GB Bob 5 req/sec B C file sizes = 1GB, total cache = 2GB 15

68 An example Alice 5 req/sec A GB Bob 5 req/sec B 100% C file sizes = 1GB, total cache = 2GB 15

69 An example Alice 5 req/sec A B GB Bob 5 req/sec B 100% C file sizes = 1GB, total cache = 2GB 15

70 An example Alice 5 req/sec A A B GB Bob 5 req/sec B 100% C file sizes = 1GB, total cache = 2GB 15

71 An example Alice A B GB 5 req/sec A 50% B 100% Bob 5 req/sec C file sizes = 1GB, total cache = 2GB 15

72 An example Alice 5 req/sec A 50% A Bob B C GB 5 req/sec B 100% C 50% file sizes = 1GB, total cache = 2GB 15

73 An example Alice HR = 83.3% A Bob B C GB 5 req/sec A 5 req/sec 50% B 100% C 50% file sizes = 1GB, total cache = 2GB 15

74 An example Alice HR = 83.3% A Bob HR = 83.3% B C GB 5 req/sec A 5 req/sec 50% B 100% C 50% file sizes = 1GB, total cache = 2GB 15

75 Properties max-min fairness Isolation Guarantee Strategy Proofness Pareto Efficiency 16

76 Properties max-min fairness Isolation Guarantee Strategy Proofness Pareto Efficiency 16

77 Properties Isolation Guarantee max-min fairness? Strategy Proofness Pareto Efficiency 16

78 An example Alice HR = 83.3% A B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec 50% B 100% C 50% file sizes = 1GB, total cache = 2GB 17

79 An example Alice HR = 83.3% A B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec 50% B 100% C 50% file sizes = 1GB, total cache = 2GB 17

80 An example Alice HR = 83.3% Bob HR = 83.3% 5 req/sec A Is it possible for a strategic Bob to 5 req/sec 50% get higher hit rate from the system? B A B C GB 100% C 50% file sizes = 1GB, total cache = 2GB 17

81 An example Alice HR = 83.3% A B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec 50% B 100% C 50% file sizes = 1GB, total cache = 2GB 17

82 An example Alice HR = 83.3% A B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec +10 req/sec 50% B 100% C 50% file sizes = 1GB, total cache = 2GB 17

83 An example Alice HR = 83.3% A B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec +10 req/sec 50% B 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

84 An example Alice HR = 83.3% A B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec +10 req/sec 50% B 50% 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

85 An example Alice HR = 83.3% B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec +10 req/sec B 100% 50% 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

86 An example Alice HR = 83.3% B C GB Bob HR = 83.3% 5 req/sec A 5 req/sec +10 req/sec B 100% 50% 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

87 An example Alice HR = 83.3% B C GB Bob HR = 83.3% 100% 5 req/sec A 5 req/sec +10 req/sec B 100% 50% 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

88 An example Alice HR = 83.3% B C GB Bob HR = 83.3% 100% 5 req/sec A 5 req/sec +10 req/sec B 100% 50% 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

89 An example Alice HR = 83.3% 66.7% B C GB Bob HR = 83.3% 100% 5 req/sec A 5 req/sec +10 req/sec B 100% 50% 100% C 100% 50% file sizes = 1GB, total cache = 2GB 17

90 An example Alice HR = 83.3% 66.7% B C GB Bob HR = 83.3% 100% 5 req/sec A 5 req/sec +10 req/sec B 100% 50% 100% By gaming the system, a user can C 100% 50% increase performance by hurting others! file sizes = 1GB, total cache = 2GB 17

91 Properties max-min fairness Isolation Guarantee Strategy Proofness Pareto Efficiency 18

92 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness 18

93 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation 18

94 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation 18

95 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation max-min rate 18

96 Theorem No allocation policy can satisfy all three properties! 19

97 Theorem No allocation policy can satisfy all three properties! Best we can do: two of three. 19

98 What makes cache sharing different? 20

99 What makes cache sharing different? Unlike CPU or network links: 20

100 What makes cache sharing different? Unlike CPU or network links: The cost of switching is high Cache misses go to slow storage Prevents efficient time multiplexing 20

101 What makes cache sharing different? Unlike CPU or network links: The cost of switching is high Cache misses go to slow storage Prevents efficient time multiplexing Can be shared in space Shared data can be accessed non-exclusively A CPU cycle used by only one thread A network link sends one packet at a time 20

102 Outline What properties do we want? Can we extend max-min to solve the problem? How do we solve it? (FairRide) How well does FairRide work in practice? 21

103 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation max-min rate 103

104 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation max-min rate FairRide 104

105 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation max-min rate FairRide 105

106 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation max-min rate FairRide Near-optimal 106

107 FairRide 107

108 FairRide Starts with max-min fairness Allocate 1/n to each user Split cost of shared files equally among shared users 108

109 FairRide Starts with max-min fairness Allocate 1/n to each user Split cost of shared files equally among shared users Only difference: blocking users who don t pay from accessing 109

110 FairRide Starts with max-min fairness Allocate 1/n to each user Split cost of shared files equally among shared users Only difference: blocking users who don t pay from accessing Probabilistic blocking: with some probability 110

111 FairRide Starts with max-min fairness Allocate 1/n to each user Split cost of shared files equally among shared users Only difference: blocking users who don t pay from accessing Probabilistic blocking: with some probability Implemented with delaying 111

112 FairRide: Blocking Alice A B C GB Bob HR = 83.3% 100% 5 req/sec +10 req/sec B 100% C 100% 24

113 FairRide: Blocking Alice A B C GB Bob HR = 83.3% 100% 5 req/sec +10 req/sec B 100% C 100% 24

114 FairRide: Blocking Alice B C GB Bob HR = 83.3% 100% 5 req/sec +10 req/sec Allow 5 Block 5 A B 100% C 100% 24

115 FairRide: Blocking Alice B C GB Bob HR = 83.3% 100% 66.7% 5 req/sec +10 req/sec Allow 5 Block 5 A B 100% C 100% 24

116 FairRide: Blocking Alice B C GB Bob HR = 83.3% 100% 66.7% 5 req/sec +10 req/sec Allow 5 Block 5 A B 100% C 100% 24

117 FairRide: Blocking Alice A C GB B 100% Cheating always gives worst performance. Bob HR = 83.3% 100% 66.7% 5 req/sec +10 req/sec Allow 5 Block 5 Dis-incentive strategic behaviors. 100% B C 24

118 Probabilistic blocking 25

119 Probabilistic blocking FairRide blocks a user with p(nj) = 1/(nj+1) probability nj is number of other users caching file j e.g., p(1)=50%, p(4)=20% 25

120 Probabilistic blocking FairRide blocks a user with p(nj) = 1/(nj+1) probability nj is number of other users caching file j e.g., p(1)=50%, p(4)=20% The best you can do in a general case 25

121 Probabilistic blocking FairRide blocks a user with p(nj) = 1/(nj+1) probability nj is number of other users caching file j e.g., p(1)=50%, p(4)=20% The best you can do in a general case Less blocking does not prevent cheating 25

122 Properties Isolation Guarantee Strategy Proofness Pareto Efficiency max-min fairness static allocation priority allocation max-min rate FairRide Near-optimal 122

123 vs. Strategyproofness Paretoefficiency 27

124 vs. Strategyproofness Paretoefficiency More efficient when user cheats Minimal impact on efficiency when no user cheats 27

125 vs. Strategyproofness Paretoefficiency More efficient when user cheats Minimal impact on efficiency when no user cheats Cost of cheating vs. cost of blocking/delaying 27

126 vs. Strategyproofness Paretoefficiency More efficient when user cheats Minimal impact on efficiency when no user cheats Cost of cheating vs. cost of blocking/delaying The latter is small insurance for the former 27

127 vs. Strategyproofness Paretoefficiency More efficient when user cheats Minimal impact on efficiency when no user cheats Cost of cheating vs. cost of blocking/delaying The latter is small insurance for the former Strategy-proofness makes the system stable 27

128 Outline What properties do we want? Can we extend max-min to solve the problem? How do we solve it? (FairRide) How well does FairRide work in practice? 28

129 Evaluation Implemented in Alluxio (formerly Tachyon) FairRide: delay a request as if blocked Compared with max-min fairness. Benchmarked with TPC-H, YCSB, Facebook workloads. 29

130 Evaluation Does FairRide prevent cheating? What is the cost of FairRide? How does it perform end-to-end? 30

131 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 1 user Time (s) 31

132 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 1 user Time (s) 31

133 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 2 cheats user 1 user Time (s) 31

134 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 2 cheats user 1 user Time (s) 31

135 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 2 cheats user 1 cheats user 1 user Time (s) 31

136 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 2 cheats user 1 cheats user 1 user Time (s) 31

137 Avg. response (ms) miss ratio (%) Cheating under Max-min fairness user 2 cheats user 1 cheats user 1 user Time (s) Cheating can greatly hurt user performance. 31

138 Avg. response (ms) miss ratio (%) Cheating under FairRide Time (s) user 1 user 2 32

139 Avg. response (ms) miss ratio (%) Cheating under FairRide Time (s) user 1 user 2 32

140 Avg. response (ms) miss ratio (%) Cheating under FairRide user 2 cheats Time (s) user 1 user 2 32

141 Avg. response (ms) miss ratio (%) Cheating under FairRide user 2 cheats Time (s) user 1 user 2 32

Avg. response (ms) miss ratio (%) Cheating under FairRide 400 60 300 45 200 30 100 15

142 Avg. response (ms) miss ratio (%) Cheating under FairRide user 2 cheats Time (s) user 1 cheats user 1 user 2 32

143 Avg. response (ms) miss ratio (%) Cheating under FairRide user 2 cheats Time (s) user 1 cheats user 1 user 2 FairRide dis-incentives users from cheating. 32

144 Avg. response (ms) Avg. miss ratio (%) Many users strategic users other users No. of strategic users (out of 20) 33

145 Avg. response (ms) Avg. miss ratio (%) Many users FairRide: 351ms 333 strategic users other users No. of strategic users (out of 20) 33

146 Avg. response (ms) Avg. miss ratio (%) Many users FairRide: 351ms 333 strategic users other users No. of strategic users (out of 20) 33

147 Avg. response (ms) Avg. miss ratio (%) Many users FairRide: 351ms 333 strategic users other users No. of strategic users (out of 20) FairRide has minimal loss. 33

148 Redcution in Median Job Time (%) Facebook experiments max-min FairRide Bin (#Tasks) 34

149 Redcution in Median Job Time (%) Facebook experiments max-min FairRide Bin (#Tasks) FairRide outperforms max-min fairness by 29% 34

150 Conclusion No policy can satisfy all desirable properties: Isolation guarantee Strategy proofness Pareto efficiency 35

151 Conclusion No policy can satisfy all desirable properties: Isolation guarantee Strategy proofness Pareto efficiency FairRide:isolation guarantee and strategyproofness through probabilistic blocking. Outperforms static allocation and other sharing policies when users cheat. Achieves this with least overhead 35

Cache Management for In Memory. Jun ZHANG Oct 15, 2018

Cache Management for In Memory Analytics Jun ZHANG Oct 15, 2018 1 Outline 1. Introduction 2. LRC: Dependency aware caching 3. OpuS: Fair cache sharing in multi tenant cloud 4. SP Cache: Load balancing