CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Size: px

Start display at page:

Download "CS246: Mining Massive Datasets Jure Leskovec, Stanford University"

Andra McKenzie
5 years ago
Views:

1 CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Web advertising We discussed how to match advertisers to queries in real-time But we did not discuss how to estimate the CTR (Click-Through Rate) Recommendation engines We

2 Web advertising We discussed how to match advertisers to queries in real-time But we did not discuss how to estimate the CTR (Click-Through Rate) Recommendation engines We discussed how to build recommender systems But we did not discuss the cold start problem 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 2

3 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 3 What do CTR and cold start have in common? With every ad we show/ product we recommend we gather more data about the ad/product Theme: Learning through experimentation

4 Google s goal: Maximize revenue The old way: Pay by impression (CPM) Best strategy: Go with the highest bidder But this ignores effectiveness of an ad The new way: Pay per click! (CPC) Best strategy: Go with expected revenue What s the expected revenue of ad a for query q? E[revenue a,q ] = P(click a q) * amount a,q Prob. user will click on ad a given that she issues query q (Unknown! Need to gather information) Bid amount for ad a on query q (Known) 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 4

5 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 5 Clinical trials: Investigate effects of different treatments while minimizing patient losses Adaptive routing: Minimize delay in the network by investigating different routes Asset pricing: Figure out product prices while trying to make most money

6 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 6

7 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 7

8 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 8 Each arm a Wins (reward=1) with fixed (unknown) prob. μ a Loses (reward=0) with fixed (unknown) prob. 1-μ a All draws are independent given μ 1 μ k How to pull arms to maximize total reward?

9 How does this map to our setting? Each query is a bandit Each ad is an arm We want to estimate the arm s probability of winning μ a (i.e., ad s the CTR μ a ) Every time we pull an arm we do an experiment 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 9

10 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 10 The setting: Set of k choices (arms) Each choice a is associated with unknown probability distribution P a supported in [0,1] We play the game for T rounds In each round t: (1) We pick some arm j (2) We obtain random sample X t from P j Note reward is independent of previous draws TT tt=11 XX tt Our goal is to maximize But we don t know μ a! But every time we pull some arm a we get to learn a bit about μ a

11 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 11 Online optimization with limited feedback Choices X 1 X 2 X 3 X 4 X 5 X 6 a a a k 0 Like in online algorithms: Time Have to make a choice each time But we only receive information about the chosen action

12 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 12 Policy: a strategy/rule that in each iteration tells me which arm to pull Hopefully policy depends on the history of rewards How to quantify performance of the algorithm? Regret!

13 Let be μμ aa the mean of PP aa Payoff/reward of best arm: μμ = mmmmmm aa μμ aa Let ii 11, ii 22 ii TT be the sequence of arms pulled Instantaneous regret at time tt: rr tt = μμ μμ aatt Total regret: TT RR TT = rr tt tt=11 Typical goal: Want a policy (arm allocation strategy) that guarantees: RR TT TT 00 as TT 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 13

14 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 14 If we knew the payoffs, which arm would we pull? PPPPPPPP aaaaaa mmmmmm μμ aa aa What if we only care about estimating payoffs μμ aa? Pick each of kk arms equally often: TT kk Estimate: μμ aa = kk TT TT/kk jj=11 XX aa,jj Regret: RR TT = TT kk(μμ μμ kk aa aa )

15 Regret is defined in terms of average reward So, if we can estimate avg. reward we can minimize regret Consider algorithm: Greedy Take the action with the highest avg. reward Example: Consider 2 actions A1 reward 1 with prob. 0.3 A2 has reward 1 with prob. 0.7 Play A1, get reward 1 Play A2, get reward 0 Now avg. reward of A1 will never drop to 0, and we will never play action A2 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 15

16 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 16 The example illustrates a classic problem in decision making: We need to trade off exploration (gathering data about arm payoffs) and exploitation (making decisions based on data already gathered) The Greedy does not explore sufficiently Exploration: Pull an arm we never pulled before Exploitation: Pull an arm aa for which we currently have the highest estimate of μμ aa

17 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 17 The problem with our Greedy algorithm is that it is too certain in the estimate of μμ aa When we have seen a single reward of 0 we shouldn t conclude the average reward is 0 Greedy does not explore sufficiently!

18 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 18 Algorithm: Epsilon-Greedy For t=1:t Set εε tt = OO(11/tt) With prob. εε tt : Explore by picking an arm chosen uniformly at random With prob. 11 εε tt : Exploit by picking an arm with highest empirical mean payoff Theorem [Auer et al. 02] For suitable choice of εε tt it holds that RR TT = OO(kk log TT) RR TT TT = OO kk log TT TT 0

19 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 19 What are some issues with Epsilon Greedy? Not elegant : Algorithm explicitly distinguishes between exploration and exploitation More importantly: Exploration makes suboptimal choices (since it picks any arm equally likely) Idea: When exploring/exploiting we need to compare arms

20 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 20 Suppose we have done experiments: Arm 1: Arm 2: 1 Arm 3: Mean arm values: Arm 1: 5/10, Arm 2: 1, Arm 3: 8/10 Which arm would you pick next? Idea: Don t just look at the mean (that is, expected payoff) but also the confidence!

21 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 21 A confidence interval is a range of values within which we are sure the mean lies with a certain probability We could believe μμ aa is within [0.2,0.5] with probability 0.95 If we would have tried an action less often, our estimated reward is less accurate so the confidence interval is larger Interval shrinks as we get more information (try the action more often)

22 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 22 Assuming we know the confidence intervals Then, instead of trying the action with the highest mean we can try the action with the highest upper bound on its confidence interval This is called an optimistic policy We believe an action is as good as possible given the available evidence

23 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, % confidence interval μμ ii After more exploration μμ ii arm i arm i

24 Goal: Want to bound PP( μμ ii μμ aa,mm bb) 3/6/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 24 Suppose we fix arm a: Let YY aa,11 YY aa,mm be the payoffs of arm a in the first m trials So, YY aa,11 YY aa,mm are i.i.d. rnd. vars. taking values in [0,1] Mean payoff of arm a: μμ aa = EE[YY aa,mm ] Our estimate: μμ aa,mm = 11 mm l=11 Want to find bb such that with high probability μμ aa μμ aa,mm bb mm YY aa,l Also want bb to be as small as possible (why?)

25 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 25 Hoeffding s inequality: Let XX 11 XX mm be i.i.d. rnd. vars. taking values in [0,1] Let μμ = EE[XX] and μμ mm = 11 mm XX mm l=11 l Then: PP μμ μμ mm bb 22 eeeeee 22bb 22 mm = δδ To find out the confidence interval bb (for a given confidence level δδ) we solve 2ee 2bb2mm δδ then 2bb 2 mm ln (δδ/2) So: bb llll 22 δδ 22 mm

26 [Auer et al. 02] 3/6/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 26 UCB1 (Upper confidence sampling) algorithm Set: μμ 11 = = μμ kk = 00 and mm 11 = = mm kk = 00 μμ aa is our estimate of payoff of arm ii mm aa is the number of pulls of arm ii so far For t = 1:T Upper confidence interval (Hoeffding s inequality) For each arm a calculate: UUUUUU aa = μμ aa + αα Pick arm jj = aaaaaa mmmmxx aa UUUUUU aa 22 ln tt mm aa Pull arm jj and observe yy tt Set: mm jj mm jj + 11 and μμ jj 11 (yy mm tt + jj mm jj 11 μμ ) jj

27 3/6/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 28 UUUUUU aa = μμ aa + αα 22 ln tt mm aa Confidence interval grows with the total number of actions tt we have taken But shrinks with the number of times mm aa we have tried arm aa This ensures each arm is tried infinitely often but still balances exploration and exploitation αα plays the role of δδ: αα = ff 22 δδ Optimism in face of uncertainty The algorithm believes that it can obtain extra rewards by reaching the unexplored parts of the state space bb llll 22 δδ 22 mm

28 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 29 Theorem [Auer et al. 2002] Suppose optimal mean payoff is μμ = mmmmmm aa And for each arm let ΔΔ aa = μμ μμ aa Then it holds that llll TT EE RR TT = ΔΔ aa aa:μμ aa <μμ ππ kk ΔΔ aa ii=aa μμ aa So: OO RR TT TT OO(kk ln TT) = kk llll TT TT OO(kk)

29 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 30 k-armed bandit problem as a formalization of the exploration-exploitation tradeoff Analog of online optimization (e.g., SGD, BALANCE), but with limited feedback Simple algorithms are able to achieve no regret (in the limit) Epsilon-greedy UCB (Upper Confidence Sampling)

30 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 31 Every round receive context [Li et al., WWW 10] Context: User features, articles view before Model for each article s click-through rate

3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.

31 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 32 Feature-based exploration: Select articles to serve users based on contextual information about the user and the articles Simultaneously adapt article selection strategy based on user-click feedback to maximize total number of user clicks

32 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 33 Contextual bandit algorithm in round t (1) Algorithm observes user u t and a set A of arms together with their features x t,a Vector x t,a summarizes both the user u t and arm a We call vector x t,a the context (2) Based on payoffs from previous trials, algorithm chooses arm a A and receives payoff r t,a Note only feedback for the chosen a is observed (3) Algorithm improves arm selection strategy with each observation (xx tt,aa, aa, rr tt,aa )

33 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 34 Payoff of arm a: EE rr tt,aa xx tt,aa = xx T tt,aa x t,a d-dimensional feature vector θθ aa θθ aa unknown coefficient vector we aim to learn Note that θθ aa are not shared between different arms! What s the difference between LinUCB, UCB1? UCB2 directly estimates μμ aa through experimentation (without any knowledge about arm aa) LinUCB estimates μμ aa by regression μμ aa = xx TT tt,aa The hope is that we will be able to learn faster as we consider the context xx aa (user, ad) of arm aa θθ aa

34 Payoff of arm a: EE rr tt,aa xx tt,aa = xx T tt,aa θθ aa x t,a d-dimensional feature vector θθ aa unknown coefficient vector we aim to learn How to estimate θθ aa? DD aa mm dd matrix of mm training inputs [xx aa,tt ] bb aa mm-dim. vector of responses to a (click/no-click) Linear regression solution to θθ aa is then θθ aa = aaaaaa mmmmnn θθ mm DD aa Which is solved by: θθ aa = xx T (mm) tt,aa θθ aa bb 22 aa DD aa T DD aa + II dd 1 DDaa TT bb aa I d is d x d identity matrix 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 35

35 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 36 One can then show (using similar techniques as we used for UCB) that So LinUCB arm selection rule is: Estimated μμ aa Confidence interval: Standard deviation

36 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 37 Initialization: For each arm aa: AA aa = II dd bb aa = 00 dd identity matrix m x m vector of zeros Online algorithm: For t = 1, 2, 3, T: Observe features of all arms aa xx tt,aa RR dd For each arm aa: 11 θθ aa = AA aa bb aa regression coefficients pp tt,aa = θθ TT TT 11 aa xx tt,aa + αα xx tt,aa AA aa xx tt,aa Choose arm aa tt = aaaaaa mmmmmm pp tt,aa choose arm aa confidence bound TT AA aatt = AA aatt + xx tt,aatt xx tt,aatt update AA for the chosen arm aa tt bb aatt = bb aa tt + rr tt xx tt,aatt updated bb for the chosen arm aa tt

37 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 38 LinUCB computational complexity is Linear in the number of arms and At most cubic in the number of features LinUCB works well for a dynamic arm set (arms come and go): For example, in news article recommendation, for instance, editors add/remove articles to/from a pool

38 What to put in slots F1, F2, F3, F4 to make the user click? 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 40

39 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 41

40 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 42 Want to choose a set that caters to as many users as possible Users may have different interests, queries may be ambiguous Want to optimize both the relevance and diversity

41 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 43

42 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 44 Alternate final: Tue 3/14 7:00-10:00pm in Cubberly Auditorium Register here: We have 100 slots. First come first serve! Final: Mon 3/17 12:15-3:15pm in NVidia (lastname starting with A-J), GatesB01 (K-S), and Packard 101 (T-Z) See Practice finals are posted on Piazza! SCPD students can take the exam at Stanford!

43 3/5/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 45 Exam protocol for SCPD students: On Friday 3/14 your exam proctor will receive the PDF of the final exam from SCPD If you take the exam at Stanford: Ask the exam monitor to delete the SCP If you don t take the exam at Stanford: Arrange 3h slot with your exam monitor You can take the exam anytime but return it in time exam PDF to cs246.mmds@gmail.com by Tuesday 3/15 11:59pm Pacific time

An introduction to multi-armed bandits

An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to