EE228a - Lecture 20 - Spring 2006 Concave Games, Learning in Games, Cooperative Games

Size: px

Start display at page:

Download "EE228a - Lecture 20 - Spring 2006 Concave Games, Learning in Games, Cooperative Games"

Rosalind Murphy
5 years ago
Views:

1 EE228a - Lecture 20 - Spring 2006 Concave Games, Learning in Games, Cooperative Games Jean Walrand - scribed by Michael Krishnan (mnk@berkeley.edu) Abstract In this lecture we extend our basic game theory concepts to more complex situations. We start with games in a continuous action space, looking at a specific class (concave games) that still have a Nash Equilibrium. We also go into repeated games and the effects of learning. Finally we look at cooperative games, where we must find a fair way to distribute a resource among cooperative players. After this we take a look at some interesting fixed-point theorems. 1 Concave Games 1.1 Motivation In many applications, the possible actions belong to a continuous set. For instance one chooses prices, transmission rates, or power levels. In such situations, one specifies reward functions instead of a matrix or rewards. 1.2 Preliminaries Figure 1: Best strategy curves for a game with a continuous action space: In the first, the discontinuity results in no Nash Equilibrium. In the second, the curvature of the curves results in multiple Nash Equilibria. In the third there is a unique Nash Equilibrium.[1] From figure 1, we can see that it is possible to have no Nash Equilibrium, several Nash Equilibria, or one unique Nash Equilibrium. We are interested in a subset of continuous games that have a unique Nash Equilibrium. It turns out that a set of games that can satisfy this criteria is concave games. 1

2 Definition 1 (Concave Game) We shall define a concave game as a game in which player i chooses an action x i R m i so that x = (x 1,..., x n) C, where C is a closed bounded convex set. The payoff function of player i is φ i(x), continuous in x and concave in x i for x C. Definition 2 (Nash Equilibrium of a Concave Game) We can then define the Nash Equilibrium to be x 0 C such that φ i(x 0 ) φ i(x 0 i, y i) y i such that (x 0 i, y i) C. (Here (x 0 i, y i) is the notation used for a vector in which the action of the i t h player is replaced by y i.) We will now show that concave games have a unique Nash Equilibrium. We begin by proving existence: Theorem 1 (Existence) A concave game always has at least one Nash Equilibrium Proof: Define ρ(x, y) := P n i=1 φi(x i, yi) for (x, y) C2. Define ρ (x) := max z Cρ(x, z). Define Γ(x) := {y C ρ(x, y) = ρ (x)} Intuitively, Γ(x) i is the strategy that player i would play given that the other players are playing x. Thus a fixed point x Γ(x) of Γ must be a Nash Equilibrium. If we know that Γ is convex and graph-continuous, then by Kakutani (proof in section 4.2), it has a fixed point. In general, it may be hard to show that Γ is convex and graph-continuous, so we will look at a specialized case. We will require that C = {x h(x) 0} where h( ) : R m R k is a vector of C 1, concave functions. We assume that C is bounded and that there is some x 0 C such that h(x 0 ) > 0 (strictly greater to avoid the trivial case). Also, φ j(x) has a continuous first derivative jφ j(x) w.r.t. x j. Thus, the objective functions are concave and the constraints are concave ( ) 0. The proof for uniqueness is harder. To simplify things, we will use a new definition: Definition 3 (Diagonally Strictly Concave) The function σ(x, r) := P n i=1 riφi(x), r Rn + is diagonally strictly concave(dsc) if (x 1 x 0 ) g(x 0, r)+(x 0 x 1 ) g(x 1, r) > 0 x 0 x 1 C where g i(x, r) := r i iφ i(x) and is the transpose operator. We can then see that since σ(x + z, r) σ(x, r) z g(x, r), DSC implies that there are diminishing returns along any direction. Theorem 2 (Uniqueness) NE is unique if r > 0 s.t. σ(x, r) := P n i=1 riφi(x) is DSC. The key is that x is a NE if x j maximizes σ(r, x) j, and DSC implies that this maximizer is unique. A sufficient condition is that the matrix [G(x, r) + G (x, r)] is negative definite for x C, where G i,j(x, r) := 2 g(x,r) x i x j. In the bilinear case where φ i(x) = P n j=1 [e ij+x ic ij]x j for e ij R m j and C ij R m i m j, there 2 is a unique NE if every eigenvalue of C has a negative real part (where C := 6 4 2C 1,1 C 1,2... C 1,mj C 2,1 C 2,2... C 2,mj C mi,1 C mi,2... C mi,m j We can get a more intuitive feel for all of this by looking at local improvements. Choose r > 0 and define x(t), t > 0 by dx i = r dt i iφ i(x) + P j uj ihj(x) where the uj s project on C. Then if G + G is negative definite, the unique equilibrium is assymptotically stable. We can see this graphically in figure 2. With appropriate r i s and u j s x(t) stays in C. Eventually, it will move to the Nash Equilibrium. 2 Learning in Games 2.1 Motivation We now return to our simpler games with a discrete set of actions for each player, but extend to repeated games. Through repetition of the game, the players will learn about the strategies of 2

3 Figure 2: Local Improvements the other players. We will then be able to explain equilibria as the result of the players learning over time rather than being fully rational. This is somewhat more intuitively satisfying, because the fully rational assumption does not seem to apply to what we see in real life. 2.2 Examples Figure 3: A 2-player game with a Nash Equilibrium (D,L) strictly dominated by another strategy (U,R). We will look first at a simple 2-player game with a reward matrix given in figure 3. We can see that the Nash Equilibrium is (D,L), but both players can benefit if they play (U,R) instead. If P1 is patient and knows P2 chooses her play based on her forecast of P1s plays, then P1 should always play U to lead P2 to play R. In this way, a sophisticated and patient player who faces a nave opponent can develop a reputation for playing a fixed strategy and obtain the rewards of a Stackelberg leader. However, most theory avoids this situation by assuming random pairings in a large population. Without being able to assume the other player is naïve, the myopic strategy is optimal. It is also important to note that with repeated games, we need to assume the games are repeated infintely many times (or the players do not know how many times they will be played). We can see this through the example game in figure 3 repeated 100 times. Suppose you are P1. You may want to play U to encourage P2 to play R, but on the last trial, there are no future benefits for playing U, so you should switch to D. Knowing that your actions do not affect the future, the myopic strategy is clearly the right way to play. But we can iterate on this result. Now that the strategies are fixed for the 100th trial, actions on the 99th trial do not affect the future. It should therefore also be played NE. By induction, it follows that every trial should be played NE. In the Cournot game, we can reach a Nash Equilibrium without assuming fully rational players with full information. The repeating of the game allows the players to learn, adjusting his strategy to the best response to the strategy of the other player from the previous trial. This adjustment is a bit naïve in that it ignores the effect your previous play has on the other player, but it converges to NE nonetheless. 3

4 2.3 Models A learning model specifies rules of individual players and examines their interactions in repeated game. Usually the same game is repeated, though a few studies have been done on learning from similar games. Three models are: 1. Fictitious Play: Players observe result of their own match, play best response to the historical frequency of play. 2. Partial Best-Response Dynamics: In each period, a fixed fraction of the population switches to a best response to the aggregate statistics from the previous period. 3. Replicator Dynamics: Share of population using each strategy grows at a rate proportional to that strategys current payoff. 2.4 Fictitious Play In fictitious play, each player computes the frequency of the actions of the other players (with initial weights). Each player then selects best response to the empirical distribution. Theorem 3 Strict Nash Equilibria are absorbing for Fictitious Play (If s is a pure strategy and is steady-state for FP, then it is a NE) Proof: Assume s(t) = s is a strict NE. Let a = a(t) be the proportion of the players that are playing s at time t. The strategies played at time t + 1 are The utility for playing strategy r at time t + 1 is then p(t + 1) = (1a)p(t) + aδ(s), (1) u(t + 1, r) = (1a)u(p(t), r) + au(d(s), r), (2) which is maximized by r = s if u(p(t), r) is maximized by r = s. Figure 4: A reward matrix for the matching pennies game We can see an example of convergence in the matching pennies game, for which the reward matrix is given in figure 4. Suppose P1 has initial weights of (1.5,2) and P2 has initial weights of (2,1.5). Then the strategies progress as follows: Strategy New Empirical Distribution (T,T) (1.5,3),(2,2.5) (T,H) (2.5,3),(2,3.5) (T,H) (3.5,3),(2,4.5) (H,H) (4.5,3),(3,4.5) (H,H) (5.5,3),(4,4.5) (H,H) (6.5,3),(5,4.5) (H,T) (6.5,4),(6,4.5) Eventually, the distribution converges to each player playing 50/50. 4

5 Theorem 4 If under FP empirical converge, then product converges to NE Proof: Intuitively, if strategies converge, this means players do not want to deviate, so limit must be NE. Further, there are conditions that guarantee convergence of an empirical FP. These include, but are not limited to: 2 2 games with generic payoffs zero-sum games games solvable by iterated strict dominance Figure 5: A reward matrix for the matching pennies game But not all empirical distributions converge. Take, for example the coordination game with reward matrix given in figure 5, with initial weights (1, 2) for both P1 and P2. Then the strategies progress as follows: Strategy New Empirical Distribution (A,A) (2, 2) (B,B) (2, 1 + 2) (A,A) (3, 1 + 2) (B,B) (3, 2 + 2) The empirical frequencies converge to the Nash Equilibrium, yet the players get 0. The problem is that even though the frequencies are right, the players choices are correlated. They are not independent. This problem can be fixed by adding some randomness to the players strategies. This leads us to stochastic fictitious play. 2.5 Stochastic Fictitious Play The goal of stochastic fixed play is to get a stronger form of convergence not only of the marginals, but also of the intended plays. Let s be the strategies being played. We will define the reward of player i to be u(i, s)+n(i, s i ) where n has positive support on the interval, and s i is the strategies of the players other than player i. The best response on player i to a strategy distribution σ is BR(i, σ)(s i ) = P [n(i, s i )is s.t.s i = BRtoσ] (3) σ is a Nash distribution if σ i = BR(i, σ) i. Harsayni s Purification Theorem states that for generic payoffs, the Nash Distribution approaches the Nash Equilibrium as the support of the perturbation approaches 0. The key feature of this stochistic fictitious play model is that the BR curve is close to the original BR curve but with the discontinuities removed. This is illustrated in figure 6. Other results regarding stochasitc fictitious play has also been found recently. In 1993 Fudenberg and Kreps found that if smoothing is small enough in a 2 2 game with a unique mixed NE, then the NE is globally stable for SFP. In 1995, it was shown that for a 2 2 game with a unique strict NE, the unique intersection of the smoothed BR curve is a global attractor for SFP. 5

Figure 6: The best-response curve for the matching pennies game. The red curve is the regular BR curve, while the blue curve includes a random perturbation.

6 Figure 6: The best-response curve for the matching pennies game. The red curve is the regular BR curve, while the blue curve includes a random perturbation. Then, in 1996, it was shown that if a 2 2 game has 2 strict NE and one mixed NE, then the SFP converges to one of the strict NE with probability 1. While SFP has all of these nice results, it is important to note that if we increase the number of players, these results do not hold and cycling is still possible. Another possible justification for randomization is as protection against opponents mistakes. A learning rule should be safe (average utility minimax) and universally consistent (utility at least as good as if we knew the frequencies but not the order of plays). The universal consistency can be satisfied by randomization as we saw in the matching pennies example. There is also an alternative to SFP called Stimulus-Response (Reinforcement Learning). In SR, you increase the probability of plays that give good results. It is difficult to discriminate learning models on the basis of experimental data: SFP, SR, etc. seem all about comparable. 3 Cooperative Games 3.1 Motivation We have seen that in some games, like the Cournot game, the NE is not the most desirable outcome for the players. They can do better if they cooperate. In this section, we will In this section, we will explore some notions of equilibrium that players achieve under cooperation. 3.2 Notions of Equilibria Figure 7: Some different equilibria in a 2-player game: The shaded region is dominated by its border which is the pareto set. If we sweep a line with slope -1, the highest point at which the line is entirely in the shaded region gives us the max-min, and the lowest point at which the line is entireley in the unshaded region gives us the social welfare. Figure 7 shows some of the possible points we can operate at. x j is the reward of player j. Max-min is a nice point because it is the best for the player who gets less, but it is not that efficient of an equilibrium. On the other hand, the social equilibrium maximizes the total utility, but this can be very unfair in that one player may get much more that the other. The Nash bargaining equilibrium is a nice compromise. 6

7 3.3 Nash Bargaining Equilibrium Definition 4 (Nash Bargaining Equilibrium (NBE)) x N P x j x N j j x N j 0. is a NBE if x R one has Intuitively, this means that moving to any other vector of rewards results in a negative sum of relative changes of the rewards. The NBE maximizes Q j xj over the feasible vectors of rewards. Let us use an example in which Alice and Bob must decide how to share $100. Alice s utility for x dollars is a(x) = 10 + x. Bob s utility for x dollars is b(x) = x. In the NBE, we want x NBE = arg max x p (10 + x)(80 + 2(100 x)) = arg max x p (10 + x)(280 2x) = arg max x(10 + x)(280 2x) = 65 (4) This means that Alice gets x = 65 dollars and Bob gets the remaining = 35 dollars. Total welfare = a(65) + b(35) = = 20.8 Using the social equilibrium, x social = arg max p x( (10 + x) + p (80 + 2(100 x))) = 40 (5) Total welfare = a(40) + b(60) = = We can see that the NBE has less total utility than the social equilibrium. Figure 8: An axiomatic justification of the Nash Bargaining Equilibrium: In the first figure, both players have equal utility so the point where R 1 = R 2 is best. In the second, some of the points are no longer feasible, but the old fair equilibrium is, so it should still be considered the best. In the third figure, the R 1 axis is stretched, while the R 2 axis is compressed, but since a fair equilibrium should be independent of currency, (a 1 /2, a 2 /2) should still be the best equilibrium Nevertheless, we consider the NBE better in some some sense. justification. This idea can be seen in figure 8. It has a nice axiomatic 3.4 Shapely Value The Shapely Value addresses a slightly different kind of problem. We now want to split the money based on each player s value not their utility. Let us look for example at a situation in which Farmer Bob hires two workers. With one worker, he can produce $ With two he can produce $ Alone he can produce nothing, and without Bob, the workers can produce nothing. What is the fair way to split the $200.00? To determine the Shapely Value of each contributor, we assume they show up in random order. We will call one s contribution the marginal increase in value when he arrives. Each person should then receive a share equal to his average contribution. In this case, Bob s average contribution is $100, and the workers each have an average contribution of $50. 7

8 4 Fixed-Point Theorems 4.1 Brower Theorem 5 (Brower) Let S = {x = (x 1,..., x n) 0 P j xj = 1} and f : S S a continuous function. The function f admits a fixed point. Figure 9: The triangle used for the proof of Brower s theorem For the proof, we draw a large triangle that contains S (figure 9). We then label the corners as shown in figure 9. We divide the big triangle into many small triangles as shown. We then give each vertex a label according to the following procedure: The vertex represents some value x. Draw a ray from the vertex in the direction of f(x). The ray must exit the triangle crossing one of the edges. Label the vertex with the same label as the corner opposite the edge which the ray exits through. Figure 10: A fully labeled triangle and paths through doors A fully labeled triangle is shown in figure 10. Note that the vertices on an edge can only be labeled with the same number as one of the endpoints of that edge. This is becuase f maps S to S, so the rays must point into the triangle. We can now show that there must be a triangle with all three vertices with different labels (call this a triangle). We do this by calling a door any edge that connects a 1 and a 2. We can then draw paths through the triangle that only go through doors. As long as the path does not go into a triangle, there must be a door out of the triangle. But we observe that the bottom edge of the big triangle must have an odd number of doors. This is because there is a 1 on the far left and a 2 on the far right. Since there are no 3 s there must be an odd number of transitions from 1 to 2 and back. This means that at least one path that goes into the triangle cannot come out. This path must end in a triangle. We can then iterate this algorithm on the triangle repeatedly until we have an infinitely small triangle. In the limit, the triangle will converge to a point. Call it z. If f(z) were outside of the triangle, the triangle would not be a triangle (see figure 11). So f(z) must be in the triangle. But in the limit, the triangle is just a point. This means that f(z) = z. z is a fixed point for f. 8

9 Figure 11: An infinitely small triangle containing the point z 4.2 Kakutani Theorem 6 (Kakutani) Let S be a simplex and f( ) : S 2 S a function. Assume that this function is nonempty and convex, which means that f(x) and f(x) is a convex set for all x S. Assume further that this function is graph-continuous. Then there is a fixed point, i.e. some x S with x f(x). Here graph-continuous means that if u n f(s n) for n 1 and s n, u n) (s, u) as n, then u f(s). Proof: Triangulate the simplex. For the nth triangulation, define the function f n as follows: For a vertex x of the triangulation, we pick a point arbitrarily in f(x) and we call that point f n(x). For a point x that is not a vertex of a triangle, we define f n(x) as the linear interpolation of the values of f n at the vertices of the triangle of x. The function f n is continuous and, by Brower s theorem, admits a fixed point, x n. The sequence {x n, n 1} has a limit point x. Using the graph continuity of f( ), we show that x f(x). Designate by v n the triangulation vertex closest to x n, breaking ties arbitrarily. Also, let u n = f n(v n) f(v n). The triangulation is getting finer so that v n x. Also, because of the linear interpolation, u n x. The graph continuity then implies that x f(x). References [1] J.B. Rosen, Existence and Uniqueness of Equilibrium Points for Concave N-Person Games, Econometrica, 33, , July [2] Fudenberg D. and D.K. Levine The Theory of Learning in Games, MIT Press, Cambridge, Massachusetts. Chapters 1, 2, 4,

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming