EC 51 MATHEMATICAL METHODS FOR ECONOMICS Lecture : Convex Sets Murat YILMAZ Boğaziçi University In this section, we focus on convex sets, separating hyperplane theorems and Farkas Lemma. And as an application we look at a linear production model and characterize efficiency. We begin with the definition of a convex set. These lecture notes are mostly based on Chapter 3 in Advanced Mathematical Economics by R.V. Vohra. Definition 1 A set C of vectors/points is called convex if for all x, y C and λ [0, 1], λx + (1 λ)y C. convex not convex Remark 1. (1) A set in R n is convex if whenever it contains two vectors/elements, it also contains the entire line segment connecting them. () If x 1, x,..., x k C is convex, then k i=1 λ ix i C where k i=1 λ i = 1, λ i 0 i. (3) Let X, Y R n be two convex sets. Then: (i) X + Y = {z R n : z = x + y x X, y Y } is convex. (ii) αx = {z R n : z = αx x X, α R} is convex. 1
(iii) X Y is convex (in fact, intersection of any collection of convex sets is convex.) (iv) X Y might not be convex. (4) {x : Ax = b x = 0} is convex. Why? Let x 1, x C = {x : Ax = b x 0}. Then, Ax 1 = b = Ax, and A(λx 1 + (1 λ)x ) = λax 1 + (1 λ)ax = λb + (1 λ)b = b. Thus, λx 1 + (1 λ)x C. (5) If f : S R is concave, then the set {(x, y) R n+1 : y f(x) x S R n } is convex. Why? Let (x 1, y 1 ), (x, y ) {(x, y) : y f(x)}. Then, y 1 f(x 1 ) and y f(x ). That is, λy 1 + (1 λ)y λf(x 1 ) + (1 λ)f(x ) f(λx 1 + (1 λ)x ). Thus, we get the following: (λy 1 + (1 λ)y, λx 1 + (1 λ)x ) {(x, y) : y f(x)} Note the reflection of this property in consumer theory: u concave upper contour set convex. Separating Hyperplane Theorems Main idea: b / C straight line C convex b y L z x C convex x chosen to be the closest element in C to point b, with b x. Line L is perpendicular to [b, x ] segment and is midway between x and b. For L to be our separator, need to show y C lies to the left/up of L. If not, all z [y, x ] C. But z is closer to b.
First, we make sure such an x (closest point to b) exists. Lemma 1 Let C be a compact set not containing the origin. Then, there exists an x C such that d(x, 0) = inf x C d(x, 0) > 0. Proof. Follows directly from the continuity of d(x) and the Weierstrass Maximum Theorem. Definition Let h R n and β R. A hyperplane is a set H h,β = {x R n : hx = β} A halfspace (below H h,β ) is a set H h,β = {x R n : hx β} A halfspace (above H h,β ) is a set H h,β = {x R n : hx β} x x {x R : (a, b) x 1} H (a,b),1 :ax 1 + bx = 1 ax 1 + bx 1 x 1 x 1 a hyperplane a half space Theorem 1 (Strict Separating Hyperplane Theorem) Let C be a closed convex set and b / C. Then there is a hyperplane H h,β such that h b < β < h x x C Proof. By a translation of the coordinates we assume that b = 0, without loss of generality. Choose x C that minimizes d(x, 0) for x C. By Lemma 1 above, such an x exists and d(x, 0) > 0. (Note that Lemma 1 assumes compactness but here we don t. Here is why: Pick any y C and let C = C {x C : d(x, 0) d(y, 0)}. Notice that C is closed because both C and {x C : d(x, 0) d(y, 0)} are closed. C is also bounded. Now, it is easy to see that the point in C closest to 0 is also the point in C closest to 0.) Let m be the midpoint of the line joining 0 to x, i.e. m = x. Choose H h,β that goes through m and is perpendicular to the line joining 0 and x. That is, we choose h to be the vector x scaled by d(x, 0). That is, h = x d(x x,0). Set β = h m. Notice β = h m = x d(x,0) = d(x,0). 3
Next, we verify that b = 0 is on one side of H h,β and x is on the other side. Observe that h b = 0 < d(x,0) = h m = β. Next, h x = x x d(x,0) = d(x, 0) > d(x,0) = h m = β. Now, pick any x C (x x ). Since C is convex (1 λ)x + λx C. From the choice of x, d(x, 0) d((1 λ)x + λx, 0). Since d(z, 0) = z z we have d(x, 0) [(1 λ)x + λx] [(1 λ)x + λx] = (x + λ(x x )) (x + λ(x x )) = d(x, 0) + λx (x x ) + λ d(x x, 0). That is, 0 x (x x ) + λd(x x, 0). Since, λ can be picked arbitrarily small, we get x (x x ) 0 x C. Using x = m and h = is, h x m h > h m = β. So x C, h x > β. x d(x,0), we get 0 [d(x, 0) h] (x m), that x x m C b = 0 H h,β x 1 h x = β This theorem basically says that a hyperplane H h,β strictly separates C from b, if C is closed and convex. If we drop the requirement that C be closed, we obtain a weaker result. Theorem (Weak Separating Hyperplane Theorem) Let C be a convex set and b / C. Then there is a hyperplane H h,β such that h b β h x x C Proof. The only difference from the proof of the previous theorem is that x is chosen so that d(x, 0) = inf x C d(x, 0). Since it is possible that x = b (i.e., if b were on the boundary of C, x not necessarily in C), the strict inequalities in the previous theorem must be replaced by weak inequalities. Theorem 3 Let C, D R n be two non-empty, disjoint, convex sets. Then there exists a hyperplane H h,β such that h x β h y for all x C and y D. Proof. K = {z : z = x y, x C, y D} is convex and 0 / K. By weak seperating hyperplane theorem, H h,β such that h 0 β h z z K. Pick any x C, y D. Then 4
h (x y) = h x h y 0. In particular, h x inf u C h u sup v D h v h y. Choose β [inf u C h u, sup v D h v] to complete the proof. What if both C and D are also closed? Do we get the strict version of above theorem? No, only if one of them is bounded. (counterexample?) Theorem 4 Let C, D R n be two non-empty, disjoint, closed and convex sets with at least one of them being bounded. Then there exists a hyperplane H h,β such that h x > β > h y for all x C and y D (where C is bounded). Proof. Similar to the one above. Just show K is closed and apply strict seperating hyperplane theorem. Definition 3 The set of all non-negative linear combinations of the columns of A m n is called the finite cone generated by the columns of A m n and denoted by cone(a). That is, cone(a) = {y R m : y = A m n x for some x R n +} Lemma cone(a) is convex and closed. Proof. Convexity is easy. For closedness, first show cone(b) is closed if all columns of B are LI. Complete the proof as an exercise. Theorem 5 (Farkas Lemma) Let A be an m n matrix and b R m. Let F = {x R n : Ax = b, x 0}. Then, either F 0 or y R m such that ya 0, y b < 0, but not both. Proof. First, we show not both part: Suppose F 0. Choose any x F. Then y b = y Ax = (y A)x 0. Now, suppose F = 0. Then, b / cone(a). Since cone(a) is closed and convex, we can use the strict seperating hyperplane theorem to identify a hyperplane H h,β that seperates b from cone(a). Without loss of generality, we can assume that h b < β < h z z cone(a). Since the origin is in cone(a), it is easy to see that β < 0. Let a j be the j th column vector of the matrix A. We show that h a j 0. Suppose not, i.e. h a j < 0. Note that λa j cone(a) for any λ 0. Thus, h (λa j ) > β since λa j cone(a). Since λ can be chosen arbitrarily large, h (λa j ) can be made smaller than β, which gives a contradiction. Thus, h a j 0 for all columns of A. Hence y = h is our required vector with y A 0 and y b < 0. 5
Polyhedrons and Polytopes Definition 4 Let S R n. A vector v R n can be expressed as a convex combination of vectors in S if there is a finite set {v 1,..., v m } S such that v = m j=1 λ jv j with m j=1 λ j = 1, λ j 0 j. Definition 5 Let S R n. The convex hull of S, conv(s), is the set of all vectors that can be expressed as a convex combination of vectors in S. (Alternatively: conv(s) is the smallest convex set containing S, or conv(s) is the intersection of all convex sets that contain S.) Definition 6 A set P R n is called a polytope if there is a finite S R n such that P = conv(s). Definition 7 A non-empty set P R n is called a polyhedron if there is an m n matrix A and a vector b R m such that P = {x R n : Ax b}. Theorem 6 The set of all convex combinations of a finite number of vectors is a polyhedron. Thus, a polytope is a polyhedron. A polyhedron is a polytope if it is also bounded. Definition 8 Let S R n be convex. An extreme point of S is a point that can not be expressed as a convex combination of any other points in S. Theorem 7 If P is a polytope, then each x P can be written as a convex combination of its extreme points. Application: Linear Production Model Let x R m be a non-negative input vector. Let y R n be a non-negative output vector. Let P be a m n production matrix that relates outputs to inputs as follows: y 1 n = x 1 m P m n. Here p ij is the amount of the j th output generated from one unit of the i th input. Let b R k be a non-negative resource/capacity vector that lists the amount of raw materials available for production. Let C m k be an m k non-negative consumption matrix that relates inputs to resources: x 1 m C m k b 1 k. Here c ij is the amount of resource j consumed to produce one unit of input i. The input space is X = {x R m : x C b, x 0}. The output space is Y = {y R n : y = x P, x X, y 0}. An output vector y is efficient if there is no other y Y such that y y. 6
Theorem 8 A vector y Y is efficient iff there exists a non-negative, non-trivial price vector p such that y p y p for all y Y. Proof. ( ): This is almost trivial. If y p y p y Y for some price vector p, then for no other y Y, y y. Thus, y is efficient. ( ): Suppose that y is efficient. First we prove the following claim: Claim 1 There exists a matrix D with n rows and a vector r such that Y = {y R n : y D r} Proof. Let x 1, x,..., x k be the extreme points of X. Pick any y Y. Then there is an x X such that y = x P. Since X is a polytope (X is a polyhedron and bounded. And every polyhedron that is bounded is also a polytope), any element in X can be expressed as a convex combination of its extreme points. Thus, {λ j } k j=1 such that x = λ 1x 1 +λ x +...+λ k x k. Thus, we can write y = λ 1 x 1 P + λ x P +... + λ k x k P. This means, each y Y can be written as a convex combination of {x 1 P, x P,..., x k P }. It is straightforward to see any convex combination of these vectors is also in Y. Hence, Y is a convex combination of a finite number of points, i.e., Y = conv({x 1 P,..., x k P }). Thus, Y is a polytope and hence it is a polyhedron, that is, A m n and b R m such that Y = {y R n : A y b} and the result follows. So, now we know Y = {y R n : y D r} for some D and r. Let S = {j : y d j = r j } where d j is the j th column of D. We show S. Suppose not. Then, y d j < r j for all j. Let w be the vector obtained from y by adding ɛ > 0 to the first component of y. Then w d j = y d j + ɛd 1j. The assumption S = allows us to choose ɛ sufficiently small so that y d j + ɛd 1j r j. Thus, w Y and w y contradicting the efficiency of y. Thus, S. Consider now the system {z d j 0} j S. We claim that there is no non-trivial non-negative solution z R n. If there is, there is an ɛ > 0 sufficiently small such that (y + ɛz)d j r j j, implying y + ɛz Y contradicting the efficiency of y. Since the system {z d j 0} j S does not admit a non-trivial non-negative solution, we have by a version of Farkas Lemma, non-negative numbers {λ j } j S such that j S λ jd j > 0. Setting p = j S λ jd j completes the proof of Theorem 8, since y j S λ jd j y j S λ jd j y Y. Note that y j S λ jd j = j S λ jyd j j S λ jr j = j S λ jy d j. See Propositions 5.F.1 and 5.F. in Mas-Colell et al, Microeconomic Theory, page 150-151, for a similar result. Theorem 8 is a simpler version of first and second welfare theorems. 7