Approximating the surface volume of convex bodies Hariharan Narayanan Advisor: Partha Niyogi Department of Computer Science, University of Chicago Approximating the surface volume of convex bodies p.
Surface volumes A natural measure of the quality of a cut in binary classification is its volume. If data points are from an open set in R d, this cut is the boundary of an open set, and the computation of its volume is of interest. Approximating the surface volume of convex bodies p.
Complexity of surface volumes The following is known about the hardness of approximating the volume of a convex set deterministically. Theorem 1 (Bárány-Füredi) There is no polynomial time deterministic algorithm that would compute a lower bound vol(k) and an upper bound vol(k) so that vol(k) vol(k) ( c n log n ) n. Approximating the surface volume of convex bodies p.
Complexity of surface volumes Let K be a convex body and C(K) be the cylinder over it of height h. Then vol C(K) hvol K vol K = 2. For small h, this approximates the volume of K. So approximating the surface volume is at least as hard as approximating the volume. K C(K) Figure 1: A cylinder of height h over K Approximating the surface volume of convex bodies p.
Randomized algorithms for volume The volume of a convex body can be computed in randomized polynomial-time as shown by Dyer, Frieze and Kannan [2]. Their algorithm took O(d 23 ) steps. In a series of papers, this was brought down to the current best - O (d 4 ) by Lovász and Vempala. Approximating the surface volume of convex bodies p.
Random algorithms for surface volum Grötschel, Lovász and Schrijver (1987) mention computing the surface volume of a convex body to be an open problem. The first and to our knowledge only work on surface volumes of convex bodies is by Dyer, Gritzmann and Hufnagel (1998), who gave a randomized polynomial time algorithm for this task. Their complexity analysis is sketchy. Approximating the surface volume of convex bodies p.
Random algorithms for surface volum The time complexity is {time for volume} {time for a quadratic program}. This appears to be O(d 9 ) (d = dimension) with present technology. On the other hand our running time is (in a restricted setting) O ( d 4 ɛ + d3.5 R 3 2 r 2 τɛ 3 where 1/τ is a condition number. ). Approximating the surface volume of convex bodies p.
Diffusion and Surface Volume The heat equation u = u t. u(x, 0) = f(x). has the solution u(, t) = f(x) K t (x, ), where K t (x, y), the heat kernel is (4πt) d/2 e x y 2 /4t. Approximating the surface volume of convex bodies p.
Diffusion and Surface Volume Let f be the function that takes the value 1 on points in M and 0 outside, i. e.u = 1 M. Then, u(x, t) = 1 M K t (, x). Define F t (M) = π/t R d M u(y, t)dy. Approximating the surface volume of convex bodies p.
Diffusion and Surface Volume In other words, we are assuming that the initial heat content per unit volume of M is 1. The quantity of heat that diffuses across the boundary from time 0 to t is K t (x, y)dxdy = t/πf t (M). R d M M Approximating the surface volume of convex bodies p. 1
Diffusion and Surface Volume We prove that lim F t (M) = vol M. t 0 If the condition number of M is τ = 1 Theorem ( 2 ( vol M = 1 + O d 3/2 )) t ln(1/t) F t (M) + (et ln(1/t))d/2 O( t )vol M Approximating the surface volume of convex bodies p. 1
Condition Number The Condition Number of a submanifold X of R d is defined to be 1/τ where τ is the largest number satisfying the following property: The open normal bundle about X of radius r is imbedded in R d for every r < τ. Approximating the surface volume of convex bodies p. 1
Condition Number Figure 2: M and two tangent spheres of radius τ Approximating the surface volume of convex bodies p. 1
Condition Number Alternate Definition when X is the boundary M of an open set M: Defined to be 1/τ where τ is the largest number satisfying the following property: For any r < τ, to every point p on M it is possible to draw two tangent spheres S 1 (p, r) and S 2 (p, r) such that S 1 (p, r) M and S 2 (p, r) M =. Approximating the surface volume of convex bodies p. 1
vol M and F t (M) Proposition 1 Let the dimension d 3, and let t < e 1 satisfy Then, t ln( 1 t ) < ɛ 40 2d 3/2. (1 2ɛ 5 )F t(m) < M < (1 + ɛ 2 )F t(m). Approximating the surface volume of convex bodies p. 1
Computing vol M when M is convex Let M be a convex body in R d. Let O be a point inside M with the property that a ball B(r ) of radius r and centre O is contained entirely inside M and a concentric ball B(R ) of radius R contains M. O Approximating the surface volume of convex bodies p. 1
Computing vol M when M is convex Consider the random variable z defined according to the following process. Definition 1 1. Choose a random point x out of the convex body M, from the uniform distribution. 2. Add to x a Gaussian random variable n having density function K t (0, n) = e n 2 /4t (4πt) d/2. 3. If x + n is outside M, set z to 1, else set z to 0. Approximating the surface volume of convex bodies p. 1
Computing vol M when M is convex Lemma 1 (vol M) πe[z ] t = F t (M). Proof: E[z ] = R d M M 1 K t (x, y)( vol M )dxdy. The lemma follows, since π F t (M) := t R d M M K t (x, y)dxdy. Approximating the surface volume of convex bodies p. 1
Computing vol M when M is convex Unfortunately, we cannot sample exactly from the uniform distribution. So, consider the random variable z defined according to the following process. Choose a random point x out of the convex body M, out of some fixed distribution with dɛ density ρ f that is within t 10R (1+ɛ /2) π of the uniform distribution on M in variation distance. Approximating the surface volume of convex bodies p. 1
Computing vol M when M is convex Add to x a Gaussian random variable n having density function K t (0, n) := e n 2 /4t (4πt) d/2. If x + n is outside M, set z to 1, else set z to 0. Approximating the surface volume of convex bodies p. 2
Computing vol M when M is convex Lemma 2 Let z be the random variable defined above. Let the dimension d 3, and let t < e 1 satisfy 1 t ln( t ) < ɛ 40 2d 3/2. Then, 1 ɛ /10 < (vol M) πe[z] tft (M) < 1 + ɛ /10. Approximating the surface volume of convex bodies p. 2
Computing vol M when M is convex Compute an estimate ˆv(M), of the volume of M to within a multiplicative error of ɛ/3 with confidence 7/8. Compute an estimate Ê[z] of E[z] with error ɛ/3, confidence 7/8. Call this estimate Ê[z]. Using a form of Hoeffding s ) inequality, we find (R that this takes O d ɛ samples. 3 Output π t Ê[z]ˆv(M). Approximating the surface volume of convex bodies p. 2
Computing vol M when M is convex Theorem 3 Let M be a convex body in R d. Let O be a point in M such that the ball of radius r centered at O is contained in M, and the concentric ball of radius R contains it. Let the condition number of M be 1/τ. Then, it is possible to find the surface area of M in time O ( d 4 ɛ + d3.5 R 3 2 r 2 τɛ 3 within an error of ɛ with probability greater than 3/4. ), Approximating the surface volume of convex bodies p. 2
Upper bound for M K(x, y)dx E_1 B_1 1 O_1 A H_1 F_1 R R R^2 D_1 G_1 M C Approximating the surface volume of convex bodies p. 2
Upper bound M K(x, y)dx < K(x, y)dx R d B 1 K(x, y)dx H 1 R d B 1 H 1 K(x, y)dx + B c 1 K(x, y)dx Approximating the surface volume of convex bodies p. 2
Upper bound Choose R = 2dt ln (1/t). Let the mass outside the ball of radius R that the gaussian with density 1 e x2 /4t be ɛ.then (4πt) d/2 ) ɛ = O ((et ln(1/t)) d/2. B 1 has radius > R and so K(x, y)dx < K(x, y)dx + ɛ. M H 1 Approximating the surface volume of convex bodies p. 2
Lower bound for M K(x, y)dx A M H_2 F_2 R C (R^2)/2 D_2 G_2 O_2 1 B_2 E_2 Approximating the surface volume of convex bodies p. 2
Lower bound M K(x, y)dx = > K(x, y)dx(since B 2 M) B 2 K(x, y)dx H 2 B 2 K(x, y)dx K(x, y)dx H 2 H 2 B2 c K(x, y)dx K(x, y)dx H 2 B c 2 Approximating the surface volume of convex bodies p. 2
Bounds for F t (M) y r H Let H K(x, y)dx =: h(r). Approximating the surface volume of convex bodies p. 2
Bounds for F t (M) Then, we have shown that 1. h(r + R 2 /2) ɛ < M K(x, y)dx 2. If r > R 2, h(r R 2 /2) + ɛ > M K(x, y)dx. Approximating the surface volume of convex bodies p. 3
Bounds for F t (M) Definition 2 Let [M] r denote the set of points at a distance of r to the manifold M. Let π r be map from [M] r to M that takes a point P on [M] r to the foot of the perpendicular from P to M. Lemma 3 Let y [M] r. Let the Jacobian of a map f be denoted by Df. (1 r) d 1 Dπ r (y) (1 + r) d 1. Approximating the surface volume of convex bodies p. 3
Bounds for F t (M) tau Q P Q P M Figure 4: M and two tangent spheres of radius τ Approximating the surface volume of convex bodies p. 3
Bounds for F t (M) Lemma 4 R d M R M K(x, y)dxdy ɛvol M. Lemma 5 (1 e α2 /4t ) π/t α 0 h(r)dr π/t. Approximating the surface volume of convex bodies p. 3
References [1] M. Belkin and P. Niyogi (2004). Semi-supervised Learning on Riemannian Manifolds. In Machine Learning 56, Special Issue on Clustering, 209-239. [2] M. Dyer, A. Frieze and R. Kannan, A random polynomial time algorithm for approximating the volume of convex sets (1991) in Journal of the Association for Computing Machinary, 38:1-17, [3] M.Dyer, P Gritzmann and A. Hufnagel, On the complexity of computing Mixed Volumes, In SIAM J, Comput. volume 27, No 2, pp 356-400, April 1998 [4] M. R. Jerrum, L. G. Valiant and V. V. Vazirani (1986), Random generation of Combinatorial structures from a uniform distribution. Theoretical Computer Science, 43, 169-188 [5] E. Levina and P.J. Bickel (2005). Maximum Likelihood estimation of intrinsic dimension. In Advances in NIPS 17, Eds. L. K. Saul, Y. Weiss, L. Bottou. [6] R. M. Karp and M. Luby, (1983). Monte-Carlo algorithms for enumeration and reliablility problems. Proc. of the 24th 33-1
IEEE Foundations of Computer Science (FOCS 83),56-64 [7] P.Niyogi, S. Weinberger, S. Smale (2004), Finding the Homology of Submanifolds with High Confidence from Random Samples. Technical Report TR-2004-08, University of Chicago [8] L. Lovász and S. Vempala (2004), Hit-and-run from a corner Proc. of the 36th ACM Symposium on the Theory of Computing, Chicago [9] S. Vempala and L. Lovász, Simulated annealing in convex bodies and an O (n 4 ) volume algorithm Proc. of the 44th IEEE Foundations of Computer Science (FOCS 03), Boston, 2003. 33-2