Recursive Estimatio Raffaello D Adrea Sprig 2 Problem Set: Probability Review Last updated: February 28, 2 Notes: Notatio: Uless otherwise oted, x, y, ad z deote radom variables, f x (x) (or the short had f(x)) deotes the probability desity fuctio of x, ad f x y (x y) (or f(x y)) deotes the coditioal probability desity fuctio of x coditioed o y. The expected value is deoted by E [ ], the variace is deoted by Var[ ], ad Pr (Z) deotes the probability that the evet Z occurs. Please report ay errors foud i this problem set to the teachig assistats (strimpe@ethz.ch or aschoellig@ethz.ch).
Problem Set Problem Prove f(x y) = f(x) f(y x) = f(y). Problem 2 Prove f(x y, z) = f(x z) f(x, y z) = f(x z) f(y z). Problem 3 Come up with a example where f(x, y z) = f(x z) f(y z) but f(x, y) f(x) f(y) for cotiuous radom variables x, y, ad z. Problem 4 Come up with a example where f(x, y z) = f(x z) f(y z) but f(x, y) f(x) f(y) for discrete radom variables x, y, ad z. Problem 5 Let x ad y be biary radom variables: x, y {, }. Prove that f x,y (x, y) f x (x) + f y (y), which is called Boferroi s iequality. Problem 6 Let y be a scalar cotiuous radom variable with probability desity fuctio f y (y), ad g be a fuctio trasformig y to a ew variable x by x = g(y). Derive the chage of variables formula, i.e. the equatio for the probability desity fuctio f x (x), uder the assumptio that g is cotiuously differetiable with dg(y) dy < y. Problem 7 Let x X be a scalar valued cotiuous radom variable with probability desity fuctio f x (x). Prove the law of the ucoscious statisticia, that is, for a fuctio g( ) show that the expected value of y = g(x) is give by E [y] = g(x)f x (x) dx. X You may cosider the special case where g( ) is a cotiuously differetiable, strictly mootoically icreasig fuctio. Problem 8 Prove E [ax + b] = ae [x] + b, where a ad b are costats. Problem 9 Let x ad y be scalar radom variables. Prove that, if x ad y are idepedet, the for ay fuctios g( ) ad h( ), E [g(x)h(y)] = E [g(x)] E [h(y)]. 2
Problem From above, it follows that if x ad y are idepedet, the E [xy] = E [x] E [y]. Is the coverse true, that is, if E [xy] = E [x] E [y], does it imply that x ad y are idepedet? If yes, prove it. If o, fid a couter-example. Problem Let x X ad y Y be idepedet, scalar valued discrete radom variables. Let z = x + y. Prove that f z (z) = y Y f x (z y) f y (y) = x X f x (x) f y (z x), that is, the probability desity fuctio is the covolutio of the idividual probability desity fuctios. Problem 2 Let x be a scalar cotiuous radom variable that takes o oly o-egative values. Prove that, for a >, Pr (x a) E [x] a, which is called Markov s iequality. Problem 3 Let x be a scalar cotiuous radom variable. Let x = E [x], σ 2 = Var[x]. Prove that for ay k >, Pr ( x x k) σ2 k 2, which is called Chebyshev s iequality. Problem 4 Let x, x 2,..., x be idepedet radom variables, each havig a uiform distributio over [, ]. Let the radom variable m be defied as the maximum of x,..., x, that is, m = max{x, x 2,..., x }. Show that the cumulative distributio fuctio of m, F m ( ), is give by F m (m) = m, m. What is the probability desity fuctio of m? Problem 5 Prove that E [ x 2] (E [x]) 2. Whe does the statemet hold with equality? Problem 6 Suppose that x ad y are idepedet scalar cotiuous radom variables. Show that Pr (x y) = F x (y)f y (y) dy. 3
Problem 7 Use Chebyshev s iequality to prove the weak law of large umbers. Namely, if x, x 2,... are idepedet ad idetically distributed with mea µ ad variace σ 2 the, for ay ε >, ( ) x + x 2 + + x Pr µ > ε as. Problem 8 Suppose that x is a radom variable with mea ad variace 5. What ca we say about Pr (5 < x < 5)? Problem 9 Let x ad y be idepedet radom variables with meas µ x ad µ y ad variaces σ 2 x ad σ 2 y, respectively. Show that Var[xy] = σ 2 xσ 2 y + µ 2 yσ 2 x + µ 2 xσ 2 y. Problem 2 a) Use the method preseted i class to obtai a sample x of the expoetial distributio { λe ˆf λx x x (x) = x < from a give sample u of a uiform distributio. b) Expoetial radom variables are used to model failures of compoets that do ot wear, for example solid state electroics. This is based o their property Pr (x > s + t x > t) = Pr (x > s) s, t. A radom variable with this property is said to be memoryless. variable with a expoetial distributio satisfies this property. Prove that a radom 4
Sample solutios Problem From the defiitio of coditioal probability, we have f(x, y) = f(x y)f(y) ad f(x, y) = f(y x)f(x) ad therefore f(x y)f(y) = f(y x)f(x). () We ca assume that f(x) ad f(y) ; otherwise f(y x) ad f(x y), respectively, are ot defied, ad the statemet uder cosideratio does ot make sese. We will check both directios of the equivalece statemet: f(x y) = f(x) f(y x) = f(y) () = f(x)f(y) = f(y x)f(x) f(x) = f(y) = f(y x) () = f(x y)f(y) = f(y)f(x) f(y) = f(x) = f(x y) Therefore, f(x y) = f(x) f(y x) = f(y). Problem 2 Usig f(x, y z) = f(x y, z)f(y z) (2) both directios of the equivalece statemet ca be show as follows: f(x y, z) = f(x z) f(x, y z) = f(x z)f(y z) (2) = f(x, y z) = f(x z)f(y z) (2) = f(x y, z)f(y z) = f(x z)f(y z) = f(x y, z) = f(x z) sice f(y z), otherwise f(x y, z) is ot defied. Problem 3 Let x, y, z [, ] be cotiuous radom variables. Cosider the joit pdf f(x, y, z) = c g x (x, z) g y (y, z) where g x ad g y are fuctios (ot ecessarily pdfs) ad c is a costat such that f(x, y, z) dx dy dz = The f(x, y z) = with f(x, y, z) f(z) = c g x(x, z) g y (y, z), c G x (z) G y (z) G x (z) = g x (x, z) dx, G y = g y (y, z) dy 5
sice f(z) = c g x (x, z) g y (y, z) dx dy = c G x (z) g y (y, z) dy = c G x (z) G y (z). Furthermore, f(x z) = f(x, z) f(z) = f(x, y, z) dy f(z) = c g x(x, z)g y (z) c G x (z)g y (z) = g x(x, z) G x (z) ad, by aalogy, f(y z) = g y(y, z) G y (z). Cotiuig the above, f(x, y z) = g x(x, z)g y (y, z) G x (z)g y (z) = f(x z)f(y z), that is, x ad y are coditioally idepedet. Now, let g x (x, z) = x + z ad g y (y, z) = y + z. The, f(x, y) = = c f(x, y, z) dz = c [ 3 z3 + 2 (x + y)z2 + xyz (x + z)(y + z) dz = c ] z= z= z 2 + (x + y)z + xy dz ( = c 3 + ) (x + y) + xy 2 f(x) = f(y) = Therefore, f(x, y) dx = c f(x, y) dy = c 3 + 2 x + ( y + xy dx = c 2 3 + 4 + 2 y + ) ( 2 y = c y + 7 ) 2 ( x + 7 ) 2 (by symmetry). ( 49 f(x)f(y) = c 2 44 + 7 ) (x + y) + xy f(x, y), 2 that is, x ad y are ot idepedet. Problem 4 As i Problem 3, let f(x, y, z) = c g x (x, z)g y (y, z). This esures that f(x, y z) = f(x z)f(y z). Let x, y, z {, }. 6
Defie g x ad g y by value tables: x y g x (x, z) y z g y (y, z) x y z f(x, y, z) = 2 g x(x, z)g y (y, z).5.5 Compute f(x, y), f(x), f(y): x y f(x, y).5.5 x f(x).5.5 y f(y).5.5 x y f(x)f(y).25.25.25.25 From this we see f(x, y) f(x)f(y). Remark: For verificatio, oe may also compute f(x, y z), f(x z), f(y z) i order to verify f(x, y z) = f(x z) f(y z), although this holds by costructio of f(x, y, z) ad the derivatio i Problem 3. Problem 5 For all x, y {, } we have by margializatio, f x (x) = f xy (x, y) = f xy (x, ) + f xy (x, ) = f xy (x, y) + f xy (x, y) f y (y) = Therefore, y {,} x {,} f xy (x, y) = f xy (x, y) + f xy ( x, y). f x (x) + f y (y) = 2f xy (x, y) + f xy (x, y) + f xy ( x, y) = 2f xy (x, y) + f xy (x, y) + f xy ( x, y) + f xy ( x, y) f xy ( x, y) = f xy (x, y) + f xy (x, y) + f xy ( x, y) + f xy ( x, y) }{{} = f xy (x, y) = y {,} x {,} f xy ( x, y) + f xy (x, y) = + f xy (x, y) f xy ( x, y) + f xy (x, y) = f xy (x, y) f x (x) + f y (y). 7
Problem 6 Cosider the probability that y is i a small iterval [ȳ, ȳ + y] (with y > small), Pr (y [ȳ, ȳ + y]) = ȳ+ y ȳ f y (y) dy, which approaches f y (ȳ) y i the limit as y. Let x := g(ȳ). For small y, we have by a Taylor series approximatio, g(ȳ + y) g( y) + dg dy (ȳ) y = x + x, x < }{{} =: x Sice g(y) is decreasig, x <, see Figure. x g(y) x + x ȳ ȳ + y Figure : Decreasig fuctio g(y) o a small iterval [ȳ, ȳ + y]. We are iterested i the probability of x [ x + x, x]. For small x, this probability is equal to the probability that y [ȳ, ȳ + y] (sice y [ȳ, ȳ + y] g(y) [g(ȳ), g(ȳ + y)] x [ x + x, x] for small y ad x). Therefore, for small y (ad thus small x), Pr (y [ȳ, ȳ + y]) = f y (ȳ) y = Pr (x [ x + x, x]) = f x ( x) x f y (ȳ) = f x ( x) x f x ( x) = f y(ȳ) x y As y (ad thus x ), x f x ( x) =. f y(ȳ) dg dy (ȳ) or f x(x) = y dg dy f y(y) dg dy (y). (ȳ) ad therefore Problem 7 From the chage of variables formula, we have f y (y) = f x(x) dg dx (x) See lecture. 8
sice g is cotiuously differetiable ad strictly mootoic. Furthermore, y = g(x) dy = dg dx (x)dx. Let Y = g(x ), that is Y = {y x X : y = g(x)}. The E [y] = yf y (y) dy = Y X g(x) f x(x) dg dg dx (x) dx (x) dx = g(x)f x (x) dx. X Problem 8 Usig the law of the ucoscious statisticia 2, we have for a discrete radom variable x, E [ax + b] = x X ad for a cotiuous radom variable x, E [ax + b] = (ax + b)f x (x) = a xf x (x) +b f x (x) = ae [x] + b x X x X (ax + b)f x (x) dx = a } {{ } E[x] xf x (x) dx +b } {{ } E[x] } {{ } f x (x) dx = ae [x] + b. } {{ } Problem 9 Usig the law of the ucoscious statisticia 3 applied to the vector radom variable pdf f xy (x, y) we have, [ ] x with y E [g(x)h(y)] = g(x)h(y)f xy (x, y) dx dy = g(x)h(y)f x (x) f y (y) dx dy (by idepedece of x, y) = g(x)f x (x) dx h(y)f y (y) dy = E [g(x)] E [h(y)]. Ad similarly for discrete radom variables. Problem We will costruct a couter-example. Let x be a discrete radom variable that takes values -,, ad with probabilities 4, 2, ad 4, respectively. 2 See Problem 7. 3 See Problem 7. 9
Let w be a discrete radom variable takig the values -, with equal probability. Let x ad w be idepedet. Let y be a discrete radom variable defied by y = xw (it follows that y ca take values -,, ). By costructio, it is clear that y depeds o x. We will ow show that y ad x are ot idepedet, but E [xy] = E [x] E [y]. PDF f xy (x, y): x y f xy (x, y) - - 8 x = (Prob 4 ) w = + (Prob 2 ) - impossible sice w - 8 x = (Prob 4 ) w = (Prob 2 ) - impossible 2 w either ±, x = - 8 8 Margializatio f x (x), f y (y):. x f(x) -.25.5.25 y f(y) -.25.5.25 It follows that f xy (x, y) = f x (x) f y (y) x, y does ot hold. For example, f xy (, ) = 8 = f x ( ) f y (). Expected values: E [xy] = x f xy (x, y) y = ( )( ) 8 + ( )() 8 + ()() 2 + ()( ) 8 + ()() 8 = E [x] = x xf x (x) E [y] = = 4 + 4 = So, for all x, y, E [xy] = E [x] E [y], but ot f xy (x, y) = f x (x) f y (y). Hece, we have showed by couterexample that E [xy] = E [x] E [y] does ot imply idepedece of x ad y. Problem From the defiitio of the probability desity fuctio for discrete radom variables, f z ( z) = Pr (z = z).
The probability that z takes o z is give by the sum of the probabilities of all possibilities of x ad y such that z = x + y, that is all ȳ Y such that x = z ȳ ad y = ȳ (or, alteratively, x X s.t. x = x ad y = z x). That is, f z ( z) = Pr (z = z) = ȳ Y Pr (x = z ȳ) Pr (y = ȳ) (by idepedece assumptio) = ȳ Y f x ( z ȳ) f y (ȳ) Rewritig the last equatio yields f z (z) = y Y f x (z y) f y (y). Similarly, f z (z) = x X f x (x) f y (z x) may be obtaied. Problem 2 E [x] = = a a a xf(x) dx = xf(x) dx a xf(x) dx + } {{ } a af(x) dx f(x) dx = a Pr (x a) a xf(x) dx Problem 3 This iequality ca be obtaied usig Markov s iequality 4 : (x x) 2 is a oegative radom variable; apply Markov s iequality with a = k 2 (k ): Pr ( (x x) 2 k 2) E [ (x x) 2] k 2 = Var[x] k 2 = σ2 k 2 Sice (x x) 2 k 2 x x k, the result follows: Pr ( x x k) σ2 k 2. Problem 4 For t, we write F m (t) = Pr (m [, t]) = Pr (max{x,..., x } [, t]) = Pr (x i [, t] i {,..., }) = Pr (x [, t]) Pr (x 2 [, t]) Pr (x [, t]) = t. 4 See Problem 2.
By substitutig m for t, we obtai the desired result. Sice F m (m) = f m (m) = df m dm (m) = f m (τ) dτ, the pdf of m is obtaied by differetiatig, d dm (m ) = m. Problem 5 Let x = E [x]. The Var[x] = (x x) 2 f(x) dx = = E [ x 2] 2 x 2 + x 2 = E [ x 2] x 2 Sice Var[x], we have E [ x 2] = x 2 + Var[x] x 2, with equality if Var[x] =. x 2 f(x) dx 2 x xf(x) dx + x 2 } {{ } = x f(x) dx } {{ } = Problem 6 For fixed ȳ, Pr (x ȳ) = F x (ȳ). The probability that y [ȳ, ȳ + y] is Pr (y [ȳ, ȳ + y]) = ȳ+ y ȳ f y (y) dy f y (ȳ) y for small y. Therefore, the probability that x ȳ ad y [ȳ, ȳ + y] is Pr (x ȳ y [ȳ, ȳ + y]) = Pr (x ȳ) Pr (y [ȳ, ȳ + y]) F x (ȳ) f y (ȳ) y (3) by idepedece of x ad y. Now, sice we are iterested i Pr (x y), i.e. i the probabilty of x beig less tha ay y ad ot just a fixed ȳ, we ca sum (3) over all possible itervals [ȳ i, ȳ i + y] Pr ( ȳ i such that x ȳ i ) = lim By lettig y, we obtai Pr (x y) = F x (y) f y (y) dy. N N i= N F x (ȳ i ) f y (ȳ i ) y 2
Problem 7 Cosider the radom variable x +x 2 + +x. It has the mea [ ] x + x 2 + + x E = (E [x ] + + E [x ]) = ( µ) = µ ad variace [ ] [ (x x + x 2 + + x + x 2 + + x Var = E = [ (E 2 (x µ) 2 + + (x µ) 2 + cross terms The expected value of the cross terms is zero, for example, Hece, ) ] 2 µ ]). E [(x i µ) (x j µ)] = E [x i µ] E [x j µ] for i j, by idepedece =. [ ] x + x 2 + + x Var = 2 (Var[x ] +... Var[x ]) = σ2 2 = σ2 Applyig Chebyshev s iequality with ϵ > gives ( ) x + x 2 + + x Pr ϵ σ2 ϵ 2 ( ) x + x 2 + + x Pr > ϵ Remark: Chagig to > is valid here, sice ( x,..., x ) ( x,..., x ) ( x,..., x ) Pr µ ϵ = Pr µ > ϵ + Pr µ =. } {{} = sice x + +x is a CRV Problem 8 First, we rewrite Pr (5 < x < 5) = Pr ( x < 5) = Pr ( x 5). Usig Chebyshev s iequality, with x =, σ 2 = 5, k = 5, yields Pr ( x 5) 5 5 2 = 5 25 = 3 5. Therefore, we ca say that Pr (5 < x < 5) 3 5 = 2 5. 3
Problem 9 First, we compute the mea of xy, E [xy] = E [x] E [y] = µ x µ y. Hece, by usig idepedece of x ad y ad the result of Problem 7, [ Var[xy] = E (xy) 2] (µ x µ y ) 2 = E [ x 2] E [ y 2] µ 2 xµ 2 y Usig E [ x 2] = σ 2 x + µ 2 x ad E [ y 2] = σ 2 y + µ 2 y, we obtai the result Var[xy] = ( σ 2 x + µ 2 x) ( σ 2 y + µ 2 y) µx µ y = σ 2 xσ 2 y + µ 2 yσ 2 x + µ 2 xσ 2 y. Problem 2 a) The cumulative distributio is ˆF x (x) = x λe λt dt = [ e λt] x t= = e λx. Let u be a sample from a uiform distributio. Accordig to the method preseted i class, we obtai a sample x by solvig u = ˆF x (x) for x, that is, u = e λx e λx = u λx = l ( u) x = Notice the followig properties: u x, u x +. b) For s, t, l( u). λ Pr (x > s + t x > t) Pr (x > s + t) Pr (x > s + t x > t) = = Pr (x > t) Pr (x > t) [ s+t = λe λx dx e λx ] t λe λx dx = s+t [ e λx ] = e λ(s+t) t e λt = e λs = s = Pr (x > s). λe λx dx 4