Probabilistic multi-resolution scanning for cross-sample differences

Size: px

Start display at page:

Download "Probabilistic multi-resolution scanning for cross-sample differences"

Jessica French
6 years ago
Views:

1 Probabilistic multi-resolution scanning for cross-sample differences Li Ma (joint work with Jacopo Soriano) Duke University 10th Conference on Bayesian Nonparametrics Li Ma (Duke) Multi-resolution scanning June 23, / 36

2 Example: Flow cytometry SSC A FSC A SSC A FSC H FSC A FSC H Aqua FSC A Aqua Dext CD8 Dext Is there any difference in two cell populations? What is the difference? Often only a small subset ( 1%) of cells are involved. Li Ma (Duke) Multi-resolution scanning June 23, / 36

3 Challenges: Identifying highly local differences in large data sets Rich distributional features: Multi-modality, skewness, and other tail behaviors etc. Potential differences can be of a variety of shapes or forms. Differences are often highly local involving small portions of the data. Requires large amounts of computation to fit nonparametric models. Li Ma (Duke) Multi-resolution scanning June 23, / 36

4 What is needed? Detectors (not just tests!) for cross-sample difference that are Flexible (i.e. nonparametric). Sensitive to highly local structures. Computationally efficient. Bonus: allow principled uncertainty quantification and decision making. Li Ma (Duke) Multi-resolution scanning June 23, / 36

5 A simple inference strategy: multi-resolution scanning Scan over the sample space using moving windows. Use windows of a variety of sizes (i.e. resolutions). Carry out a base inference task (e.g. a two-sample test) on each window. Combine evidence across windows to summarize the inferred structure. Li Ma (Duke) Multi-resolution scanning June 23, / 36

6 Divide-and-conquer: multi-resolution scanning Multi-resolution scanning (MRS) transforms a large data set small information packets a complex nonparametric problem simple parametric problems. Computational efficiency and ease for parallelized computing. This strategy is simple, flexible, and computationally efficient. Caveat: The information packets may be too small for identifying highly local structures. Li Ma (Duke) Multi-resolution scanning June 23, / 36

7 Two-sample comparison Observe two data samples x 11,x 12,...,x 1n1 Q 1 x 21,x 22,...,x 2n2 Q 2. How are Q 1 and Q 2 different, if at all? Li Ma (Duke) Multi-resolution scanning June 23, / 36

8 A multi-resolution windowing scheme Construct windows of various sizes through nested dyadic partitioning: k=0 Ω k=1 k=2 A k=3 A l A r We shall refer to the partitioning also as a windowing scheme (tree).. Li Ma (Duke) Multi-resolution scanning June 23, / 36

9 Multi-resolution decomposition of probability distribution Scale k: A θ(a) 1 θ(a) Scale k+1: A l A r On each window A, θ(a) gives the proportion of probability in A l. Q can be fully represented by θ(a) on all windows Q {θ(a) : A T } where T is a windowing tree that generates the Borel σ-algebra. We shall call the θ(a) s probability assignment coefficients (PACs). Li Ma (Duke) Multi-resolution scanning June 23, / 36

10 Induced decomposition of the statistical experiment The transform of a distribution into PACs induces a decomposition of the experiment of observing an i.i.d. sample into a collection of sequential Binomial experiments: n A θ(a) n A l n A r 1 θ(a) ion to PACs The(red experiment ticks) on windowb A: Binomial experiment n(a l ) Binomial(n(A),θ(A)). Li Ma (Duke) Multi-resolution scanning June 23, / 36

11 Induced decomposition of the statistical experiment The likelihood of the i.i.d. sample can be written as n L (Q) = q(x i ) = L A (Q) i=1 A T where L A (Q) = C A (x) θ(a) n(a l) (1 θ(a)) n(a r) So each window contribute to the empirical evidence in an orthogonal manner x 1,x 2,...,x n {(n(a l ),n(a r )) : A T }. The sufficient statistic (n(a l ),n(a r )) forms the information packet for window A. The likelihood principle implies that one can model the Binomial experiments ignoring the sequential nature in the Bayesian paradigm. Li Ma (Duke) Multi-resolution scanning June 23, / 36

12 Two-sample comparison Scale k: A θ 1 (A) θ 2 (A) 1 θ 2 (A) 1 θ 1 (A) Scale k+1: A l A r Q 1 and Q 2 are different if and only if θ 1 (A) θ 2 (A) on some windows. Base inference task: On each window A, test a simple two-sample hypothesis: H 0 (A) : θ 1 (A) = θ 2 (A) under the corresponding Binomial experiment. Li Ma (Duke) Multi-resolution scanning June 23, / 36

13 MRS for two-sample difference Scan over the windows (up to some maximum resolution). Carry out a hypothesis test for H 0 (A) : θ 1 (A) = θ 2 (A) on each window under the model n 1 (A l ) Binomial(n 1 (A),θ 1 (A)) n 2 (A l ) Binomial(n 2 (A),θ 2 (A)). Combine the evidence, and report the significant windows. Li Ma (Duke) Multi-resolution scanning June 23, / 36

14 Base inference task A variety of testing strategies can be used, such as classical LR test or χ 2 test that give a p-value. Look for windows A with very small p-values. How small is small? Need to adjust for multiple testing in a resolution-specific way. We shall take a fully probabilistic Bayesian approach to complete the base inference task. Main advantage: To facilitate borrowing strength across windows. Li Ma (Duke) Multi-resolution scanning June 23, / 36

15 Bayesian hypothesis testing on each window A Introduce a latent state variable S(A) such that S(A) = 0 if H 0 (A) is true and = 1 if H 0 (A) is false. Hypothesis testing is to infer the state S(A). Li Ma (Duke) Multi-resolution scanning June 23, / 36

16 It requires specifying priors on S(A) and on θ 1 (A),θ 2 (A) given S(A). Adopting conjugate Beta priors on (θ 1 (A),θ 2 (A)), We shall write this as Bernoulli prior on S(A): θ 1 (A) = θ 2 (A) S(A) = 0 Beta(α l (A),α r (A)) θ 1 (A),θ 2 (A) S(A) = 1 ind Beta(α l (A),α r (A)). θ 1 (A),θ 2 (A) S(A) paired-beta(α(a),s(a)). S(A) Bernoulli(ρ(A)). Li Ma (Duke) Multi-resolution scanning June 23, / 36

17 Hierarchical model representation The MRS for two-sample comparison can be expressed as a hierarchical model: S(A) Bernoulli(ρ(A)) θ 1 (A),θ 2 (A) S(A) paired-beta(α(a),s(a)) independently for all windows A. n 1 θ 1 (A) Binomial(n 1 (A),θ 1 (A)) n 2 θ 2 (A) Binomial(n 2 (A),θ 2 (A)) The model is fully conjugate, and the evidence on each window is summarized in the posterior probability for S(A) = 1: P(S(A) = 1 x) = (1 ρ(a))/ρ(a) BF(A). We call this the posterior marginal alternative probability (PMAP) on A. Li Ma (Duke) Multi-resolution scanning June 23, / 36

18 Summarizing evidence and reporting significant windows For proper multiple testing adjustment, we need to make it harder for calling a windows in high resolution. Why? Specifically, we need to let ρ(a) = P(S(A) = 1) 2 k where k is the resolution level of A. This is really bad news for identifying local differences, because small windows tend to contain limited data in the first place! Li Ma (Duke) Multi-resolution scanning June 23, / 36

19 Borrowing strength across scanning windows The MRS strategy treats each window as an independent inferential unit. The PMAP on A does not take into account empirical evidence from other windows. Does nearby windows provide useful information? Yes! Due to the spatially clustering nature of differential structures. Essentially need smoothing on the cross-sample variability. How to borrow strength across windows? Incorporating dependency into the hypotheses across windows through a graphical model on the latent variable {S(A) : A T }. Li Ma (Duke) Multi-resolution scanning June 23, / 36

20 Markov tree (Crouse et al 1998) Model the state variable S(A) for all A s using a Markov process on the multi-resolution tree. Scale k: S(A) = g ρ g,h (A l ) ρ g,s (A r ) Scale k+1: S(A l ) = h S(A r ) = s This makes the null/alternative state on the windows attain spatial-scale dependency. Li Ma (Duke) Multi-resolution scanning June 23, / 36

21 Hierarchical model representation The new hierarchical model {S(A) : A T } MT(ρ) θ 1 (A),θ 2 (A) S(A) paired-beta(α(a),s(a)) S A ρ(a) ρ(a) θ 1 A, θ 2 A S A l S A r θ 1 A l, θ 2 A l θ 1 A r, θ 2 A r Li Ma (Duke) Multi-resolution scanning June 23, / 36

22 Specifying the Markov transition matrix If A s parent is in state s, then A is in state t with probability ρ s,t (A). P(S(A) = h S(parent(A)) = g) = ρ g,h (A). A parsimonious two-parameter specification: ( ) ( ρ0,0 (A) ρ ρ(a) = 0,1 (A) 1 γ2 k γ2 = k ρ 1,0 (A) ρ 1,1 (A) 1 γβ k γβ k ) where k is the level of A. The 2 k factor is included to control the prior expected number of rejections at each resolution. γ controls the prior expected number of significant windows. β controls the level of spatial dependency. Li Ma (Duke) Multi-resolution scanning June 23, / 36

23 Full posterior Given i.i.d. samples from Q 1 and Q 2, the joint posterior is still a Markov-MRS {S(A) : A T } ρ,x MT( ρ) θ 1 (A),θ 2 (A) S(A), α,x paired-beta(s(a), α(a)) with ρ = { ρ(a) : A T } given as ρ(a) = diag(φ(a)) 1 ρ(a) diag(m(a)) diag(φ(a l ) φ(a r )) where m(a) = (m 0 (A),m 1 (A)), represents the Hadamard product, and φ : T R 2 is given as follows ρ(a) diag(φ(a l ) φ(a r )) m(a) if n 1 (A) + n 2 (A)) > 1 φ(a) = (1/µ(A),1/µ(A)) if n 1 (A) + n 2 (A)) = 1 (1,1) otherwise. Computing φ involves a bottom-up recursive information passing. Li Ma (Duke) Multi-resolution scanning June 23, / 36

24 Computing PMAPs For each A T, let ϕ(a) = (P(S(A) = 0 x),p(s(a) = 1 x)). Then ϕ(ω) = ( ρ 0,0 (Ω), ρ 0,1 (Ω)). Now suppose { ϕ(a) have been computed for all windows up to resolution k 0, then for any resolution k + 1 window A, ϕ(a) = ρ(a) ϕ(a p ) where A p is the parent of A in resolution k. This involves a top-down recursive information passing. In essence this is a forward-backward belief propagation algorithm. Li Ma (Duke) Multi-resolution scanning June 23, / 36

25 Marginal posterior consistency Theorem Suppose we observe two independent i.i.d. samples x 1 = (x 1,1,...,x 1,n1 ) and x 2 = (x 2,1,...,x 2,n2 ) respectively from two distributions P 1 and P 2 supported on the entire sample space. Under weak conditions on the prior parameters, as n 1, n 2, and n 1 /(n 1 + n 2 ) ζ (0,1), for any window A, { P(S(A) = 1 x) p 1 if P 1 (A l A) P 2 (A l A) 0 if P 1 (A l A) = P 2 (A l A). Li Ma (Duke) Multi-resolution scanning June 23, / 36

26 Joint posterior consistency Theorem Suppose we observe two independent i.i.d. samples x 1 = (x 1,1,...,x 1,n1 ) and x 2 = (x 2,1,...,x 2,n2 ) respectively from two distributions P 1 and P 2 supported on the entire sample space. Under weak conditions on the prior parameters, as n 1, n 2, and n 1 /(n 1 + n 2 ) ζ (0,1), P ( S(A) = 1{P 1 (A l A) P 2 (A l A)} for all A up to resolution K x ) p 1. Li Ma (Duke) Multi-resolution scanning June 23, / 36

27 Example: Testing performance Local Shift Local Dispersion Global Shift Global Dispersion density density density density sensitivity sensitivity sensitivity sensitivity M MRS KNN Cramer co OPT PT CH specificity specificity specificity specificity Li Ma (Duke) Multi-resolution scanning June 23, / 36

28 Example: Visualizing the empirical evidence Alternative Prob. Effect Size level level Left: the PMAPs. Right: the posterior expected effect size. Effect size measured by absolute log odds eff(a) = log θ 1(A)/(1 θ 1 (A)) θ 2 (A)/(1 θ 2 (A)). Note the clustering of differential structures! Li Ma (Duke) Multi-resolution scanning June 23, / 36

29 Reporting significant windows We take a fully decision theoretic approach. Candidate loss functions (Müller et al 2007). L(δ,c) = c A 1 {Z(A)=0} δ(a) + (1 c) A 1 {Z(A)=1} (1 δ(a)) Optimal rule: δ(a) = 1 {P(S(A)=1 x)>c}. L(δ,c) = A eff(a)δ(a) + A eff(a)(1 δ(a)) + 2c A δ(a) Optimal rule: δ(a) = 1 {E(eff(A) x)>c}. Li Ma (Duke) Multi-resolution scanning June 23, / 36

30 Multiple testing control One can control the posterior expected number of false rejections (NFR) NFR(c) = E(# of falsely rejected H 0 (A) s x) = P(S(A) = 0 x), which is computable using the PMAPs. A:δ(A)=1 Or control the posterior expected false discovery rate (FDR) FDR(c) = E(# of falsely rejected H 0(A) s x) # of rejected H 0 (A) s = NFR(c) {A : δ(a) = 1}. Li Ma (Duke) Multi-resolution scanning June 23, / 36

31 Two extensions through hierarchical modeling Optional pruning Bayesian model averaging on the maximum resolution for scanning. Achieved through another layer of hyperprior on {S(A) : A T }. {S(A) : A T } ρ, η pmt(ρ, η). Li Ma (Duke) Multi-resolution scanning June 23, / 36

32 Two extensions through hierarchical modeling Adaptive partitioning inferring a good windowing scheme in multivariate problems. Achieved through a hyper-prior on the windowing tree. T λ RP(λ) {S(A) : A T } ρ, η,t pmt(ρ, η,t ) θ 1 (A),θ 2 (A) S(A), α,t paired-beta(s(a), α(a)) Full posterior available analytically. PMAPs computable through forward-backward recursion. Reporting and visualizing the significant windows on the MAP tree (Empirical Bayes). The MAP tree is computable through a forward-backward-forward algorithm. Tˆ Li Ma (Duke) Multi-resolution scanning June 23, / 36

33 Back to the flow cytometry data SSC A FSC A SSC A FSC H FSC A FSC H Aqua FSC A Aqua Dext CD8 Dext One sample is transfected with a small number of cells that are high in Dext and CD8. Li Ma (Duke) Multi-resolution scanning June 23, / 36

34 Visualizing the difference on the MAP windowing scheme Effect Size For such big data: adopt a loss that takes into account the effect size. Optimal decision rule: δ(a) = 1 {E(eff(A) x)>c} with 1% expected FDR. Li Ma (Duke) Multi-resolution scanning June 23, / 36

35 Example: Identifying differential cells in flow cytometry SSC A FSC H Aqua Dext FSC A FSC A FSC A CD8 Identified differential region involves 0.004% and 0.213% of the two cell populations, indeed high in Dext and CD8. Li Ma (Duke) Multi-resolution scanning June 23, / 36

36 Summary A hierarchical model formulation for MRS. Utilizing graphical modeling to borrow strength across windows thereby enhance the ability to identify local structures. Fully principled decision theoretic approach to hypothesis testing and multiplicity adjustment. Applicable to k-sample setting and other distributional families with corresponding multi-resolution decomposition. R package MRS to become available. Thank you! Li Ma (Duke) Multi-resolution scanning June 23, / 36

Bayesian Methods. David Rosenberg. April 11, New York University. David Rosenberg (New York University) DS-GA 1003 April 11, / 19

Bayesian Methods. David Rosenberg. April 11, New York University. David Rosenberg (New York University) DS-GA 1003 April 11, / 19 Bayesian Methods David Rosenberg New York University April 11, 2017 David Rosenberg (New York University) DS-GA 1003 April 11, 2017 1 / 19 Classical Statistics Classical Statistics David Rosenberg (New