c SL&DM Hastie & Tibshirani Marc h 26, 2002 Sup ervised Learning: 29 ' $ Khan data BL EWS NB RMS & %

Size: px

Start display at page:

Download "c SL&DM Hastie & Tibshirani Marc h 26, 2002 Sup ervised Learning: 29 ' $ Khan data BL EWS NB RMS & %"

Rafe Simpson
6 years ago
Views:

1 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 28 Classification of microarray samples Example: small round blue cell tumors; Khan et al, Nature Medicine, 2001 ffl Tumors classified as BL (Burkitt lymphoma), EWS (Ewing), NB (neuroblastoma) and RMS (rhabdomyosarcoma). ffl There are 63 training samples and 25 test samples, although five of the latter were not SRBCTs genes ffl Khan et al report zero training and test errors, using a complex neural network model. Decided that 96 genes were important". ffl Upon close examination, network is linear. It's essentially extracting linear principal components, and classifying in their subspace. ffl But even principal components is unnecessarily complicated for this problem!

2 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 29 Khan data BL EWS NB RMS

3 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 31 Class centroids BL EWS NB RMS Test sample Gene Gene Gene Gene Gene Average Expression Average Expression Average Expression Average Expression Average expression

4 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 32 Nearest Shrunken Centroids Idea: shrink each class centroid towards the overall centroid. First normalize by the within-class standard deviation for each gene. Details ffl Let x ij be the expression for genes i = 1; 2;:::p and samples j = 1; 2;:::n. ffl We have classes 1; 2;:::K, and let C k be indices of the n k samples in class k. ffl The ith component of the centroid for class k is μx ik = Pj2Ck x ij=n k, the mean expression value in class k for gene i; the ith component of the overall centroid is μx i = Pn j=1 x ij=n.

5 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 33 ffl Let d ik = (μx ik μx i )=s i where s i is the pooled within-class standard deviation for gene i: s 2 i = 1 n K X k X i2ck (x ij μx ik ) 2 : ffl Shrink each d ik towards zero, giving d 0 ik and new shrunken centroids or prototypes μx 0 ik = μx i + s i d 0 ik ffl The shrinkage is by soft-thresholding: (0,0) ffl Choose by cross-validation. d 0 ik = sign(d ik )(jd ik j ) +

6 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 34 K-Fold Cross-Validation Primary method for estimating a tuning parameter. Divide the data into K roughly equal parts Test Train Train Train Train ffl for each k = 1; 2;:::K, fit the model with parameter to the other K 1 parts, and compute its error in predicting the kth part. Average this error over the K parts to give the estimate CV ( ). ffl do this for many values of. Draw the curve CV ( ) and choose the value of that makes CV ( ) smallest. Typically we use K = 5 or 10.

7 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 35 Results Number of genes te te tr Error 0.4 cv te te te cv tr 0.2 te te tr cv te 0.0 te te cv cv cv te te te te te te te tr tr tr tr cv tr cv tr cv tr cv te te Amount of Shrinkage Delta

8 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 36 Advantages ffl Simple, includes nearest centroid classifier as a special case. ffl Thresholding denoises large effects, and sets small ones to zero, thereby selecting genes. ffl with more than two classes, method can select different genes, and different numbers of genes for each class.

9 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 37 The genes that matter BL EWS NB RMS

10 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 38 Estimated Class Probabilities Training Data 1.0 BL EWS NB RMS Probability Sample Test Data Probability O BL EWS NB RMS O O O O Sample

11 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 39 Class probabilities ffl For a test sample x Λ = (x Λ 1 ;xλ 2 ;:::xλ p). We define the discriminant score for class k ffi k (x Λ ) = px i=1 (x Λ i μx0 ik )2 s 2 i ffl The classification rule is then 2logß k C(x Λ ) = ` if ffi`(x Λ ) = min k ffi k (x Λ ) ffl estimates of the class probabilities, by analogy to Gaussian linear discriminant analysis, are ^p k (x Λ ) = P e 2 1 ffi k(x Λ ) K`=1 e 1 2 ffi`(xλ ) ffl Still very simple. In statistical parlance, this is a restricted version of a naive Bayes classifier (also called idiot's Bayes!)

12 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 40 Adaptive threshold scaling ffl idea: define class-dependent scaling factors k for each class: d ik = μx ik μx i m k k s i : (1) ffl Use smaller factors for hard-to-classify classes => same test error with fewer total number of genes ffl Adaptive procedure: start with all k = 1, and then reduce k by 10 for the class k with largest area under training error curve. ffl repeat 20 times and choose solution with smallest area under curve for all classes ffl can dramatically reduce total number of genes used, without increasing error rate

13 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 41 Lymphoma data Scaling factors changed from (1; 1; 1) to (1:9; 1; 1:5) Error Error Size te tetetete tetetetetetetetetetetetetetetetetetetetetetetetetete te tr tr tr tr tr tetetetetete tr tr tr tr tr tr tetetete tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr Amount of Shrinkage Size te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr Amount of Shrinkage

Supervised vs unsupervised clustering

Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful