Using Procrustes Analysis and Principal Component Analysis to Detect Schizophrenic Brains. Anthony Dotterer

Size: px

Start display at page:

Download "Using Procrustes Analysis and Principal Component Analysis to Detect Schizophrenic Brains. Anthony Dotterer"

Jessica Quinn
6 years ago
Views:

1 Using Procrustes Analysis and Principal Component Analysis to Detect Schizophrenic Brains Anthony Dotterer

Introduction Bookstein has studied the difference between normal and schizophrenic brains by doing shape analysis on landmark points described in [1].

2 Introduction Bookstein has studied the difference between normal and schizophrenic brains by doing shape analysis on landmark points described in [1]. This paper attempts to use Procrustes Analysis on his dataset of 14 normal and 14 schizophrenic brains to create their mean shapes. With these means, Principle Component Analysis is applied to find the distribution of possible shapes for normal brains and schizophrenic brains. Setup As described in the introduction this paper uses 14 normal and 14 schizophrenic brain shapes from Bookstein's experiments. Each of the shapes contain 13 different points that correspond between each other shape. The following figures show the 14 normal brain shapes and then the 14 schizophrenic brain shapes. Illustration 1: Normal Brain Shapes

3 Illustration 2: Schizophrenic Brain Shapes General Procrustes Analysis The first step to applying Principle Component Analysis (PCA) to a step of shapes is finding the shapes' mean shape. The job of find a mean shape form a set of shapes is best done with General Procrustes Analysis. Lets define a set of N shapes, S, such that each shape contains a set of n points. In our case N will equal 14 and n will equal 13. S {S 1, S 2,... S N } S i {{x 1, y 1 },{x 2, y 2 },...{ x n, y n }} S i, j {x j, y j } Now we normalize all of the shapes such that their center of mass is at the origin and their square root of the sum of squares is equal to 1.

4 S i ' =S i j S i, j n S i = j S i ' x 2 2 i, j y i, j Next we assume the first shape is the mean shape and we begin to iterate until the previous mean shape is within some sum of square distances of the newest mean shape. As we iterate, each shape is rotationally normalized to the mean shape. V DV T =SVD S it S S it S T W DW T =SVD S it S T S it S R=V W T S i =S i R In the next step of the iteration, the new mean shape is calculated by normalizing the scale and translation of the shape created by taking the mean of each point location from the rotation normalized shapes. S new = i N S i, j S new =norm S new Once the proper convergence has been met, the loop will stop iterating. The following figures are example of the mean shapes of the normal brain and the schizophrenic brain with the convergence of 0.1.

5 Illustration 3: Mean Normal Brain Shape Illustration 4: Mean Schizophrenic Brain Shape

6 The described General Procrustes Analysis is implemented in Matlab code using the normalizing function in Appendix A and Procrustes function in Appendix B. Principle Component Analysis PCA will create two sets, Φ and b. Φ will contain the largest eigenvectors of the covariance matrix of the set of shapes compared with their mean shape. b will contain the parameters that when used with Φ and the mean shape will recreated a particular shape. In order to begin PCA, all of the shapes in a set must be normalized to the mean shape. The following figures shows the normalized shapes. Notice the strange alignment in the schizophrenic figures. Illustration 5: Normalized Normal Brain Shapes

7 Illustration 6: Normalized Schizophrenic Brain Shapes The first step is create to the covariance matrix of the set of shapes versus their mean shape. S i S S i S T i = N Next the largest eigenvectors are extracted from the covariance matrix into Φ. With Φ and the mean shape, the parameter set can be created. b i = T S i S The following figure shows the set b in two dimensions where green represents normal brains and blue represents schizophrenic brains and their means denoted by the red. Notice how the strangle alignment in the schizophrenic brain shapes cause a small group of outliers in its parameter set.

8 Illustration 7: Parameter Sets of Normal (green) and Schizophrenic (blue) Brain Shapes and their means (red) This PCA method was implemented in Matlab code as a function is Appendix C. Matching Brain Shapes Once Φ and b of the sets of brains have been calculated, each shape can try to be matched to either the normal or schizophrenic brain sets. The first step in matching a brain shape is finding that shape's normalized shape with respect each set's mean shape. With the new normalized shapes, their parameter sets, b N and b S, can be calculated using each shape set's Φ. Finally the shape's parameter set who's Euclidean distance to each shape sets' mean parameter set is closest becomes matched to that brain type. This method requires Φ to be calculated for only two eigenvectors. The final matching results are not at all satisfactory. The following table describes how each shape and the type it matched. Normal Shapes Normal Schizophrenic Schizophrenic Shapes Normal Schizophrenic 1 TRUE FALSE 1 TRUE FALSE 2 TRUE FALSE 2 FALSE TRUE 3 TRUE FALSE 3 TRUE FALSE 4 TRUE FALSE 4 TRUE FALSE 5 TRUE FALSE 5 TRUE FALSE

9 Normal Shapes Normal Schizophrenic Schizophrenic Shapes Normal Schizophrenic 6 TRUE FALSE 6 TRUE FALSE 7 FALSE TRUE 7 TRUE FALSE 8 TRUE FALSE 8 TRUE FALSE 9 FALSE TRUE 9 TRUE FALSE 10 FALSE TRUE 10 TRUE FALSE 11 TRUE FALSE 11 TRUE FALSE 12 TRUE FALSE 12 TRUE FALSE 13 FALSE TRUE 13 FALSE TRUE 14 TRUE FALSE 14 TRUE FALSE The matching algorithm was implemented in Matlab code using a driver in Appendix E and a rotational normalizing function in Appendix D. Discussion This paper uses General Procrustes Analysis and Principle Component Analysis to find a parameter set that is used to classify new shapes. Unfortunately, this approach employed only two of the largest eigenvectors in Φ and the Euclidean distance to label shapes. A better approach could possibly use more eigenvectors in Φ with a multi-dimensional Gaussian probability distribution using the mean of the parameter set and the covariance matrix of all the parameters. In fact, my original approach did use such a method. Unfortunately, my implementation would complain about eigenvectors that were too small and the probability distribution's output would not be between 0 and 1 but in the order of in size. Therefore, my approached was simplified to the described method. References [1] The Shape of Madness. A mathematician figures out how to predict who will become schizophrenic. D MACKENZIE - DISCOVER-NEW YORK-, DISCOVER-PALM COAST DATA Appendix A: normalize.m Normalize the passed shape in the following was ways: Translation normalization: center of mass at origin Scale normalization: divide by square root of sum of squares of center of mass input:

10 - shapes the shapes before normalization output: - normshapes the translation and scale normalized shapes authored by: Anthony Dotterer function normshapes = normalize(shapes) get the dimensions of the shapes [ncoords, npoints, nshapes] = size(shapes); normalize the translation means = mean(shapes, 2); for i = 1:nshapes normshapes(1, :, i) = shapes(1, :, i) - means(1, i); normshapes(2, :, i) = shapes(2, :, i) - means(2, i); normalize the scale SqrtSSQ = sqrt(sum(normshapes(1, :, :).^2 + normshapes(2, :, :).^2, 2)); for i = 1:nshapes normshapes(:, :, i) = normshapes(:, :, i) / SqrtSSQ(1, 1, i); return; Appendix B: procrustes.m Preforms procrustes analyzes on the passed shapes input - shapes Shapes to align in a mean shape - convergence The difference between iterations to be done output - meanshape The mean shape for the passed shapes - normalshapes The shape set normalized to the mean shape authored by Anthony Dotterer function [meanshape, normalshapes] = procrustes(shapes, convergence) normalize the shapes by tranlation and scale normalshapes = normalize(shapes); iterate through shapes to find mean shape i = 0; meanshapes(:, :, 1) = normalshapes(:, :, 1); SSD = [ convergence+1 ]; while (i == 0 SSD(i) > convergence) i = i + 1; rotation normalize shapes to mean shape

11 for j = 1:size(normalShapes, 3) calculate the rotation tmp = normalshapes(:, :, j)' * meanshapes(:, :, i); [V, D, Vt] = svd(tmp * tmp'); [W, d, Wt] = svd(tmp' * tmp); rotation = V * W'; apply the rotation to the shape normalshapes(:, :, j) = normalshapes(:, :, j) * rotation; find the new mean shape tmpmeanshape = mean(normalshapes, 3); normalize scale and translation meanshapes(:, :, i+1) = normalize(tmpmeanshape); get this iterations sum of if (i ~= 1) SSD(i) = sum(sum((meanshapes(:, :, i-1) - meanshapes(:, :, i)).^2)); meanshape = meanshapes(:, :, i+1); return; Appendix C: PCA.m Finds the PCA distribution matrix and eigenvectors vector input - shapes The shapes to do PCA - meanshapes The mean shape of the passed shapes - eigens The number of eigenvectors to add to phi output - b The PCA distribution matrix - phi The eigenvectors vector of the covariance shape matrix authored by: Anthony Dotterer function [b, phi] = PCA(shapes, meanshape, eigens) get the covariance matrix tmp = zeros(size(shapes, 2)*2); for i = 1:size(shapes, 3) tmp2 = [shapes(1, :, i), shapes(2, :, i)]' - [meanshape(1, :), meanshape(2, :)]'; tmp(:, :) = tmp(:, :) + (tmp2 * tmp2'); covar = tmp / size(shapes, 3);

12 get the eigen vectors and values of the covariance matrix [phi, eigvalues] = eigs(covar, eye(size(covar)), eigens); calculate the PCA distrubitions for each shape b = []; for i = 1:size(shapes, 3) b(:, i) = phi' * ([shapes(1, :, i), shapes(2, :, i)]' - [meanshape(1, :), meanshape(1, :)]'); return; Appendix D: normalizewithmean.m Normalize the passed shape in the following was ways: Translation normalization: center of mass at origin Scale normalization: divide by square root of sum of squares of center of mass Rotation normalization: rotates with respect to the passed mean shape input: - shapes the shapes before normalization - meanshape the normalized mean shape output: - normshapes the translation, scale, and rotation normalized shapes authored by: Anthony Dotterer function normshapes = normalizewithmean(shapes, meanshape) get the dimensions of the shapes [ncoords, npoints, nshapes] = size(shapes); normalize the translation means = mean(shapes, 2); for i = 1:nshapes normshapes(1, :, i) = shapes(1, :, i) - means(1, i); normshapes(2, :, i) = shapes(2, :, i) - means(2, i); normalize the scale SqrtSSQ = sqrt(sum(normshapes(1, :, :).^2 + normshapes(2, :, :).^2, 2)); for i = 1:nshapes normshapes(:, :, i) = normshapes(:, :, i) / SqrtSSQ(1, 1, i); normalize rotation for i = 1:size(shapes, 3) calculate the rotation tmp = shapes(:, :, i)' * meanshape; [V, D, Vt] = svd(tmp * tmp'); [W, d, Wt] = svd(tmp' * tmp);

13 rotation = V * W'; apply the rotation to the shape shapes(:, :, i) = shapes(:, :, i) * rotation; return; Appendix E: driver.m test shell file for reading Bookstein small schizoprenia dataset for testing algorithms on Procrustes Analysis Bob Collins, Penn State, CSE586 Midterm, Feb 2006 load 'schizo.txt' note, landmark point data will be in a 3D matrix first index: 2 coordinates, x and y second index: 13 points per person third index: 28 people. first 14 are "normal", second 14 are schizophrenic see comments in "schizo.txt" for more info on the landmarks schizo = reshape(schizo',[ ]); plot the normal shapes and the schizo shapes in preshape showshapes(schizo(:, :, 1:14), 1, [-1, 1, -1, 1]); showshapes(schizo(:, :, 15:28), 2, [-1, 1, -1, 1]); separate the normal from the schizo normalshapes = schizo(:, :, 1:14); schizoshapes = schizo(:, :, 15:28); do procrustes on normal and schizo brains [normalmean, normalizenshapes] = procrustes(normalshapes, 0.1); [schizomean, normalizesshapes] = procrustes(schizoshapes, 0.1); show the normalized shapes showshapes(normalizenshapes, 3, [-1, 1, -1, 1]); showshapes(normalizesshapes, 4, [-1, 1, -1, 1]); show the means showshapes(normalmean, 5, [-1, 1, -1, 1]); showshapes(schizomean, 6, [-1, 1, -1, 1]); calculate the PCA of the shapes [bnormal, phinormal] = PCA(normalizeNShapes, normalmean, 2); [bschizo, phischizo] = PCA(normalizeSShapes, schizomean, 2); find the mean of the normal and the schizo bnormalmean = mean(bnormal, 2); bschizomean = mean(bschizo, 2); show the parameter sets figure(7); hold on; axis([ 0, 3, -.5, 1.5 ]); plot(bnormal(1, :), bnormal(2,:), 'g*'); plot(bschizo(1, :), bschizo(2, :), 'b+');

14 plot(bnormalmean(1), bnormalmean(2), 'r*'); plot(bschizomean(1), bschizomean(2), 'r+'); hold off; try to score each of the normal brains versus the schizo brains normalmatches = []; schizomatches = []; for i = 1:size(schizo, 3) normalize this shape normnshape = normalizewithmean(schizo(:, :, i), normalmean); normsshape = normalizewithmean(schizo(:, :, i), schizomean); calculate the normal and schizo b distrubitions normalbparams = phinormal' * ( [normnshape(1, :), normnshape(2, :)]' - [normalmean(1, :), normalmean(2, :)]' ); schizobparams = phischizo' * ( [normsshape(1, :), normsshape(2, :)]' - [schizomean(1, :), schizomean(2, :)]' ); find the distances of the parameter sets for each normal and schizo shape sets normaldistance = sqrt(sum((normalbparams - bnormalmean).^2)); schizodistance = sqrt(sum((schizobparams - bschizomean).^2)); fill the match array with this shape's matching parameters if (normaldistance < schizodistance) normalmatches(size(normalmatches, 2)+1) = i; else schizomatches(size(schizomatches, 2)+1) = i; normalmatches schizomatches

Clustering and Visualisation of Data

Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some