Dimensionality Reduction of fmri Data using PCA Elio Andia ECE-699 Learning from Data George Mason University Fairfax VA, USA INTRODUCTION The aim of this project was to implement PCA as a dimensionality reduction method to fmri data in order to improve a classifiers results. The data was collected by Marcel Just and colleagues at the Center for Cognitive Brain Imaging at Carnegie Mellon University. Just and colleagues experiment consisted of testing human subjects with a sentence picture comparison, the brain activation was measured using fmri (Functional Magnetic Resonance Imaging). The data set consists of fmri images, each containing 5,000 voxels (3D representations where position is defined in relation to other voxels; not explicitly encoded like coordinates)[4]. An fmri image of a portion of the brain was taken every 500 ms. 25-30 Regions of Interest are highlighted in the data, regions of the brain where you would expect to see some brain activity [1]. There is code available that will provide specific functions for displaying and analyzing some of the data. The data is available in.mat files for each subject. Upon loading a file, three variables will be defined: Info, Data, and Meta.
Info is a 1x54 structure array that defines the data in terms of the time intervals. It contains information such as: whether the picture or sentence was displayed first, whether the sentence matches the picture, the subjects reaction time in determining whether the sentence matches the picture, and other information defining the experiment in terms of time. Data contains the raw observations, fmri images collected in sequence over a period of time. Each element in the 54x1 cell array contains an NxV array of fmri activations. The Meta variable has the relevant data on the data set. It provides things like the human subject identifier, number of images in the data set, number of trails, and definitions for the regions of interest. Below is a picture of one of the slices for the brain activation for one the subjects at at specific time and trail (subject 04847, trail 5, at time 5)
Below is a plot of the brain activation for one the subjects at a specific time and trail (subject 04847, trail 5, at time 5) it is a display of all slices taken from the fmri machine. The condition value indicates one of the four states for a segment. Zero being that the data should be ignored. One indicating a rest interval is occurring. Two indicates the sentence picture combination does match and three indicates that a non-matching pair of sentence and picture has been displayed to the subject.
Below is a graph of the voxel activity across time for a specific voxel (at location 36,62,8). The red line indicates the condition for that segment as described in the previous paragraph. By transforming the data to include on trails 3 and 4, below is the plot for trails 3 and 4 of the voxel (36,62,8). The graph displays the third trail as indicated by the condition 2 (x= 0 to 54) a sentence-picture combination that match followed by the fourth trail where the picture-sentence combination do not match (condition = 3, x= 55-108).
There were various ways to manipulate the data, such as only keeping the n-most active voxels for the data set. Below you can see two plots, the first one being the snapshot containing the 2200 most active voxels and to the right of that is snapshot containing the 400 most active voxels. A Naive Bayes classifier was provided with the code. After applying the Naive Bayes classifier, the returned array Predictions is where predictions(k,j) = log P(example_k class_j). The average data set per subject contained about 5000 voxels. The data set dimension was reduced to the 2200 and the 400 most active voxels. In the future I would like to perform the PCA on multiple subjects and test the classifier results across all of them. The main issue with setting that up is the variation in the dimensions of the subjects brain and the fmri images that result[3].
REFERENCES [1] "Learning to Decode Cognitive States from Brain Images,"T.M. Mitchell, R. Hutchinson, R.S. Niculescu, F.Pereira, X. Wang, M. Just, and S. Newman, Machine Learning, Vol. 57, Issue 1-2, pp. 145-175. October 2004. [2] "Training fmri Classifiers to Detect Cognitive States across Multiple Human Subjects," X. Wang, R. Hutchinson, and T. M. Mitchell, Neural Information Processing Systems 2003. December 2003. [3] Archibald, Christopher, Evan Millar, Classifying FMRI Images Using Sparse Coding: A Project for CS229 (n.d.): n. pag. Web [4] Foley, James D.; Andries van Dam; John F. Hughes; Steven K. Feiner (1990). "Spatial-partitioning representations; Surface detail". Computer Graphics: Principles and Practice. The Systems Programming Series. Addison-Wesley. ISBN 0-201-12110-7. These cells are often called voxels (volume elements), in analogy to pixels. [5] "Learning to Identify Overlapping and Hidden Cognitive Processes from fmri Data,"R. Hutchinson, T.M. Mitchell, I. Rustandi, submitted to HBM 2005.