Classification of Protein Crystallization Imagery Xiaoqing Zhu, Shaohua Sun, Samuel Cheng Stanford University Marshall Bern Palo Alto Research Center September 2004, EMBC 04
Outline Background X-ray crystallography High-throughput automatic crystallization Related work Proposed approach Level-set boundary detection Texture-based feature extraction SVM and decision-tree classifier Experimental results Conclusion Crystallization Imagery, EMBC 2004 2
Background X-ray crystallography For determining 3-D structures of proteins molecules Protein crystallization process highly sensitive to physico-chemical conditions High-throughput approach Robotic setup of protein crystallization trials (1000s trials per day) Periodically recorded crystallization results via digital photography High volume of data calls for automatic classification Long-term goal Automatic data inference for crystallization Prediction of crystallization results based on past experience Recommendation of subsequent experimental conditions Crystallization Imagery, EMBC 2004 3
Example Protein Crystallization Results Crystal-negative Crystal-positive Crystallization Imagery, EMBC 2004 4
Example Protein Crystallization Results Crystal-negative Crystal-positive Crystallization Imagery, EMBC 2004 5
Related Work Edge detection via Hough transform [Zuk and Ward, 1991] Spectral features from FFT [Jusica et al., 2001] Sobel edge detector with reported accuracy ~ 75% [Wilson et al.2002] Canny edge detector; lease-square circle fitting; texture and geometric features; self-organizing neural network; with reported accuracy ~ 75% [Spraggon et al. 2002] Conic curve fitting for boundary detection; spectral analysis for classification; with reported error rate ~ 15% [Cumbaa et al. 2002] Line tracking for boundary detection; manually-tuned decision tree for classification; reported error rate ~ 12% and 14% [Bern et al. 2003] Crystallization Imagery, EMBC 2004 6
Proposed Approach Drop boundary detection: The dynamic programming line-tracking algorithm The level-set method Feature exaction: Local geometric features based on gradient information and Hough transform Global texture features from gray-level co-occurrence matrix (GLCM) Combination of both Classification algorithm Automatic decision tree with winnowing (C5.0) Support vector machine (SVM-Light) Crystallization Imagery, EMBC 2004 7
System Diagram Input Images Segmented Images Feature Vectors Classification Results Image 1: [f 1, f 2, f N ] Image 2: [f 1, f 2, f N ] Image 1 = positive Image 2 = negative Drop Boundary Detection Feature Extraction Classification Automatic Feature Selection Crystallization Imagery, EMBC 2004 8
Illustration of the Level-set method The level-set method describes the 2-D boundary by constructing a 3-D surface, called the front, the zero-level set of which coincides with the 2-D boundary. The algorithm follows the changes of the 2-D boundary by tracking the simpler motion of the front in 3-D space. Courtesy of J.A. Sethian at Dept. of Mathematics, UC-Berkeley http://math.berkeley.edu/~sethian/explanations/level_set_explain.html Crystallization Imagery, EMBC 2004 9
Texture Feature Extraction Gray level co-occurrence matrix (GLCM) Joint histogram of gray levels of a pair of pixels with a given spatial relationship Captures the statistics of the gray level spatial variation Score functions from the GLCM Contrast, correlation, entropy, etc Mean and variance over several GLCMs for orientation-invariant features [R. Haralick, 1973] Only pixels within the drop boundary are considered Crystallization Imagery, EMBC 2004 10
Support Vector Machine (SVM) Classifier Non-linear mapping of the input feature vectors into a highdimensional space Construction of an optimal separating plane for dividing the transformed vectors Implementation: SVM-Light at http://svmlight.joachims.org/ Crystallization Imagery, EMBC 2004 11
Decision-Tree Classifier Automatic calculation of the decision thresholds from training samples Boosting: final classification decision based on voting from several classifiers Winnowing: selection of more important features based on statistics of the training feature vectors Implementation: the C5.0 data mining tool at http://www.rulequest.com/see5-info.html Crystallization Imagery, EMBC 2004 12
Experimental Setup Dataset: 520 manually annotated images from Joint Center for Structural Genomics (JCSG) Boundary detection: Comparison of the new level-set method with the previous line-tracking algorithm Classification: Binary classification: crystal-positive/crystal-negative 10-fold cross-validation for classification Investigation of texture and geometric features Comparison of SVM-Light vs. the C5.0 classifier Feature selection via C5.0 winnowing Crystallization Imagery, EMBC 2004 13
Example Drop detection Results From the line-tracking algorithm From the level-set method The Level-set method reduces the error rate of drop detection by 5-10% Crystallization Imagery, EMBC 2004 14
Classification Results: Error Rates 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% F.N. 14.6% F.P. 9.6% 10.00% 5.00% 0.00% SVM-Light C5.0 C 5.0 with boosting Texture, False-Negative Geometric, False-Negative Combined, False-Negative Texture, False-Positive Geometric, False-Positive Combined, False-Positive Crystallization Imagery, EMBC 2004 15
Selected Features Geometric features: F2 - Overall directional gradient F4 - Normalized Hough Transform F5 - Curve ratio F7 - Gray standard deviation Texture features: F2 - Contrast F3 - Correlation F5 - Sum Average F9 - Entropy Crystallization Imagery, EMBC 2004 16
Conclusions Investigated several modern mathematical tools: Level set method for drop boundary detection Texture features based on GLCM The SVM and boosting in the C5.0 classifier Automatic feature selection Best classification results: C5.0 with boosting and both features False positive rate: 9.6%; False negative rate:14.6% Crystallization Imagery, EMBC 2004 17
Thank You