Statistical data integration: Challenges and opportunities

Size: px
Start display at page:

Download "Statistical data integration: Challenges and opportunities"

Transcription

1 Statistical data integration: Challenges and opportunities Genevera I. Allen 1,2 1 Departments of Statistics, Computer Science, and Electrical and Computer Engineering, Rice University, TX, USA. 2 Jan and Dan Duncan Neurological Research Institute, Baylor College of Medicine, TX, USA. The authors Morris and Baladandayuthapani should be congratulated on this comprehensive and compelling review of statistical contributions in bioinformatics. In the last section, the authors discuss a burgeoning new area of research they term integromics, which involves integrating and jointly analyzing multiple types of omics data. While we prefer the term data integration, we agree that this is an exciting new area of statistical and bioinformatics research. In this piece, we aim to complement Morris and Baladandayuthapani by further discussing data integration from a methodological and practical standpoint. We will specifically outline some of the major challenges of data integration, some recent successes, and highlight open areas for future research. 1 Statistical data integration and multi-view data The term data integration has come to refer to many things in many different fields. The type encountered in bioinformatics is typically integration of what we call multi-view or multi-modal data. Suppose that there are multiple types of omics data profiled on the same set of subjects or samples. The multiple omics platforms offer multiple views of the data or multiple data modes. If data is organized as a typical data matrix where the rows correspond to observations and columns as features, then multi-view data yields a series of coupled data matrices, each with different features (different columns) but measured on the same observations (shared rows). In this sense, the task of integrating multi-view data can be thought of as the opposite of meta-analysis: in data integration, common observations are analyzed across different sets of features as opposed to meta-analysis where common features are analyzed across different sets of observations. Address for correspondence: Genevera I. Allen, Department of Statistics, 6100 Main St. MS 138, Houston, TX 77005, USA. gallen@rice.edu 2017 SAGE Publications / X

2 1.1 Why data integration? Statistical data integration: Challenges and opportunities 333 First, we pause to address a logical question: Why is data integration necessary and especially, why are new statistical methods needed for data integration? From a scientific perspective, and as Morris and Baladandayuthapani well motivate, integration of multi-view bioinformatics data is critically important for scientists to gain a holistic understanding of biological systems. Statistically, we are accustomed to analyzing multivariate data, but the problem of data integration, especially for omics data, brings new challenges. First, each single view of multi-view data is typically high-dimensional, meaning that the number of features in each view is larger than the number of observations. Analyzing high-dimensional data of one type is often a challenge, and thus jointly analyzing multi-view data where each data view is high-dimensional is doubly challenging; many high-dimensional statistical techniques cannot be straightforwardly applied to high-dimensional multi-view data. Second, many examples of data integration problems in bioinformatics consist of multi-view data of mixed types, meaning that each data view consists of variables from different domains (e.g., continuous, count-valued, categorical, skewed continuous, bounded, among others). For example in integrative genomics, genotype data is typically categorical, gene expression as measured via RNA-sequencing is count-valued or non-negative skewed continuous, DNA methylation data is bounded on the interval zero to one, and so forth. Thus, each data view consists of variables of a different type. While many techniques have been developed to model each of these individual data types separately, there are currently few methods that can jointly analyze high-dimensional mixed multi-view data. 2 Data challenges Before statistical modelling can occur, multi-view omics data can pose many data challenges that must first be addressed. As Morris and Baladandayuthapani notes, many of these challenges have not been the primary focus of statisticians, but their involvement is important. First, acquiring and preparing multi-view omics data for statistical data integration can be challenging. For example, with The Cancer Genome Research Atlas (TCGA) data available from the TCGA data portal (TCGA Research Network, 2011, 2017), different omics types are often represented and stored differently, each requiring specific domain expertise to understand and pre-process the data. Thus, acquiring, formatting and linking the joint subjects to yield coupled data matrices in a unified format conducive to statistical modelling is nontrivial. For TCGA data, we developed the TCGA2STAT R package (Wan et al., 2015) that automatically downloads and wrangles the TCGA data, yielding a series of coupled dataframes linked by subjects or genes that is ready for integrated statistical analyses. Related to this challenge, often statisticians or bioinformaticians specialize in analyzing a few, but not all types of omics data. This means that in order to pre-process and jointly analyze multi-view omics data, a collaborative team is often needed with members that have expertise in each data type. This further creates challenges for ensuring reproducible research as different team members often use different pipelines and

3 334 Genevera I. Allen platforms for processing different omics data. Care must be taken to ensure that all of these processing steps are fully documented and reproducible across platforms and across team members. Finally, problems such as batch effects and missing data which can be worrisome for individual datasets, can be further exacerbated when working with multi-view omics data. Batch effect detection and correction methods are designed for a single dataset, and typically with multi-view data, batch effects are removed independently for each data type (Leek et al., 2010). But, the statistical power to detect and remove batch effects could be substantially increased if all views of multi-view data are considered; new statistical techniques for this task are needed. Furthermore, such considerations can inform experimental design of multi-view studies: if batches of observations are different and randomized across data views (e.g., the group of samples comprising a batch in one data view should not be grouped together in a batch for another data view), then this can be exploited to improve detection and removal of batches from multi-view data. Next, missing data, or more appropriately, missing views from multi-view data can be a major problem. Consider TCGA ovarian cancer, for example, where there are n = 592 unique subjects which includes n = 210 patients with somatic mutation data, n = 296 with RNA-sequencing gene expression, n = 578 with mirna-array expression, n = 588 with array-cgh copy number variation and n = 572 with methylation array data; only n = 204 patients have complete data views across these omics types (Network et al., 2011). The pattern of missing data, then, is not at all random, but instead, entire data views are completely missing for many subjects. If one only uses the complete cases across all data views, then the statistical power is severely limited. But, on the other hand, it is ill-advised to simply impute an entire data view that is missing for a given subject. Thus, this is a wide open area of statistical research that is of great practical importance. 3 Statistical modelling challenges Integrating multi-view omics data poses many statistical modelling challenges and has become a ripe area for statistical research. If one s goal is to use multi-view data to predict an outcome for each of the observations or subjects, then there are several readily available approaches. First, some existing machine learning methods such as random forests or deep learning are adept at handling data of mixed types; these methods, however, are not ideally suited to high-dimensional data and hence, may not be the best for multi-view omics data. Second, one could build a completely independent predictive model for each data view and then use ensemble learning techniques to combine the predictions from each data view. And finally, one could use feature learning techniques such as feature selection, dimension reduction or pattern recognition techniques to learn features for each data view separately; then a joint predictive model can be fit to the learned features from all the data views. Thus for prediction with multi-view data, several possible approaches exist, although there is certainly room for further research and methods development in this area.

4 Statistical data integration: Challenges and opportunities 335 Beyond prediction, one s goal is typically to explore the data to make data-driven discoveries which generate new hypotheses from data. These techniques include exploratory data visualization, dimension reduction, pattern recognition, clustering, feature selection or network structure learning, among many others. With multi-view data and especially mixed multi-view data, existing techniques for data-driven discovery cannot be applied in a straightforward manner, and new statistical approaches need to be developed. Recently, there has been a flurry of data integration techniques proposed for dimension reduction based on canonical correlations analysis (Rossouw et al., 2008; Witten and Tibshirani, 2009), coupled matrix factorizations (Acar et al., 2011; Lock et al., 2013), multi-step principal components analysis (PCA) methods (Di et al., 2009), and the Generalized singular value decomposition (SVD) (Van Loan, 1976; Alter and Golub, 2004). These methods offer important advances for multi-view data, but they typically assume that all data views consist of the same types of variables (e.g., all continuous data). Hence, these are not ideally suited for mixed multi-view data. One could use these integrated dimension reduction methods in a latent variable or hierarchical model to capture different types of variables in each data view, but such models may not capture a full range of dependencies and can be computationally more demanding. Another area of recent success in data integration methodology include methods for integrative clustering (Shen et al., 2009; Lock and Dunson, 2013). As with the dimension reduction methods, these techniques typically use latent variable or hierarchical models to capture mixed types of data and hence may present some of the same caveats. An open area of research is to develop clustering or dimension reduction techniques that can more directly model mixed multi-view data. 3.1 Integration via mixed graphical models One example where new statistical methods have recently been developed that directly model mixed multi-view data is that of graphical models. Graphical models, when applied to bioinformatics data, typically assume that each gene, mirna, CpG site or other biomarker is a node in the network; graphical models then seek to model and estimate relationships between different biomarkers and represent these as a network where edges between two genes denote a form of dependence between the genes. Recently, Yang et al. (2012, 2015) proposed to build graphical models by assuming that every variable conditional on all others arises from a univariate exponential family distribution. This then leads to a joint graphical model distribution that is suitable for data from a variety of domains (e.g., Poisson or negative binomial graphical models for count-valued data such as from next generation sequencing) and that greatly extends the class of graphical models beyond the typical examples of Ising or Gaussian graphical models which are special cases. To yield graphical models that are appropriate for mixed multi-view data, Yang et al. (2014a) and Chen et al. (2015) proposed to build graphical models by assuming all conditional distributions arise from potentially different exponential families. While this idea is appealing, Yang et al. (2014a) and Chen et al. (2015) also show that the types of dependencies between variables of different types is severely limited, making

5 336 Genevera I. Allen this model impractical. In another line of work, Lauritzen and Wermuth (1989) and Lauritzen (1996) proposed chain graphical models consisting of a Gaussian graphical model conditional on a discrete (Ising) model; Lee and Hastie (2015) and Cheng et al. (2016) later considered structural graph estimation in the high-dimensional case. These instances, however, are only appropriate for integrating continuous and discrete-valued variables and thus, they are not suitable for count-valued data such as with next generation sequencing. Most recently, Yang et al. (2014b) proposed to combine the concept of chain graphical models and graphical models via exponential families to yield mixed chain graphical models. These models assume that groups of variables form a chain graph and that all relevant conditional distributions arise from potentially different exponential families. Interestingly, by conditioning on other groups of variables through chain graphs, Yang et al. (2014b) show that this class of models permits a wide and flexible range of dependencies between variables of different types. Furthermore, the chaining of groups of variables is a particularly relevant assumption for omics data where, for example, we know that mutations influence gene expression but gene expression does not influence mutations. Hence, we could assume that mutations point to gene expression variables in the mixed chain graphical model. As Figure 1 in Morris and Baladandayuthapani nicely illustrates, this chaining or directionality assumption is known from the underlying biology for integrative analyses of biomedical data; hence, mixed chain graphical models could be a particularly relevant tool for modelling mixed multi-view omics data. Related to these models, however, there is still much room for further research to yield a practical tool that can be applied to large-scale integrative analyses. Some examples of open areas include developing methods to better fit the models and learn the graph structure in high-dimensional settings, methods to test the model s parametric assumptions or even use semi-parametric approaches as in Yang et al. (2014c), and finally methods to assess the model fit or model uncertainty. Overall, mixed chain graphical models yield an exciting approach to directly integrating mixed multi-view data that could be used to make many discoveries about how biomarkers of different types are related. 4 Discussion In summary, mixed multi-view data found in bioinformatics offers a host of opportunities for new statistical research. The specific challenges that we have outlined with this data are likely to yield a whole new sub-field of high-dimensional statistics that will spur a flurry of research over the next decade. As new statistical techniques that allow scientists to explore their data holistically are developed, statisticians are poised to lead the way with data-driven scientific discoveries. Acknowledgements The author acknowledges support from NSF DMS and NSF DMS

6 Statistical data integration: Challenges and opportunities 337 References Acar E, Kolda TG and Dunlavy DM (2011) All-at-once optimization for coupled matrix and tensor factorizations. arxiv preprint arxiv: Alter O and Golub GH (2004) Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. Proceedings of the National Academy of Sciences of the United States of America, 101, Chen S, Witten DM and Shojaie A (2015) Selection and estimation for mixed graphical models. Biometrika, 102, 47. Cheng J, Li T, Levina E and Zhu J (2016) Highdimensional mixed graphical models. Journal of Computational and Graphical Statistics (to appear). Di C-Z, Crainiceanu CM, Caffo BS and Punjabi NM (2009) Multilevel functional principal component analysis. The Annals of Applied Statistics, 3, 458. Lauritzen SL (1996) Graphical Models, volume 17. New York: Clarendon Press. Lauritzen SL and Wermuth N (1989) Graphical models for associations between variables, some of which are qualitative and some quantitative. The Annals of Statistics, 17, Lee JD and Hastie TJ (2015) Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24, Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K and Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11, Lock EF and Dunson DB (2013) Bayesian consensus clustering. Bioinformatics. page btt425. Lock EF, Hoadley KA, Marron JS and Nobel AB (2013) Joint and individual variation explained (jive) for integrated analysis of multiple data types. The Annals of Applied Statistics, 7, 523. TCGA Research Network (2011) Integrated genomic analyses of ovarian carcinoma. Nature, 474, TCGA Research Network (2017) The cancer genome atlas. URL nih.gov/(last accessed 21 April 2017). Rossouw D, Robert-Granié C, Besse P et al. (2008) A sparse pls for variable selection when integrating omics data. Genetics and Molecular Biology, 7, 35. Shen R, Olshen AB and Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25, Van Loan CF (1976) Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis, 13, Wan Y-W, Allen GI and Liu Z (2015) Tcga2stat: Simple tcga data access for integrated statistical analysis in r. Bioinformatics. page btv677. Witten DM, Tibshirani RJ (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8, Yang E, Allen G, Liu Z and Ravikumar PK (2012) Graphical models via generalized linear models. Advances in Neural Information Processing Systems, 25, Yang E, Baker Y, Ravikumar P, Allen GI and Liu Z (2014a) Mixed graphical models via exponential families. In International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W & CP, 33, Yang E, Ravikumar P, Allen GI, Baker Y, Wan Y-W and Liu Z (2014b) A general framework for mixed graphical models. arxiv preprint arxiv: Yang E, Ravikumar P, Allen GI, and Liu Z (2015) Graphical models via univariate exponential family distributions. Journal of Machine Learning Research, 16, Yang Z, Ning Y and Liu H (2014c) On semiparametric exponential family graphical models. arxiv preprint arxiv:

Package XMRF. June 25, 2015

Package XMRF. June 25, 2015 Type Package Package XMRF June 25, 2015 Title Markov Random Fields for High-Throughput Genetics Data Version 1.0 Date 2015-06-12 Author Ying-Wooi Wan, Genevera I. Allen, Yulia Baker, Eunho Yang, Pradeep

More information

Package r.jive. R topics documented: April 12, Type Package

Package r.jive. R topics documented: April 12, Type Package Type Package Package r.jive April 12, 2017 Title Perform JIVE Decomposition for Multi-Source Data Version 2.1 Date 2017-04-11 Author Michael J. O'Connell and Eric F. Lock Maintainer Michael J. O'Connell

More information

XMRF: An R Package to Fit Markov Networks to High-Throughput Genomics Data

XMRF: An R Package to Fit Markov Networks to High-Throughput Genomics Data XMRF: An R Package to Fit Markov Networks to High-Throughput Genomics Data Ying-Wooi Wan, Genevera I. Allen, Yulia Baker, Eunho Yang, Pradeep Ravikumar, Zhandong Liu May 27, 2015 Contents 1 Introduction

More information

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine,

More information

Kernel Learning Framework for Cancer Subtype Analysis with Multi-omics Data Integration

Kernel Learning Framework for Cancer Subtype Analysis with Multi-omics Data Integration Kernel Learning Framework for Cancer Subtype Analysis with Multi-omics Data Integration William Bradbury Thomas Lau Shivaal Roy wbradbur@stanford thomklau@stanford shivaal@stanford December 12, 2015 Abstract

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Sparse & Functional Principal Components Analysis

Sparse & Functional Principal Components Analysis Sparse & Functional Principal Components Analysis Genevera I. Allen Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College

More information

Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification

Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification 1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore

More information

Data Mining Technologies for Bioinformatics Sequences

Data Mining Technologies for Bioinformatics Sequences Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment

More information

arxiv: v1 [cs.ai] 11 Oct 2015

arxiv: v1 [cs.ai] 11 Oct 2015 Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 ParallelPC: an R package for efficient constraint based causal exploration arxiv:1510.03042v1 [cs.ai] 11 Oct 2015 Thuc

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Missing Data Estimation in Microarrays Using Multi-Organism Approach Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

arxiv: v1 [stat.me] 6 Dec 2018

arxiv: v1 [stat.me] 6 Dec 2018 A parallel algorithm for penalized learning of the multivariate exponential family from data of mixed types Diederik S. Laman Trip 1, Wessel N. van Wieringen 2,3, 1 Department of Bionanoscience, Kavli

More information

High throughput Data Analysis 2. Cluster Analysis

High throughput Data Analysis 2. Cluster Analysis High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

An efficient algorithm for sparse PCA

An efficient algorithm for sparse PCA An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/32015 holds various files of this Leiden University dissertation. Author: Akker, Erik Ben van den Title: Computational biology in human aging : an omics

More information

On Demand Phenotype Ranking through Subspace Clustering

On Demand Phenotype Ranking through Subspace Clustering On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

Supervised classification of law area in the legal domain

Supervised classification of law area in the legal domain AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms

More information

Matlab project Independent component analysis

Matlab project Independent component analysis Matlab project Independent component analysis Michel Journée Dept. of Electrical Engineering and Computer Science University of Liège, Belgium m.journee@ulg.ac.be September 2008 What is Independent Component

More information

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011 GMDR User Manual GMDR software Beta 0.9 Updated March 2011 1 As an open source project, the source code of GMDR is published and made available to the public, enabling anyone to copy, modify and redistribute

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

Tensor Sparse PCA and Face Recognition: A Novel Approach

Tensor Sparse PCA and Face Recognition: A Novel Approach Tensor Sparse PCA and Face Recognition: A Novel Approach Loc Tran Laboratoire CHArt EA4004 EPHE-PSL University, France tran0398@umn.edu Linh Tran Ho Chi Minh University of Technology, Vietnam linhtran.ut@gmail.com

More information

Classification of High Dimensional Data By Two-way Mixture Models

Classification of High Dimensional Data By Two-way Mixture Models Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department The Pennsylvania State University 1 Outline Goals Two-way mixture model approach Background: mixture discriminant

More information

Applied Bioinformatics

Applied Bioinformatics Applied Bioinformatics Course Overview & Introduction to Linux Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu What is bioinformatics Bio Bioinformatics

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Opening Windows into the Black Box

Opening Windows into the Black Box Opening Windows into the Black Box Yu-Sung Su, Andrew Gelman, Jennifer Hill and Masanao Yajima Columbia University, Columbia University, New York University and University of California at Los Angels July

More information

Localized Data Fusion for Kernel k-means Clustering with Application to Cancer Biology

Localized Data Fusion for Kernel k-means Clustering with Application to Cancer Biology Localized Data Fusion for Kernel k-means Clustering with Application to Cancer Biology Mehmet Gönen gonen@ohsuedu Department of Biomedical Engineering Oregon Health & Science University Portland, OR 97239,

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department

More information

MULTIVARIATE ANALYSES WITH fmri DATA

MULTIVARIATE ANALYSES WITH fmri DATA MULTIVARIATE ANALYSES WITH fmri DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Motivation Modelling Concepts Learning

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization 10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala

More information

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Research Article Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares

Research Article Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares Mathematical Problems in Engineering, Article ID 162938, 5 pages http://dxdoiorg/101155/2013/162938 Research Article Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis

More information

SAS High-Performance Analytics Products

SAS High-Performance Analytics Products Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Enumerating the decomposable neighbours of a decomposable graph under a simple perturbation scheme

Enumerating the decomposable neighbours of a decomposable graph under a simple perturbation scheme Enumerating the decomposable neighbours of a decomposable graph under a simple perturbation scheme Alun Thomas Department of Biomedical Informatics University of Utah Peter J Green Department of Mathematics

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART Vadhana Jayathavaj Rangsit University, Thailand vadhana.j@rsu.ac.th Adisak

More information

Class Prediction Methods Applied to Microarray Data for Classification

Class Prediction Methods Applied to Microarray Data for Classification Class Prediction Methods Applied to Microarray Data for Classification Fatima.S. Shukir The Department of Statistic, Iraqi Commission for Planning and Follow up Directorate Computers and Informatics (ICCI),

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Contents. Introduction 2. PerturbationClustering 2. SubtypingOmicsData 5. References 8

Contents. Introduction 2. PerturbationClustering 2. SubtypingOmicsData 5. References 8 PINSPlus: Clustering Algorithm for Data Integration and Disease Subtyping Hung Nguyen, Sangam Shrestha, and Tin Nguyen Department of Computer Science and Engineering University of Nevada, Reno, NV 89557

More information

RADIOMICS: potential role in the clinics and challenges

RADIOMICS: potential role in the clinics and challenges 27 giugno 2018 Dipartimento di Fisica Università degli Studi di Milano RADIOMICS: potential role in the clinics and challenges Dr. Francesca Botta Medical Physicist Istituto Europeo di Oncologia (Milano)

More information

srna Detection Results

srna Detection Results srna Detection Results Summary: This tutorial explains how to work with the output obtained from the srna Detection module of Oasis. srna detection is the first analysis module of Oasis, and it examines

More information

Effectiveness of Sparse Features: An Application of Sparse PCA

Effectiveness of Sparse Features: An Application of Sparse PCA 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

ROTS: Reproducibility Optimized Test Statistic

ROTS: Reproducibility Optimized Test Statistic ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing

More information

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Introduction to GE Microarray data analysis Practical Course MolBio 2012 Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Gradient of the lower bound

Gradient of the lower bound Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level

More information

IJCAI Dept. of Information Engineering

IJCAI Dept. of Information Engineering IJCAI 2007 Wei Liu,Xiaoou Tang, and JianzhuangLiu Dept. of Information Engineering TheChinese University of Hong Kong Outline What is sketch-based facial photo hallucination Related Works Our Approach

More information

A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models

A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models Emanuele Ruffaldi Lorenzo Peppoloni Alessandro Filippeschi Carlo Alberto Avizzano 2014 IEEE International

More information

TieDIE Tutorial. Version 1.0. Evan Paull

TieDIE Tutorial. Version 1.0. Evan Paull TieDIE Tutorial Version 1.0 Evan Paull June 9, 2013 Contents A Signaling Pathway Example 2 Introduction............................................ 2 TieDIE Input Format......................................

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Relative Constraints as Features

Relative Constraints as Features Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

PROMO 2017a - Tutorial

PROMO 2017a - Tutorial PROMO 2017a - Tutorial Introduction... 2 Installing PROMO... 2 Step 1 - Importing data... 2 Step 2 - Preprocessing... 6 Step 3 Data Exploration... 9 Step 4 Clustering... 13 Step 5 Analysis of sample clusters...

More information

The Anatomical Equivalence Class Formulation and its Application to Shape-based Computational Neuroanatomy

The Anatomical Equivalence Class Formulation and its Application to Shape-based Computational Neuroanatomy The Anatomical Equivalence Class Formulation and its Application to Shape-based Computational Neuroanatomy Sokratis K. Makrogiannis, PhD From post-doctoral research at SBIA lab, Department of Radiology,

More information

A Survey of Statistical Models to Infer Consensus 3D Chromosomal Structure from Hi-C data

A Survey of Statistical Models to Infer Consensus 3D Chromosomal Structure from Hi-C data A Survey of Statistical Models to Infer Consensus 3D Chromosomal Structure from Hi-C data MEDHA UPPALA, University of California Los Angeles The spatial organization of the genomic material leads to interactions

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR Numerical Computation, Statistical analysis and Visualization Using MATLAB and Tools Authors: Jamuna Konda, Jyothi Bonthu, Harpitha Joginipally Infotech Enterprises Ltd, Hyderabad, India August 8, 2013

More information

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani

More information

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel

More information

Microarray Data Analysis with PCA in a DBMS

Microarray Data Analysis with PCA in a DBMS Microarray Data Analysis with PCA in a DBMS Waree Rinsurongkawong University of Houston Department of Computer Science Houston, TX 7724, USA Carlos Ordonez University of Houston Department of Computer

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Applied Bioinformatics

Applied Bioinformatics Applied Bioinformatics Course Overview & Introduction to Linux Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu What is bioinformatics Bio Bioinformatics

More information

Practical Guidance for Machine Learning Applications

Practical Guidance for Machine Learning Applications Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

A New Meta-heuristic Bat Inspired Classification Approach for Microarray Data

A New Meta-heuristic Bat Inspired Classification Approach for Microarray Data Available online at www.sciencedirect.com Procedia Technology 4 (2012 ) 802 806 C3IT-2012 A New Meta-heuristic Bat Inspired Classification Approach for Microarray Data Sashikala Mishra a, Kailash Shaw

More information

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Correlation Motif Vignette

Correlation Motif Vignette Correlation Motif Vignette Hongkai Ji, Yingying Wei October 30, 2018 1 Introduction The standard algorithms for detecting differential genes from microarray data are mostly designed for analyzing a single

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems

Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems J o s e p h G e r a c i, M o y e z D h a r s e e, P a u l o N u i n, A l e x a n d r

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Graphs,EDA and Computational Biology. Robert Gentleman

Graphs,EDA and Computational Biology. Robert Gentleman Graphs,EDA and Computational Biology Robert Gentleman rgentlem@hsph.harvard.edu www.bioconductor.org Outline General comments Software Biology EDA Bipartite Graphs and Affiliation Networks PPI and transcription

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Robust Principal Component Analysis (RPCA)

Robust Principal Component Analysis (RPCA) Robust Principal Component Analysis (RPCA) & Matrix decomposition: into low-rank and sparse components Zhenfang Hu 2010.4.1 reference [1] Chandrasekharan, V., Sanghavi, S., Parillo, P., Wilsky, A.: Ranksparsity

More information

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions 2016 IEEE International Conference on Big Data (Big Data) Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions Jeff Hebert, Texas Instruments

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

PROBLEM FORMULATION AND RESEARCH METHODOLOGY PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter

More information

Integrating Logistic Regression with Knowledge Discovery Systems

Integrating Logistic Regression with Knowledge Discovery Systems Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1997 Proceedings Americas Conference on Information Systems (AMCIS) 8-15-1997 Integrating Logistic Regression with Knowledge Discovery

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Learning Orthographic Transformations for Object Recognition

Learning Orthographic Transformations for Object Recognition Learning Orthographic Transformations for Object Recognition George Bebis,Michael Georgiopoulos,and Sanjiv Bhatia, Department of Mathematics & Computer Science, University of Missouri-St Louis, St Louis,

More information