DI TRANSFORM. The regressive analyses. identify relationships
|
|
- Bennett Cannon
- 6 years ago
- Views:
Transcription
1 July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical, and engineering data. There are three classification tools that use different machine learning algorithms to sort data into clusters based on similarity and return a class assignment for each data point. These options include unsupervised, supervised, and hierarchical classification. All are used to identify features, explore properties, and determine the location of data (if the input data has a spatial component). The MVstats TM package also includes two predictive multivariate regression tools: linear regression and nonlinear regression. The regressive analyses identify relationships between predictor variables and a response variable to construct a model that you can use to predict the value of the response variable where it is unknown. In addition, the nonlinear regression model has applications beyond basic variable prediction as it includes simulation tools that allow you to perform what if queries of the model. Both regression tools have out-of-sample model validation features that make it easy to assess the accuracy of the model. Finally, you can obtain high quality results faster from any MVstats TM algorithm if outlier and multicollinearity analysis data preparation tools are used prior to model construction. These tools are part of the MVstats TM package. Classification Methods The regressive analyses identify relationships between predictor variables and a response variable to construct a model that you can use to predict the value of the response variable where it is unknown. Classification is used to explore data and identify features. With respect to geophysical data, you can identify facies in a volume from seismic attributes using classification. The same approach works with well logs for facies identification in a vertical profile. You might use classification to analyze large volumes of completions and production information to identify the most effective completion design. When a classification model is applied, a class assignment is determined for each data point that could be a well, a well log measured depth, or a location within a seismic volume. Unsupervised Classification The unsupervised classification tool does not require training data and is often the best option for exploring large datasets as the algorithm efficiently operates on the raw data. Unsupervised classification uses 1
2 k-means 1 clustering that partitions the data into a specified number of mutually exclusive clusters. These clusters are optimized so that the data points within each cluster are as close to one another as possible but as far as possible from data points in other clusters. Each cluster is represented by a centroid, and a centroid value is reported for each input variable. The centroid values describe the properties of each cluster. These values are calculated at the location within the cluster where the sum of distances from all data points is minimized. Hierarchical Classification The hierarchical classification algorithm 2 identifies classes that have a genetic relationship to one another. An advantage of this approach is control. You can direct the model to search for smaller, more nuanced classes contained within a larger group. The algorithm starts with a single originating class that is subdivided into child classes. These can then be further subdivided to form a tree. Child classes of the same parent are more similar to each other than child classes of a different parent. The lowest level classes, those that are children but not parents, are the ones defined in the final model. Hierarchical classification is sensitive to outliers, so it is important to perform Outlier Analysis prior to modeling. 1 You can find a technical description of the k-means algorithm in the following: Ding C. and Xiaogeng H. (2004). Proceedings of the 21st International Conference on Machine Learning: K-means Clustering via Principal Component Analysis. Banff, Canada. 2 Additional algorithm details are found here: Luo F., Khan L., Bastani F., Yen I., and Zhou, J. (2004). A dynamically growing self-organizing tree for hierarchical clustering gene expression profiles. Bioinformatics Advance Access. Supervised Classification A training dataset is required to perform supervised classification. This is also known as discriminant analysis. Currently DI Transform only supports the use of facies logs for training a supervised model; this limits the tool-to-well log analysis. In addition to a facies log, you must supply a set of standard well logs (for example, gamma ray and resistivity) that are analyzed to describe each facies class with the ultimate goal of producing a model that can identify facies from a set of standard well logs alone. If a facies log is available, supervised classification is a powerful tool for well log classification because the model sees the answer and is allowed to work backwards from the desired results. Supervised classification is accomplished in four steps. First, the facies log supplies the model with a class assignment for every measured depth. Then, the discriminant analysis is performed on the data within each class to produce characteristic parameters describing the class. Next, the tool examines the standard well log values at every measured depth and assigns the class that the characteristic parameters show most closely matches the data. Finally, differences between the original facies log and the modeled facies are reported in a table and can be examined visually with a side-byside comparison of the logs. These differences are a signal that additional information is needed to distinguish facies of interest. Predictive Methods Regression models analyze data collected in the past to identify relationships to apply in the future or to fill gaps in data. A geologist might use a regression 2
3 model to predict porosity or pore pressure from well logs. An engineer might use a regression model to predict production from completions parameters and geologic characteristics. DI Transform offers linear and nonlinear regression modeling tools. With both approaches, relationships between multiple independent predictor variables and a single dependent response variable are identified and combined linearly to produce a model that predicts the response variable. Both models search for the best combination of regression coefficients to apply to the predictor variables so that the error between the model s prediction of the response variable and the actual value is minimized. The major difference between the two methods is the shape the relationships between predictor and response variables are allowed to take. With linear regression, relationships must be linear; with nonlinear regression, relationships can be more complex. Out-of-sample validation tools are offered for both linear and nonlinear regression. These tools withhold a portion of the possible regression data, build a model with the remainder, and compare the model prediction of the withheld data to the actual values. The N folds tool divides the regression data into N portions, and then performs the out-of-sample analysis N times once with each fold withheld. The leave-one-out method withholds a single regression sample with the out-of-sample analysis performed as many times as the user specifies. The average absolute error and error standard deviation of the out-of-sample analyses are reported for both methods. Linear Principal Components Regression Analysis DI Transform linear regression harnesses the power of principal components analysis (PCA). The advantage of this approach is that results are not negatively affected when redundant variables are included in a model. This makes it a good option for well log analysis where certain logs might track one another within different materials. PCA optimally fits a series of orthogonal vectors through the multidimensional cloud of input data and describes it in the most efficient way possible. The first eigenvector, or principal component, is fit through the data cloud in its widest direction, so it explains the largest possible variance in the data. The second principal component, which must be orthogonal to the first, describes the largest amount of remaining variance. More components are added until the data is sufficiently explained or until the number of components equals the number of variables. A regression model is then built using the principal components. When the model is applied, the predictor variable values are mapped onto the coordinate systems of the principal components. The response variable is predicted from the principal component regression model. Nonlinear Regression Nonlinear regression allows for complex transformations of the predictor variables. This increases the predictive power of the model because it is better able to utilize information from predictor variables that do not have a linear relationship with the response variable. It is also purposefully designed not to be a black box. The optimal transformations identified by the model are displayed so that you can exercise your expertise and intuition to evaluate and tune the model. This ensures that the model is built on physically reasonable relationships and is not biased by unique features of the regression data. This is not the case with neural 3
4 network-based prediction models, which do not allow for expert override and are vulnerable to data over-fitting if analyses are not performed using very large datasets. The transparency of the DI Transform approach also lets you pull meaningful information from the variable transforms, including optimal predictor variable values and points of diminishing returns. A weakness of the nonlinear regression method, however, is that it is sensitive to data redundancy; this can produce unintuitive predictor variable transforms. We recommend performing multicollinearity analysis before running nonlinear regression to safeguard against that possibility. The first step in the nonlinear regression algorithm is to convert the response variable data to a standard normal distribution. This entails subtracting the mean from each data point and dividing it by the standard deviation of the data. Then the predictor variable data is also transformed to have mean values of zero, sorted from smallest to largest, and scaled. Point-wise continuous transforms are applied to the predictor variables within the allowed relationships (linear, monotonic, higher order, or periodic) using a proprietary method. The algorithm iterates among the different transform options to minimize the error between the model prediction of the response variable and the actual value. This is a data-driven, non-parametric approach, meaning that no single equation describes the transform applied to a given predictor variable. The model returns a validation plot comparing the model prediction of the response variable values to the actual values. The model also returns significance and sensitivity values for each predictor variable. The sensitivity value reports how much the model correlation coefficient would change if the variable was not included in the model. The significance value is the ratio of the range of the predictor variable in its transformed space to the range of the response variable in its transformed space with large values indicating that a change in the predictor variable has a large impact on the value of the response variable. Predictor variable contribution to the model is further examined in transformation plots. The model produces transformation plots for every predictor variable and the response variable, which display the original variable values compared with the transformed values. Because the model is built in standard normal data space, the transformed variable axes are shown in relative units representing the contribution of the predictor variable to the prediction of the response variable unless a simulation is performed. When a simulation is performed on a particular predictor variable, discrete values or data ranges of the other predictor variables are supplied to the model. The response variable is then predicted in physical units for example, barrels of oil (bbls) using the supplied values over the full range of the predictor variable. Specifying predictor variable values lets you query the model with what if scenarios. Data Preparation Tools Outlier Analysis Outliers make fundamental patterns and relationships in data difficult to identify. A model built on data that contains outliers will underperform at best and produce completely incorrect predictions at worst. We recommend removing outliers prior to any modeling effort. DI Transform includes an outlier analysis tool to make that process fast and straightforward. Outlier analysis 4
5 is launched from any correlation table; the analysis is performed only on the data in the table. A probability distribution function (PDF), which represents the probability of a random sample having a particular value, is calculated for each variable from the supplied data using the mean and standard deviation. A smoothing factor lets the user control whether the PDF tracks the actual data distribution or that of a more idealized distribution. You specify an alpha which controls when data is flagged as an outlier. For example, if alpha is set to 0.01, data points that fall under the PDF curve at or below the two 0.5% probability cut-off levels (high or low) are flagged as outliers. You can then decide whether to remove the flagged data points from the correlation table or retain them. Data is only removed from the correlation table; it is not removed from the database. Conclusion DI Transform offers a variety of multivariate analysis tools to take your geophysical, geological, or engineering workflow to a higher level without the pain of exporting information into a statistical software package. Copyright 2015, Drillinginfo, Inc. All rights reserved. Multicollinearity Analysis Multicollinearity analysis determines when two variables contain redundant information. Redundant information supplied to the nonlinear regression tool can produce unintuitive predictor variable transforms and should be avoided. Multicollinearity analysis is launched from any correlation table. First, a maximum multiple correlation coefficient (RSQMAX) is specified. Then, RSQMAX is calculated for different combinations of variables within the correlation table. If the multiple correlation coefficient exceeds RSQMAX, the variable with the highest pair-wise correlation with other variables is flagged as a candidate for rejection. You determine which variables to reject or retain. A variable that is rejected using the multicollinearity analysis tool is only removed from the correlation table but not from the database. PROACTIVE EFFICIENT COMPETITIVE By monitoring the market, Drillinginfo continuously delivers innovative oil & gas solutions that enable our customers to sustain a competitive advantage in any environment. Drillinginfo customers constantly perform above the rest because they are able to be more efficient and more proactive than the competition. Learn more at 5 WP_DI Transform MVstats_RB_Q315; 07/31/15
Clustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationData Preprocessing. Komate AMPHAWAN
Data Preprocessing Komate AMPHAWAN 1 Data cleaning (data cleansing) Attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. 2 Missing value
More informationUsing Statistical Techniques to Improve the QC Process of Swell Noise Filtering
Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering A. Spanos* (Petroleum Geo-Services) & M. Bekara (PGS - Petroleum Geo- Services) SUMMARY The current approach for the quality
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationChemometrics. Description of Pirouette Algorithms. Technical Note. Abstract
19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms
More informationSandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing
Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationBasic Statistical Terms and Definitions
I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationSPSS INSTRUCTION CHAPTER 9
SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationCertified Data Science with Python Professional VS-1442
Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationAbstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion
Abstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion MLDA3:FaciesClasificationandReservoirProperties2,onOctober17,2018from 11:25am to11:50am inroom 204B
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationImproving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall
Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography
More informationOptimizing Completion Techniques with Data Mining
Optimizing Completion Techniques with Data Mining Robert Balch Martha Cather Tom Engler New Mexico Tech Data Storage capacity is growing at ~ 60% per year -- up from 30% per year in 2002. Stored data estimated
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationTensor Based Approaches for LVA Field Inference
Tensor Based Approaches for LVA Field Inference Maksuda Lillah and Jeff Boisvert The importance of locally varying anisotropy (LVA) in model construction can be significant; however, it is often ignored
More informationSAS (Statistical Analysis Software/System)
SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationDOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA
Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion
More informationApplication of K-Means Clustering Methodology to Cost Estimation
Application of K-Means Clustering Methodology to Cost Estimation Mr. Jacob J. Walzer, Kalman & Co., Inc. 2018 Professional Development & Training Workshop 6/2018 Background There are many ways to analyze
More informationUsing the DATAMINE Program
6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationData: a collection of numbers or facts that require further processing before they are meaningful
Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationA Soft Computing-Based Method for the Identification of Best Practices, with Application in the Petroleum Industry
CIMSA 2005 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications Giardini Naxos, Italy, 20-22 July 2005 A Soft Computing-Based Method for the Identification
More informationCRF Based Point Cloud Segmentation Jonathan Nation
CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to
More informationSpectral Classification
Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationSupervised Variable Clustering for Classification of NIR Spectra
Supervised Variable Clustering for Classification of NIR Spectra Catherine Krier *, Damien François 2, Fabrice Rossi 3, Michel Verleysen, Université catholique de Louvain, Machine Learning Group, place
More informationResting state network estimation in individual subjects
Resting state network estimation in individual subjects Data 3T NIL(21,17,10), Havard-MGH(692) Young adult fmri BOLD Method Machine learning algorithm MLP DR LDA Network image Correlation Spatial Temporal
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationChristoHouston Energy Inc. (CHE INC.) Pipeline Anomaly Analysis By Liquid Green Technologies Corporation
ChristoHouston Energy Inc. () Pipeline Anomaly Analysis By Liquid Green Technologies Corporation CHE INC. Overview: Review of Scope of Work Wall thickness analysis - Pipeline and sectional statistics Feature
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationPriyank Srivastava (PE 5370: Mid- Term Project Report)
Contents Executive Summary... 2 PART- 1 Identify Electro facies from Given Logs using data mining algorithms... 3 Selection of wells... 3 Data cleaning and Preparation of data for input to data mining...
More informationThis chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.
CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.
More informationExploring Econometric Model Selection Using Sensitivity Analysis
Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover
More informationClassification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging
1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant
More information7 Techniques for Data Dimensionality Reduction
7 Techniques for Data Dimensionality Reduction Rosaria Silipo KNIME.com The 2009 KDD Challenge Prediction Targets: Churn (contract renewals), Appetency (likelihood to buy specific product), Upselling (likelihood
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationClustering analysis of gene expression data
Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains
More informationQuantifying Data Needs for Deep Feed-forward Neural Network Application in Reservoir Property Predictions
Quantifying Data Needs for Deep Feed-forward Neural Network Application in Reservoir Property Predictions Tanya Colwell Having enough data, statistically one can predict anything 99 percent of statistics
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationMotion Interpretation and Synthesis by ICA
Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More information