Priyank Srivastava (PE 5370: Mid- Term Project Report)

Size: px
Start display at page:

Download "Priyank Srivastava (PE 5370: Mid- Term Project Report)"

Transcription

1 Contents Executive Summary... 2 PART- 1 Identify Electro facies from Given Logs using data mining algorithms... 3 Selection of wells... 3 Data cleaning and Preparation of data for input to data mining... 3 Selection of data mining technique & Workflow... 7 Mathematical Background of PCA and K-Means clustering... 9 Interpretation of Results... 9 Relationship of Predicted Electro facies with original variables The folly of trusting Data mining PART-2: Doing Clustering using SOM and R package Clustering and SOM in R PART-3: Clustering using Merged Dataset of all wells Conclusion Appendix A: R Code for Part-III... 18

2 Executive Summary The Objective of present project is to prepare a data mining model to estimate electro facies from set of open-hole well logs. This trained model can then be used as a predictive tool for estimating unknown logs at any new location. Present workflow utilizes principal component analysis (PCA) and K-Means clustering algorithm for preparation of data mining model. This report is divided into three parts in part-i the data mining algorithm is run on individual wells which uses different attributes for each well depending on the availability. The produced clusters are mapped back to individual wells based on gamma ray values which broadly shows Facies 1 as high gamma ray, Facies 3 as mix of sand shale sequence and Facies 2 as low gamma ray. Presence of these facies is then correlated with corresponding production rates from different wells to figure out reservoir quality of each facies. Though K-means always converges the answer given by K-means depends on the initial centers. It also returns centers that are averages of data points. So some of the wells (Young Joe; Flanik Randal) which do not have complete dataset doesn t show any clusters and thus it is difficult to generalize the interpretation from this Model. This part ends with discussing various disadvantages of K-means clustering. We can predict the unknown logs in these wells using present data mining model but it is out of scope of present project. The process of data mining helps uncovering the hidden patterns in the data set by exposing the relationships between attributes. But the issue is that it uncovers a lot of unuseful patterns. It is up to the domain expert to filter through the patterns and accept the ones that are valid to answer the objective question. Thus, in part-ii some of the wells are used for clustering using selforganizing maps (SOM). In part-iii, 5 attributes (GR, AT90, PEF, RHOB and NPHI) are merged for all the 10 selected wells and similar workflow (PCA+K-means) is run to generate a generalize model for three clusters from which different facies and its characteristics are identified. To conclude based on study in Part III, I can summarize my finding in following table Cluster name Interpretation 1 Shales/Sands with low porosity (0.09) and resistivity (9.12). Probably tight shales with high clay bound water (since, high N phi 0.289) 3 Shales/Sands with very low porosity (0.038) but higher resistivity (16.26) and grain density than facies 1. Probably contains hydrocarbon saturation and less water 2 Probably the hottest spot in this region with good porosity and high Hydrocarbon saturation. So the well with highest amount of Facies 2 will be the most prolific producer.

3 PART- 1 Identify Electro facies from Given Logs using data mining algorithms Selection of wells I choose the wells according to their API numbers so 10 wells in county parker (API: ) were chosen. But not all wells have equal amount of data while some wells have processed logs some don t have it. The table below gives the API numbers with corresponding well name and Production rate for the chosen wells. API s Well name Production rate* (Mscf/day) Moore Deaton Frank-Mask Sugar Tree Westhoff John Flamik Randal Young Joe Kinyon Hagler Lake Wheatherford 965 *From Drillininginfo.com Based on the production rate, the wells can be divided in three categories. Our Goal in this project is to (1) classify each well in electro-facies. (2) If i can relate the performance of well with newly classified electro facies. Data cleaning and Preparation of data for input to data mining Since the logs given to us were processed and contains many redundant and missing parameters. It becomes imperative to select and clean the data for selection of attribute we want as input to data mining algorithms. We want to develop electro facies for upper Barnett and lower Barnett zones local stratigraphy of subsurface is given in Figure 1 as observed Barnett shales is divided in two parts by forestburg limestone Thus, before inputting data in any data mining algorithm we need to get rid of these limestone zones. Since in all of the given logs resistivity of mud is of order of 0.4 Ohm-meter we can be sure that all the wells are drilled by water-based muds and hence we can use Photoelectric (PE) log as lithology indicator since carbonates usually have high PE values of 5. We can easily screen out all the values of log which shows PE < 4. Additional filtering is done by screening out all depths which shows Density (RHOB) >2.7 gm/cc. Figure 2 shows the workflow used for cleaning and filtering of depth so that our final output is depth and parameters of only upper and lower Barnett shale.

4 Figure 3 contains the list of attributes selected for each well. It can be observed that flamik randal and young Joe well contains least amount of attributes.

5 Figure 1 : General stratigraphy of the Ordovician to Pennsylvanian section in fort-worth basin (Loucks & Ruppel, 2007) Select all the depths with PEF < 4 Select all the depths with non zero GR, RHOB, AT90 and 0<NPHI <1 Normalize every parameter with its mean and variance Figure 2: Workflow for Data cleaning

6 Moore ( 9 Attributes) GR(Max:368;Min:18) PEF(Max:6.2;Min:2.2) AT90(Max:862;Min:0.68) NPHI(Max:0.397;Min:.002) RHOB(Max:2.76;Min:2.34) WCLC(AVE: 0.183) WILL(AVE:0.69) WQUA(AVE:0.471) VCL(AVE:0.332) Deaton(11 Attributes) GR(Max:337;Min:12) PEF(Max:5.18;Min:1.8) AT90 NPHI(Max:0.374;Min:0) RHOB(Max:2.825;Min:2.39) WCLC(AVE: 0.176) WDOL(AVE:0.096) WILL(AVE:0.136) WQUA(AVE:0.474) WTOC(AVE:0.022) VCL(AVE:0.237) Frank Mask(6 Attributes) GR(Max:346;Min:0) NPHI(Max:0.30;Min:0) RHOB(Max:2.705;Min:0) VCL(AVE:0.289) PR (AVE: 0.227) CB (0.205) Sugartree (10 Attributes) GR(Max:201;Min:0) PEF(Min:0;Max:9.776) AT90(Min:0.224;Max:173) NPHI(Min:-0.014;Max:0.569) RHOB(Min:2.75;Max:0.30) WILL WQUA VCL PR BULKMOD Westhoff John (9 Attributes) GR(Max:368;Min:18) PEF(Min:2.28;Max:6.234) NPHI(Min:0.002;Max:0.397 RHOB(Max:2.76;Min:2.34) WCAR (AVE:0.025) WCLC(AVE:0.183) WILL(AVE:0.311) WQUA(AVE:0.471) VCL(AVE:0.332) Flamik Randal (5 Attributes) GR(Min:0,Max:883) PEF(Min:0,Max:11.54) AT90(Min:0,Max:927) NPHI(Min:0,Max:2.7) RHOB(Min:0;Max:164) Young Joe (5 Attributes) GR(Min:0,Max:883) PEF(Min:0,Max:11.54) AT90 NPHI RHOB Kinyon (7 Attributes) GR PEF AT90 NPHI RHOB PR YME Hagler (9 Attributes) GR PEF AT90 NPHI RHOB WCLC WILL WQUA VCL Lake whetherford GR PEF AT90 NPHI RHOB WILL WQUA WPYR Figure 3: Table listed below gives the summary of different meaningful curves which could be extracted from each well.

7 Selection of data mining technique & Workflow Due to high volume of log data. It is desirable to choose unsupervised data mining techniques to first find out if our data contains any hidden trends or patterns. Since many wells have log attributes as high as 200. So, it becomes necessary to first reduce the dimensionality of data before applying any clustering algorithm. I use principal component analysis (PCA) to first reduce the dimensionality of data in three principal components and consequently use K-means clustering algorithm to optimize and generate clusters in the data. Figure 4 gives PCA & clustering density plots for different wells in sequence. Clustering is done using X-means algorithm which automatically optimizes number of clusters by iteration. However, due to uneven size of clustering as shown in Fig-4 it can be argued successfully that this method is not giving us the right clusters that we want since in the quest to minimize the within cluster sum of squares error, the X-means clustering gave more weight to larger clusters. Thus, to conclude this clustering technique could not be applied in this case since K-means assumes that each cluster have roughly equal number of observations. Also, PCA is the methodology which is applied to correlated attributes since presence of variance in any one direction is necessary so if the data doesn t show any correlation than applying PCA is not a meaningful task. Table 1 : Parameters used in X-means clustering and PCA analysis PCA No. of components selected based of keeping variance of 90% X- Means Clustering Min. clusters 2 Max. clusters 60 Numerical measures Euclidean distances Max. runs 10 Max. Optimization steps 100

8 Figure 4 : PCA Density Plots with X- Means clustering for following wells in order from top left 1. Moore 2. Deaton 3. Frankmask 4. Sugar tree 5. Westhoff John 6. Flanik Randal 7. Young Joe 8. Kinyon 9. Hagler 10. Lake Wheatherford. While Using X-Means clustering most of the wells can be described by three clusters in PCA data but Well 6 & 7 does not display any specific clusters.

9 Mathematical Background of PCA and K-Means clustering PCA is the dimensionality reduction technique to reduce dimensionality of data for a correlated attribute dataset. The 1 st principal component is the direction of maximum variance in data. While each principal component is independent and orthogonal to each other. Every attribute needs to be scaled before applying PCA algorithm to it. PCA is a very useful tool for exploratory data analysis and predictive modelling of huge dimension dataset. While PCA helps to see internal patterns in data next step for data mining is Clustering, although literature is rich with many different algorithms for efficient way to do clustering fundamental workflow for clustering is shown in Table 2 Table 2 : Workflow for clustering algorithms Determine No. of Clusters (Centroids) to be placed Iterate until things converge and number of clusters optimizes. Find distance of each data point to each centroid and assign centroid to each data point based on minimizing sum of distance distance recompute centroid and reclassify based on minimizing sum of distances from centroid find centroid of the clusters done in first iteration and reclassify each datapoint to it's cluster Interpretation of Results Since Principal components as such does not have any physical meanings. I have to transform the predicted clusters back to the original data. Table below gives the distribution of data-points in different clusters for all the analyzed wells: Well name No. of data points used in analyses after cleaning Data points in cluster 1 Data point in cluster 2 Data point in cluster 3 Data point in cluster 4 Moore Deaton Frank mask Sugar tree Westhoff john Flanik Randal Young Joe Kinyon Hagler Wheatherford lake

10 Depth Priyank Srivastava (PE 5370: Mid- Term Project Report) Relationship of Predicted Electro facies with original variables 4400 GR & Electrofacies For Moore Well Facies 1 Dominated Facies 3 Dominated 5400 Facies 2 Dominated 5600 GR ELECTROFACIES Figure 5 : Moore well can be subdivided into three electro facies using data mining which can be correlated with gamma ray values. Facies 1 shows high gamma ray and are most probably shale interval while facies 2 have lesser radioactivity as compare to facies 1. Facies 3 have the lowest gamma ray reading.

11 Depth Depth Priyank Srivastava (PE 5370: Mid- Term Project Report) 4900 GR & Electrofacies for Deaton Well GR & Electrofacies Frank mask Facies 1 Dominated Facies 1 Dominated Facies 3 Dominated 6000 Facies 2 Dominated Facies 1 Dominated Facies 1 Dominated GR ELECTROFACIES GR ELECTROFACIES Figure 6 : Deaton well seem to contain only facies 1 and facies 3. While amount of facies 2 is very less. In Frank mask well only two type of facies is present but it is not easy to classify them just based on gamma ray log.

12 Depth Depth Priyank Srivastava (PE 5370: Mid- Term Project Report) GR & Electrofacies Kinyon GR & Electrofacies Hagler Facies Facies Facies Facies Facies Facies GR GR

13 The folly of trusting Data mining Most of Data mining algorithm are heuristic processes in which no physical understanding is needed for application of any process. The process of data mining is suppose to show us hidden trends. However, applying any data mining task blindly can lead to completely wrong outputs. Given below are some of the caveats of using K-means clustering to real life dataset. 1. K-means assumes the variance of the distribution of each attribute is spherical 2. Doesn t work on spherical dataset Usually higher the dimensions of data more difficult is applying K-means to it efficiently. 3. The Curse of Unevenly sized clusters K-means assumes the prior probability for all K clusters are the same i.e. each cluster has roughly equal number of observations. Which is obviously not the same with our dataset. PART-2: Doing Clustering using SOM and R package Figure 7 Shows use of self-organizing maps U matrix plot with K means clustering for all the wells using same attributes as used in part-1 Figure 7 : SOM clustering for Moore well

14 However, again it is difficult to evaluate the accuracy of clustering. Clustering and SOM in R Since R provides some flexibility and quality checks for clustering. The filtered data obtained from part- 1 data cleaning workflow with additional constraint of GR value >120 is used as an input to R and I used K-means clustering technique to see how it performs. This is done for following four wells Moore, Deaton, Frankmask, Kinyon. This section describes the results of using R. Figure 8 : Clustering Optimization for Moore well Figure 9: Clustering optimization of Deaton Well

15 Figure 10 : Clustering Optimization for Frank mask well Figure 11: Clustering optimization of Kinyon Well

16 Figure 12 : Clustering optimization of Hagler Well PART-3: Clustering using Merged Dataset of all wells Names of selected wells. This time I just used the wells which contains all these 5 curves i.e. GR, AT90, PEF, NPHI, and RHOB. Following wells were selected for the analysis Bonds ranch C-1 Hyder 1H Jerome Russell John W Porter 3 Massey Unit McFarland-Dixon Moore-Price Sol Carpenter Heirs Sugar tree Upham Joe Johnson Applying the same workflow to merged dataset gives following three clusters as given in Figure 13 : PCA clusters for merged dataset

17 The table below gives centroid for each cluster Cluster number PC1 PC2 Avg. GR (API) Avg. DPHI Avg. PEF Avg. At 90 Avg. RHOB Avg. NPHI Conclusion The clusters can be interpreted as follows: Cluster name Interpretation 1 Shales/Sands with low porosity (0.09) and resistivity (9.12). Probably tight shales with high clay bound water (since, high N phi 0.289) 3 Shales/Sands with very low porosity (0.038) but higher resistivity (16.26) and grain density than facies 1. Probably contains hydrocarbon saturation and less water 2 Probably the hottest spot in this region with good porosity and high Hydrocarbon saturation. So the well with highest amount of Facies 2 will be the most prolific producer.

18 Appendix A: R Code for Part-III setwd("c:/users/priya/desktop/dmp_midterm/r") ms<-read.table("book1_final.csv",header = TRUE,sep = ",") ms[is.na(ms)]<-0 attach(ms) ls.str(ms) #na.rm=true #x[!is.na(x)] ms<-ms[,c(1,2,4,5,6,7,8)] #removing values of PEF>4 and GR<120 msfilter<-ms[(ms$pef<4&ms$gr>110),] ##Doing k means clustering in r par(mfrow=row(1,3),mar=c(4,4,2,1)) #mydata<-scale(msfilter) ##applying PCA for sacled variable mspca<-prcomp(msfilter,center=true, scale=true, retx=true) fulldata<-data.frame(msfilter,mspca$x) mydata<-mspca$x # Determine number of clusters wss <- (nrow(xmydata)-1)*sum(apply(mydata,2,var)) for (i in 2:15) wss[i] <- sum(kmeans(mydata, centers=i)$withinss) dev.copy(pdf,"myplot.pdf") plot(1:15, wss, type="b", xlab="number of Clusters", ylab="within groups sum of squares") fit<-kmeans(mydata,3,iter.max = 100, nstart=50) #get cluster means aggregate(mydata,by=list(fit$cluster),fun=mean) #append cluster assignment mydata<-data.frame(fulldata,fit$cluster) library(cluster) clusplot(mydata,fit$cluster,color=true,shade=true,labels=0,lines=0) write.table(mydata,"c:/users/priya/desktop/dmp_midterm/r/mergeddata.txt",sep="\t")

B S Bisht, Suresh Konka*, J P Dobhal. Oil and Natural Gas Corporation Limited, GEOPIC, Dehradun , Uttarakhand

B S Bisht, Suresh Konka*, J P Dobhal. Oil and Natural Gas Corporation Limited, GEOPIC, Dehradun , Uttarakhand Prediction of Missing Log data using Artificial Neural Networks (ANN), Multi-Resolution Graph-based B S Bisht, Suresh Konka*, J P Dobhal Oil and Natural Gas Corporation Limited, GEOPIC, Dehradun-248195,

More information

DI TRANSFORM. The regressive analyses. identify relationships

DI TRANSFORM. The regressive analyses. identify relationships July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,

More information

Abstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion

Abstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion Abstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion MLDA3:FaciesClasificationandReservoirProperties2,onOctober17,2018from 11:25am to11:50am inroom 204B

More information

Seismic facies analysis using generative topographic mapping

Seismic facies analysis using generative topographic mapping Satinder Chopra + * and Kurt J. Marfurt + Arcis Seismic Solutions, Calgary; The University of Oklahoma, Norman Summary Seismic facies analysis is commonly carried out by classifying seismic waveforms based

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Benefits of Integrating Rock Physics with Petrophysics

Benefits of Integrating Rock Physics with Petrophysics Benefits of Integrating Rock Physics with Petrophysics Five key reasons to employ an integrated, iterative workflow The decision to take a spudded well to completion depends on the likely economic payout.

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://dx.doi.org/10.1109/19.668276 Fung, C.C., Wong, K.W. and Eren, H. (1997) Modular artificial neural network for prediction of petrophysical properties from well log data.

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

SPE demonstrated that quality of the data plays a very important role in developing a neural network model.

SPE demonstrated that quality of the data plays a very important role in developing a neural network model. SPE 98013 Developing Synthetic Well Logs for the Upper Devonian Units in Southern Pennsylvania Rolon, L. F., Chevron, Mohaghegh, S.D., Ameri, S., Gaskari, R. West Virginia University and McDaniel B. A.,

More information

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction CSE 258 Lecture 5 Web Mining and Recommender Systems Dimensionality Reduction This week How can we build low dimensional representations of high dimensional data? e.g. how might we (compactly!) represent

More information

Clustering and Dimensionality Reduction

Clustering and Dimensionality Reduction Clustering and Dimensionality Reduction Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: Data Mining Automatically extracting meaning from

More information

We LHR5 06 Multi-dimensional Seismic Data Decomposition for Improved Diffraction Imaging and High Resolution Interpretation

We LHR5 06 Multi-dimensional Seismic Data Decomposition for Improved Diffraction Imaging and High Resolution Interpretation We LHR5 06 Multi-dimensional Seismic Data Decomposition for Improved Diffraction Imaging and High Resolution Interpretation G. Yelin (Paradigm), B. de Ribet* (Paradigm), Y. Serfaty (Paradigm) & D. Chase

More information

MURDOCH RESEARCH REPOSITORY

MURDOCH RESEARCH REPOSITORY MURDOCH RESEARCH REPOSITORY http://dx.doi.org/10.1109/iconip.1999.843984 Wong, K.W., Fung, C.C. and Myers, D. (1999) A generalised neural-fuzzy well log interpretation model with a reduced rule base. In:

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Introduction to digital image classification

Introduction to digital image classification Introduction to digital image classification Dr. Norman Kerle, Wan Bakx MSc a.o. INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Purpose of lecture Main lecture topics Review

More information

A Soft Computing-Based Method for the Identification of Best Practices, with Application in the Petroleum Industry

A Soft Computing-Based Method for the Identification of Best Practices, with Application in the Petroleum Industry CIMSA 2005 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications Giardini Naxos, Italy, 20-22 July 2005 A Soft Computing-Based Method for the Identification

More information

Software that Works the Way Petrophysicists Do

Software that Works the Way Petrophysicists Do Software that Works the Way Petrophysicists Do Collaborative, Efficient, Flexible, Analytical Finding pay is perhaps the most exciting moment for a petrophysicist. After lots of hard work gathering, loading,

More information

Mahalanobis clustering, with applications to AVO classification and seismic reservoir parameter estimation

Mahalanobis clustering, with applications to AVO classification and seismic reservoir parameter estimation Mahalanobis clustering Mahalanobis clustering with applications to AVO classification and seismic reservoir parameter estimation Brian H. Russell and Laurence R. Lines ABSTRACT A new clustering algorithm

More information

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32 Homework #6 has been released. It is due at 11:59PM on 11/7. 33 CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

GeoFrame Basic Petrophysical Interpretation Using PetroViewPlus

GeoFrame Basic Petrophysical Interpretation Using PetroViewPlus GeoFrame Basic Petrophysical Interpretation Using PetroViewPlus Training and Exercise Guide GeoFrame 4 September 20, 2001 1 Contents About This Course Chapter 1 PetroViewPlus Workflow Learning Objectives

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Supervised vs.unsupervised Learning

Supervised vs.unsupervised Learning Supervised vs.unsupervised Learning In supervised learning we train algorithms with predefined concepts and functions based on labeled data D = { ( x, y ) x X, y {yes,no}. In unsupervised learning we are

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Multi-attribute seismic analysis tackling non-linearity

Multi-attribute seismic analysis tackling non-linearity Multi-attribute seismic analysis tackling non-linearity Satinder Chopra, Doug Pruden +, Vladimir Alexeev AVO inversion for Lamé parameters (λρ and µρ) has become a common practice as it serves to enhance

More information

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton Three problems with backpropagation Where does the supervision come from?

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Fuzzy Preprocessing Rules for the Improvement of an Artificial Neural Network Well Log Interpretation Model

Fuzzy Preprocessing Rules for the Improvement of an Artificial Neural Network Well Log Interpretation Model Fuzzy Preprocessing Rules for the Improvement of an Artificial Neural Network Well Log Interpretation Model Kok Wai Wong School of Information Technology Murdoch University South St, Murdoch Western Australia

More information

Joint quantification of uncertainty on spatial and non-spatial reservoir parameters

Joint quantification of uncertainty on spatial and non-spatial reservoir parameters Joint quantification of uncertainty on spatial and non-spatial reservoir parameters Comparison between the Method and Distance Kernel Method Céline Scheidt and Jef Caers Stanford Center for Reservoir Forecasting,

More information

Predicting Porosity through Fuzzy Logic from Well Log Data

Predicting Porosity through Fuzzy Logic from Well Log Data International Journal of Petroleum and Geoscience Engineering (IJPGE) 2 (2): 120- ISSN 2289-4713 Academic Research Online Publisher Research paper Predicting Porosity through Fuzzy Logic from Well Log

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

A Self-Organizing Map, Machine Learning Approach to Lithofacies Classification

A Self-Organizing Map, Machine Learning Approach to Lithofacies Classification A Self-Organizing Map, Machine Learning Approach to Lithofacies Classification Lan Mai-Cao Department of Drilling and Production Engineering Ho Chi Minh City University of Technology (HCMUT) Ho Chi Minh

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

PS wave AVO aspects on processing, inversion, and interpretation

PS wave AVO aspects on processing, inversion, and interpretation PS wave AVO aspects on processing, inversion, and interpretation Yong Xu, Paradigm Geophysical Canada Summary Using PS wave AVO inversion, density contrasts can be obtained from converted wave data. The

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Synthetic, Geomechanical Logs for Marcellus Shale M. O. Eshkalak, SPE, S. D. Mohaghegh, SPE, S. Esmaili, SPE, West Virginia University

Synthetic, Geomechanical Logs for Marcellus Shale M. O. Eshkalak, SPE, S. D. Mohaghegh, SPE, S. Esmaili, SPE, West Virginia University SPE-163690-MS Synthetic, Geomechanical Logs for Marcellus Shale M. O. Eshkalak, SPE, S. D. Mohaghegh, SPE, S. Esmaili, SPE, West Virginia University Copyright 2013, Society of Petroleum Engineers This

More information

Tensor Based Approaches for LVA Field Inference

Tensor Based Approaches for LVA Field Inference Tensor Based Approaches for LVA Field Inference Maksuda Lillah and Jeff Boisvert The importance of locally varying anisotropy (LVA) in model construction can be significant; however, it is often ignored

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

The Lesueur, SW Hub: Improving seismic response and attributes. Final Report

The Lesueur, SW Hub: Improving seismic response and attributes. Final Report The Lesueur, SW Hub: Improving seismic response and attributes. Final Report ANLEC R&D Project 7-0115-0241 Boris Gurevich, Stanislav Glubokovskikh, Marina Pervukhina, Lionel Esteban, Tobias M. Müller,

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Data Mining with SPSS Modeler

Data Mining with SPSS Modeler Tilo Wendler Soren Grottrup Data Mining with SPSS Modeler Theory, Exercises and Solutions Springer 1 Introduction 1 1.1 The Concept of the SPSS Modeler 2 1.2 Structure and Features of This Book 5 1.2.1

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Volumetric Classification: Program gtm3d

Volumetric Classification: Program gtm3d 3D PROBABILISTIC SEISMIC FACIES ANALYSIS PROGRAM gtm3d (Generative Topographic Mapping) Overview Like Self-organizing Maps (SOM), Generative Topographic Mapping (GTM) maps high-dimensional data (e.g. five

More information

Tutorial Base Module

Tutorial Base Module Tutorial Base Module CONTENTS Part 1 - Introduction to the CycloLog desktop... 7 1.1 Introduction to the CycloLog desktop... 7 1.2 Using Log Data panes... 9 Part 2 - Importing and displaying well log data...

More information

Cluster Analysis Gets Complicated

Cluster Analysis Gets Complicated Cluster Analysis Gets Complicated Collinearity is a natural problem in clustering. So how can researchers get around it? Cluster analysis is widely used in segmentation studies for several reasons. First

More information

TerraStation II v7 Training

TerraStation II v7 Training HOW-TO Using multiple component sets with PETRA and IMAGELog It often happens that you have a well where you want to calculate lithologies with different components in different intervals, but then display

More information

Data Mining: Unsupervised Learning. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Data Mining: Unsupervised Learning. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Data Mining: Unsupervised Learning Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Learning how k-means clustering works 2 Understanding dimensionality reduction

More information

Fluid flow modelling with seismic cluster analysis

Fluid flow modelling with seismic cluster analysis Fluid flow modelling with seismic cluster analysis Fluid flow modelling with seismic cluster analysis Laurence R. Bentley, Xuri Huang 1 and Claude Laflamme 2 ABSTRACT Cluster analysis is used to construct

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Programs for MDE Modeling and Conditional Distribution Calculation

Programs for MDE Modeling and Conditional Distribution Calculation Programs for MDE Modeling and Conditional Distribution Calculation Sahyun Hong and Clayton V. Deutsch Improved numerical reservoir models are constructed when all available diverse data sources are accounted

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Digital Core study of Wanaea and Perseus Core Fragments:

Digital Core study of Wanaea and Perseus Core Fragments: Digital Core study of Wanaea and Perseus Core Fragments: Summary for Woodside Energy Mark A. Knackstedt,2, A. Ghous 2, C. H. Arns, H. Averdunk, F. Bauget, A. Sakellariou, T.J. Senden, A.P. Sheppard,R.

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART Vadhana Jayathavaj Rangsit University, Thailand vadhana.j@rsu.ac.th Adisak

More information

DETERMINATION OF REGIONAL DIP AND FORMATION PROPERTIES FROM LOG DATA IN A HIGH ANGLE WELL

DETERMINATION OF REGIONAL DIP AND FORMATION PROPERTIES FROM LOG DATA IN A HIGH ANGLE WELL 1st SPWLA India Regional Conference Formation Evaluation in Horizontal Wells DETERMINATION OF REGIONAL DIP AND FORMATION PROPERTIES FROM LOG DATA IN A HIGH ANGLE WELL Jaideva C. Goswami, Denis Heliot,

More information

Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli

Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli By definition, machine learning models are based on learning and self-adaptive techniques. A priori, real world data are intrinsically carriers

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Scaling Techniques in Political Science

Scaling Techniques in Political Science Scaling Techniques in Political Science Eric Guntermann March 14th, 2014 Eric Guntermann Scaling Techniques in Political Science March 14th, 2014 1 / 19 What you need R RStudio R code file Datasets You

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

User-defined compaction curves

User-defined compaction curves User-defined compaction curves Compaction curves define the decrease in porosity of a given lithology with increasing burial depth. In Move, the 2D Kinematic and 3D Kinematic modules use compaction curves

More information

Upstream Data Management in the 2020s: New uses for old skills in the world of Big Data. Dr Duncan Irving DEJ Aberdeen September 29 th, 2015

Upstream Data Management in the 2020s: New uses for old skills in the world of Big Data. Dr Duncan Irving DEJ Aberdeen September 29 th, 2015 Upstream Data Management in the 2020s: New uses for old skills in the world of Big Data Dr Duncan Irving DEJ Aberdeen September 29 th, 2015 2 Our workflows haven t really changed much since the first data

More information

Final Report: Kaggle Soil Property Prediction Challenge

Final Report: Kaggle Soil Property Prediction Challenge Final Report: Kaggle Soil Property Prediction Challenge Saurabh Verma (verma076@umn.edu, (612)598-1893) 1 Project Goal Low cost and rapid analysis of soil samples using infrared spectroscopy provide new

More information

Data Analytics for. Transmission Expansion Planning. Andrés Ramos. January Estadística II. Transmission Expansion Planning GITI/GITT

Data Analytics for. Transmission Expansion Planning. Andrés Ramos. January Estadística II. Transmission Expansion Planning GITI/GITT Data Analytics for Andrés Ramos January 2018 1 1 Introduction 2 Definition Determine which lines and transformers and when to build optimizing total investment and operation costs 3 Challenges for TEP

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

A Really Good Log Interpretation Program Designed to Honour Core

A Really Good Log Interpretation Program Designed to Honour Core A Really Good Log Interpretation Program Designed to Honour Core Robert V. Everett & James R. Everett CWLS Summary: Montney example illustrates methodology We have unique, focussed, log interpretation

More information

PETROPHYSICAL DATA AND OPEN HOLE LOGGING BASICS COPYRIGHT. MWD and LWD Acquisition (Measurement and Logging While Drilling)

PETROPHYSICAL DATA AND OPEN HOLE LOGGING BASICS COPYRIGHT. MWD and LWD Acquisition (Measurement and Logging While Drilling) LEARNING OBJECTIVES PETROPHYSICAL DATA AND OPEN HOLE LOGGING BASICS MWD and LWD Acquisition By the end of this lesson, you will be able to: Understand the concept of Measurements While Drilling (MWD) and

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Spectral Classification

Spectral Classification Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

UNIVERSITY OF OKLAHOMA GRADUATE COLLEGE DATA MINING APPLICATIONS IN RESERVOIR MODELING A THESIS SUBMITTED TO THE GRADUATE FACULTY

UNIVERSITY OF OKLAHOMA GRADUATE COLLEGE DATA MINING APPLICATIONS IN RESERVOIR MODELING A THESIS SUBMITTED TO THE GRADUATE FACULTY UNIVERSITY OF OKLAHOMA GRADUATE COLLEGE DATA MINING APPLICATIONS IN RESERVOIR MODELING A THESIS SUBMITTED TO THE GRADUATE FACULTY in partial fulfillment of the requirements for the Degree of MASTER OF

More information