University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences
|
|
- Kathleen Page
- 6 years ago
- Views:
Transcription
1 University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences CPEN 405: Artificial Intelligence Lab 7 November 15, 2017 Unsupervised Learning Contents 1 Introduction 2 2 Background Machine Learning Supervised Learning Unsupervised Learning Clustering Reinforcement Learning K-Means Clustering Introduction The algorithm Implementation The Data class The KMeans class The cluster function Challenge: Identifying the Black Pod disease Requirements Tasks 11 Page 1
2 1 Introduction Now we would like to see Artificial Intelligence in a different dimension: where an agent program can recognize patterns and learn. In the field of Artificial Intelligence, Machine Learning is now in the forefront, to such extents that it is applied in Natural Language Processing and Automated Reasoning. In this lab we shall see how to develop a program that is capable of categorizing data into groups on its own using the K-Means Clustering algorithm. We shall then apply this to detection of diseased plants in agriculture. 2 Background 2.1 Machine Learning Learning is the ability of an agent to improve its behavior based on observations made about the world. This could mean the following: The range of behaviors is expanded; the agent can do more. The accuracy on tasks is improved; the agent can do things better. The speed is improved; the agent can do things faster. There are three main types of learning which are described as follows. 2.2 Supervised Learning In supervised learning, the agent is presented with example input-output pairs and learns a function to map an input to an output such that when it is presented with new inputs, it can automatically determine the corresponding outputs. An abstract definition of supervised learning is as follows. Assume the learner is given the following data: a set of input features, X 1,..., X n ; a set of target features, Y 1,..., Y k ; a set of training examples, where the values for the input features and the target features are given for each example; and a set of test examples, where only the values for the input features are given. The aim is to predict the values of the target features for the test examples and as-yet-unseen examples. Typically, learning is the creation of a representation that can make predictions based on descriptions of the input features of new examples. As an example, take a spam filter. As you identify and mark s as spam and allow others to pass as not-spam, it learns a model from the input ( s) to the output (spam or not-spam) based on which it can classify an incoming Unsupervised Learning Clustering In unsupervised learning, the agent is simply presented with raw inputs and the agent learns patterns in the input. Most commonly the agent classifies the input into bins clustering.[?]. Thus in clustering or unsupervised learning, the target features are not given in the training examples. The aim is to construct a natural classification that can be used to cluster the data. Page 2
3 Given several images of the faces of people, an unsupervised classifier should be able to identify two groups, one being male and the other being female; or it should be able to identify three groups, one being toddlers, another being youth and the third being the elderly. In hard clustering, each example is placed definitively in a class. The class is then used to predict the feature values of the example. The alternative to hard clustering is soft clustering, in which each example has a probability distribution over its class. The prediction of the values for the features of an example is the weighted average of the predictions of the classes the example is in, weighted by the probability of the example being in the class.[?] 2.4 Reinforcement Learning In reinforcement learning the agent learns learns from rewards and punishments. For example, in learning how to ride a bike, you take actions several actions. Some of which keep your balance and enable you move forward, while others make you fall. You learn to ride by avoiding the actions that make you fall and doing more of those that keep your balance. Again, imagine a robot that can act in a world, receiving rewards and punishments and determining from these what it should do. This is the problem of reinforcement learning.[?] Page 3
4 3 K-Means Clustering 3.1 Introduction In clustering a dataset we are concerned about putting the data into groups such that similar items are in the same group, and, of course, dissimilar items are in different groups. The k-means algorithm is one of the most common clustering algorithms. Take a look at the image below. Figure 1: Raw data in scatter diagram Can you see 4 clusters in there? That was easy, right? What about the dataset below? (0.7, 5.1) (1.5, 6 ) (2.1, 4.5) (2.4, 5.5) (3, 4.4) (3.5, 5) (4.5, 1.5) (5.2, 0.7) (5.3, 1.8) (6.2, 1.7) (6.7, 2.5) (8.5, 9.2) (9.1, 9.7) (9.5, 8.5) Now things are getting tougher. You may plot this data (provided in SmallRandData.csv in Lab 6 Resources) and identify the three clusters. A MATLAB script, clusterexample.m is provided to help you with this. Let us take a look at the clusters we obtain from Figure 1. Page 4
5 Figure 2: Raw data partitioned into 4 clusters The red circles appear to be in the center of each cluster. These are known as the centroids of the clusters. Mathematically the centroid of a cluster is simply the arithmetic mean of the data in the cluster. You should notice that for any point in one cluster it is closer to the centroid of that cluster than to the centroid of any other cluster. You may try openexample( stats/partitiondataintotwoclustersexample ) if you have MAT- LAB installed. 3.2 The algorithm Now that we have a clear picture of the result of the algorithm let us see how it works. The K-Means algorithm The k-means algorithm aims to group n observations into k cluster such that each observation belongs to the cluster with the nearest mean. The k has to be specified before the algorithm starts. The algorithm assumes that each feature is given on a numerical scale, and it tries to find classes that minimize the sum-of-squares error when the predicted values for each example are derived from the class to which it belongs A brute-force approach to clustering numeric data would be to examine all possible combinations of the source data set and then determine which of those groupings is best. Even for a dataset of 50 observations to be clustered into 3 groups the number of possible groupings is 119,649,664,052,358,811,373,730. If you are daring enough to proceed and you can examine a billion clusters per second it will take over 3 million years to analyze all the combinations. The algorithm proceeds as follows: 1. Randomly assign data items to clusters 2. Compute the mean/centroid of each cluster 3. Reassign each data item to the cluster of the closest centroid, i.e. the cluster that minimizes the data point-to-cluster-centroid distance. Page 5
6 4. If there were no reassignments, a stable assignment has been found and hence clustering is complete 5. Else go back to step 2 The procedure is illustrated in the diagram below.[?] Figure 3: k-means Problem and Cluster Initialization Figure 4: Compute centroids and reassign clusters Figure 5: Update centroids and update clustering until there is no change Page 6
7 3.3 Implementation 1. Begin by creating a console application. 2. Each observation of the data has two values so create a class with two floating-point attributes. 3. Read the Longitude and Latitude from the data provided in HEALTH FACILITIES IN GHANA.csv [?] in Lab 6 resources into an array of your class s type. As a test, first, use the data in we saw earlier (SmallRandData.csv) with k=3 to check if your clustering is working correctly. 4. Create a function to display some of the data that has been read. 5. Create a class KMeans to handle the clustering. Create a KMeans object, set it s data to the data you read from the file and set the number of clusters, k, to a desired value. 6. Create a cluster function in your KMeans class to cluster the data and assign each data item a cluster. Invoke this function on the KMeans object to cluster the data. 7. Create another function to display the clustered data showing the clusters clearly. 8. You may write the clusters to files and then import them into MATLAB. On clustering with k = 2 and plotting the imported data you should see that the data has been clustered into the Northern and Southern hemispheres of the country and that the Southern part has more health facilities The Data class Data - x : Double - y : Double - cluster: Integer + getters and setters, etc The KMeans class KMeans - rawdata : List of Data - centroids : List of Data - k: Integer + cluster() : void + getters and setters, etc The cluster function Function cluster() /* k-means clustering algorithm */ Data: rawdata is an array of Data from file k is the number of clusters Result: rawdata with stable cluster assignments 1. Randomly assign data items clusters; while stable assignment has not been found do 2. Compute centroid for each cluster; 3. Reassign each data item to the cluster of the closest centroid, i.e. the cluster that minimizes the data point-to-cluster-centroid distance.; Algorithm 1: The summary of k-means clustering algorithm Page 7
8 Function cluster() /* k-means clustering algorithm */ Data: rawdata is an array of Data from file k is the number of clusters Result: rawdata with stable cluster assignments /* 1. a. Randomly assign data items clusters */ for i = 0 to rawdata.lengt H 1 do rawdata[i].clust ER = i mod k; /* 1. b. Refine randomization with the Fisher-Yates shuffle */ for i = 0 to rawdata.lengt H 1 do r = generate random naumber in range [i, rawdata.lengt H 1]; swapclusters(rawdata[i], rawdata[r]); /* Until a stable assignment has been found */ stableassignment =false; while!stableassignment do /* 2. Compute centroid for each cluster */ centroids.clear(); for i = 0 to k 1 do clusteri = rawdata.where(clu ST ER == i); centroidi = new Data; centroidi.x = clusteri.sumx()/clusteri.len GT H; centroidi.y = clusteri.sumy()/clusteri.len GT H; centroidi.clust ER = i; centroids.add(centroidi); /* For each point */ stableassignment =true; for i = 0 to rawdata.lengt H 1 do /* 3. a. Compute and minimize the point-to-centroid distance */ mincluster = 0; mindistance = distance(rawdata[i], centroid[0]); for j = 1 to k 1 do dist = distance(rawdata[i], centroid[j]); if dist < mindistance then mincluster = j; mindistance = dist; /* 3. b. Update cluster assignment if necessary */ if mincluster rawdata[i].clu ST ER then stableassignment =false; rawdata[i].clu ST ER = mincluster; For point-to-cluster-centroid distance, the Euclidean distance may be used. dist = (rawdata[i].x centroid[j].x) 2 + (rawdata[i].y centroid[j].y ) 2 Algorithm 2: The full k-means clustering algorithm Page 8
9 4 Challenge: Identifying the Black Pod disease The Black pod disease, also known as, Phytophthora pod rot, that affects cocoa pods is one such a destructive disease that reduces the yield of cocoa. The symptoms are 1. Translucent spots on pod surface which develop into a small, dark hard spots 2. entire pod becomes black and necrotic with 14 days of initial symptoms 3. white to yellow downy growth on black areas 4. internal tissues become dry and shriveled resulting in mummified pods To prevent the spread, mummified pods should be removed and destroyed to reduce spread.[?] Figure 6: TOP: Healthy cocoa pods. BOTTOM: Cocoa suffering black pod. Small dark ones are mummified Assuming, on a cocoa farm, we have a robot that is equipped with a computer vision system and that is able to isolate images of cocoa pods from the input images we wish to employ k-means clustering to identify the diseased ones. Page 9
10 4.1 Requirements 1. The program should have a user interface that allows one to choose an images of a pod to be analyzed. 2. For the selected image, read the pixel values into an RGB matrix and apply the k-means clustering algorithm on it. 3. Plot a histogram of the cluster counts versus the centroids. For each of the centroids draw a vertical bar that is colored with the color the centroid value represents and with height corresponding to the number of points in that cluster. 4. One may then analyze these histograms and make a judgment as to whether the pod is diseased or not. Automation of this will constitute supervised learning. You may use a library like JFreeChart or the inbuilt charts in JavaFX. 5. Extra credit Modify your program to work such that after clustering, it displays the image with the darkest regions highlighted purple. 5 Tasks 1. Identify 3 strengths and weaknesses of the k-means clustering algorithm. 2. Would you say k-means is a hard clustering algorithm or a soft clustering algorithm? 3. What is the difference between the k-means, the k-medoids and the k-medians algorithms? Page 10
Artificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationA Review on Plant Disease Detection using Image Processing
A Review on Plant Disease Detection using Image Processing Tejashri jadhav 1, Neha Chavan 2, Shital jadhav 3, Vishakha Dubhele 4 1,2,3,4BE Student, Dept. of Electronic & Telecommunication Engineering,
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationClustering. Stat 430 Fall 2011
Clustering Stat 430 Fall 2011 Outline Distance Measures Linkage Hierachical Clustering KMeans Data set: Letters from the UCI repository: Letters Data 20,000 instances of letters Variables: 1. lettr capital
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationUnsupervised Learning Partitioning Methods
Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationMachine Learning - Clustering. CS102 Fall 2017
Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationSearch. The Nearest Neighbor Problem
3 Nearest Neighbor Search Lab Objective: The nearest neighbor problem is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and data compression.
More informationClustering Color/Intensity. Group together pixels of similar color/intensity.
Clustering Color/Intensity Group together pixels of similar color/intensity. Agglomerative Clustering Cluster = connected pixels with similar color. Optimal decomposition may be hard. For example, find
More informationAssociative Cellular Learning Automata and its Applications
Associative Cellular Learning Automata and its Applications Meysam Ahangaran and Nasrin Taghizadeh and Hamid Beigy Department of Computer Engineering, Sharif University of Technology, Tehran, Iran ahangaran@iust.ac.ir,
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationData Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 6. k-means Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Outline K-Means Algorithm in K-Means Node References K-Means Algorithm in Overview The k-means method is a clustering
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationHCR Using K-Means Clustering Algorithm
HCR Using K-Means Clustering Algorithm Meha Mathur 1, Anil Saroliya 2 Amity School of Engineering & Technology Amity University Rajasthan, India Abstract: Hindi is a national language of India, there are
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationk-means Clustering Todd W. Neller Gettysburg College
k-means Clustering Todd W. Neller Gettysburg College Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual Example Worked Example Initialization Methods
More informationData Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science
Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 201 Road map What is Cluster Analysis? Characteristics of Clustering
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More information[7.3, EA], [9.1, CMB]
K-means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] Outline Introduction K-means Algorithm Example How K-means partitions? K-means Demo Relevant Issues Application: Cell Neulei Detection Summary
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationLecture 12 Recognition
Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More information11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records
11/2/2017 MIST.6060 Business Intelligence and Data Mining 1 An Example Clustering X 2 X 1 Objective of Clustering The objective of clustering is to group the data into clusters such that the records within
More informationUnsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1
Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct
More informationCHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM
96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays
More informationCharacter Recognition
Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches
More informationMachine Learning : Clustering, Self-Organizing Maps
Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The
More informationSTAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions
STAT 515 --- STATISTICAL METHODS Statistics: The science of using data to make decisions and draw conclusions Two branches: Descriptive Statistics: The collection and presentation (through graphical and
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationFigure 1 shows unstructured data when plotted on the co-ordinate axis
7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Key Frame Extraction and Foreground Modelling Using K-Means Clustering Azra Nasreen Kaushik Roy Kunal
More informationhttp://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review
More informationk-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University
k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual
More informationColor based segmentation using clustering techniques
Color based segmentation using clustering techniques 1 Deepali Jain, 2 Shivangi Chaudhary 1 Communication Engineering, 1 Galgotias University, Greater Noida, India Abstract - Segmentation of an image defines
More informationEE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR
EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR 1.Introductıon. 2.Multi Layer Perception.. 3.Fuzzy C-Means Clustering.. 4.Real
More informationCS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008
CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationLab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD
Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationIntroduction to Geospatial Analysis
Introduction to Geospatial Analysis Introduction to Geospatial Analysis 1 Descriptive Statistics Descriptive statistics. 2 What and Why? Descriptive Statistics Quantitative description of data Why? Allow
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationData Science and Statistics in Research: unlocking the power of your data Session 3.4: Clustering
Data Science and Statistics in Research: unlocking the power of your data Session 3.4: Clustering 1/ 1 OUTLINE 2/ 1 Overview 3/ 1 CLUSTERING Clustering is a statistical technique which creates groupings
More informationMaximum Entropy (Maxent)
Maxent interface Maximum Entropy (Maxent) Deterministic Precise mathematical definition Continuous and categorical environmental data Continuous output Maxent can be downloaded at: http://www.cs.princeton.edu/~schapire/maxent/
More informationMachine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham
Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand
More informationIntroduction to Machine Learning
Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19 Outline
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationIntroduction to Clustering
Introduction to Clustering Ref: Chengkai Li, Department of Computer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) What is Cluster Analysis? Finding groups of
More informationOverview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce
Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration
More informationIntelligent Image and Graphics Processing
Intelligent Image and Graphics Processing 智能图像图形处理图形处理 布树辉 bushuhui@nwpu.edu.cn http://www.adv-ci.com Clustering Clustering Attach label to each observation or data points in a set You can say this unsupervised
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationK-means Clustering & k-nn classification
K-means Clustering & k-nn classification Andreas C. Kapourani (Credit: Hiroshi Shimodaira) 03 February 2016 1 Introduction In this lab session we will focus on K-means clustering and k-nearest Neighbour
More informationCHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION
CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION 4.1. Introduction Indian economy is highly dependent of agricultural productivity. Therefore, in field of agriculture, detection of
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationCHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE
32 CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE 3.1 INTRODUCTION In this chapter we present the real time implementation of an artificial neural network based on fuzzy segmentation process
More informationClassifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped
More informationRecent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery
Recent Progress on RAIL: Automating Clustering and Comparison of Different Road Classification Techniques on High Resolution Remotely Sensed Imagery Annie Chen ANNIEC@CSE.UNSW.EDU.AU Gary Donovan GARYD@CSE.UNSW.EDU.AU
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationImproving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall
Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography
More informationUsing Statistical Techniques to Improve the QC Process of Swell Noise Filtering
Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering A. Spanos* (Petroleum Geo-Services) & M. Bekara (PGS - Petroleum Geo- Services) SUMMARY The current approach for the quality
More informationMachine Learning in Digital Security
Machine Learning in Digital Security White Paper www.seqrite.com Table of Contents 1. Introduction 2. Introduction to Machine Learning 3. Machine Learning usage in Security Industry 4. Clustering Samples
More informationK-means Clustering & PCA
K-means Clustering & PCA Andreas C. Kapourani (Credit: Hiroshi Shimodaira) 02 February 2018 1 Introduction In this lab session we will focus on K-means clustering and Principal Component Analysis (PCA).
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationK-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.
K-means clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 K-means Outline K-means, K-medoids Choosing the number of clusters: Gap test, silhouette plot. Mixture
More informationAn Unsupervised Technique for Statistical Data Analysis Using Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationA Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition
Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray
More informationCase Study: Attempts at Parametric Reduction
Appendix C Case Study: Attempts at Parametric Reduction C.1 Introduction After the first two studies, we have a better understanding of differences between designers in terms of design processes and use
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More information10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,
10/5/2017 MIST.6060 Business Intelligence and Data Mining 1 Distance Measures Nearest Neighbors In a p-dimensional space, the Euclidean distance between two records, a = a, a,..., a ) and b = b, b,...,
More informationEstablishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation
Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems
More informationUnsupervised Learning and Data Mining
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Supervised Learning ó Decision trees ó Artificial neural nets ó K-nearest neighbor ó Support vectors ó Linear regression
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationSection 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc
Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationWhat is KNIME? workflows nodes standard data mining, data analysis data manipulation
KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:
More informationCHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM
CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationPattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures
Pattern recognition Classification/Clustering GW Chapter 12 (some concepts) Textures Patterns and pattern classes Pattern: arrangement of descriptors Descriptors: features Patten class: family of patterns
More information