KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Similar documents
Introduction to Artificial Intelligence

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Experimental Design + k- Nearest Neighbors

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

k-nearest Neighbors + Model Selection

Nearest Neighbor Classification

CSCI567 Machine Learning (Fall 2014)

Input: Concepts, Instances, Attributes

Summary. Machine Learning: Introduction. Marcin Sydow

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Identification Of Iris Flower Species Using Machine Learning

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

Basic Concepts Weka Workbench and its terminology

CS 584 Data Mining. Classification 1

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Intro to Artificial Intelligence

EPL451: Data Mining on the Web Lab 5

ANN exercise session

RAJESH KEDIA 2014CSZ8383

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

Application of Fuzzy Logic Akira Imada Brest State Technical University

Statistical Methods in AI

EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION OF MULTIVARIATE DATA SET

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Machine Learning: Algorithms and Applications Mockup Examination

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

NeuroMem API Open Source Library PATTERN LEARNING AND RECOGNITION WITH A NEUROMEM NETWORK

Introduction to R and Statistical Data Analysis

CS 343: Artificial Intelligence

Machine Learning Chapter 2. Input

Data Mining Practical Machine Learning Tools and Techniques

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Hands on Datamining & Machine Learning with Weka

Kernels and Clustering

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Performance Analysis of Data Mining Classification Techniques

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

CP365 Artificial Intelligence

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Lecture #11: The Perceptron

PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-nearest Neighbor Classifier

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

IN recent years, neural networks have attracted considerable attention

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Orange3 Educational Add-on Documentation

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

K-means Clustering & PCA

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MATH5745 Multivariate Methods Lecture 13

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Machine Learning and Pervasive Computing

Lecture 3. Oct

STAT 1291: Data Science

CSCI567 Machine Learning (Fall 2018)

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Clustering & Classification (chapter 15)

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

INF 4300 Classification III Anne Solberg The agenda today:

Support Vector Machines - Supplement

Techniques for Dealing with Missing Values in Feedforward Networks

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Unsupervised learning

Rule extraction from support vector machines

Linear Regression and K-Nearest Neighbors 3/28/18

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Data Preprocessing. Supervised Learning

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Computer Vision. Exercise Session 10 Image Categorization

Lecture 8: Grid Search and Model Validation Continued

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps

K-means Clustering & k-nn classification

2002 Journal of Software.. (stacking).

Instance-based Learning

Neural Networks Laboratory EE 329 A

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Work 2. Case-based reasoning exercise

Computer Vision. Colorado School of Mines. Professor William Hoff Dept of Electrical Engineering &Computer Science.

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Chuck Cartledge, PhD. 20 January 2018

CSE 573: Artificial Intelligence Autumn 2010

CROSS-CORRELATION NEURAL NETWORK: A NEW NEURAL NETWORK CLASSIFIER

Data Mining Practical Machine Learning Tools and Techniques

Using a genetic algorithm for editing k-nearest neighbor classifiers

Intro to R for Epidemiologists

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Transcription:

KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn

Contents K-means clustering K-Nearest Neighbour

Power Systems Analysis An automated learning approach Understanding states in the power system is established through observation of inputs and outputs without regard to the physical electrotechnical relations between the states. Some input P(X) Some output Adding knowledge about the electrotechnical rules means adding heuristics to the learning.

Supervised learning - a preview In the scope of this course, we will be studying three forms of supervised learning. Decision Trees Overview and practical work on exercise session. Statistical methods k-nearest Neighbour Overview and practical work on exercise session. Also included in Project Assignment knn algorithm can also be used for unsupervised clustering. Artificial Neural Networks Overview only, no practical work.

Lazy vs. Eager learning In Eager learning, the Training set is pre-classified. All objects in the Learning set are clustered with regards to their neighbours. Ex.: Artificial Neural Networks. In Lazy learning, only when a new object is input to the algorithm, the distance is calculated. Ex.: k Nearest Neighbours.

Contents K-means clustering K-Nearest Neighbour

K-means clustering K-means clustering involves creating clusters of data It is iterative and continues until no more clusters can be created It requires the value of k to be defined at start. Consider for instance a table like the following: Sepal length Sepal width Petal length Petal width 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2

Plotted the data looks something like

K-means clustering (continued) In k means clustering, first pick k mean points randomly in the space Calculate the Euclidean distance from each data point in dataset to the mean points Assign a data point to its closest mean point Recalculate means Once ended, we have k clusters

Contents K-means clustering k-nearest Neighbour

The k Nearest Neighbour algorithm The k Nearest Neighbour algorithm is a way to classify objects with attributes to its nearest neighbour in the Learning set. In knn method, the k nearest neighbours are considered. Nearest is measured as distance in Euclidean space.

k-nearest Neighbour classification Assuming instead a table like this where we have lables to clusters Sepal length Sepal width Petal length Petal width Species 5.1 3.5 1.4 0.2 iris setosa 4.9 3.0 1.4 0.2 iris setosa 4.7 3.2 1.3 0.2 iris setosa... 7.0 3.2 4.7 1.4 iris versicolor 6.3 3.3 6.0 2.5 iris virginica

K-Nearest Neighbour algorithm Given a new set of measurements, perform the following test: Find (using Euclidean distance, for example), the k nearest entities from the training set. These entities have known labels. The choice of k is left to us. Among these k entities, which label is most common? That is the label for the unknown entity.

Example from Automatic Learning techniques in Power Systems One Machine Infinite Bus (OMIB) system Assuming a fault close to the Generator will be cleared within 155 ms by protection relays We need to identify situations in which this clearing time is sufficient and when it is not Under certain loading situations, 155 ms may be too slow. Source: Automatic Learning techniques in Power Systems, L. Wehenkel

OMIB further information In the OMIB system the following parameters influence security Amount of active and reactive power of the generator ( Amount of load nearby the generator (PI) Voltage magnitudes at the load bus and at the infinite bus Short-circuit reactance Xinf, representing the effect of variable topology in the large system represented by the infinite bus. In the example, Voltages at generator and Infinite bus are assumed similar and constant for simplicity Source: Automatic Learning techniques in Power Systems, L. Wehenkel

Our database of objects with attributes In a simulator, we randomly sample values for P u and Q u creating a database with 5000 samples (objects) and for each object we have a set of attributes (P u, Q u, V 1, P 1, V inf, X inf, CCT) as per below. Source: Automatic Learning techniques in Power Systems, L. Wehenkel

Plot of database content Source: Automatic Learning techniques in Power Systems, L. Wehenkel

In the OMIB example database Sample 4984, and its neighbours

Error in the 1-NN classification