Machine Learning: Basic Principles

Similar documents
Machine Learning - Clustering. CS102 Fall 2017

Machine Learning. Classification

Introduction to Artificial Intelligence

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

k-nn classification & Statistical Pattern Recognition

The k-means Algorithm and Genetic Algorithm

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Topics in Machine Learning

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

CSE4334/5334 DATA MINING

k-nearest Neighbor (knn) Sept Youn-Hee Han

Fall 09, Homework 5

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Jarek Szlichta

Chapter 6: Cluster Analysis

Kernels and Clustering

Natural Language Processing

Introduction to Machine Learning. Xiaojin Zhu

Intro to Artificial Intelligence

Machine Learning with Python

CS 343: Artificial Intelligence

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Cluster Analysis: Agglomerate Hierarchical Clustering

Cs445 Homework #1. Due 9/9/ :59 pm DRAFT

Supervised vs.unsupervised Learning

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Database system. Régis Mollard

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Applying Supervised Learning

Section 6.3: Measures of Position

Classification: Feature Vectors

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

CP365 Artificial Intelligence

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

Dimension Reduction CS534

CSE 573: Artificial Intelligence Autumn 2010

CSE 152 : Introduction to Computer Vision, Spring 2018 Assignment 5

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Basic Data Mining Technique

Introduction to Pattern Recognition and Machine Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term )

CHAPTER 4: CLUSTER ANALYSIS

Chuck Cartledge, PhD. 23 September 2017

Introduction to Machine Learning

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

9 Classification: KNN and SVM

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Matchings, Ramsey Theory, And Other Graph Fun

Section 2 Comparing distributions - Worksheet

SOCIAL MEDIA MINING. Data Mining Essentials

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Introduction to Clustering

Gene Clustering & Classification

Supervised Learning: K-Nearest Neighbors and Decision Trees

Robotics Programming Laboratory

PROBLEM 4

ECE 5424: Introduction to Machine Learning

Clustering & Classification (chapter 15)

DIGITAL IMAGE ANALYSIS. Image Classification: Object-based Classification

COMP33111: Tutorial and lab exercise 7

Describable Visual Attributes for Face Verification and Image Search

SYDE 372 Introduction to Pattern Recognition. Distance Measures for Pattern Classification: Part I

From dynamic classifier selection to dynamic ensemble selection Albert H.R. Ko, Robert Sabourin, Alceu Souza Britto, Jr.

The Grade 3 Common Core State Standards for Geometry specify that students should

1. Alicia tosses 3 fair coins. What is the probability that she gets at 1. least 1 head? Express your answer as a common fraction.

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Manifold Learning for Video-to-Video Face Recognition

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Overview of machine learning

Using Machine Learning to Optimize Storage Systems

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

(6.6) Geometry and spatial reasoning. The student uses geometric vocabulary to describe angles, polygons, and circles.

Machine Learning 13. week

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

The Entity-Relationship (ER) Model

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Jeff Howbert Introduction to Machine Learning Winter

Social Voting Techniques: A Comparison of the Methods Used for Explicit Feedback in Recommendation Systems

Lecture 3. Oct

k-nn classification with R QMMA

Heart Disease Detection using EKSTRAP Clustering with Statistical and Distance based Classifiers

Lecture 12 Recognition

K-means Clustering & k-nn classification

Search Engines. Information Retrieval in Practice

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Chapter 6 Rational Numbers and Proportional Reasoning

Topic 1 Classification Alternatives

A Computer Vision System for Graphical Pattern Recognition and Semantic Object Detection

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Lecture 12 Recognition. Davide Scaramuzza

Transcription:

Machine Learning: Basic Principles Teaching demonstration Kalle Palomäki Department of Signal Processing and Acoustics Aalto University

Content 1. Goal 2. Machine learning: definition 3. Classification an important machine learning approach 4. A machine learning problem Hands on problem solving Demonstration 5. Summary

Goal Part of introductory sessions adjusted to 20 minutes 4 th year students with no background in machine learning Start building understanding of machine learning by Concrete examples Solving simple hands on problems

Machine learning - definition Wikipedia: Machine learning deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions

Common sense definition: machines that learn a little like the brains http://www.paranormalpeopleonline.com/boskop-man-big-brains-and-increased-intelligence/ http://oldentech.files.wordpress.com/2010/07/1028528_29880053.jpg

Internet and machine learning - far beyond the single brains capacity http://www.slate.com/blogs/future_tense/2014/10/24/internet_sleep_new_research_from_usc_shows_internet_activity_changes_in.html

Machine learning categories Supervised learning Classification Unsupervised learning Clustering Reinforcement learning

Classifier

Classifier

Problem Lisa is a tailor... http://ecx.images-amazon.com/images/i/51f9cnkx90l._sy300_.jpg http://upload.wikimedia.org/wikipedia/commons/3/39/leonardo_da_vinci_043-mod.jpg

Lisa makes uniforms Salvation army uniforms: men have trousers, women skirts http://www.bilerico.com/2009/03/army%20uniforms.jpg

Sometimes she makes mistakes These should be skirts.

Once she made a skirt for prince Charles! http://i.dailymail.co.uk/i/pix/2009/05/21/article-1186234-050b9cb2000005dc-834_224x423.jpg

Hip Waist Hip Waist

Here is Lisa s data waist (cm) hip (cm) gender 29.6 34.4 Female 28.9 34.4 Female 31.3 34.5??? 30.8 33.7 Male 29.8 34.5??? 32.5 33.6 Male 30.6 34.4???.........

Female samples: Red Missing gender information: * * * Male samples : Blue

Some help to Lisa? Discuss in pairs 2 min: How would you approach this problem? What kind of algorithm would you design? Try to come up with some ideas please! Use the picture provided to assist your discussion

K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure:, 3. Sort the distances and determine nearst neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/knn/

Female samples: Red Missing gender information: * * * Male samples : Blue

Female samples: Red K = 3 Missing gender information: * * * Male samples : Blue

K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure:, 3. Sort the distances and determine nearst neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/knn/

Euclidean distance Euclidean distance, Training samples Test sample http://people.revoledu.com/kardi/tutorial/knn/

Euclidean distance Training samples Eucidean distance, Training sample index Test sample http://people.revoledu.com/kardi/tutorial/knn/

Euclidean distance Data dimension Training samples Eucidean distance, Training sample index Test sample Dimension index http://people.revoledu.com/kardi/tutorial/knn/

Euclidean distance Data dimension M=2 Training samples Eucidean distance Training sample index, Test sample Dimension index http://people.revoledu.com/kardi/tutorial/knn/

Female samples of training data Euclidean distance: d 1 Test sample * Male samples of training data

Female samples of training data d 2 Test sample * Male samples of training data

Female samples of training data d 3 Test sample * Male samples of training data

Female samples of training data d 4 Test sample * Male samples of training data

Female samples of training data Test sample * d 5 Male samples of training data

Female samples of training data Test sample * d 6 Male samples of training data

K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure:, 3. Sort the distances and determine nearest neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/knn/

Female samples of training data Test sample * 3 nearest neighbors Male samples of training data

K-nearest neighbours algorithm 1. Determine K = number of nearest neighbours 2. Calculate the distance between test sample all the training samples Use euclidean distance measure:, 3. Sort the distances and determine nearest neigbours 4. Gather the categories of the nearest neighbors 5. Use the majority voting to predict the test sample class http://people.revoledu.com/kardi/tutorial/knn/

Female samples of training data Test sample * 3 nearest neighbors Male samples of training data All 3 neighbors were Male Class was male

Female samples of training data Test sample * 3 nearest neighbors Male samples of training data

Female samples of training data Test sample * 3 nearest neighbors 2 neighbors Female 1 neighbor Male More Females than Males Class is Female Male samples of training data

Classification problem Lisa has lost gender information of one of her customers, and does not know whether to make skirt or trousers. She is planning to throw a coin. Can you help her to make a better decision? The customer who is missing gender information: Gender ------, Waist 28, Hip 34, waist gender (cm) hip (cm) Male 28 32 Male 33 35 Female 27 33 Female 31 36 http://www.dcs.gla.ac.uk/~srogers/firstcourseml/matlab/chapter5/knnexample.html#1 Molarius A, Seidell JC, Sans S, Tuomilehto J, Kuulasmaa K. (1999) "Waist and hip circumferences, and waist-hip ratio in 19 populations of the WHO MONICA Project", International Journal of Obesity and Related Metabolic Disorders :J. Internat. Association Study Obesity, 23:116-125.

Solution Gender waist (cm) hip (cm) distance Male 28 32 (28-28) 2 +(34-32) 2 =4 Male 33 35 (28-33) 2 +(34-35) 2 =26 Female 27 33 (28-27) 2 +(34-33) 2 =2 Female 31 36 (28-31) 2 +(34-36) 2 =13 Test sample 28, 34

Solution Gender waist (cm) hip (cm) distance Male 28 32 (28-28) 2 +(34-32) 2 =4 Male 33 35 (28-33) 2 +(34-35) 2 =26 Female 27 33 (28-27) 2 +(34-33) 2 =2 Female 31 36 (28-31) 2 +(34-36) 2 =13 Test sample 28, 34

Solution Gender waist (cm) hip (cm) Distance rank Male 28 32 (28-28) 2 +(34-32) 2 =4 2 Male 33 35 (28-33) 2 +(34-35) 2 =26 4 Female 27 33 (28-27) 2 +(34-33) 2 =2 1 Female 31 36 (28-31) 2 +(34-36) 2 =13 3 Test sample 28, 34

Solution Gender waist (cm) hip (cm) Distance rank belongs to the neighborhood (Yes or No) Male 28 32 (28-28) 2 +(34-32) 2 =4 2 Yes Male 33 35 (28-33) 2 +(34-35) 2 =26 4 No Female 27 33 (28-27) 2 +(34-33) 2 =2 1 Yes Female 31 36 (28-31) 2 +(34-36) 2 =13 3 Yes Test sample 28, 34

Solution Gender waist (cm) hip (cm) Distance rank belongs to the neighborhood (Yes or No) gender if in neigborhood Male 28 32 (28-28) 2 +(34-32) 2 =4 2Yes Male Male 33 35 (28-33) 2 +(34-35) 2 =26 4No Female 27 33 (28-27) 2 +(34-33) 2 =2 1Yes Female Female 31 36 (28-31) 2 +(34-36) 2 =13 3Yes Female Test sample 28, 34 Male 1 Female 2 Number of Female > Number of Male Class: Female

Summary We addressed briefly principles of machine learning 1. First we defined the machine learning 2. Classification as an important machine learning task 3. Solved a hands on problem of classification utilizing K- nearest neighbour algorithm Check out my website for These slides Exercise The code on the decision border calculations in previous slides http://users.spa.aalto.fi/kpalomak/demonstration_session

What next Supervised learning Classification Unsupervised learning Clustering Reinforcement learning

Face recognition http://cs.nyu.edu/~roweis/data.html

Speech recognition Spectrum over time for cat k a t

Searches