MIA - Master on Artificial Intelligence
|
|
- Janice Osborne
- 6 years ago
- Views:
Transcription
1 MIA - Master on Artificial Intelligence
2 1 Hierarchical Non-hierarchical Evaluation
3 1 Hierarchical Non-hierarchical Evaluation
4 The Concept of, proximity, affinity, distance, difference, divergence We use distance when metric properties hold: d(x, x) = 0 d(x, y) 0 when x y d(x, y) = d(y, x) (simmetry) d(x, z) d(x, y) + d(y, z) (triangular inequation) We use similarity in the general case Function: sim : A B S (where S is often [0, 1]) Homogeneous: sim : A A S (e.g. word-to-word) Heterogeneous: sim : A B S (e.g. word-to-document) Not necessarily symmetric, or holding triangular inequation.
5 The Concept of If A is a metric space, the distance in A may be used. D euclidean ( x, y) = x y = (x i y i ) 2 vs distance sim D (A, B) = 1 1+D(A,B) monotonic: min{sim(x, y), sim(x, z)} sim(x, y z) i
6 Applications, case-based reasoning, IR,... Discovering related words - Distributional similarity Resolving syntactic ambiguity - Taxonomic similarity Resolving semantic ambiguity - Ontological similarity Acquiring selectional restrictions/preferences
7 Relevant Information Content (information about compared units) Words: form, morphology, PoS,... Senses: synset, topic, domain,... Syntax: parse trees, syntactic roles,... Documents: words, collocations, NEs,... Context (information about the situation in which simmilarity is computed) Window based vs. Syntactic based External Knowledge Monolingual/bilingual dictionaries, ontologies, corpora
8 Vectorial methods (1) L 1 norm, Manhattan distance, taxi-cab distance, city-block distance N L 1 ( x, y) = x i y i i=1 L 2 norm, Euclidean distance L 2 ( x, y) = x y = N (x i y i ) 2 Cosine distance cos( x, y) = x y x y = i=1 x i y i i x 2 i i i y 2 i
9 Vectorial methods (2) L 1 and L 2 norms are particular cases of Minkowsky measure ( N ) 1 r D minkowsky ( x, y) = L r ( x, y) = (x i y i ) r Camberra distance N x i y i D camberra ( x, y) = x i + y i i=1 Chebychev distance D chebychev ( x, y) = max x i y i i i=1
10 Set-oriented methods (3): Binary valued vectors seen as sets 2 X Y Dice. S dice (X, Y) = X + Y X Y Jaccard. S jaccard (X, Y) = X Y X Y Overlap. S overlap (X, Y) = min( X, Y ) X Y Cosine. cos(x, Y) = X Y Above similarities are in [0, 1] and can be used as distances simply substracting: D = 1 S
11 Set-oriented methods (4): Agreement contingency table Object j Object i a b a + b 0 c d c + d a + c b + d p 2a Dice. S dice (X, Y) = 2a + b + c a Jaccard. S jaccard (X, Y) = a + b + c a Overlap. S overlap (X, Y) = min(a + b, a + c) a Cosine. S overlap (X, Y) = (a + b)(a + c) Matching coefficient. S mc (i, j) = a + d p
12 Distributional Particular case of vectorial representation where attributes are probability distributions N x T = [x 1... x N ] such that i, 0 x i 1 and x i = 1 i=1 Kullback-Leibler Divergence (Relative Entropy) D(q r) = q(y) log q(y) (non symmetrical) r(y) y Y Mutual Information I(A, B) = D(h f g) = h(a, b) h(a, b) log f(a) g(b) a A b B (KL-divergence between joint and product distribution)
13 Semantic Project objects onto a semantic space: D A (x 1, x 2 ) = D B (f(x 1 ), f(x 2 )) Semantic spaces: ontology (WordNet, CYC, SUMO,...) or graph-like knowledge base (e.g. Wikipedia). Not easy to project words, since semantic space is composed of concepts, and a word may map to more than one concept. Not obvious how to compute distance in the semantic space.
14 WordNet
15 WordNet
16 Distances in WordNet WordNet:: Some definitions: SLP(s 1, s 2 ) = Shortest Path Length from concept s 1 to s 2 (Which subset of arcs are used? antonymy, gloss,... ) depth(s) = Depth of concept s in the ontology MaxDepth = max depth(s) s WN LCS(s 1, s 2 ) = Lowest Common Subsumer of s 1 and s 2 IC(s) = log 1 = Information Content of s (given a P(s) corpus)
17 Distances in WordNet Shortest Path Length: D(s 1, s 2 ) = SLP(s 1, s 2 ) Leacock & Chodorow: D(s 1, s 2 ) = log SLP(s 1, s 2 ) 2 MaxDepth Wu & Palmer: D(s 1, s 2 ) = 2 depth(lcs(s 1, s 2 )) depth(s 1 ) + depth(s 2 ) Resnik: D(s 1, s 2 ) = IC(LCS(s 1, s 2 )) Jiang & Conrath: D(s 1, s 2 ) = IC(s 1 ) + IC(s 2 ) 2 IC(LCS(s 1, s 2 )) Lin: D(s 1, s 2 ) = 2 IC(LCS(s 1, s 2 )) IC(s 1 ) + IC(s 2 ) Gloss overlap: Sum of squares of lengths of word overlaps between glosses Gloss vector: Cosine of second-order co-occurrence vectors of glosses
18 Distances in Wikipedia Measures using links, including measures usend on WordNet, but applied to Wikipedia graph Measures using content of articles (vector spaces) Measures using Wikipedia Categories
19 1 Hierarchical Non-hierarchical Evaluation
20 Partition a set of objects into clusters. Objects: features and values measure Utilities: Exploratory Data Analysis (EDA). Generalization (learning). Ex: on Monday, on Sunday,? Friday Supervised vs unsupervised classification Object assignment to clusters Hard. one cluster per object. Soft. distribution P(c i x j ). Degree of membership.
21 Produced structures Hierarchical (set of clusters + relationships) Good for detailed data analysis Provides more information Less efficient No single best algorithm Flat / Non-hierarchical (set of clusters) Preferable if efficiency is required or large data sets K-means: Simple method, sufficient starting point. K-means assumes euclidean space, if is not the case, EM may be used. Cluster representative Centroid µ = 1 c x c x
22 Dendogram Hierarchical Single-link clustering of 22 frequent English words represented as a dendogram. be not he I it this the his a and but in on with for at from of to as is was
23 Hierarchical Hierarchical Bottom-up (Agglomerative ) Start with individual objects, iteratively group the most similar. Top-down (Divisive ) Start with all the objects, iteratively divide them maximizing within-group similarity.
24 Agglomerative (Bottom-up) Hierarchical Input: A set X = {x 1,..., x n } of objects A function sim: P(X) P(X) R Output: A cluster hierarchy for i:=1 to n do c i :={x i } end C:={c 1,..., c n }; j:=n + 1 while C > 1 do (c n1, c n2 ):=arg max (cu,c v ) C C sim(c u, c v ) c j = c n1 c n2 C:=C \ {c n1, c n2 } {c j } j:=j + 1 end while
25 Cluster Hierarchical Single link: of two most similar members Local coherence (close objects are in the same cluster) Elongated clusters (chaining effect) Complete link: of two least similar members Global coherence, avoids elongated clusters Better (?) clusters UPGMA: Unweighted Pair Group Method with Arithmetic Mean 1 D(x, y) X Y x X y Y Average pairwise similarity between members Trade-off between global coherence and efficiency
26 Examples Hierarchical A cloud of points in a plane Single-link clustering Intermediate clustering Complete-link clustering
27 Divisive (Top-down) Hierarchical Input: A set X = {x 1,..., x n } of objects A function coh: P(X) R A function split: P(X) P(X) P(X) Output: A cluster hierarchy C:={X}; c 1 :=X; j:=1 while c i C s.t. c i > 1 do c u :=arg min cv C coh(c v ) (c j+1, c j+2 ) = split(c u ) C:=C \ {c u } {c j+1, c j+2 } j:=j + 2 end while
28 Top-down clustering Hierarchical Cluster splitting: Finding two sub-clusters Split clusters with lower coherence: Single-link, Complete-link, Group-average Splitting is a sub-clustering task: Non-hierarchical clustering Bottom-up clustering Example: Distributional noun clustering (Pereira et al., 93) nouns with similar verb probability distributions KL divergence as distance between distributions D(p q) = p(x) log p(x) q(x) x X Bottom-up clustering not applicable due to some q(x) = 0
29 Non-hierarchical clustering Nonhierarchical Start with a partition based on random seeds Iteratively refine partition by means of reallocating objects Stop when cluster quality doesn t improve further group-average similarity mutual information between adjacent clusters likelihood of data given cluster model Number of desired clusters? Testing different values Minimum Description Length: the goodness function includes information about the number of clusters
30 K-means Nonhierarchical Clusters are represented by centers of mass (centroids) or a prototypical member (medoid) Euclidean distance Sensitive to outliers Hard clustering O(n)
31 K-means algorithm Nonhierarchical Input: A set X = {x 1,..., x n } R m A distance measure d : R m R m R A function for computing the mean µ : P(R) R m Output: A partition of X in clusters Select k initial centers f 1,..., f k while stopping criterion is not true do for all clusters c j do c j :={x i f l d(x i, f j ) d(x i, f l )} for all means f j do f j :=µ(c j ) end while
32 K-means example Nonhierarchical Assignment Recomputation of means
33 EM algorithm Nonhierarchical Estimate the (hidden) parameters of a model given the data Estimation Maximization deadlock Estimation: If we knew the parameters, we could compute the expected values of the hidden structure of the model. Maximization: If we knew the expected values of the hidden structure of the model, we could compute the MLE of the parameters. NLP applications Forward-Backward algorithm (Baum-Welch reestimation). Inside-Outside algorithm. Unsupervised WSD
34 EM example Nonhierarchical Can be seen as a soft version of K-means Random initial centroids Soft assignments Recompute (averaged) centroids C1 C2 C1 C1 C2 C2 Initial state After iteration 1 After iteration 2 An example of using the EM algorithm for soft clustering
35 evaluation Evaluation Related to a reference clustering: Purity and Inverse Purity. P = 1 D max c x IP = 1 D c x x max c x c Where: c = obtained clusters x = expected clusters Without reference clustering: Cluster quality measures: Coherence, average internal distance, average external distance, etc.
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationCLUSTERING. Quiz information. today 11/14/13% ! The second midterm quiz is on Thursday (11/21) ! In-class (75 minutes!)
CLUSTERING Quiz information! The second midterm quiz is on Thursday (11/21)! In-class (75 minutes!)! Allowed one two-sided (8.5x11) cheat sheet! Solutions for optional problems to HW5 posted today 1% Quiz
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationCS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University
CS490W Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti] Clustering Document clustering Motivations Document
More information5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction
Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationFlat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017
Flat Clustering Slides are mostly from Hinrich Schütze March 7, 07 / 79 Overview Recap Clustering: Introduction 3 Clustering in IR 4 K-means 5 Evaluation 6 How many clusters? / 79 Outline Recap Clustering:
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationIntroduction to Clustering
Introduction to Clustering Ref: Chengkai Li, Department of Computer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) What is Cluster Analysis? Finding groups of
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83
More informationAdministrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES
Administrative Machine learning: Unsupervised learning" Assignment 5 out soon David Kauchak cs311 Spring 2013 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Machine
More informationCSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)
CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for
More informationUnsupervised Learning I: K-Means Clustering
Unsupervised Learning I: K-Means Clustering Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp. 487-515, 532-541, 546-552 (http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf)
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 16: Flat Clustering Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2009.06.16 1/ 64 Overview
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationCOMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE
COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationClustering Tips and Tricks in 45 minutes (maybe more :)
Clustering Tips and Tricks in 45 minutes (maybe more :) Olfa Nasraoui, University of Louisville Tutorial for the Data Science for Social Good Fellowship 2015 cohort @DSSG2015@University of Chicago https://www.researchgate.net/profile/olfa_nasraoui
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationVector Space Models: Theory and Applications
Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationCluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University
Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical
More informationCluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.
Cluster analysis formalism, algorithms Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz poutline motivation why clustering? applications, clustering as
More informationUnsupervised Learning
Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from
More informationPV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211
PV: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv IIR 6: Flat Clustering Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University, Brno Center
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationLexical Semantics. Regina Barzilay MIT. October, 5766
Lexical Semantics Regina Barzilay MIT October, 5766 Last Time: Vector-Based Similarity Measures man woman grape orange apple n Euclidian: x, y = x y = i=1 ( x i y i ) 2 n x y x i y i i=1 Cosine: cos( x,
More informationMachine learning - HT Clustering
Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationClustering CE-324: Modern Information Retrieval Sharif University of Technology
Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch. 16 What
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationWhat is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology
Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Text Clustering Prof. Chris Clifton 19 October 2018 Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti Document clustering Motivations Document
More informationAn Unsupervised Technique for Statistical Data Analysis Using Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationMEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY
MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationAutomatic Construction of WordNets by Using Machine Translation and Language Modeling
Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationClustering & Bootstrapping
Clustering & Bootstrapping Jelena Prokić University of Groningen The Netherlands March 25, 2009 Groningen Overview What is clustering? Various clustering algorithms Bootstrapping Application in dialectometry
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationExpectation Maximization!
Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationOverview of Clustering
based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that
More informationClustering Results. Result List Example. Clustering Results. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationChapter 9. Classification and Clustering
Chapter 9 Classification and Clustering Classification and Clustering Classification and clustering are classical pattern recognition and machine learning problems Classification, also referred to as categorization
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationHierarchical Clustering
Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationHierarchical Graph Clustering: Quality Metrics & Algorithms
Hierarchical Graph Clustering: Quality Metrics & Algorithms Thomas Bonald Joint work with Bertrand Charpentier, Alexis Galland & Alexandre Hollocou LTCI Data Science seminar March 2019 Motivation Clustering
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationUnsupervised Learning and Data Mining
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Supervised Learning ó Decision trees ó Artificial neural nets ó K-nearest neighbor ó Support vectors ó Linear regression
More informationCS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017
CS 2750: Machine Learning Clustering Prof. Adriana Kovashka University of Pittsburgh January 17, 2017 What is clustering? Grouping items that belong together (i.e. have similar features) Unsupervised:
More informationMEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI
MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,
More information4. Ad-hoc I: Hierarchical clustering
4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationClustering. Bruno Martins. 1 st Semester 2012/2013
Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationAutomatic Data Analysis in Visual Analytics Selected Methods
Automatic Data Analysis in Visual Analytics Selected Methods Multimedia Information Systems 2 VU (SS 2015, 707.025) Vedran Sabol Know-Center March 15 th, 2016 2 Lecture Overview Visual Analytics Overview
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More information