DATA CLUSTERING SATU VIRTANEN. T Seminar on String Algorithms
|
|
- Sara Hines
- 5 years ago
- Views:
Transcription
1 DATA CLUSTERING SATU VIRTANEN T Seminar on String Algorithms
2 OUTLINE Introduction General clustering methods Clustering in metric spaces Clustering string data Clustering in graphs Concluding remarks
3 CLUSTERING Practical applications of information processing involve massive data sets. Only a small fraction of the data contains semantically interesting information. Clustering = process of organizing properly represented data into meaningful groups, possibly ignoring noise, by interpreting which data points are in some sense connected.
4 A COMMON ABSTRACTION P = a set of n data points represented as d-dimensional coordinate vectors p = (p 1, p 2,..., p d ) P. A cluster C P is a set of points that are sufficiently close to each other w.r.t. some proximity measure.
5 SOME QUESTIONS TO BE SETTLED How close is close enough? Does such a threshold need to be fixed beforehand? How many clusters will emerge? Is the number of clusters fixed? How many points are needed to form a cluster? Will the clustering relation be symmetric?
6 THERE ARE NO CORRECT ANSWERS! One can define strict rules for what is a proper cluster and what is not these are bound to be application-specific. Issues worth considering [BYCHN03]: natural justification of the definition of a cluster computational complexity of determining the clusters
7 WHAT ALL GETS CLUSTERED? Most clustering algorithms will produce clusters regardless of the data even for uniformly random data [JMF99]. Outliers = noise or erroneous data to be left outside of all clusters; recognizing these is often quite error prone. For example, large holes in the ozone layer went unnoticed for a while as they were classified as outliers [BYCHN03].
8 METHODS FOR FINDING k CLUSTERS Some algorithms require the number of clusters k as a parameter. The user must either have some a priori information on the suitable number of clusters or needs iterate the algorithm times for different numbers of clusters to find the most convincing clustering.
9 THE k-means CLUSTERING ALGORITHM Universe X in which n data points p P are located. A set K X of k points are chosen as cluster centers. Each p P is assigned to some c K. This is iterated l times to minimize total distance of the points to the assigned centers. Complexity: O(n k l)-time and O(n + k)-space.
10 THRESHOLD ALGORITHMS Many algorithms require as a parameter a value that determines the boundaries of the clusters. The threshold determines how close (in the sense of the defined proximity measure) two data points need to be in order to be classified into the same cluster. It may be absolute or relative. Different values of the threshold yield different clusterings.
11
12
13 WHY THRESHOLDS? In practice, an ideal algorithm would not require such a threshold but would rather interpret it dynamically depending on the input data. In some application areas, naturally defined threshold values exist and hence clustering algorithms that need them are also justified.
14 APPLICATIONS image processing: recognition of e.g. hand-written characters image segmentation: reducing noise in images genome comparison: recognition of e.g. proteins data mining: extracting useful information from massive databases
15 THE GENERAL APPROACH Usually a clustering algorithm produces clusters for all of the data points simultaneously, iteratively moving the clusters around until some fitness function is satisfied. If the data points are clustered sequentially such that the cluster of one will be completely decided before another is considered, the algorithm is called incremental.
16 CLASSIFICATION OF THE METHODS Based on the output of the algorithm; if it is just one absolute clustering partitional clustering a hierarchy of possible clusterings hierarchical clustering
17 PARTITIONAL METHODS Produces a single collection C of clusters C P such that C i=1 C i = P. If C i, C j C, i j, C i C j =, the clustering is crisp, and if C i, C j C, i j, s.t. C i C j, the clustering is fuzzy. [BYCHN03]
18 AN EXAMPLE: SQUARED ERROR APPROACH Calculate the centroid c i of each C i C, and then measure the total squared error of the data points p = (p 1, p 2,..., p d ) P w.r.t. to their centroid c i. E = C i C p j C i d ( ci (k) p j (k) ) 2 k=1 Choose the partitioning C that minimizes E. Some limits on the set of feasible partitions need to be imposed, as assigning each data point to its own cluster yields E = 0.
19 HIERARCHICAL CLUSTERING Methods that produce several different levels of clustering going from rather coarse clustering to an excessively fine clustering. It depends on the application whether an absolute definition of a good cluster is at all possible. If not, the user must to choose between the possible clusterings provided by a hierarchical method.
20 DENDROGRAMS
21 CLUSTERING IN METRIC SPACES There exist algorithms that produce a clustering based on a distance matrix only. This usually means that the data points are on a metric space, which consists of a universe X together with a distance function d (u, v) 0 that is a metric. For a metric d defined in X, the following applies u, v, w X : it is symmetric: d (u, v) = d (v, u), it has zero reflexive distance: d (v, v) = 0, and the triangle inequality holds: d (u, v) + d (v, w) d (u, w).
22 CLUSTERING IN METRIC SPACES Usually only a finite subset U X is considered. A clustering is now a partition (or a cover) of U and P is partitioned (or covered) accordingly. The partition is aimed to be such that the distance between points in the same cluster is as small as feasible and the distance between points in different clusters is as large as possible. For practical applications, the distance function d should be light to compute especially for large U.
23 EXAMPLES OF METRIC CLUSTERINGS When a feasible metric d has been defined on X, the clusters of P may be formed by selecting the k nearest points of each data point to be its cluster. This is very likely to produce a fuzzy clustering and the value of k should be reasoned out by the user. Another clustering results in connecting iteratively the points nearest to each other into the same initial cluster until all points have been connected to at least one other point This produces a crisp clustering, possibly with just one cluster.
24 EUCLIDEAN DISTANCE A common choice for a metric in continuous spaces. For two d-dimensional vectors a = (a 1, a 2,..., a d ) and b = (b 1, b 2,..., b d ), this is d (a i b i ) 2. i=1
25 A NON-METRICAL DISTANCE-BASED METHOD The mutual neighbor distance method by Gowda and Krishna (see [JMF99] and the references therein): For each p i P, n = P, label the other n 1 points assigning label 1 to the nearest point, 2 to the second nearest, and finally n 1 to the farthest. Denote the matrix of these labels by N, such that N ij is the label assigned to p j by p i, then M(p i, p j ) = N ij + N ji. The triangle equality is not satisfied and hence M is not a metric. Note that a threshold value needs to be defined.
26 CLUSTERING STRING DATA No intuitive definition of distance is readily available; no dimensional information or obvious coordinate vector is attached to the data define transformations that introduce a proximity measure. Hamming distance Levenshtein or edit distance
27 EDIT DISTANCE Defined for two sequences A = (a 1, a 2,..., a s ) to sequence B = (b 1, b 2,..., b t ) as the smallest number of substitutions, insertions, and deletions required to transform A into B. This is a metric and can be calculated by dynamic programming in O(s t) time. Works well in associating a mistyped word to the intended word.
28 WEAKNESSES OF THE EDIT DISTANCE Pronunciation may be significant in determining whether two words are similar. Other meaesures are needed for languages with complicated stemming and conflation rules. All of the following words have the same stem talo: talot talossa talosta taloon A stemmer that aims to take into account the structural properties of words is called morphological and can be based on the holomorphic distance, which is in turn based on feature extraction on subsequences of the original string.
29 EXAMPLE: HOLOMORPHIC VS. EDIT DISTANCE The 10 nearest neighbors in a dictionary for the Spanish word apetitosa (engl. tasty, appetizing, or tempting). Holom. dist. Translation Edit distance Translation apetitosa tasty apetitosa tasty apetito appetite apetitoso tasty apetitoso tasty aceitosa oily apetite appetite apestosa sickening apetitiva appetizing apetitiva appetizing apetitivo appetizing apetito appetite apetible tempting aceitoso oily apetecer crave acetoso sour apetecedor tempting alentosa reassuring apetencia hunger (for) aparatosa pretentious
30 APPLICATIONS OF EDIT DISTANCE Data that originally is not in textual format can be transformed into a string, such as the DNA sequences. Any binary vector or matrix can also be handled by string methods.
31 CLUSTERING BOOKS BY ACM CLASSIFICATION LABELS Jain et al. [JMF99] consider clustering books by defining the similarity w.r.t. the ACM CR classification labels. They use the ratio of the length of the longest common prefix to the length of the first string. Examples of such labels: H242, I233, and I522.
32 ACM COMPUTING CLASSIFICATION SYSTEM General Literature Computer Hardware Systems Software Data Organization Theory of Computation Mathematics of Computing Information Systems Computing Methodologies Computer Applications Computing Milieux A B C D E F G H I J K General Programming Techniques Software Engineering Programming Languages Operating Systems Miscellaneous m General Formal Definitions Language Language Constructs and Theory Classifications and Features Processors Miscellaneous m Abstract data types Classes and objects Concurrent programming structures Constraints Control structures...
33 CLUSTERING IN GRAPHS Graph G = (V, E), V = n, E = m. data points represented by the vertex set V connections between represented by the edge set E For different types of data, simple transformations into graphs exist.
34 DELAUNAY GRAPHS A transformation into a graph for a set of points on a plane (generalizable to higher dimensions). Represent each point with a vertex and placing an edge between each pair of points that are Voronoi neighbors.
35 THE VORONOI NEIGHBORHOOD RELATION Define a partitioning the plane containing the n data points into n convex polygons such that there is exactly one data point inside each of the polygons, and all other points inside the polygon are closer to the data point of that polygon than to any other data point The resulting diagram of points are polygons is called a Voronoi diagram or a Dirichlet tessellation. Two points are Voronoi neighbors if they can be connected by a straight line that only passes through their own two polygons.
36 AN EXAMPLE: A SET OF POINTS
37 THE VORONOI DIAGRAM
38 ADDING THE EDGES
39 THE RESULTING GRAPH
40 USING MINIMUM SPANNING TREES A common approach to cluster graphs is using a minimum spanning tree of the graph [JMF99]. Clusters are obtained by deleting the edges in decreasing order of their weights.
41 MST EXAMPLE
42 MST EXAMPLE
43 MST EXAMPLE
44 MST EXAMPLE
45 MST EXAMPLE
46 MST EXAMPLE
47 MST EXAMPLE
48 MST EXAMPLE
49 DENSITY-BASED METHODS For a graph G = (V, E) with n = V and m = E, density is the ratio of m to the maximum possible number of edges ( n 2) : δ = ( m n ) = 2 m n(n 1). 2 In simple terms, a cluster in a graph can be for example a surprisingly dense induced subgraph. Some authors have only considered the maximal cliques of graphs as proper clusters.
50 RELATIVE DENSITY One may also consider the relative density δ r of a subgraph with vertex set S V : δ r = {(u, v) E u, v S} {(u, v) E {u, v} S }. Mihail et al. [MGSZ02] use spectral methods to identify clusters that have a high relative density.
51 CONNECTIVITY-BASED CLUSTERING The connectivity of a graph may be characterized by the number of (disjoint) paths between pairs of vertices. For a pair of vertices to belong to the same cluster, they should be highly connected, whereas there should not be many paths connecting them to vertices outside the cluster. One may for example split the graph in two parts such that there are as few edges as possible between the parts, considering a remaining part of h vertices a proper cluster if more than h 2 edges would be needed to split it further [HS00].
52 ANOTHER CONNECTIVITY APPROACH Many clustering approaches assign weights to the edges of the graph and prune the edges with respect to the weights until the graph has decomposed into satisfactory clusters (as in the MST example). One such a weight function is edge-betweenness, which for an edge e is the number of shortest paths that contain e [NG03].
53 LOCAL ONLINE CLUSTERING Finding the cluster for only one vertex instead of all vertices, using only local information. We formulate a fitness function similar to relative density and optimize by simulated annealing [KCDGV83]. Useful for massive and partially unknown graphs such as the World Wide Web.
54 OTHER APPROACHES artificial neural networks evolutionary algorithms [JMF99]
55 In summary, clustering is an interesting, useful, and challenging problem. It has great potential in applications like object recognition, image segmentation, and information filtering and retrieval. However, it is possible to exploit this potential only after making several design choices carefully. [JMF99]
56 References [BYCHN03] Ricardo Baeza-Yates, E. Chávez, N. Herrera, and Gonzalo Navarro. Clustering in metric spaces with applications in information retrieval. In Wu and Xiong, editors, Information Retrieval and Clustering. Kluwer Academic Publishers, To appear. [HS00] [JMF99] E. Hartuv and R. Shamir. A clustering algorithm based on graph connectivity. Information Processing Letters, 76(4 6): , A. K. Jain, M. N. Murty, and P. J.Flynn. Data clustering: A review. ACM Computing Surveys, 31(3), September [KCDGV83] Scott Kirkpatrick, Jr. C. D. Gelatt, and M. P. Vecchi.
57 Optimization by simulated annealing. Science, 220(4598): , [MGSZ02] [NG03] Milena Mihail, Christos Gkantsidis, Amin Saberi, and Ellen Zegura. On the semantics of Internet topologies. Technical Report GIT-CC-02-07, College of Computing, Georgia Institute of Technology, January Mark E. J. Newman and Michelle Girvan. Mixing patterns and community structure in networks. In Proceedings of the XVIII Sitges Conference on Statistical Mechanics, Berlin, Germany, Springer-Verlag.
Unsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More information6. Concluding Remarks
[8] K. J. Supowit, The relative neighborhood graph with an application to minimum spanning trees, Tech. Rept., Department of Computer Science, University of Illinois, Urbana-Champaign, August 1980, also
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationAlgorithms for Euclidean TSP
This week, paper [2] by Arora. See the slides for figures. See also http://www.cs.princeton.edu/~arora/pubs/arorageo.ps Algorithms for Introduction This lecture is about the polynomial time approximation
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationDiscrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book
Discrete geometry Lecture 2 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 The world is continuous, but the mind is discrete
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationCS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University
CS443: Digital Imaging and Multimedia Binary Image Analysis Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines A Simple Machine Vision System Image segmentation by thresholding
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationSome Open Problems in Graph Theory and Computational Geometry
Some Open Problems in Graph Theory and Computational Geometry David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science ICS 269, January 25, 2002 Two Models of Algorithms Research
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationSimplicial Global Optimization
Simplicial Global Optimization Julius Žilinskas Vilnius University, Lithuania September, 7 http://web.vu.lt/mii/j.zilinskas Global optimization Find f = min x A f (x) and x A, f (x ) = f, where A R n.
More informationEnhancing Clustering Results In Hierarchical Approach By Mvs Measures
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.25-30 Enhancing Clustering Results In Hierarchical Approach
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationHierarchical Ordering for Approximate Similarity Ranking
Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au
More informationIntroduction to Machine Learning
Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19 Outline
More informationOverview. Efficient Simplification of Point-sampled Surfaces. Introduction. Introduction. Neighborhood. Local Surface Analysis
Overview Efficient Simplification of Pointsampled Surfaces Introduction Local surface analysis Simplification methods Error measurement Comparison PointBased Computer Graphics Mark Pauly PointBased Computer
More informationAd hoc and Sensor Networks Topology control
Ad hoc and Sensor Networks Topology control Goals of this chapter Networks can be too dense too many nodes in close (radio) vicinity This chapter looks at methods to deal with such networks by Reducing/controlling
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationAn Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem
An Optimal Algorithm for the Euclidean Bottleneck Full Steiner Tree Problem Ahmad Biniaz Anil Maheshwari Michiel Smid September 30, 2013 Abstract Let P and S be two disjoint sets of n and m points in the
More informationVoronoi Diagrams in the Plane. Chapter 5 of O Rourke text Chapter 7 and 9 of course text
Voronoi Diagrams in the Plane Chapter 5 of O Rourke text Chapter 7 and 9 of course text Voronoi Diagrams As important as convex hulls Captures the neighborhood (proximity) information of geometric objects
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationNorbert Schuff VA Medical Center and UCSF
Norbert Schuff Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics N.Schuff Course # 170.03 Slide 1/67 Objective Learn the principle segmentation techniques Understand the role
More informationData Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationA fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p.
Title A fuzzy k-modes algorithm for clustering categorical data Author(s) Huang, Z; Ng, MKP Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. 446-452 Issued Date 1999 URL http://hdl.handle.net/10722/42992
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationLecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationWolfgang Mulzer Institut für Informatik. Planar Delaunay Triangulations and Proximity Structures
Wolfgang Mulzer Institut für Informatik Planar Delaunay Triangulations and Proximity Structures Proximity Structures Given: a set P of n points in the plane proximity structure: a structure that encodes
More informationOutline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI
Other Use of Triangle Ineuality Algorithms for Nearest Neighbor Search: Lecture 2 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Outline
More informationPARALLEL CLASSIFICATION ALGORITHMS
PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationSOMSN: An Effective Self Organizing Map for Clustering of Social Networks
SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationApproximation Algorithms for Geometric Intersection Graphs
Approximation Algorithms for Geometric Intersection Graphs Subhas C. Nandy (nandysc@isical.ac.in) Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Outline
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More information[8] that this cannot happen on the projective plane (cf. also [2]) and the results of Robertson, Seymour, and Thomas [5] on linkless embeddings of gra
Apex graphs with embeddings of face-width three Bojan Mohar Department of Mathematics University of Ljubljana Jadranska 19, 61111 Ljubljana Slovenia bojan.mohar@uni-lj.si Abstract Aa apex graph is a graph
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationChapter 8. Voronoi Diagrams. 8.1 Post Oce Problem
Chapter 8 Voronoi Diagrams 8.1 Post Oce Problem Suppose there are n post oces p 1,... p n in a city. Someone who is located at a position q within the city would like to know which post oce is closest
More informationSegmentation Computer Vision Spring 2018, Lecture 27
Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have
More informationA Generalized Method to Solve Text-Based CAPTCHAs
A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationCHAPTER-6 WEB USAGE MINING USING CLUSTERING
CHAPTER-6 WEB USAGE MINING USING CLUSTERING 6.1 Related work in Clustering Technique 6.2 Quantifiable Analysis of Distance Measurement Techniques 6.3 Approaches to Formation of Clusters 6.4 Conclusion
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationUse of Multi-category Proximal SVM for Data Set Reduction
Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationClustering Using Graph Connectivity
Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the
More informationPlanar Graphs. 1 Graphs and maps. 1.1 Planarity and duality
Planar Graphs In the first half of this book, we consider mostly planar graphs and their geometric representations, mostly in the plane. We start with a survey of basic results on planar graphs. This chapter
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationUniformity and Homogeneity Based Hierachical Clustering
Uniformity and Homogeneity Based Hierachical Clustering Peter Bajcsy and Narendra Ahuja Becman Institute University of Illinois at Urbana-Champaign 45 N. Mathews Ave., Urbana, IL 181 E-mail: peter@stereo.ai.uiuc.edu
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More informationLoad Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations
Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Stefan Bischof, Ralf Ebner, and Thomas Erlebach Institut für Informatik Technische Universität München D-80290
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationStructural and Syntactic Pattern Recognition
Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationCollaborative Rough Clustering
Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationIn what follows, we will focus on Voronoi diagrams in Euclidean space. Later, we will generalize to other distance spaces.
Voronoi Diagrams 4 A city builds a set of post offices, and now needs to determine which houses will be served by which office. It would be wasteful for a postman to go out of their way to make a delivery
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More informationStatistical Methods and Optimization in Data Mining
Statistical Methods and Optimization in Data Mining Eloísa Macedo 1, Adelaide Freitas 2 1 University of Aveiro, Aveiro, Portugal; macedo@ua.pt 2 University of Aveiro, Aveiro, Portugal; adelaide@ua.pt The
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More information