Dimension Induced Clustering
|
|
- Diana Norman
- 5 years ago
- Views:
Transcription
1 Dimension Induced Clustering Aris Gionis Alexander Hinneburg Spiros Papadimitriou Panayiotis Tsaparas HIIT, University of Helsinki Martin Luther University, Halle Carnegie Melon University HIIT, University of Helsinki SIGKDD 2005 Page 1 Data Objects Introduction Vector Space Motivation Mapping into high-dimensional Feature Space High-dimensional Feature Space Observation: data points cannot fill the space => Data lie on one or more low-dimensional manifolds Real data exhibit patterns and regularities SIGKDD 2005 Page 2
2 The red points form a 1d manifold in the 2d space. Example A low dimensional manifold must contain sufficient number of points that are densely packed density-based methods? SIGKDD 2005 Page 3 Dimension Induced Clustering How to separate river and lake River and lake have same density Both are spatially connected But they differ in dimensionality Density is still necessary for separating lake from surroundings SIGKDD 2005 Page 4
3 Dimension Induced Clustering Problem - Given a set of data objects with a distance function - Find dense subsets of objects with similar dimensionality SIGKDD 2005 Page 5 Indexing Other Applications efficient approximation of nearest neighbor for metric data, assumes bounded intrinsic dimensionality [Krauthgammer & Lee, ICALP 2004] Mixture Models of PCA needs average dimensionality as parameter [Aggarwal & Yu, SIGMOD 2002], [P. Agarwal et al, PODS 2004] SIGKDD 2005 Page 6
4 What is dimension? Approach using representation dimension is the number of coordinates decompose data space into set of linear sub-spaces densely filled with points Drawbacks assumes vector spaces only linear manifolds SIGKDD 2005 Page 7 What is dimension? Approach using relative distances use distances between objects only extend notion of fractal dimension Advantages complex curved manifolds applicable to metric spaces SIGKDD 2005 Page 8
5 Fractal Dimension Correlation Dimension Set of objects X, X =n Distance function Points in the ball of radius r around x Correlation Integral Correlation Dimension SIGKDD 2005 Page 9 Fractal Dimension Intuition behind definition d corr = 1 d corr = 2 SIGKDD 2005 Page 10
6 Fractal Dimension In real life, datasets are finite C () r = 1 n x X ( x, r) Calculation of correlation dimension: fit a line on the log-log plot of C(r) B n SIGKDD 2005 Page 11 Fractal Dimension 1D Data d= 2D Data SIGKDD 2005 Page 12
7 Fractal Dimension What if the data is non-homogeneous? SIGKDD 2005 Page 13 Local Fractal Dimension However, looking at A and B individually SIGKDD 2005 Page 14
8 Local Fractal Dimension Local Growth Curve Local Correlation Dimension For finite data G x () r = B ( x, r ) n d x is estimated by fitting a line 1 SIGKDD 2005 Page 15 Local Fractal Dimension Linear Growth Model of an object x i SIGKDD 2005 Page 16
9 Linear Growth Model d i : rate of growth of log G x (r) dimensionality b i : value of L x (log r) at radius 1-- density L x (log r*) : density at radius r* The model can be summarized with two values: d i and c i = L x (log r*) how do we select r*? SIGKDD 2005 Page 17 Selecting r* Idea: choose r* such that c i s and d i s are un-correlated SIGKDD 2005 Page 18
10 Local Representation Local Representation of point x i Captures the view of the world for each point SIGKDD 2005 Page 19 The fitting interval The linear growth model is defined over a subset of the neighbors of x Clipping from above Clipping from below SIGKDD 2005 Page 20
11 Algorithm SIGKDD 2005 Page 21 Experiments Detection of m-flats in high-dimensional space (a) 2d flat in 3d space Classification error = 8.1% (a) 40d flat in 50d space Classification error = 1.2% SIGKDD 2005 Page 22
12 Comparison to Optics Optics: density based hierarchical clustering SIGKDD 2005 Page 23 Low Rank Sub-Matrix Combinatorial low-rank sub-matrix in a random Matrix Apply DIC to set of columns and set of rows Final Clusters are the Cartesian product of row and column clusters Rank 2 SIGKDD 2005 Page 24
13 Experiments Gene Expression Data (gene clustering) Yeast data from George Church, Harvard SIGKDD 2005 Page 25 Experiments Gene Expression Data (gene clustering) Neither density nor dimensionality alone can detect the structure SIGKDD 2005 Page 26
14 Conclusion Find subsets with low fractal dimensionality Local Representation local fractal dimensionality local density Visualization of the cluster structure SIGKDD 2005 Page 27
Dimension Induced Clustering
Dimension Induced Clustering Aristides Gionis Basic Research Unit, HIIT University of Helsinki, Finland Spiros Papadimitriou Department of Computer Science Carnegie Mellon University, USA Alexander Hinneburg
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationOn Exploring Complex Relationships of Correlation Clusters
In Proc. 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007), Banff, Canada, 2007 On Exploring Complex Relationships of Correlation Clusters Elke Achtert, Christian
More informationParametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego
Parametrizing the easiness of machine learning problems Sanjoy Dasgupta, UC San Diego Outline Linear separators Mixture models Nonparametric clustering Nonparametric classification and regression Nearest
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationThe Ball-Pivoting Algorithm for Surface Reconstruction
The Ball-Pivoting Algorithm for Surface Reconstruction 1. Briefly summarize the paper s contributions. Does it address a new problem? Does it present a new approach? Does it show new types of results?
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationNature Methods: doi: /nmeth Supplementary Figure 1
Supplementary Figure 1 Schematic representation of the Workflow window in Perseus All data matrices uploaded in the running session of Perseus and all processing steps are displayed in the order of execution.
More informationMath 734 Aug 22, Differential Geometry Fall 2002, USC
Math 734 Aug 22, 2002 1 Differential Geometry Fall 2002, USC Lecture Notes 1 1 Topological Manifolds The basic objects of study in this class are manifolds. Roughly speaking, these are objects which locally
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationLatent Variable Models for Structured Prediction and Content-Based Retrieval
Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio
More informationOUTLIER MINING IN HIGH DIMENSIONAL DATASETS
OUTLIER MINING IN HIGH DIMENSIONAL DATASETS DATA MINING DISCUSSION GROUP OUTLINE MOTIVATION OUTLIERS IN MULTIVARIATE DATA OUTLIERS IN HIGH DIMENSIONAL DATA Distribution-based Distance-based NN-based Density-based
More informationClustering Lecture 4: Density-based Methods
Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationSparse Manifold Clustering and Embedding
Sparse Manifold Clustering and Embedding Ehsan Elhamifar Center for Imaging Science Johns Hopkins University ehsan@cis.jhu.edu René Vidal Center for Imaging Science Johns Hopkins University rvidal@cis.jhu.edu
More informationClustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo
Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationOn Order-Constrained Transitive Distance
On Order-Constrained Transitive Distance Author: Zhiding Yu and B. V. K. Vijaya Kumar, et al. Dept of Electrical and Computer Engineering Carnegie Mellon University 1 Clustering Problem Important Issues:
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationPart I. Graphical exploratory data analysis. Graphical summaries of data. Graphical summaries of data
Week 3 Based in part on slides from textbook, slides of Susan Holmes Part I Graphical exploratory data analysis October 10, 2012 1 / 1 2 / 1 Graphical summaries of data Graphical summaries of data Exploratory
More informationAdvanced visualization techniques for Self-Organizing Maps with graph-based methods
Advanced visualization techniques for Self-Organizing Maps with graph-based methods Georg Pölzlbauer 1, Andreas Rauber 1, and Michael Dittenbach 2 1 Department of Software Technology Vienna University
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationDBSCAN. Presented by: Garrett Poppe
DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources
More informationChapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102
More informationGraph projection techniques for Self-Organizing Maps
Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationOPTICS-OF: Identifying Local Outliers
Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 99), Prague, September 1999. OPTICS-OF: Identifying Local Outliers Markus M. Breunig, Hans-Peter
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber &
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationCSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!
More informationTopological estimation using witness complexes. Vin de Silva, Stanford University
Topological estimation using witness complexes, Acknowledgements Gunnar Carlsson (Mathematics, Stanford) principal collaborator Afra Zomorodian (CS/Robotics, Stanford) persistent homology software Josh
More informationLecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationEstimation of Intrinsic Dimension via Clustering
Estimation of Intrinsic Dimension via Clustering Brian Eriksson Department of Computer Science Boston University eriksson@cs.bu.edu Mark Crovella Department of Computer Science Boston University crovella@bu.edu
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationGeneralized Transitive Distance with Minimum Spanning Random Forest
Generalized Transitive Distance with Minimum Spanning Random Forest Author: Zhiding Yu and B. V. K. Vijaya Kumar, et al. Dept of Electrical and Computer Engineering Carnegie Mellon University 1 Clustering
More informationTopology - I. Michael Shulman WOMP 2004
Topology - I Michael Shulman WOMP 2004 1 Topological Spaces There are many different ways to define a topological space; the most common one is as follows: Definition 1.1 A topological space (often just
More informationObserving Information: Applied Computational Topology.
Observing Information: Applied Computational Topology. Bangor University, and NUI Galway April 21, 2008 What is the geometric information that can be gleaned from a data cloud? Some ideas either already
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationStatistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.
Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or
More informationDistance-based Methods: Drawbacks
Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationJeff Howbert Introduction to Machine Learning Winter
Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d
More informationIndian Institute of Technology Kanpur. Visuomotor Learning Using Image Manifolds: ST-GK Problem
Indian Institute of Technology Kanpur Introduction to Cognitive Science SE367A Visuomotor Learning Using Image Manifolds: ST-GK Problem Author: Anurag Misra Department of Computer Science and Engineering
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationData Mining Algorithms
for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationLesson 5 overview. Concepts. Interpolators. Assessing accuracy Exercise 5
Interpolation Tools Lesson 5 overview Concepts Sampling methods Creating continuous surfaces Interpolation Density surfaces in GIS Interpolators IDW, Spline,Trend, Kriging,Natural neighbors TopoToRaster
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationExample for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)
Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows) Average clustering coefficient of a graph Overall measure
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More information1 Point Set Topology. 1.1 Topological Spaces. CS 468: Computational Topology Point Set Topology Fall 2002
Point set topology is something that every analyst should know something about, but it s easy to get carried away and do too much it s like candy! Ron Getoor (UCSD), 1997 (quoted by Jason Lee) 1 Point
More informationVisualization and text mining of patent and non-patent data
of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent
More informationNonparametric Importance Sampling for Big Data
Nonparametric Importance Sampling for Big Data Abigael C. Nachtsheim Research Training Group Spring 2018 Advisor: Dr. Stufken SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Motivation Goal: build a model
More informationHow do microarrays work
Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationCPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017
CPSC 340: Machine Learning and Data Mining Finding Similar Items Fall 2017 Assignment 1 is due tonight. Admin 1 late day to hand in Monday, 2 late days for Wednesday. Assignment 2 will be up soon. Start
More informationTowards a hybrid approach to Netflix Challenge
Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the
More informationIntroduction to Functions of Several Variables
Introduction to Functions of Several Variables Philippe B. Laval KSU Today Philippe B. Laval (KSU) Functions of Several Variables Today 1 / 20 Introduction In this section, we extend the definition of
More informationFaster Clustering with DBSCAN
Faster Clustering with DBSCAN Marzena Kryszkiewicz and Lukasz Skonieczny Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract. Grouping data
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2016 Admin Assignment 1 solutions will be posted after class. Assignment 2 is out: Due next Friday, but start early! Calculus and linear
More informationOBE: Outlier by Example
OBE: Outlier by Example Cui Zhu 1, Hiroyuki Kitagawa 2, Spiros Papadimitriou 3, and Christos Faloutsos 3 1 Graduate School of Systems and Information Engineering, University of Tsukuba 2 Institute of Information
More informationSelection of Scale-Invariant Parts for Object Class Recognition
Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationContents. 2.5 DBSCAN Pair Auto-Correlation Saving your results References... 18
Contents 1 Getting Started... 2 1.1 Launching the program:... 2 1.2 Loading and viewing your data... 2 1.3 Setting parameters... 5 1.4 Cropping points... 6 1.5 Selecting regions of interest... 7 2 Analysis...
More informationDENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE
DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering
More informationSimilarity measures in clustering time series data. Paula Silvonen
Similarity measures in clustering time series data Paula Silvonen paula.silvonen@vtt.fi Introduction Clustering: determine the similarity or distance between profiles group the expression profiles according
More informationHeterogeneous Density Based Spatial Clustering of Application with Noise
210 Heterogeneous Density Based Spatial Clustering of Application with Noise J. Hencil Peter and A.Antonysamy, Research Scholar St. Xavier s College, Palayamkottai Tamil Nadu, India Principal St. Xavier
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationMore about liquid association
More about liquid association Liquid Association (LA) LA is a generalized notion of association for describing certain kind of ternary relationship between variables in a system. (Li 2002 PNAS) low (-)
More informationTopological Classification of Data Sets without an Explicit Metric
Topological Classification of Data Sets without an Explicit Metric Tim Harrington, Andrew Tausz and Guillaume Troianowski December 10, 2008 A contemporary problem in data analysis is understanding the
More informationMultidimensional Indexing The R Tree
Multidimensional Indexing The R Tree Module 7, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. When we create
More informationDensity-preserving projections for large-scale local anomaly detection
Knowl Inf Syst DOI 10.1007/s10115-011-0430-4 REGULAR PAPER Density-preserving projections for large-scale local anomaly detection Timothy de Vries Sanjay Chawla Michael E. Houle Received: 11 January 2011
More informationDetecting Novel Associations in Large Data Sets
Detecting Novel Associations in Large Data Sets J. Hjelmborg Department of Biostatistics 5. februar 2013 Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar 2013 1 / 22 Overview
More informationData Stream Clustering Using Micro Clusters
Data Stream Clustering Using Micro Clusters Ms. Jyoti.S.Pawar 1, Prof. N. M.Shahane. 2 1 PG student, Department of Computer Engineering K. K. W. I. E. E. R., Nashik Maharashtra, India 2 Assistant Professor
More informationThis research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight
This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.
More informationHigh Information Rate and Efficient Color Barcode Decoding
High Information Rate and Efficient Color Barcode Decoding Homayoun Bagherinia and Roberto Manduchi University of California, Santa Cruz, Santa Cruz, CA 95064, USA {hbagheri,manduchi}@soe.ucsc.edu http://www.ucsc.edu
More information