Visualization with Data Clustering DIVA Seminar Winter 2006 University of Fribourg

Size: px
Start display at page:

Download "Visualization with Data Clustering DIVA Seminar Winter 2006 University of Fribourg"

Transcription

1 Visualization with Data Clustering DIVA Seminar Winter 2006 University of Fribourg Terreaux Patrick February 15, 2007 Abstract In this paper, several visualizations using data clustering are presented. The first part explains the various kind of clustering methods used to organize data, and we will see that two most used are hierarchical clustering and localitybased clustering. The next parts present seven types of visualization like Hierarchical Parallel Coordinates, Dendrogram, H-BLOB, Mountains and hills and others. Keywords: clustering, visualisation, hierarchical, partitioning, localy-based, tree, parallel coordinates, mountains and hills 1 Introduction The arrival of computers facilitated the acquisition and the processing of data. After several years of use and with the help of Internet, a lot of data has been collected. The analysis of these data can become very difficult due to their quantity. 2 Data clustering methods The clusterization of data is the method used to analyse large quantitiy of data. Each characteristic can be represented by a vector. The main principe is to group near object together, based on euclidian or other distance between each object. There exists three categories of clustering: partitioning, hierarchical and localy-based. The following parts explain theses three categories. The four first visualization use hierarchical clustering, and the two following use localy-based clustering. 2.1 Partioning The goal of partitioning is to subdivide a set of data objects into a certain number of clusters. The two more known are C-Means and K-Means. C-Means The principe of the c-means is to create several cluster containing objects having a similar threshold value of δ. The resulting number of clusters cannot be known in advance. Let us take for example of cereals. Today there is a very large quantity of cereals, with various characteristics, adapted for children, regime and others. The analysis of all these data must be done by using an interactive visualization to have a better understanding. This visualization is limited to draw a simple shape for each object. With a large scale of data, it is not possible today to draw all data in the same time. The solution consists to have a level of details. Similar objects are grouped together and the level of details indirectly defines the numbers of groups. To resolve that, the clusterization of data enters in action. The first step is to calculate the cluterization of data. The second is to define a space to each cluster. Figure 1: a) The construction of clusters. Object x is in 2 clusters and this is not possible. b) Final result The disadvantage with this method is that the user must define the threshold value δ. K-Means The principe of k-means is to create k clusters containing all objects. The similarity between objects into a same cluster must be maximal and minimal with others clusters. 1

2 Figure 2: a) The same scene as in figure 1. b) K = 3 The big disadvantage of partitioning methods is that the result is static and thus the level of details is unchangeable. If we want to change the threshold δ or the k value, all must be recalculated and this is not feasible for a large quantity of data. 2.2 Hierarchical The principe is to group at each step similar object together. All clusters obtain a weight corresponding to the number of objects contained into. A tree is represented at the end. Figure 4: objects is the range of value. The goal of the visualization developped at the Polytechnic Institute of Worcester[2], is to extend parallel coordinates to support large scale of data. 3.2 Interaction Each node in the hierarchical cluster tree contains the number of enclosed objects, the mean of enclosed objects and the bounds min and max. Node are now represented with a line of the mean values using a variable-width opacity bands defining the bounds, and are represented with different colors. At the lower level of details, there is as much line than objects. At the higher, there is only one line, the mean value, with its width opacity bands. Figure 3: a) Arrangement of 8 objects. b) Hierarchical representation (tree) 3.3 Visualization The big advantage of this data structure is the facility to naviguate in the tree. This is why this is the most used. 2.3 Localy-based This method groups neighboring data elements into cluster based on local conditions, like attraction/repulsion. More explanation are available on the figure 11 on page 4. 3 Hierarchical Parallel Coordinates 3.1 About Usually, the complexity of the data isn t represented in the view. The parallel coordinates allows to show each characteristic with a new axis. However, with a large quantity of data, some axis can become illegible, and the only thing that become readable Figure 5: Variation of the level of details 2

3 3.4 Personal opinion The hierarchical clustering provides a multiresolution view of the data and aid in revealing data trends at different degrees of summarization. So, when the parallel coordinates is the appropriate view for an application, the quantity of data is not any more a problem. 4 Dendrogram 4.1 About A dendrogram is a binary tree diagram used to illustrate the arrangement of the clusters produced by a clustering algorithm. Developped by the University of Maryland[3], this view allows to see the entire hierarchy. The clustering is done with objects and with characteristics. 4.2 Interaction Through a lot of functionalities like pattern recognition, the main interaction is done by choosing the level of details with a horizontal threshold, defining the threshold value. By going down the threshold, the desired number of clusters is chosen. After choosing a specific cluster, another hierarchical view is shown where the clusterization is based on characteristics. 4.4 Personal opinion This view represents the real structure of the hierarchical clustering, a tree. The interaction is very intuitive and easy to understand. 5 Hierarchical tree-map 5.1 About Developed by the Bell laboratory[6], this visualization draws all objects in a 2D space. All data is based on a hierarchical clustering. The visual method used, a space-filling recursive division of rectangular area, keep items always at the same screen position. 5.2 Interaction Let us take this example: 4.3 Visualization Figure 7: The number under nodes indicates the dissimilarity value The tree-map has no concept of tree dissimilarity. It assumes that the user wants to cut the tree at given depths. Its alternating horizontal and vertical splits is good at showing the tree depth, but has a tendency to create a very long skinny rectangles with unbalanced trees. The view is focuses attention on the tree depth, not on clusters. The solution is to define several dissimilarities values: Figure 6: a) top level, 1 cluster. b) 5 clusters and below the clusterization of characteristics. Figure 8: Tree-map(left) compared with the Hierarchical Clustering View(right). Tree-map is cut at depths 2 and 3. The clustering view is cut at dissimilarities 5.0 and 3.5 3

4 5.3 Visualization Figure 9: Various dissimilarity values. 5.4 Personal opinion This is a good adaptation of the tree-map. The focus is now on cluster and not on the depth of the tree. The hierarchical clustering allows to interact with the level of dissimilarity. 6 H-BLOB 6.1 About Meaning Hierarchical Binary Large Object, this visualization was created at ETH Zurich[1]. Based on hierarchical and localy-based clustering, this is a 3D visualization where each object is represented by a elipse. 6.2 Interaction By using a localy-based clustering, each object are always in the same screen position. By using a hierarchical clustering, we have the possibility to choose the level of details. At the lower level of details, one circle represents contains all object. At the higher, each object is represented by one circle. By grouping near object, circles increase and shapes like oil on a frying pan appear. Figure 10: a) 1 cluster. b) 5 clusters. c) 10 clusters. d) 20 clusters. 6.4 Personal opinion This is a great visualization. The color and the thickness are very good for the understanding the different cluster. I juste see one disadvantage: clusters can be hidden if there is one in the middle of another. 7 Montains and hills 7.1 About The principe of the visualization developed at the Sandia National Laboratory [4] with its software VxInsight is to represent data like mountains and valleys. Objects are stored in a database and are first disposed on a 2D plane. Each object increases the height of its region. The mountains s height depends on the number of date below. A graph is used, where each node is an object and the link defines the similarity value. Objects without similarity have a link of 0.0 and two identical have a link of 1.0. Then, objects are placed in a 2D plane using the principe of attraction/repulsion Visualization Figure 11: Data processing 4

5 7.2 Interaction The interaction is done by zooming in and out. At the lower level of details, there is only one mountain. At the biggest, each object represents a mountain. 7.3 Visualization 8 HD-Eye 8.1 About This last visualization is near than mountains and hills. This one is developed at the University of Halle in Germany[5]. 8.2 Interaction The interaction is done by choosing the threshold of the most present clusters. 8.3 Visualization Figure 12: Global view of mountains Figure 14: Global of mountains with different threshold values 8.4 Personal opinion This last visualization presents another principe of interaction with mountains. Besides, it is similar. Figure 13: Global view with mountains and objects 7.4 Personal opinion This is the first visualization in this paper that doesn t use a hierarchical clustering. After trying the software VxInsight, I throught that the interaction is very userfriendly. 9 Conclusion In this paper we have seen that the clustering method used is specific to each visualization. Some like hierarchical parallel coordinates are adapted for a large scale of data. Some others like moutains and hills are specifically created for large scale of data. 5

6 The choice of one visualization depends of what kind of data you have to analyze. With the powerfull of the next computers, some new visualizations are going to appear. References [1] T. C. Sprenger, R. Brunella, M. H. Gross. H-BLOB: A Hierarchical Visual Clustering Method Using Implicit Surfaces, Proceedings of the 11th IEEE Visualization 2000 Conference (VIS 2000). [2] Ying-Huey Fua, Matthew O. Ward, Elke A. Rundensteiner. Hierarchical parallel coordinates for exploration of large datasets, Proceedings of the conference on Visualization 99, pp , [3] Jinwook Seo, Ben Shneiderman, Interactively Exploring Hierarchical Clustering Results, Computer, vol. 35, no. 7, pp , July, [4] George S. Davidson, Brian N. Wylie, Kevin W. Boyack, Cluster Stability and the Use of Noise in Interpretation of Clustering, infovis, p. 23, [5] Alexander Hinneburg, Daniel A. Keim, Markus Wawryniuk, HD-Eye: Visual Mining of High-Dimensional Data, IEEE Computer Graphics and Applications, vol. 19, no. 5, pp , September/October, [6] G.J. Wills, An Interactive View for Hierarchical Clustering, infovis, p. 26,

Multidimensional Visualization and Clustering

Multidimensional Visualization and Clustering Multidimensional Visualization and Clustering Presentation for Visual Analytics of Professor Klaus Mueller Xiaotian (Tim) Yin 04-26 26-20072007 Paper List HD-Eye: Visual Mining of High-Dimensional Data

More information

Background. Parallel Coordinates. Basics. Good Example

Background. Parallel Coordinates. Basics. Good Example Background Parallel Coordinates Shengying Li CSE591 Visual Analytics Professor Klaus Mueller March 20, 2007 Proposed in 80 s by Alfred Insellberg Good for multi-dimensional data exploration Widely used

More information

Quality Metrics for Visual Analytics of High-Dimensional Data

Quality Metrics for Visual Analytics of High-Dimensional Data Quality Metrics for Visual Analytics of High-Dimensional Data Daniel A. Keim Data Analysis and Information Visualization Group University of Konstanz, Germany Workshop on Visual Analytics and Information

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

Navigating Hierarchies with Structure-Based Brushes

Navigating Hierarchies with Structure-Based Brushes Navigating Hierarchies with Structure-Based Brushes Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609 yingfua,matt,rundenst

More information

Space-based Partitioning Data Structures and their Algorithms. Jon Leonard

Space-based Partitioning Data Structures and their Algorithms. Jon Leonard Space-based Partitioning Data Structures and their Algorithms Jon Leonard Purpose Space-based Partitioning Data Structures are an efficient way of organizing data that lies in an n-dimensional space 2D,

More information

CS535 Fall Department of Computer Science Purdue University

CS535 Fall Department of Computer Science Purdue University Spatial Data Structures and Hierarchies CS535 Fall 2010 Daniel G Aliaga Daniel G. Aliaga Department of Computer Science Purdue University Spatial Data Structures Store geometric information Organize geometric

More information

Motivation for B-Trees

Motivation for B-Trees 1 Motivation for Assume that we use an AVL tree to store about 20 million records We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes

More information

Edge Equalized Treemaps

Edge Equalized Treemaps Edge Equalized Treemaps Aimi Kobayashi Department of Computer Science University of Tsukuba Ibaraki, Japan kobayashi@iplab.cs.tsukuba.ac.jp Kazuo Misue Faculty of Engineering, Information and Systems University

More information

Project Participants

Project Participants Annual Report for Period:10/2004-10/2005 Submitted on: 06/21/2005 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI) Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) Raytracing Global illumination-based rendering method Simulates

More information

USING THE DEFINITE INTEGRAL

USING THE DEFINITE INTEGRAL Print this page Chapter Eight USING THE DEFINITE INTEGRAL 8.1 AREAS AND VOLUMES In Chapter 5, we calculated areas under graphs using definite integrals. We obtained the integral by slicing up the region,

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Clustering in Ratemaking: Applications in Territories Clustering

Clustering in Ratemaking: Applications in Territories Clustering Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking

More information

Visualization of Large Web Access Data Sets

Visualization of Large Web Access Data Sets Visualization of Large Web Access Data Sets Ming C. Hao, Pankaj Garg, Umeshwar Dayal, Vijay Machiraju, Daniel Cotting 1 Software Technology Laboratory HP Laboratories Palo Alto HPL-2002-71 March 22 nd,

More information

Interactive Visual Exploration

Interactive Visual Exploration Interactive Visual Exploration of High Dimensional Datasets Jing Yang Spring 2010 1 Challenges of High Dimensional Datasets High dimensional datasets are common: digital libraries, bioinformatics, simulations,

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

4. Ad-hoc I: Hierarchical clustering

4. Ad-hoc I: Hierarchical clustering 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) March 28, 2002 [Angel 8.9] Frank Pfenning Carnegie

More information

ArtemiS SUITE diagram

ArtemiS SUITE diagram Intuitive, interactive graphical display of two- or three-dimensional data sets HEARING IS A FASCINATING SENSATION ArtemiS SUITE Motivation The diagram displays your analysis results in the form of graphical

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

clustering SVG shapes

clustering SVG shapes Clustering SVG Shapes Integrating SVG with Data Mining and Content-Based Image Retrieval Michel Kuntz Fachhochschule Kaiserslautern Zweibrücken, Germany SVG Open 2010 1 Presentation Overview Context, Problem,

More information

Object-Based Classification & ecognition. Zutao Ouyang 11/17/2015

Object-Based Classification & ecognition. Zutao Ouyang 11/17/2015 Object-Based Classification & ecognition Zutao Ouyang 11/17/2015 What is Object-Based Classification The object based image analysis approach delineates segments of homogeneous image areas (i.e., objects)

More information

Facet: Multiple View Methods

Facet: Multiple View Methods Facet: Multiple View Methods Large Data Visualization Torsten Möller Overview Combining views Partitioning Coordinating Multiple Side-by-Side Views Encoding Channels Shared Data Shared Navigation Synchronized

More information

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 06 07 Department of CS - DM - UHD Road map Cluster Analysis: Basic

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 16 CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 16 Today Continue Clustering Last Time Flat Clustring Today Hierarchical Clustering Divisive Agglomerative Applications of Clustering Hierarchical

More information

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data 3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Andreas Butz, WS 2009/10 Konzept und Basis für n:

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Spatial Data Structures

Spatial Data Structures 15-462 Computer Graphics I Lecture 17 Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) April 1, 2003 [Angel 9.10] Frank Pfenning Carnegie

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Prototype Selection for Handwritten Connected Digits Classification

Prototype Selection for Handwritten Connected Digits Classification 2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal

More information

DRAFT EAST POINSETT CO. SCHOOL DIST. - KINDERGARTEN MATH

DRAFT EAST POINSETT CO. SCHOOL DIST. - KINDERGARTEN MATH Module 1 - Math Count to tell the number of objects Test: 10/15/2015 (No TLI K.CC.3 Write numbers from 0 to 20. Represent a number of objects with a written numeral 0-20 (with 0 representing a count of

More information

Multiple Dimensional Visualization

Multiple Dimensional Visualization Multiple Dimensional Visualization Dimension 1 dimensional data Given price information of 200 or more houses, please find ways to visualization this dataset 2-Dimensional Dataset I also know the distances

More information

Multi-scale Techniques for Document Page Segmentation

Multi-scale Techniques for Document Page Segmentation Multi-scale Techniques for Document Page Segmentation Zhixin Shi and Venu Govindaraju Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, Amherst

More information

Spatial Data Structures

Spatial Data Structures Spatial Data Structures Hierarchical Bounding Volumes Regular Grids Octrees BSP Trees Constructive Solid Geometry (CSG) [Angel 9.10] Outline Ray tracing review what rays matter? Ray tracing speedup faster

More information

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records 11/2/2017 MIST.6060 Business Intelligence and Data Mining 1 An Example Clustering X 2 X 1 Objective of Clustering The objective of clustering is to group the data into clusters such that the records within

More information

Synoptics Limited reserves the right to make changes without notice both to this publication and to the product that it describes.

Synoptics Limited reserves the right to make changes without notice both to this publication and to the product that it describes. GeneTools Getting Started Although all possible care has been taken in the preparation of this publication, Synoptics Limited accepts no liability for any inaccuracies that may be found. Synoptics Limited

More information

cs6964 February TABULAR DATA Miriah Meyer University of Utah

cs6964 February TABULAR DATA Miriah Meyer University of Utah cs6964 February 23 2012 TABULAR DATA Miriah Meyer University of Utah cs6964 February 23 2012 TABULAR DATA Miriah Meyer University of Utah slide acknowledgements: John Stasko, Georgia Tech Tamara Munzner,

More information

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang Department of Computer Science San Marcos, TX 78666 Report Number TXSTATE-CS-TR-2010-24 Clustering in the Cloud Xuan Wang 2010-05-05 !"#$%&'()*+()+%,&+!"-#. + /+!"#$%&'()*+0"*-'(%,1$+0.23%(-)+%-+42.--3+52367&.#8&+9'21&:-';

More information

Clustering Part 3. Hierarchical Clustering

Clustering Part 3. Hierarchical Clustering Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

2 parallel coordinates (Inselberg and Dimsdale, 1990; Wegman, 1990), scatterplot matrices (Cleveland and McGill, 1988), dimensional stacking (LeBlanc

2 parallel coordinates (Inselberg and Dimsdale, 1990; Wegman, 1990), scatterplot matrices (Cleveland and McGill, 1988), dimensional stacking (LeBlanc HIERARCHICAL EXPLORATION OF LARGE MULTIVARIATE DATA SETS Jing Yang, Matthew O. Ward and Elke A. Rundensteiner Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609 yangjing,matt,rundenst}@cs.wpi.edu

More information

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering

More information

The Unified Segment Tree and its Application to the Rectangle Intersection Problem

The Unified Segment Tree and its Application to the Rectangle Intersection Problem CCCG 2013, Waterloo, Ontario, August 10, 2013 The Unified Segment Tree and its Application to the Rectangle Intersection Problem David P. Wagner Abstract In this paper we introduce a variation on the multidimensional

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Hierarchical Clustering Lecture 9

Hierarchical Clustering Lecture 9 Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011:

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University Ray Tracing III Wen-Chieh (Steve) Lin National Chiao-Tung University Shirley, Fundamentals of Computer Graphics, Chap 10 Doug James CG slides, I-Chen Lin s CG slides Ray-tracing Review For each pixel,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

ECE 480: Design Team #9 Application Note Designing Box with AutoCAD

ECE 480: Design Team #9 Application Note Designing Box with AutoCAD ECE 480: Design Team #9 Application Note Designing Box with AutoCAD By: Radhika Somayya Due Date: Friday, March 28, 2014 1 S o m a y y a Table of Contents Executive Summary... 3 Keywords... 3 Introduction...

More information

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts MSc Computer Games and Entertainment Maths & Graphics II 2013 Lecturer(s): FFL (with Gareth Edwards) Fractal Terrain Based on

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Ray Tracing Acceleration Data Structures

Ray Tracing Acceleration Data Structures Ray Tracing Acceleration Data Structures Sumair Ahmed October 29, 2009 Ray Tracing is very time-consuming because of the ray-object intersection calculations. With the brute force method, each ray has

More information

Objective of clustering

Objective of clustering Objective of clustering Discover structures and patterns in high-dimensional data. Group data with similar patterns together. This reduces the complexity and facilitates interpretation. Expression level

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

More information

Heterogeneous Density Based Spatial Clustering of Application with Noise

Heterogeneous Density Based Spatial Clustering of Application with Noise 210 Heterogeneous Density Based Spatial Clustering of Application with Noise J. Hencil Peter and A.Antonysamy, Research Scholar St. Xavier s College, Palayamkottai Tamil Nadu, India Principal St. Xavier

More information

Area and Volume. where x right and x left are written in terms of y.

Area and Volume. where x right and x left are written in terms of y. Area and Volume Area between two curves Sketch the region and determine the points of intersection. Draw a small strip either as dx or dy slicing. Use the following templates to set up a definite integral:

More information

Reducing Complex Visualizations for Analysis

Reducing Complex Visualizations for Analysis Proceedings of the 50th Hawaii International Conference on System Sciences 2017 Reducing Complex Visualizations for Analysis Lucas McDaniel Institute of Northern Engineering University of Alaska Fairbanks

More information

Interactive Visualization of Fuzzy Set Operations

Interactive Visualization of Fuzzy Set Operations Interactive Visualization of Fuzzy Set Operations Yeseul Park* a, Jinah Park a a Computer Graphics and Visualization Laboratory, Korea Advanced Institute of Science and Technology, 119 Munji-ro, Yusung-gu,

More information

On Multi-Stack Boundary Labeling Problems

On Multi-Stack Boundary Labeling Problems On Multi-Stack Boundary Labeling Problems MICHAEL A. BEKOS 1, MICHAEL KAUFMANN 2, KATERINA POTIKA 1, ANTONIOS SYMVONIS 1 1 National Technical University of Athens School of Applied Mathematical & Physical

More information

6th Grade Math. Parent Handbook

6th Grade Math. Parent Handbook 6th Grade Math Benchmark 3 Parent Handbook This handbook will help your child review material learned this quarter, and will help them prepare for their third Benchmark Test. Please allow your child to

More information

Hidden surface removal. Computer Graphics

Hidden surface removal. Computer Graphics Lecture Hidden Surface Removal and Rasterization Taku Komura Hidden surface removal Drawing polygonal faces on screen consumes CPU cycles Illumination We cannot see every surface in scene We don t want

More information

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.)

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.) Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.) It is a graphical representation of numerical data. The right data visualization tool can present a complex data

More information

The difficult task of sifting through the

The difficult task of sifting through the USING PROJECTIONS TO VISUALLY CLUSTER HIGH-DIMENSIONAL DATA The High-Dimensional Eye system proves that a tight integration of advanced clustering algorithms and state-of-the-art visualization techniques

More information

An Introduction to Data Analysis, Statistics, and Graphing

An Introduction to Data Analysis, Statistics, and Graphing An Introduction to Data Analysis, Statistics, and Graphing What is a Graph? Present processes, relationships, and changes in a visual format that is easily understandable Attempts to engage viewers by

More information

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner)

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner) CS4620/5620: Lecture 37 Ray Tracing 1 Announcements Review session Tuesday 7-9, Phillips 101 Posted notes on slerp and perspective-correct texturing Prelim on Thu in B17 at 7:30pm 2 Basic ray tracing Basic

More information

Using R-trees for Interactive Visualization of Large Multidimensional Datasets

Using R-trees for Interactive Visualization of Large Multidimensional Datasets Using R-trees for Interactive Visualization of Large Multidimensional Datasets Alfredo Giménez, René Rosenbaum, Mario Hlawitschka, and Bernd Hamann Institute for Data Analysis and Visualization (IDAV),

More information

Aim: How do we find the volume of a figure with a given base? Get Ready: The region R is bounded by the curves. y = x 2 + 1

Aim: How do we find the volume of a figure with a given base? Get Ready: The region R is bounded by the curves. y = x 2 + 1 Get Ready: The region R is bounded by the curves y = x 2 + 1 y = x + 3. a. Find the area of region R. b. The region R is revolved around the horizontal line y = 1. Find the volume of the solid formed.

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Image Segmentation for Image Object Extraction

Image Segmentation for Image Object Extraction Image Segmentation for Image Object Extraction Rohit Kamble, Keshav Kaul # Computer Department, Vishwakarma Institute of Information Technology, Pune kamble.rohit@hotmail.com, kaul.keshav@gmail.com ABSTRACT

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

FUNCTIONS AND MODELS

FUNCTIONS AND MODELS 1 FUNCTIONS AND MODELS FUNCTIONS AND MODELS In this section, we assume that you have access to a graphing calculator or a computer with graphing software. FUNCTIONS AND MODELS 1.4 Graphing Calculators

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm

More information

CS Information Visualization Sep. 19, 2016 John Stasko

CS Information Visualization Sep. 19, 2016 John Stasko Multivariate Visual Representations 2 CS 7450 - Information Visualization Sep. 19, 2016 John Stasko Learning Objectives Explain the concept of dense pixel/small glyph visualization techniques Describe

More information

You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics.

You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics. Getting Started 1. Create a folder on the desktop and call it your last name. 2. Copy and paste the data you will need to your folder from the folder specified by the instructor. Exercise 1: Explore the

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Rebecca C. Steorts, Duke University STA 325, Chapter 10 ISL 1 / 63 Agenda K-means versus Hierarchical clustering Agglomerative vs divisive clustering Dendogram (tree) Hierarchical

More information

Algorithms for GIS:! Quadtrees

Algorithms for GIS:! Quadtrees Algorithms for GIS: Quadtrees Quadtree A data structure that corresponds to a hierarchical subdivision of the plane Start with a square (containing inside input data) Divide into 4 equal squares (quadrants)

More information

Computer Graphics 7: Viewing in 3-D

Computer Graphics 7: Viewing in 3-D Computer Graphics 7: Viewing in 3-D In today s lecture we are going to have a look at: Transformations in 3-D How do transformations in 3-D work? Contents 3-D homogeneous coordinates and matrix based transformations

More information

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering World Journal of Computer Application and Technology 5(2): 24-29, 2017 DOI: 10.13189/wjcat.2017.050202 http://www.hrpub.org Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

More information

Lesson 1 Parametric Modeling Fundamentals

Lesson 1 Parametric Modeling Fundamentals 1-1 Lesson 1 Parametric Modeling Fundamentals Create Simple Parametric Models. Understand the Basic Parametric Modeling Process. Create and Profile Rough Sketches. Understand the "Shape before size" approach.

More information

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1 Hierarchy An arrangement or classification of things according to inclusiveness A natural way of abstraction, summarization, compression, and simplification for understanding Typical setting: organize

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

WHOLE NUMBER AND DECIMAL OPERATIONS

WHOLE NUMBER AND DECIMAL OPERATIONS WHOLE NUMBER AND DECIMAL OPERATIONS Whole Number Place Value : 5,854,902 = Ten thousands thousands millions Hundred thousands Ten thousands Adding & Subtracting Decimals : Line up the decimals vertically.

More information

Interacting with Layered Physical Visualizations on Tabletops

Interacting with Layered Physical Visualizations on Tabletops Interacting with Layered Physical Visualizations on Tabletops Simon Stusak University of Munich (LMU) HCI Group simon.stusak@ifi.lmu.de Abstract Physical visualizations only recently started to attract

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

More information

5 Mathematics Curriculum. Module Overview... i. Topic A: Concepts of Volume... 5.A.1

5 Mathematics Curriculum. Module Overview... i. Topic A: Concepts of Volume... 5.A.1 5 Mathematics Curriculum G R A D E Table of Contents GRADE 5 MODULE 5 Addition and Multiplication with Volume and Area GRADE 5 MODULE 5 Module Overview... i Topic A: Concepts of Volume... 5.A.1 Topic B:

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information