Visual Data Mining. Overview. Apr. 24, 2007 Visual Analytics presentation Julia Nam

Similar documents
As is typical of decision tree methods, PaintingClass starts by accepting a set of training data. The attribute values and class of each object in the

StarClass: Interactive Visual Classification Using Star Coordinates

PolyCluster: An Interactive Visualization Approach to Construct Classification Rules

Towards an Effective Cooperation of the User and the Computer for Classification

RINGS : A Technique for Visualizing Large Hierarchies

DBSCAN. Presented by: Garrett Poppe

Interactive Machine Learning Letting Users Build Classifiers

CSE 5243 INTRO. TO DATA MINING

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

Analysis and Extensions of Popular Clustering Algorithms

Hierarchical Document Clustering

Data Sets. of Large. Visual Exploration. Daniel A. Keim

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

DATA WAREHOUING UNIT I

Interactive Visual Decision Tree for Developing Detection Rules of Attacks on Web Applications

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

DS504/CS586: Big Data Analytics Big Data Clustering II

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

DS504/CS586: Big Data Analytics Big Data Clustering II

Clustering. Supervised vs. Unsupervised Learning

CS570: Introduction to Data Mining

Multidimensional Visualization and Clustering

DATA MINING AND WAREHOUSING

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

Visual Computing. Lecture 2 Visualization, Data, and Process

9. Conclusions. 9.1 Definition KDD

Chapter 1, Introduction

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

SCHEME OF COURSE WORK. Data Warehousing and Data mining

A Parallel Community Detection Algorithm for Big Social Networks

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Faster Clustering with DBSCAN

Performance Analysis of Video Data Image using Clustering Technique

Multiple Dimensional Visualization

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Supervised vs. Unsupervised Learning

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University

Supervised Clustering of Yeast Gene Expression Data

Introduction to Data Mining and Data Analytics

Clustering Part 4 DBSCAN

Image Mining: frameworks and techniques

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

C-NBC: Neighborhood-Based Clustering with Constraints

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Efficient and Effective Clustering Methods for Spatial Data Mining. Raymond T. Ng, Jiawei Han

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Using R-trees for Interactive Visualization of Large Multidimensional Datasets

Data warehouses Decision support The multidimensional model OLAP queries

Understanding Clustering Supervising the unsupervised

Unsupervised Learning

8. Automatic Content Analysis

Courtesy of Prof. Shixia University

Research on Data Mining Technology Based on Business Intelligence. Yang WANG

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM

Clustering to Reduce Spatial Data Set Size

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Automatic Domain Partitioning for Multi-Domain Learning

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Scalable Varied Density Clustering Algorithm for Large Datasets

Information Visualization. Jing Yang Spring Multi-dimensional Visualization (1)

MIS2502: Data Analytics Clustering and Segmentation. Jing Gong

Lecture 11: Classification

Comparative Study of Subspace Clustering Algorithms

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Overview: GeoViz Toolkit Interface:

Parallel Coordinates ++

The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti

Data Preprocessing. Slides by: Shree Jaswal

Performance Analysis of K-Mean Clustering on Normalized and Un-Normalized Information in Data Mining

Improved K-Means Algorithm Based on COATES Approach. Yeola, Maharashtra, India. Yeola, Maharashtra, India.

Hierarchical Density-Based Clustering for Multi-Represented Objects

Decision Trees Oct

Object-Based Classification & ecognition. Zutao Ouyang 11/17/2015

Implementation of Data Clustering With Meta Information Using Improved K-Means Algorithm Based On COATES Approach

Exploratory Data Analysis EDA

by Prentice Hall

An Autoassociator for Automatic Texture Feature Extraction

Classification with Decision Tree Induction

HIERARCHICAL MULTILABEL CLASSIFICATION

Estimating Data Center Thermal Correlation Indices from Historical Data

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Credit card Fraud Detection using Predictive Modeling: a Review

Basic Data Mining Technique

Statistical Pattern Recognition

Anomaly Detection on Data Streams with High Dimensional Data Environment

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

Transcription:

Overview Visual Data Mining Apr. 24, 2007 Visual Analytics presentation Julia Nam Visual Classification: An Interactive Approach to Decision Tree Construction M. Ankerst, C. Elsen, M. Ester, H. Kriegel, University of Munich StarClass: Interactive Visual Classification Using Star Coordinates Soon Tee Teoh, Kwan-Liu Ma, University of California, Davis PaintClass: Interactive Construction, Visualization and Exploration of Decision Trees Soon Tee Teoh, Kwan-Liu Ma, University of California, Davis Decision Trees Decision Trees

Decision Tree Classifier Most classification systems are designed for minimal user intervention. Using visualization, they allows to integrate the domain knowledge of an expert in the tree construction phase. Also, they allow to explore the decision tree interactively. Visual Classification: An Interactive Approach to Decision Tree Construction (KDD 99) M. Ankerst, C. Elsen, M. Ester, H. Kriegel, University of Munich PBC (Perception Based Classification) Circle Segments They represent all values of one attribute in a segment of a circle with the proposed arrangement inside a segment. The color of pixel is determined by the class label of the object.

Interactive Classification PBC (Perception Based Classification) Experimental Evaluation Experimental Evaluation

Contribution Introduce a fully interactive method for decision tree construction based on a multidimensional visualization. Domain knowledge of an expert can be profitably included in the tree construction phase. After that, the user has a much deeper understanding of the data than just knowing the decision tree generated by an arbitrary algorithm. StarClass & PaintingClass: Interactive Construction, Visualization and Exploration of Decision Trees (Conf. On Data Mining SDM 03 & SIGKDD 03) Soon Tee Teoh, Kwan-Liu Ma University of California, Davis PaintingClass Visual Projections in PaintingClass Start with a set of training data. Every object in the set is projected and displayed visually into 2-D space. The user creates a projection that best separates the data objects belonging in different classes. Then it is partitioned by the user into regions. Repeat projection and partitioning until the user has a satisfactory decision tree. The class is leaf nodes of the tree. Star Coordinates Good at showing dimensions with numerical attributes StarClass: Interactive Visual Classification Using Start Coordinates Parallel Coordinates Good at showing dimensions with categorical attributes

Star Coordinates User can select any projection. The color of the data object is the assigned class. Star Coordinates Painting Regions Can scale or rotate an axis by moving an axis. Can generate different axis mapping. Can be zoomed by selecting region.

Building the decision Tree Parallel Coordinates To display the categorical dimensions, Painting Regions Decision Tree Visualization & Exploration Focus + Context

Decision Tree Visualization & Exploration The Objectives of PaintingClass To create a user-directed decision tree classification system To enable users to explore and visualize multidimensional data and their corresponding decision trees to improve user understanding and to gain knowledge. First Goal : Good enough Accuracy Second Goal : Knowledge Gained User can see the hierarchy of split points. They reveal which dimensions, combinations of dimensions and values are most correlated to different classes. The shape of the decision tree indicates certain characteristics of the data. Each node in the tree is itself a visual projection of a subset of the data. It can reveal patterns, clusters, shapes and outliers. Hierarchy allows the user to focus on subsets of the data.

Conclusion The user interactively edits projections of multidimensional data and paints regions to build a decision tree. Exploration of decision trees fulfills some general goals of data mining beyond classification. PaintingClass also provides several useful extensions to StarClass, the most important of which is the use of Parallel Coordinates to display categorical values so that even datasets with categorical dimensions can be classified and visualized.