Rough Sets in Data Mining and Spatial Computing

Similar documents
DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

Fuzzy Entropy based feature selection for classification of hyperspectral data

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Texture Image Segmentation using FCM

A Vector Agent-Based Unsupervised Image Classification for High Spatial Resolution Satellite Imagery

(Refer Slide Time: 0:51)

Classification with Diffuse or Incomplete Information

Collaborative Rough Clustering

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

I. INTRODUCTION. Image Acquisition. Denoising in Wavelet Domain. Enhancement. Binarization. Thinning. Feature Extraction. Matching

Hybrid Model with Super Resolution and Decision Boundary Feature Extraction and Rule based Classification of High Resolution Data

Contents. Foreword to Second Edition. Acknowledgments About the Authors

STRATIFIED SAMPLING METHOD BASED TRAINING PIXELS SELECTION FOR HYPER SPECTRAL REMOTE SENSING IMAGE CLASSIFICATION

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Table Of Contents: xix Foreword to Second Edition

NCC 2009, January 16-18, IIT Guwahati 267

INCREASING CLASSIFICATION QUALITY BY USING FUZZY LOGIC

Hybrid Approach for Classification using Support Vector Machine and Decision Tree

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

COLOR BASED REMOTE SENSING IMAGE SEGMENTATION USING FUZZY C-MEANS AND IMPROVED SOBEL EDGE DETECTION ALGORITHM

Introduction to digital image classification

Object Extraction Using Image Segmentation and Adaptive Constraint Propagation

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality

Spectral Classification

A Review on Cluster Based Approach in Data Mining

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

1. AN.Valliyappan 2. Assistant Professor/ECE VCET, Madurai VCET, Madurai

Remote Sensed Image Classification based on Spatial and Spectral Features using SVM

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

Computational color Lecture 1. Ville Heikkinen

Change Detection in Remotely Sensed Images Based on Image Fusion and Fuzzy Clustering

Fuzzy C-means Clustering with Temporal-based Membership Function

Tri-modal Human Body Segmentation

American International Journal of Research in Science, Technology, Engineering & Mathematics

Outline of Presentation. Introduction to Overwatch Geospatial Software Feature Analyst and LIDAR Analyst Software

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

An Efficient Clustering for Crime Analysis

A Hybrid Recommender System for Dynamic Web Users

Granular Computing: A Paradigm in Information Processing Saroj K. Meher Center for Soft Computing Research Indian Statistical Institute, Kolkata

Face Recognition with Rough-Neural Network: A Rule Based Approach

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM

Experimentation on the use of Chromaticity Features, Local Binary Pattern and Discrete Cosine Transform in Colour Texture Analysis

PERSONALIZATION OF MESSAGES

APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING

Mining Local Association Rules from Temporal Data Set

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm

Recognition of Changes in SAR Images Based on Gauss-Log Ratio and MRFFCM

Efficient Rule Set Generation using K-Map & Rough Set Theory (RST)

International Journal of Advanced Research in Computer Science and Software Engineering

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

A Competent Segmentation of Volcanic Eruption Flow through Volcano Using Fuzzy C-Means and Texture Segmentation Filter

Detection of Defect on Fruit using Computer Vision Technique

Random projection for non-gaussian mixture models

Data: a collection of numbers or facts that require further processing before they are meaningful

CHAPTER 5 OBJECT ORIENTED IMAGE ANALYSIS

The Role of Biomedical Dataset in Classification

Index Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface

Hybrid Feature Selection for Modeling Intrusion Detection Systems

On Generalizing Rough Set Theory

FINGERPRINT MATCHING BASED ON STATISTICAL TEXTURE FEATURES

Image Analysis, Classification and Change Detection in Remote Sensing

Global Journal of Engineering Science and Research Management

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Unsupervised Change Detection in Optical Satellite Images using Binary Descriptor

Analyzing Outlier Detection Techniques with Hybrid Method

Density Based Clustering using Modified PSO based Neighbor Selection

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

Dynamic Clustering of Data with Modified K-Means Algorithm

Chapter 1, Introduction

Data Clustering With Leaders and Subleaders Algorithm

RPKM: The Rough Possibilistic K-Modes

IMAGINE Objective. The Future of Feature Extraction, Update & Change Mapping

Region Based Image Fusion Using SVM

A Comparative Study of Conventional and Neural Network Classification of Multispectral Data

IMAGE ENHANCEMENT USING FUZZY TECHNIQUE: SURVEY AND OVERVIEW

Coefficient of Variation based Decision Tree (CvDT)

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

RSES 2.2 Rough Set Exploration System 2.2 With an application implementation

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

Connected Component Analysis and Change Detection for Images

IMAGE CLUSTERING AND CLASSIFICATION

Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data

Remote Sensing Image Analysis via a Texture Classification Neural Network

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

Thematic Mapping with Remote Sensing Satellite Networks

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Fuzzy Multilevel Graph Embedding for Recognition, Indexing and Retrieval of Graphic Document Images

Climate Precipitation Prediction by Neural Network

Improving Classifier Performance by Imputing Missing Values using Discretization Method

PATTERN RECOGNITION USING NEURAL NETWORKS

Modeling the Real World for Data Mining: Granular Computing Approach

A Survey on Image Segmentation Using Clustering Techniques

A Technical Insight into Clustering Algorithms & Applications

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

Transcription:

Rough Sets in Data Mining and Spatial Computing Sonajhaira Minz Professor School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi 110 067 INDIA http://www.jnu.ac.in/facultystaff/showrofile.asp?sendusername=sminz

Outline PART I Rough Set Theory Applied for Data Mining Data Mining using Rough Sets PART - II Remote sensing data classification and Rough Sets Spatial Computing 2

Background Data Mining & Rough Set Theory Classification, Clustering: Rajni Jain, Girish Kumar Singh, Arun, Ashish, Puran, Yashpal, Sunil, Kalicharan, Surender, Mary, Ibrahim, Rangbahadur, Multi Agent Systems for Scheduling Problems: Ahmad Balid Privacy Preserving Data Mining: Fuad Al Yamiri Time-Series Analysis: Ibrahim Abu Ghali Multi-view Ensemble Learning: Vipin Kumar 3

PART-I Applications of Rough Set Concepts for Data Analysis Real Data Data Models Issues in dimension reduction 4

Real Data Vector/multidimensional representation Numeric values of the attributes Mostly dirty Imprecise Not essentially large 5

Data Models Patterns that describe the data Concepts, Class Descriptions, Clusters, Association Rules 6

Data Mining: Machine Learning Decision Tree Induction: Attribute Selection for test at an interior node Problems: Conceptually/quantitatively - Correlated Attributes Over fitted training samples Attribute selection error 7

Rough Sets: Real Data Experience Reduct Types Data reduction Concept description: Cluster analysis POS Discretization Granulation Tree Association Rule mining Clustering 8

Types of Reduct (Decision) Relative Reduct Core = Ri, Reduct = {Ri} i= 1,2, Variable Precision RS: Approximate Core Dynamic Reduct 9

Decision relative Reduct Reduct in comparison to simple diversity index, entropy, information gain, gain ratio, GINI index Complexity - reduct computation: O(m 2 n log n) tree induction: O(m n log n) DT induction (with tree prunning): O(m n log n) + O(n (log n) 2 ) RDT: [O(m 2 n log n) + O( R n log n) + O(n (log n) 2 ) 10

Variable Precision RS: Approximate Core R = {R 1, R 2,..., R n } Set of all reducts Core = R 1 R 2... R n A = R 1 R 2... R n A = m core core A Approximate core - (core : ') where, degree of approximation of the set core a superset of core. 11

Approximate Core Cont. Approximate core a superset of core. C 1 : (core : 1) core, say {a 1 } C 2 : (core : 0.98) {a 1, a 5 } C 3 : (core : 0.95) {a 1, a 2, a 5, a 7 } C 4 : (: (core : 0) {a i : a i A} The additional attribute added to core from A is selected from smallest reduct in R. Approximate core instead of reduct RDT core : 12

Dynamic Reduct Dynamic RDT Dynamic reduct based decision tree for handling noise. Information System T = (U, A, V, F), where A = C D C: Conditional attributes D: Decision attribute Let power set of T be P(T). A subtable of T is a member of P (T) with respect to the attribute set A. 13

Dynamic Reduct Let a subtable corresponding the attribute set P (T). DR(T, ): Dynamic Reduct of information T. RED(T, D): Decision relative reduct of T. RED (B, D): Decision relative reduct of B with respect to subatable. DR(T, ) = π P (T) RED (B, D) 14

Discretization Boolean reasoning Using positive region POS of class A supervised ( labelled data) groupingmaximizing the cardinality of POS ai (C p ) for numeric values of attribute a i, with respect to decision class C p using a rough membership function f ai,c p 15

Discretization Rough membership Function f ai,c p : V ai R arg max x I V f ai,c (x)= x card(a i,i X Cp ) ai p card(x ai,i ) Where, X a i,i = {x x U, a i (x) I, I V ai } and, a i,i X Cp = { x a i (x) I, I V ai, D(x) = c P } Also, POS a i (D) = {a ix: X [x] D } 16

Reduct: RDT for Classification Achieve Dimension Reduction using reducts 17

RDT Experiment 18

RDT Experiment 19

Reduct in other models Applied to Covering algorithm Applied to Cluster analysis 20

Reduct based Covering Algorithm 21

Reduct based Covering Algorithm 22

Reduct based Covering Algorithm 23

Reduct based Cluster Analysis Iris Data 24

Reduct based Cluster Analysis 25

Variable Precision RS: Approximate Core R = {R 1, R 2,..., R n } Set of all reducts Core = R 1 R 2... R n A = R 1 R 2... R n A = m core core A Approximate core - (core : ') where, is the degree of approximation of the set core a superset of core. 26

Approximate Core Cont. Approximate core a superset of core. C 1 : (core : 1) core, say {a 1 } C 2 : (core : 0.98) {a 1, a 5 } C 3 : (core : 0.95) {a 1, a 2, a 5, a 7 } C 4 : (core : 0) {a i : a i A} The additional attribute added to core from A is selected from smallest reduct in R. Approximate core instead of reduct RDT core : 27

Approximate Core: RDT core : 28

RDT core : 29

RDT core : 30

Dynamic Reduct Dynamic RDT Dynamic reduct based decision tree for handling noise. Information System T = (U, A, V, F), where A = C D C: Conditional attributes D: Decision attribute Let power set of T be P(T). A subtable of T is a member of P (T) with respect to the attribute set A. 31

Dynamic Reduct Let a subtable corresponding the attribute set P (T). DR(T, ): Dynamic Reduct of information T. RED(T, D): Decision relative reduct of T. RED (B, D): Decision relative reduct of B with respect to subatable. DR(T, ) = π P (T) RED (B, D) 32

Dynamic Reduct based RDT: Nutrition Dataset (Real) 33

Discretization Boolean reasoning Using positive region POS of class A supervised ( labelled data) groupingmaximizing the cardinality of POS ai (C p ) for numeric values of attribute a i, with respect to decision class C p using a rough membership function f ai,c p 34

Discretization Rough membership Function f ai,c p : V ai R arg max x I V f ai,c (x)= x card(a i,i X Cp ) ai p card(x ai,i ) Where, X a i,i = {x x U, a i (x) I, I V ai } and, a i,i X Cp = { x a i (x) I, I V ai, D(x) = c P } Also, POS a i (D) = {a ix: X [x] D } 35

Discretization using POS 36

Discretization using POS Result-1 37

Discretization using POS Result-2 38

Discretization using POS Result-3 39

Granulation Tree 40

Granulation Tree 41

Clustering using Granules Data Description 42

Clustering using Granules 43

Histon for Image processing using Rough Sets Let an image I of size M N of L intensity levels g i,j 0, L 1. The histogram indicates the number of pixels along Y-axis corresponding intensity values 0-255 along X-axis, indicate lower approximation or positive region. For segmentation an intensity to identify different concepts in a data 44

Image Processing: Histogram 45

Rough Set: Histon Color images have three histograms each for R, G and B intensity values. For i {R, G, B} and 0 g L a histogram is M N h(g) = m=1 n=1 δ(i m, n g) ; Where δ(.) is the Dirac impulse function. Histon H(g) H(g) = M m=1 N n=1 (1 + X m, n )δ(i m, n g) 1 d τ (m, n) < threshhold Where, X m, n =, d 0 otherwise τ (m, n) indicates difference in intensities of a pixel (m,n) with its neighbours 46

Part -II Spatial Data Spatial Computing 47

Spatial & Remote Sensing Data: The Team Mining Spatial & Remote Sensing Data (Big Data Analytics) Spatial Data Mining: A Machine Learning Approach, Dr. Anshu Dixit Ph.D. 2012 (IASRI) Change Detection Using Unsupervised Learning Algorithms for Delhi, India, Asian Journal of Geoinformatics, Vol 13, No. 4, 12-15, 2013, Hemant Kr. Aggarwal, M.Tech. 2013 (Ph.D. IIITD) Active Learning for Semi-supervised Classification in Hyperspectral Remote Sensing Images, Monoj Pradhan (Ph.D) Semi-Supervised Classification, Prem Shankar (Ph.D) Content-based Classification, Saroj Kumar Sahu (Ph.D) Rough Set Based Extreme Learning Machine for Hyper Spectral Data Classification, Ankit Malviya (M.Tech) Comparing Decision Tree and Markov Random Field based Classification for Spatial Data, Mahedra Gupta (M.Tech) 48

Spatial Data: Satellite Image 49

Spatial Data: Multidimensinal 50

Spatial Data: Medical Images 51

Geo-Spatial 52

Knowledge-oriented Remote Sensing Image Analysis Hemant Kumar Aggarwal, Sonajharia Minz, Change Detection Using Unsupervised Learning Algorithms for Delhi, India, Asian Journal of Geoinformatics, Vol 13, No. 4, 12-15, 2013 53

Motivation Knowledge-oriented Change Detection Measure effectiveness of Machine Learning for Change Detection Explore potential for dimensionality reduction 54

Experimental Results Original Water Vegetation Urban Features K-means Fuzzy C mean EM 55

Percentage of Pixels per Class Year Kmeans EM FCM Kmeans EM FCM Kmeans EM FCM 1998 52.8 76.6 45.8 41.929 15.81 49.23 5.265 7.58 4.97 1999 35.16 55.43 43.25 59.19 41.22 53.97 5.648 3.35 2.78 2000 37.318 56 42.89 58.773 40.34 53.6 3.909 3.65 3.51 2001 36.016 38.72 51.85 59.881 55.78 44.45 4.104 5.49 3.69 2002 34.283 64.61 63.25 62.282 32.95 33.68 3.435 2.43 3.06 2009 37.867 35.025 38.02 58.837 61.19 58.79 3.296 3.78 3.18 2010 35.717 52.87 40.39 58.438 41.57 54.3 5.848 5.56 5.3 2011 34.662 58.54 36.16 61.904 37.11 60.56 3.434 4.35 3.27 56

Total Percentage Change Class Water Built-up Vegetation Algorithm K-means -0.26 2.85-2.59 EM -0.46 3.04-2.58 FCM -0.24 1.62-1.38 57

Conclusion Partitioning based methods are more effective than probabilistic and fuzzy. Decrease in vegetated area and increase in urban area. Dimensionality Reduction by 50% Future Work Environmental Footprint and More Spatial Footprint Change Discovery 58

59

60

Rough Set based classification of Hyperspectral Data (Master s Dissertation: Ankit Malvia ) 61

Framework 62

Results: Dimension Reduction 63

S.N. Datasets Elapsed time Total Bands No. Selected bands Selected Features 1. Indian Pines 1156.42 200 4 B1, B22, B43, B84 2. Pavia University scene 72165.33 103 4 B1, B2, B87, B103 64

Spatial Computing Challenges pertaining to Data Characteristics Not iid Spatial Auto correlation (Tobler s First Law of Geography nearby objects are related to each other more than the objects at a distance) Class imbalance problem (challenge to Statistical and probabilistic methods) Classification very small labelled data 65

Spatial Computing: Patterns 66

Spatial Patterns: Hotspot 67

Spatial Computing: Patterns 68

Patterns based on 2D plane 69

Spatial Patterns: Spherical Earth 70

Spatial Computing: Transformative Technology GPS Remote Sensing GIS Spatial Database Management Systems Spatial Statistics 71

Spatial Computing: Opportunities Short Term Spatial Predictive Analysis Geocollaborative Systems Moving Spatial Computing Long Term Fusion to Synergies Sensors to Clouds Spatial Cognitive first Geoprivacy 72

Spatial Computing Shashi Shekhar: McKnight Distinguished University Professor, Department of Computer Science at the University of Minnesota, MN, USA https://vimeo.com/148128607 http://cacm.acm.org/videas/sptial-computing 73

Some Important References 1. Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A., Rough Sets: A tutorial, 1997 2. Zadeh, L.A., Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems, 90, 1997 3. Zadeh, L.A., Granular Computing and Rough Set theory, LNAI 4585, 2007 4. Grzymala, J.B., Introduction to Rough Set Theory and Application, ppt 5. Jain, Rajni, Rough Set Theory based Decision Tree Induction for Data Mining, Ph.D. Thesis, 2005 6. Singh, Girish, Rough Set Theory for Data Mining, Ph.D. Thesis, 2007 7. Jain Rajni and Sonajharia Minz. 2008. Drawing Conclusions from Forest Cover Type Data - The Hybridized Rough Set Model, Journal of the Indian Society of Agricultural Statistics, 62(1):75-84 8. Girish Kumar Singh and Sonajharia Minz. 2007. Discretization Using Clustering and Rough Set Theory. In: Proceedings of International Conference on Computing: Theory and Applications (ICCTA'07). ieeecomputersociety.org/10.1109/iccta.2007.51 9. Mushrif, M.M., Ray, A.K., A-IFS Histon Based Multithresholding Algorithm for Color Image Segmentation, IEEE Signal Processing Letter, Vol 16, No. 3, 2009 74

Jain, Rajni and Sonajharia Minz. 2007. Intelligent data analysis for identifying rich: The rough set way. In: Proceedings of 2 nd National Conference on Methods and Models in Computing (NCM2C 2007), Eds: S. Minz and D.K. Lobiyal, JNU, New Delhi, Allied Publishers Pvt. Ltd., pp. 117-130. Minz, S. and Rajni Jain. 2005. Refining decision tree classifiers using rough set tools, International Journal of Hybrid Intelligent System 2(2):133-148. Rajni Jain and Sonajharia Minz. 2005. Dynamic RDT model for data mining, Proceedings of 2nd Indian International Conference on Artificial Intelligence (IICAI-05), Pune, India. Rajni Jain and Sonajharia Minz. 2005. Dynamic RDT model for mining rules from real data, Journal of the Indian Society of Agricultural Statistics, 59(2). Sonajharia Minz, Rajni Jain. 2003. Rough set-based Decision Tree model for Classification, Proceedings of 5th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2003 Prague, Czech Republic, LNCS 2737, 172-181. Rajni Jain, Sonajharia Minz. 2003. Classifying Mushrooms in the Hybridized Rough Sets Framework, Proceedings of 1st Indian International Conference on Artifical Intelligence (IICAI- 03), Bhanu Prasad (Editor) 554-567. Sonajharia Minz, Rajni Jain. 2003. Hybridizing Rough set framework for Classification: An Experimental View, Design and Application of Hybrid Intelligent Systems A. Abraham et. al (Eds.), IOS Press, 631-640. Rajni Jain, Sonajharia Minz. 2003. Should decision trees be learned using rough sets?, Proceedings of 1st Indian International Conference on Artificial Intelligence (IICAI-03), Bhanu Prasad (Editor)1466-1479. 75

Thank YOU 76