Traffic Volume(Time v.s. Station)
|
|
- Patricia Hamilton
- 6 years ago
- Views:
Transcription
1 Detecting Spatial Outliers: Algorithm and Application Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota hone:
2 Outline Introduction Motivation General Denition of Spatial Outlier Related Work roposed Approach and Algorithm Evaluation of roposed Approach Conclusions
3 $ Spatial Data Mining Spatial Databases are too large to analyze manually NASA Earth Observation System(EOS) National Institute of Justice - Crime mapping Census Bureau, Dept. of Commence - Census Data Spatial Data Mining Discover frequent and interesting spatial patterns for post processing (knowledge discovery) attern examples: outliers, hot-spots, land-use classication 1
4 Spatial Outlier $ Denition A data point that is extreme relative to its neighbors Individual attribute value is not necessarily extreme in the total population, but is extreme in its adjacent area An example Item: alm Beach County, Neighbors(item) = Counties in Florida Source: 2
5 Application Domain I-35W North Bound Volume: the number of vehicles passing a station within 5 minutes Traffic Volume(Time v.s. Station) I35W Station ID (North Bound) Time 0 3
6 Application Domain Minneapolis-St. aul (Twin-Cities) Trac Data Set 930 detectors (stations) installed on major highways eriodical measuring attributes: volume, occupancy, speed Interesting spatial outliers - discontinuities { Assume smooth spatial attribute
7 $ Application Domain: Twin-Cities Trac Data Traffic Volume v.s. Time for Station 138 on 1/ Volume Time Figure 1: Station 138 on 1/ Traffic Volume v.s. Time for Station 139 on 1/ Volume Time Figure 2: Station 139 on 1/ Traffic Volume v.s. Time for Station 140 on 1/ Volume Time Figure 3: Station 140 on 1/
8 An Example of Spatial Outlier Spatial outlier: S, global outlier: G 8 7 G Original Data oints S Data oint Fitting Curve 6 Attribute Values Q D 2 1 L Location Z s(x) approach: S(x) = [f (x)? 1 k y2n(x) (f (y))] if Z s(x) = j S(x)? s j >, declare x as an outlier s 4 Outliers Detected Using Spatial Test S 3 Spatial Test Zs(x) Q Location 4
9 Evaluation of Statistical Assumption Distribution of trac station attribute f (x) looks normal 70 Volume Distribution at 10:00am on 1/15/ Number of Stations Volume Value Distribution of S(x) = [f (x)? 1 k looks normal too! y2n(x) (f (y))] Normalized Volume Difference over Spatial Neighborhood at 10:00am on 1/15/ Number of Stations Normalized Volume Difference over Spatial Neighborhood 5
10 General Denition of Spatial Outlier Given A spatial framework S consisting of locations s 1 ; s 2 ; : : : ; s n An attribute function f : s i! R (R : set of real numbers) A neighborhood relationship N S S A neighborhood aggregation function f N aggr : RN! R A dierence function F diff : R R! R A test procedure T F : R! ft rue; F alseg General denition: SO-outlier An object O 2 S is a SO-outlier (f, faggr, N F diff, T F ) if T F ff diff [f (x), faggr(f N (x), N (x))]g == TRUE Constraints F diff, T F are composed (e.g., ratio, product, linear combination) of few statistical function of f, f N aggr 6
11 $ Related Work - Outlier Detection Tests Outlier Detection Methods One-dimensional (linear) Multi-dimensional Frequency distribution over attribute value Homogeneous Dimensions Bi-partite dimension (Spatial outlier detection) Graphical Quantitative Varigram Cloud Moran Scatterplot Scatterplot Spatial Statistic Z s(x) 7
12 Related Work - Outlier Detection Tests Related work: two families 1-dimension - ignores geographic location Homogeneous Multi-dimension - mixes location with attributes Spatial outlier { 2 classes of dimensions - location, attributes { Neighborhood - based on location dimensions { Dierence - compares attribute dimensions Comparison of outlier detection methods One-dimensional ( linear ) Homogeneous Multi-dimensional Bi-partite Neighbor Definition N/A location and attribute location Comparison with population distribution location and attribute attribute values of neighbors 8
13 Related Work Dierent Spatial Outlier Tests Spatial Statistic Approach Scatter lot Approach (Luc Anselin 94) Moran Scatter plot Approach (Luc Anselin 95) Variogram Cloud Approach (Graphic) All these are special case of SO-outlier Will show this for one case Test for Outlier Detection Outlier Denition Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) Test function T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) Spatial Statistic (Z s(x) ) Scatter lot Moran Scatter lot f(x)? S(x) = [f(x)? E(x)] = E(x)? (m f(x) + b) Z i = f, I i = (E(x) = 1 k f(y)) y2n(x) (E(x) = 1 k f(y)) f Z i y2n(x) m 2 j W ijz j, j S(x)? s s j > j? j > (Z i :+, I i : -) or (Z i :-, I i : +) Table 1: Test for Outlier Detection 9
14 $ Scatter lot Approach 7 Scatter lot Average Attribute Values Over Neighborhood Q S Attribute Values Lemma Scatter plot is a special case of SO-outlier Given An attribute function f (x) A neighborhood relationship N (x) An aggregation function f N aggr : E(x) = 1 k y2n(x) f (y) A dierence function F diff : = E(x)? (m f (x) + b) Detect spatial outlier by Test function T F : j? j > 10
15 $ Moran Scatterplot Approach X axis : Z i = f(x)? standardized deviation units Y axis (local moran value ): I i = Z i { where m 2 = Four clusters j Z2 j, j W ij = 1 m 2 j W ijz j { High-high: High attribute value, high attribute value neighbors (ositive association) { High-low: High attribute value, low attribute value neighbors (Negative association) { low-high: Low attribute value, high attribute values neighbors (Negative association) { low-low: Low attribute value, low attribute value neighbors (ositive association) Spatial outliers { High low; Low high Weighted Neighbor Z Score of Attribute Values Moran Scatter lot Q S Z Score of Attribute Values
16 Moran Scatter lot Approach Lemma Moran Scatter plot is a special case of SO-outlier Given An attribute function f (x) A neighborhood relationship N (x): neighbor matrix W ij Dierence functions Z i = f(x)? f f, I i = Z i m 2 j W ijz j Detect Spatial Outlier by Test function (Z i : +, I i : -) or (Z i : -, I i : +)
17 Variogram Cloud Approach Approach X axis: airwise distance between neighbors Y axis: Square root dierence of attribute value Spatial outlier { Locations that are near to one another { Locations with large attribute dierences An example { S1 : relation of S and Q node { S2 : relation of S and node Square Root of Absolute Difference of Attribute Values (Q,S) (,S) Variogram Cloud airwise Distance
18 $ Outline Introduction roposed Approach and Algorithm roblem formulation Our approach Ecient algorithm Cost model Evaluation of roposed Approach Conclusions
19 $ roblem Formulation Given A spatial framework S consisting of locations s 1 ; s 2 ; : : : ; s n An attribute function f : s i! R A neighborhood relationship N S S An aggregation function faggr N : RN! R A dierence function F diff : R R! R A test procedure T F : R! ft rue; F alseg General denition: SO-outlier An object O 2 S is a SO-outlier (f, faggr, N F diff, T F ) if T F ff diff [f (x), faggr(f N (x), N (x))]g == TRUE Design An ecient algorithm to detect SO-outier, i,e., O = fs i j s i 2 S; s i is a spatial outlierg Objective Eciency: to minimize the computation time Constraints F diff, T F are composed (e.g., ratio, product, linear combination) of few statistical function of f, f N aggr The size of the data set the main memory size Computation time is determined by I/O time 11
20 Our Approach Separate two phases Model Building Test a node (or a set of nodes) Computation Structure of Model Building Key insights: { Need a few statistics of f (x) and f N aggr Spatial self join using N (x) relationship Computation Structure of Testing Single node: spatial range query { Get All Neighbors(x) operation A given set of nodes { Sequence of Get All Neighbor(x) 12
21 An Example of Our Approach Consider Scatter lot Model Building Neighborhood aggregate function f N aggr : E(x) = 1 k Global aggregate function f G1 aggr, f G2 aggr,.., f Gk aggr y2n(x) f (y) { f (x); E(x); f (x)e(x); f 2 (x); E 2 (x) Global parameter G1 aggr, G2 aggr,.., Gk aggr { m = N f(x)e(x)? f(x) E(x) N f 2 (x)?( f(x) 2 f(x) E { b 2 (x)? f(x) f(x)e(x) = N f 2 (x)?( f(x)) q 2 S { yy?(m 2 S xx ) =, (n?2) { where S xx = { and S yy = f 2 (x)? [ ( f(x)) 2 E 2 (x)? [ ( E(x)) 2 n n ] ] Testing Dierence function { = E(x)? (m f (x) + b) { where E(x) = 1 k Test function { j? j > y2n(x) f (y) 13
22 Scatter lot X axis: Attribute value of each point Y axis: Average attribute value in adjacent area Run a linear square regression on the plot { Regression line y = mx + b { m = N x i y i? x i yi N x 2?( x i i ) 2 { b = xi yi 2? xi xi y i N x 2 i?( x i ) 2 { = ^y? (mx + b) { Standard deviation = { Mean = 0 { S xx = { S yy = x 2 i? [ ( x i ) 2 n y 2 i? [ ( y i ) 2 = ^y? (mx + b) n ] ] q S yy?(a 2 S xx ) (N?2) { if standardized residual j? j > { Declare x as an outlier
23 Spatial Outlier Detection: Summary Model Building Compute simple statistics on f (x); f N aggr Details in Table 2 Computable in a single spatial-self-join on N (x) Outlier Denition Attribute function f Neighborhood aggregate function Model Building Spatial Statistic Scatter lot Moran Scatter lot f(x) f(x) f(x) S(x) = f(x)? E(x) E(x) = 1 k y2n(x) f(y) f N aggr Global function faggr G1,.., faggr Gk Global parameter aggr G1,.., Gk aggr f G2 aggr, aggregate G2 aggr, S(x), S 2 (x) S(x) s =, n s = q 1 [ S 2 (x) ( S(x)) 2? ] n n f(x), f(x)e(x), E 2 (x) E(x), f 2 (x), m = N f(x)e(x)? f(x) E(x), N f 2 (x)?( f(x) 2 b = f(x) E 2 (x)? f(x) f(x)e(x), N f 2 (x)?( f(x)) 2 = q S yy?(m 2 S xx), where (n?2) S xx = f 2 (x)? [ ( f(x)) 2 n ], S yy = E 2 (x)? [ ( E(x)) 2 n ], f(x), f 2 (x) f(x) f =, n f = q 1 [ f 2 (x) ( f(x)) 2? ] n n Table 2: Model Building 14
24 Spatial Outlier Detection: Summary Test for Outlier Detection Outlier Denition Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) Test function T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) Spatial Statistic (Z s(x) ) Scatter lot Moran Scatter lot f(x)? S(x) = [f(x)? E(x)] = E(x)? (m f(x) + b) Z i = f, I i = (E(x) = 1 k f(y)) y2n(x) (E(x) = 1 k f(y)) f Z i y2n(x) m 2 j W ijz j, j S(x)? s s j > j? j > (Z i :+, I i : -) or (Z i :-, I i : +) Table 3: Test for Outlier Detection 15
25 Model Building Algorithm Algorithm A1: steps For each location x { Retrieve data record of x = (f(x), identities of neighbors(x)) { Get-All-Neighbors(x): Retrieve data records of neighbor(x) if neighbor y is not in the memory buer request another I/O operation { Compute neighborhood aggregate function f N aggr { Accumulate global aggregate function: f G1 aggr, f G2 aggr,.., f Gk aggr Compute global parameter G1 aggr, G2 aggr,.., Gk aggr I/O cost is determined by Dominant operation: Get-All-Neighbors(x) I/O cost of Get-All-Neighbors(x) is determined by the clustering eciency { Grouping nodes into disk page 16
26 $ Test Algorithm Algorithm A2(a): steps For each location x along a route { Retrieve data record of x = (f(x), identies of neighbors(x)) { Get-All-Neighbors(x): Retrieve data records of neighbor(x) if neighbor y is not in the memory buer request another I/O operation { Compute Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) if T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) == T rue { Declare x as an outlier Algorithm A2(b): steps For each location x within the random set { Retrieve data record of x = (f(x), identities of neighbors(x)) { Get-All-Neighbors(x): Retrieve data records of neighbor(x) if neighbor y is not in the memory buer request another I/O operation { Compute Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) if T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) == T rue { Declare x as an outlier 17
27 I/O Cost Model Denition CE: Clustering Eciency N: Total number of nodes Bfr: Blocking factor (number of nodes in a page) K: Number of neighbors for each node L: Number of nodes in a route R: Number of nodes in a random set Assume two memory buers Cost model of A1 (N/Bfr)+N*K*(1-CE) { The cost to retrieve all nodes: (N/Bfr) { The cost to retrieve neighbors of all nodes: N*K*(1-CE) Cost model of A2(a) L*(1-CE) + L*K*(1-CE) Cost model of A2(b) R+R*K(1-CE) 18
28 Clustering Eciency arameter Computation cost (I/O cost) is determined by Clustering Eciency(CE) CE denition: { (Total number of unsplit edges)/(total number of edges) { robability[v i and a neighbor of v i are stored in the same disk page] An example { CE = 9?3 9 = 6 9 = 0:66 age B V1 V3 V2 V4 V5 V6 V7 V8 age A V9 age C CE depends on { Disk block size { Data set size, data set distribution { Clustering method 19
29 $ Outline Introduction roposed Approach and Algorithm Evaluation of roposed Approach Candidates (Clustering Methods) Experiment Design Results Conclusions
30 Experimental Evaluation (Summary) Hypothesis: I/O cost of the algorithms is determined by the clustering eciency hysical Data age Clustering Method Graph-based method: CCAM Geometric method: Cell Tree Geometric method: Z-order Metrics: Clustering Eciency (CE), I/O cost Benchmark data Minneapolis - St. aul Trac data Benchmark tasks Model Building Test Spatial Outlier (Route) Test Spatial Outlier (Set of Random Nodes) Variable arameters Buer size Disk page size 20
31 Clustering Method: CCAM Connectivity Clustered Access Method Cluster the nodes via min-cut graph partitioning Use B+ tree with Z-order as the secondary index Key 8 Key 0 Key 1 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Node 0 Node 1 Node 4 Node 5 Node 3 Node 6 Node 8 Node B tree index Key 9 Key 11 Key 12 Key 13 Key 14 Key 15 Key 18 Key 26 Node 7 Node 12 Node 13 Node 18 Node 11 Node 14 Node 15 Node 26 Data age 21
32 Clustering Method: CCAM Connectivity Clustered Access Method Cluster the nodes via min-cut graph partitioning Use B+ tree with Z-order as the secondary index
33 Clustering Method: Cell Tree Binary Space artitioning(bs) Decompose universe into disjoint convex subspaces Each leaf node corresponds to one of the subspaces Each tree node is stored on one disk page Cannot exploit edge information, pure geometric H H H3 H1 H2 H3 Cell-tree Index Node 0 Node 1 Node 3 Node 6 Node 4 Node 5 Node 7 Node 18 Node 8 Node 9 Node 11 Node 14 Node 12 Node 13 Node 15 Node 26 Data age 22
34 Clustering Method: Cell Tree Binary Space artitioning(bs) Decompose universe into disjoint convex subspaces Each leaf node corresponds to one of the subspaces Each tree node is stored on one disk page Cannot exploit edge information, pure geometric H H H
35 Clustering Method: Z-order Impose a total order on the nodes Z order = interleave (bits of X, bits of Y) Use B+ tree as the primary index Cannot exploit edge information, pure geometric Key 8 B + tree index Key 0 Key 1 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Key 9 Key 11 Key 12 Key 13 Key 14 Key 15 Key 18 Key 26 Node 0 Node 1 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 11 Node 12 Node 13 Node 14 Node 15 Node 18 Node 26 Data age 23
36 Clustering Method: Z-order Impose a total order on the nodes Z order = interleave (bits of X, bits of Y) Use B+ tree as the primary index Cannot exploit edge information, pure geometric
37 Experiment Design Experiment Data Set Twin-Cities Trac Data Each data object (node) { Attribute Values { Neighbor List { Size: 256 bytes CCAM Z-order Cell-tree age Size Buffer Size No of Neighbors Twin-Cities Highway Map Clustering method Sets of pages of data Sets of pages of data Global arameters Test Spatial Outlier (A2) (a) Set of random nodes (b) Nodes in a Route Buffer size No of neighbors Model Building (spatial self- join) I/O Cost Clustering Efficiency Figure 4: Experimental Layout 24
38 Experimental Observation and Results Step 1: Model Building Compute global aggregate function Compute global parameters Spatial self join structure Step 2: Test Spatial Outlier Detect outliers along a Route Detect outliers in a set of random nodes 25
39 Model Building: Eect of age Size Fixed arameters Buer Size: 64k Variable arameters age size, clustering strategy Number of page accesses Zord CELL CCAM CE Value Zord CELL CCAM age Size (K bytes) age Size (K Bytes) CCAM has the best performance CCAM has the highest CE value High CE => Low I/O cost { Cost Model: (N/Bfr)+N*K*(1-CE) Increase page size => reduce number of page accesses 26
40 Model Building: eect of Buer Size Fixed arameters age size: 2K Clustering Eciency: { CCAM=0.81, Cell=0.69, Z-ord=0.51 Variable arameters Number of buers, clustering strategy Number of page accesses Zord CELL CCAM Number of Buffers Increase Buer size => reduce number of page accesses CCAM has the best performance 27
41 $ Test Spatial Outlier (Route): Eect of age Size Average I/O cost of outlier query over 50 routes Fixed arameters Buer size: 4 Kbytes, Data point size = 256 bytes Variable arameters age size, clustering strategy Number of age Accesses Zord CELL CCAM CE Value Zord CELL CCAM age Size (K Bytes) age Size (K Bytes) Increase page size => reduce no of page accesses CCAM has the best performance CCAM has the highest CE value High CE => Low I/O cost { Cost Model: L*(1-CE) + L*K*(1-CE) Cell Tree has zero CE value when Bfr=2 Increase page size => erformance gap reduces 28
42 $ Test Spatial Outlier (Random Nodes): Eect of Buer Size Average I/O cost of outlier query over 50 sets of random nodes Fixed arameters Number of random nodes: 150, page size : 1 Kbytes CE value : CCAM = 0.68 Cell = 0.53 Z-ord = 0.31 Variable arameters Buer Size, clustering strategy Number of page accesses Zord CELL CCAM Number of Buffers Increase buer size => No eect CCAM has the best performance
43 $ Test Spatial Outlier (Random Nodes): Eect of age Size Average I/O cost over 50 sets of random nodes Fixed arameters Number of random nodes: 150 points, buer size: 4K Variable arameters age size, clustering strategy Number of age Accesses Zord CELL CCAM age Size (K Bytes) Increase page size => reduce no of page accesses CCAM has the best performance
44 $ Summary of Experimental Results Clustering Eciency CCAM achieves higher clustering eciency than Cell tree and Z-order Test parameter and test result computation CCAM has lower I/O than Cell tree and Z-order Higher CE leads to lower I/O cost CE is a good predicator of relative I/O performance age size improves clustering eciency of all methods Reduces performance gap between methods 29
45 Conclusion SO-outlier A general denition of spatial outlier Show that existing denition are special cases of SO-outlier Scatter plot, Moran Scatterplot, Spatial Statistic Develop ecient algorithm to detect spatial outlier Model Buliding Test Spatial Outlier Recognize the computation structure of spatial outlier detection algorithms Self Join Get-All-Neighbor() dominates I/O cost Develop Algebraic Cost Models Evaluate Alternate age Clustering Methods 30
46 Future Direction Extend Spatial Outlier Detection Test Multi attributes (non-location) Location attribute includes time Temporal and Spatial-Temporal Outliers Extend Experiments NASA data sets - uniform grid Hypothesis - Geometric clustering may perform well Explore other spatial patterns beyond spatial outlier Land-use classication 31
Unified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang. Pekka Maksimainen University of Helsinki 2007
Unified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang Pekka Maksimainen University of Helsinki 2007 Spatial outlier Outlier Inconsistent observation in data set
More informationSpatial Outlier Detection
Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point
More informationExploratory Data Analysis EDA
Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationAccess Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value
Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationIntroduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science
Introduction Open GEODA GIS Seminar Series 2012 Division of Spatial Information Science University of Tsukuba H.Malinda Siriwardana The GeoDa Center for Geospatial Analysis and Computation develops state
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationWhat we have covered?
What we have covered? Indexing and Hashing Data warehouse and OLAP Data Mining Information Retrieval and Web Mining XML and XQuery Spatial Databases Transaction Management 1 Lecture 6: Spatial Data Management
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationM. Andrea Rodríguez-Tastets. I Semester 2008
M. -Tastets Universidad de Concepción,Chile andrea@udec.cl I Semester 2008 Outline refers to data with a location on the Earth s surface. Examples Census data Administrative boundaries of a country, state
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationDETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS
DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS S. E. Pawar and Agwan Priyanka R. Dept. of I.T., University of Pune, Sangamner, Maharashtra, India M.E. I.T., Dept. of I.T., University of
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationNon-convex Multi-objective Optimization
Non-convex Multi-objective Optimization Multi-objective Optimization Real-world optimization problems usually involve more than one criteria multi-objective optimization. Such a kind of optimization problems
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationSegmentation and Grouping
Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation
More informationAn Introduction to Spatial Databases
An Introduction to Spatial Databases R. H. Guting VLDB Journal v3, n4, October 1994 Speaker: Giovanni Conforti Outline: a rather old (but quite complete) survey on Spatial DBMS Introduction & definition
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationIntegrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.
Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationMultidimensional Data and Modelling - DBMS
Multidimensional Data and Modelling - DBMS 1 DBMS-centric approach Summary: l Spatial data is considered as another type of data beside conventional data in a DBMS. l Enabling advantages of DBMS (data
More informationMesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee
Mesh segmentation Florent Lafarge Inria Sophia Antipolis - Mediterranee Outline What is mesh segmentation? M = {V,E,F} is a mesh S is either V, E or F (usually F) A Segmentation is a set of sub-meshes
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:
More informationSearching for Similar Trajectories in Spatial Networks
Journal of Systems and Software (accepted), 2008 Searching for Similar Trajectories in Spatial Networks E. Tiakas A.N. Papadopoulos A. Nanopoulos Y. Manolopoulos Department of Informatics, Aristotle University
More informationChapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationNearest Neighbor Queries
Nearest Neighbor Queries Nick Roussopoulos Stephen Kelley Frederic Vincent University of Maryland May 1995 Problem / Motivation Given a point in space, find the k NN classic NN queries (find the nearest
More informationCCAM: A Connectivity-Clustered Access Method for Networks and Network Computations
102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 1, JANUARY-FEBRUARY 1997 CCAM: A Connectivity-Clustered Access Method for Networks and Network Computations Shashi Shekhar, Senior Member,
More informationCCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12
Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:
More informationChap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationPSS718 - Data Mining
Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationMobility Data Management & Exploration
Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter
More informationSpatial Interpolation - Geostatistics 4/3/2018
Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places
More informationData Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationGEOGRAPHIC INFORMATION SYSTEMS Lecture 18: Spatial Modeling
Spatial Analysis in GIS (cont d) GEOGRAPHIC INFORMATION SYSTEMS Lecture 18: Spatial Modeling - the basic types of analysis that can be accomplished with a GIS are outlined in The Esri Guide to GIS Analysis
More informationDouble Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization
Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Kun Yuan, Jae-Seo Yang, David Z. Pan Dept. of Electrical and Computer Engineering The University of Texas at Austin
More informationNetwork visualization techniques and evaluation
Network visualization techniques and evaluation The Charlotte Visualization Center University of North Carolina, Charlotte March 15th 2007 Outline 1 Definition and motivation of Infovis 2 3 4 Outline 1
More informationExtracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang
Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion
More informationDS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 1 3.1 Linearization and Optimization of Functions of Vectors 1 Problem Notation 2 Outline 3.1.1 Linearization 3.1.2 Optimization of Objective Functions 3.1.3 Constrained
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationAppendix A: Scenarios
Appendix A: Scenarios Snap-Together Visualization has been used with a variety of data and visualizations that demonstrate its breadth and usefulness. Example applications include: WestGroup case law,
More informationCS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University
CS443: Digital Imaging and Multimedia Binary Image Analysis Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines A Simple Machine Vision System Image segmentation by thresholding
More informationA Mixed Fragmentation Methodology For. Initial Distributed Database Design. Shamkant B. Navathe. Georgia Institute of Technology.
A Mixed Fragmentation Methodology For Initial Distributed Database Design Shamkant B. Navathe Georgia Institute of Technology Kamalakar Karlapalem Hong Kong University of Science and Technology Minyoung
More informationLecture 22 The Generalized Lasso
Lecture 22 The Generalized Lasso 07 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Due today Problem Set 7 - Available now, please hand in by the 16th Motivation Today
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More informationNearest Neighbor Methods for Imputing Missing Data Within and Across Scales
Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales Valerie LeMay University of British Columbia, Canada and H. Temesgen, Oregon State University Presented at the Evaluation of
More informationThis research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight
This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationModule 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains:
The Lecture Contains: Data Access and Communication Data Access Artifactual Comm. Capacity Problem Temporal Locality Spatial Locality 2D to 4D Conversion Transfer Granularity Worse: False Sharing Contention
More informationScatterplot: The Bridge from Correlation to Regression
Scatterplot: The Bridge from Correlation to Regression We have already seen how a histogram is a useful technique for graphing the distribution of one variable. Here is the histogram depicting the distribution
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationSimulations of the quadrilateral-based localization
Simulations of the quadrilateral-based localization Cluster success rate v.s. node degree. Each plot represents a simulation run. 9/15/05 Jie Gao CSE590-fall05 1 Random deployment Poisson distribution
More informationTopic:- DU_J18_MA_STATS_Topic01
DU MA MSc Statistics Topic:- DU_J18_MA_STATS_Topic01 1) In analysis of variance problem involving 3 treatments with 10 observations each, SSE= 399.6. Then the MSE is equal to: [Question ID = 2313] 1. 14.8
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationStorage and Access Methods for Advanced Traveler Information Systems ... Report 9626 S~1PT O TANSPORTAT57ON E.S. Number 96-U CTS TE
Report 9626 Number 96-U S~1PT O TANSPORTAT57ON 3 0314 000235712 L il 4 4 4 4 4.* E.S... Highway Based - - - - Sensors _ Drivers Traffic Persional Report nu Devices ation Road Maps City Maps Construction
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationTDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.
Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide
More informationW[1]-hardness. Dániel Marx. Recent Advances in Parameterized Complexity Tel Aviv, Israel, December 3, 2017
1 W[1]-hardness Dániel Marx Recent Advances in Parameterized Complexity Tel Aviv, Israel, December 3, 2017 2 Lower bounds So far we have seen positive results: basic algorithmic techniques for fixed-parameter
More informationFosca Giannotti et al,.
Trajectory Pattern Mining Fosca Giannotti et al,. - Presented by Shuo Miao Conference on Knowledge discovery and data mining, 2007 OUTLINE 1. Motivation 2. T-Patterns: definition 3. T-Patterns: the approach(es)
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationGPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC
GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT
More informationFundamentals of Integer Programming
Fundamentals of Integer Programming Di Yuan Department of Information Technology, Uppsala University January 2018 Outline Definition of integer programming Formulating some classical problems with integer
More informationLatent Variable Models for Structured Prediction and Content-Based Retrieval
Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio
More informationInteractive Math Glossary Terms and Definitions
Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Addend any number or quantity being added addend + addend = sum Additive Property of Area the
More information8. Automatic Content Analysis
8. Automatic Content Analysis 8.1 Statistics for Multimedia Content Analysis 8.2 Basic Parameters for Video Analysis 8.3 Deriving Video Semantics 8.4 Basic Parameters for Audio Analysis 8.5 Deriving Audio
More informationStatistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS
Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015 Topics today YOUR REPORTS OF A-2 Thematic maps with charts
More informationInternational Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA
International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationHow and what do we see? Segmentation and Grouping. Fundamental Problems. Polyhedral objects. Reducing the combinatorics of pose estimation
Segmentation and Grouping Fundamental Problems ' Focus of attention, or grouping ' What subsets of piels do we consider as possible objects? ' All connected subsets? ' Representation ' How do we model
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS
ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/ TEXTURE ANALYSIS Texture analysis is covered very briefly in Gonzalez and Woods, pages 66 671. This handout is intended to supplement that
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,
More informationStatistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *
OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX
More informationOverview of various smoothers
Chapter 2 Overview of various smoothers A scatter plot smoother is a tool for finding structure in a scatter plot: Figure 2.1: CD4 cell count since seroconversion for HIV infected men. CD4 counts vs Time
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationChapter 3: Data Mining:
Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems
More informationCode No: R Set No. 1
Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationGeostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents
GMS 7.0 TUTORIALS 1 Introduction Two-dimensional geostatistics (interpolation) can be performed in GMS using the 2D Scatter Point module. The module is used to interpolate from sets of 2D scatter points
More informationA Survey on Spatial Co-location Patterns Discovery from Spatial Datasets
International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 A Survey on Spatial Co-location Patterns Discovery from Spatial Datasets Mr.Rushirajsinh L. Zala 1, Mr.Brijesh
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More information3.1. Solution for white Gaussian noise
Low complexity M-hypotheses detection: M vectors case Mohammed Nae and Ahmed H. Tewk Dept. of Electrical Engineering University of Minnesota, Minneapolis, MN 55455 mnae,tewk@ece.umn.edu Abstract Low complexity
More informationOverview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce
Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration
More information