Traffic Volume(Time v.s. Station)

Size: px
Start display at page:

Download "Traffic Volume(Time v.s. Station)"

Transcription

1 Detecting Spatial Outliers: Algorithm and Application Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota hone:

2 Outline Introduction Motivation General Denition of Spatial Outlier Related Work roposed Approach and Algorithm Evaluation of roposed Approach Conclusions

3 $ Spatial Data Mining Spatial Databases are too large to analyze manually NASA Earth Observation System(EOS) National Institute of Justice - Crime mapping Census Bureau, Dept. of Commence - Census Data Spatial Data Mining Discover frequent and interesting spatial patterns for post processing (knowledge discovery) attern examples: outliers, hot-spots, land-use classication 1

4 Spatial Outlier $ Denition A data point that is extreme relative to its neighbors Individual attribute value is not necessarily extreme in the total population, but is extreme in its adjacent area An example Item: alm Beach County, Neighbors(item) = Counties in Florida Source: 2

5 Application Domain I-35W North Bound Volume: the number of vehicles passing a station within 5 minutes Traffic Volume(Time v.s. Station) I35W Station ID (North Bound) Time 0 3

6 Application Domain Minneapolis-St. aul (Twin-Cities) Trac Data Set 930 detectors (stations) installed on major highways eriodical measuring attributes: volume, occupancy, speed Interesting spatial outliers - discontinuities { Assume smooth spatial attribute

7 $ Application Domain: Twin-Cities Trac Data Traffic Volume v.s. Time for Station 138 on 1/ Volume Time Figure 1: Station 138 on 1/ Traffic Volume v.s. Time for Station 139 on 1/ Volume Time Figure 2: Station 139 on 1/ Traffic Volume v.s. Time for Station 140 on 1/ Volume Time Figure 3: Station 140 on 1/

8 An Example of Spatial Outlier Spatial outlier: S, global outlier: G 8 7 G Original Data oints S Data oint Fitting Curve 6 Attribute Values Q D 2 1 L Location Z s(x) approach: S(x) = [f (x)? 1 k y2n(x) (f (y))] if Z s(x) = j S(x)? s j >, declare x as an outlier s 4 Outliers Detected Using Spatial Test S 3 Spatial Test Zs(x) Q Location 4

9 Evaluation of Statistical Assumption Distribution of trac station attribute f (x) looks normal 70 Volume Distribution at 10:00am on 1/15/ Number of Stations Volume Value Distribution of S(x) = [f (x)? 1 k looks normal too! y2n(x) (f (y))] Normalized Volume Difference over Spatial Neighborhood at 10:00am on 1/15/ Number of Stations Normalized Volume Difference over Spatial Neighborhood 5

10 General Denition of Spatial Outlier Given A spatial framework S consisting of locations s 1 ; s 2 ; : : : ; s n An attribute function f : s i! R (R : set of real numbers) A neighborhood relationship N S S A neighborhood aggregation function f N aggr : RN! R A dierence function F diff : R R! R A test procedure T F : R! ft rue; F alseg General denition: SO-outlier An object O 2 S is a SO-outlier (f, faggr, N F diff, T F ) if T F ff diff [f (x), faggr(f N (x), N (x))]g == TRUE Constraints F diff, T F are composed (e.g., ratio, product, linear combination) of few statistical function of f, f N aggr 6

11 $ Related Work - Outlier Detection Tests Outlier Detection Methods One-dimensional (linear) Multi-dimensional Frequency distribution over attribute value Homogeneous Dimensions Bi-partite dimension (Spatial outlier detection) Graphical Quantitative Varigram Cloud Moran Scatterplot Scatterplot Spatial Statistic Z s(x) 7

12 Related Work - Outlier Detection Tests Related work: two families 1-dimension - ignores geographic location Homogeneous Multi-dimension - mixes location with attributes Spatial outlier { 2 classes of dimensions - location, attributes { Neighborhood - based on location dimensions { Dierence - compares attribute dimensions Comparison of outlier detection methods One-dimensional ( linear ) Homogeneous Multi-dimensional Bi-partite Neighbor Definition N/A location and attribute location Comparison with population distribution location and attribute attribute values of neighbors 8

13 Related Work Dierent Spatial Outlier Tests Spatial Statistic Approach Scatter lot Approach (Luc Anselin 94) Moran Scatter plot Approach (Luc Anselin 95) Variogram Cloud Approach (Graphic) All these are special case of SO-outlier Will show this for one case Test for Outlier Detection Outlier Denition Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) Test function T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) Spatial Statistic (Z s(x) ) Scatter lot Moran Scatter lot f(x)? S(x) = [f(x)? E(x)] = E(x)? (m f(x) + b) Z i = f, I i = (E(x) = 1 k f(y)) y2n(x) (E(x) = 1 k f(y)) f Z i y2n(x) m 2 j W ijz j, j S(x)? s s j > j? j > (Z i :+, I i : -) or (Z i :-, I i : +) Table 1: Test for Outlier Detection 9

14 $ Scatter lot Approach 7 Scatter lot Average Attribute Values Over Neighborhood Q S Attribute Values Lemma Scatter plot is a special case of SO-outlier Given An attribute function f (x) A neighborhood relationship N (x) An aggregation function f N aggr : E(x) = 1 k y2n(x) f (y) A dierence function F diff : = E(x)? (m f (x) + b) Detect spatial outlier by Test function T F : j? j > 10

15 $ Moran Scatterplot Approach X axis : Z i = f(x)? standardized deviation units Y axis (local moran value ): I i = Z i { where m 2 = Four clusters j Z2 j, j W ij = 1 m 2 j W ijz j { High-high: High attribute value, high attribute value neighbors (ositive association) { High-low: High attribute value, low attribute value neighbors (Negative association) { low-high: Low attribute value, high attribute values neighbors (Negative association) { low-low: Low attribute value, low attribute value neighbors (ositive association) Spatial outliers { High low; Low high Weighted Neighbor Z Score of Attribute Values Moran Scatter lot Q S Z Score of Attribute Values

16 Moran Scatter lot Approach Lemma Moran Scatter plot is a special case of SO-outlier Given An attribute function f (x) A neighborhood relationship N (x): neighbor matrix W ij Dierence functions Z i = f(x)? f f, I i = Z i m 2 j W ijz j Detect Spatial Outlier by Test function (Z i : +, I i : -) or (Z i : -, I i : +)

17 Variogram Cloud Approach Approach X axis: airwise distance between neighbors Y axis: Square root dierence of attribute value Spatial outlier { Locations that are near to one another { Locations with large attribute dierences An example { S1 : relation of S and Q node { S2 : relation of S and node Square Root of Absolute Difference of Attribute Values (Q,S) (,S) Variogram Cloud airwise Distance

18 $ Outline Introduction roposed Approach and Algorithm roblem formulation Our approach Ecient algorithm Cost model Evaluation of roposed Approach Conclusions

19 $ roblem Formulation Given A spatial framework S consisting of locations s 1 ; s 2 ; : : : ; s n An attribute function f : s i! R A neighborhood relationship N S S An aggregation function faggr N : RN! R A dierence function F diff : R R! R A test procedure T F : R! ft rue; F alseg General denition: SO-outlier An object O 2 S is a SO-outlier (f, faggr, N F diff, T F ) if T F ff diff [f (x), faggr(f N (x), N (x))]g == TRUE Design An ecient algorithm to detect SO-outier, i,e., O = fs i j s i 2 S; s i is a spatial outlierg Objective Eciency: to minimize the computation time Constraints F diff, T F are composed (e.g., ratio, product, linear combination) of few statistical function of f, f N aggr The size of the data set the main memory size Computation time is determined by I/O time 11

20 Our Approach Separate two phases Model Building Test a node (or a set of nodes) Computation Structure of Model Building Key insights: { Need a few statistics of f (x) and f N aggr Spatial self join using N (x) relationship Computation Structure of Testing Single node: spatial range query { Get All Neighbors(x) operation A given set of nodes { Sequence of Get All Neighbor(x) 12

21 An Example of Our Approach Consider Scatter lot Model Building Neighborhood aggregate function f N aggr : E(x) = 1 k Global aggregate function f G1 aggr, f G2 aggr,.., f Gk aggr y2n(x) f (y) { f (x); E(x); f (x)e(x); f 2 (x); E 2 (x) Global parameter G1 aggr, G2 aggr,.., Gk aggr { m = N f(x)e(x)? f(x) E(x) N f 2 (x)?( f(x) 2 f(x) E { b 2 (x)? f(x) f(x)e(x) = N f 2 (x)?( f(x)) q 2 S { yy?(m 2 S xx ) =, (n?2) { where S xx = { and S yy = f 2 (x)? [ ( f(x)) 2 E 2 (x)? [ ( E(x)) 2 n n ] ] Testing Dierence function { = E(x)? (m f (x) + b) { where E(x) = 1 k Test function { j? j > y2n(x) f (y) 13

22 Scatter lot X axis: Attribute value of each point Y axis: Average attribute value in adjacent area Run a linear square regression on the plot { Regression line y = mx + b { m = N x i y i? x i yi N x 2?( x i i ) 2 { b = xi yi 2? xi xi y i N x 2 i?( x i ) 2 { = ^y? (mx + b) { Standard deviation = { Mean = 0 { S xx = { S yy = x 2 i? [ ( x i ) 2 n y 2 i? [ ( y i ) 2 = ^y? (mx + b) n ] ] q S yy?(a 2 S xx ) (N?2) { if standardized residual j? j > { Declare x as an outlier

23 Spatial Outlier Detection: Summary Model Building Compute simple statistics on f (x); f N aggr Details in Table 2 Computable in a single spatial-self-join on N (x) Outlier Denition Attribute function f Neighborhood aggregate function Model Building Spatial Statistic Scatter lot Moran Scatter lot f(x) f(x) f(x) S(x) = f(x)? E(x) E(x) = 1 k y2n(x) f(y) f N aggr Global function faggr G1,.., faggr Gk Global parameter aggr G1,.., Gk aggr f G2 aggr, aggregate G2 aggr, S(x), S 2 (x) S(x) s =, n s = q 1 [ S 2 (x) ( S(x)) 2? ] n n f(x), f(x)e(x), E 2 (x) E(x), f 2 (x), m = N f(x)e(x)? f(x) E(x), N f 2 (x)?( f(x) 2 b = f(x) E 2 (x)? f(x) f(x)e(x), N f 2 (x)?( f(x)) 2 = q S yy?(m 2 S xx), where (n?2) S xx = f 2 (x)? [ ( f(x)) 2 n ], S yy = E 2 (x)? [ ( E(x)) 2 n ], f(x), f 2 (x) f(x) f =, n f = q 1 [ f 2 (x) ( f(x)) 2? ] n n Table 2: Model Building 14

24 Spatial Outlier Detection: Summary Test for Outlier Detection Outlier Denition Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) Test function T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) Spatial Statistic (Z s(x) ) Scatter lot Moran Scatter lot f(x)? S(x) = [f(x)? E(x)] = E(x)? (m f(x) + b) Z i = f, I i = (E(x) = 1 k f(y)) y2n(x) (E(x) = 1 k f(y)) f Z i y2n(x) m 2 j W ijz j, j S(x)? s s j > j? j > (Z i :+, I i : -) or (Z i :-, I i : +) Table 3: Test for Outlier Detection 15

25 Model Building Algorithm Algorithm A1: steps For each location x { Retrieve data record of x = (f(x), identities of neighbors(x)) { Get-All-Neighbors(x): Retrieve data records of neighbor(x) if neighbor y is not in the memory buer request another I/O operation { Compute neighborhood aggregate function f N aggr { Accumulate global aggregate function: f G1 aggr, f G2 aggr,.., f Gk aggr Compute global parameter G1 aggr, G2 aggr,.., Gk aggr I/O cost is determined by Dominant operation: Get-All-Neighbors(x) I/O cost of Get-All-Neighbors(x) is determined by the clustering eciency { Grouping nodes into disk page 16

26 $ Test Algorithm Algorithm A2(a): steps For each location x along a route { Retrieve data record of x = (f(x), identies of neighbors(x)) { Get-All-Neighbors(x): Retrieve data records of neighbor(x) if neighbor y is not in the memory buer request another I/O operation { Compute Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) if T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) == T rue { Declare x as an outlier Algorithm A2(b): steps For each location x within the random set { Retrieve data record of x = (f(x), identities of neighbors(x)) { Get-All-Neighbors(x): Retrieve data records of neighbor(x) if neighbor y is not in the memory buer request another I/O operation { Compute Dierence function F diff (f; f N aggr, f G2 aggr, : : :, f Gk aggr ) if T F (f; f N aggr, f G2 aggr, : : :, f Gk aggr, ) == T rue { Declare x as an outlier 17

27 I/O Cost Model Denition CE: Clustering Eciency N: Total number of nodes Bfr: Blocking factor (number of nodes in a page) K: Number of neighbors for each node L: Number of nodes in a route R: Number of nodes in a random set Assume two memory buers Cost model of A1 (N/Bfr)+N*K*(1-CE) { The cost to retrieve all nodes: (N/Bfr) { The cost to retrieve neighbors of all nodes: N*K*(1-CE) Cost model of A2(a) L*(1-CE) + L*K*(1-CE) Cost model of A2(b) R+R*K(1-CE) 18

28 Clustering Eciency arameter Computation cost (I/O cost) is determined by Clustering Eciency(CE) CE denition: { (Total number of unsplit edges)/(total number of edges) { robability[v i and a neighbor of v i are stored in the same disk page] An example { CE = 9?3 9 = 6 9 = 0:66 age B V1 V3 V2 V4 V5 V6 V7 V8 age A V9 age C CE depends on { Disk block size { Data set size, data set distribution { Clustering method 19

29 $ Outline Introduction roposed Approach and Algorithm Evaluation of roposed Approach Candidates (Clustering Methods) Experiment Design Results Conclusions

30 Experimental Evaluation (Summary) Hypothesis: I/O cost of the algorithms is determined by the clustering eciency hysical Data age Clustering Method Graph-based method: CCAM Geometric method: Cell Tree Geometric method: Z-order Metrics: Clustering Eciency (CE), I/O cost Benchmark data Minneapolis - St. aul Trac data Benchmark tasks Model Building Test Spatial Outlier (Route) Test Spatial Outlier (Set of Random Nodes) Variable arameters Buer size Disk page size 20

31 Clustering Method: CCAM Connectivity Clustered Access Method Cluster the nodes via min-cut graph partitioning Use B+ tree with Z-order as the secondary index Key 8 Key 0 Key 1 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Node 0 Node 1 Node 4 Node 5 Node 3 Node 6 Node 8 Node B tree index Key 9 Key 11 Key 12 Key 13 Key 14 Key 15 Key 18 Key 26 Node 7 Node 12 Node 13 Node 18 Node 11 Node 14 Node 15 Node 26 Data age 21

32 Clustering Method: CCAM Connectivity Clustered Access Method Cluster the nodes via min-cut graph partitioning Use B+ tree with Z-order as the secondary index

33 Clustering Method: Cell Tree Binary Space artitioning(bs) Decompose universe into disjoint convex subspaces Each leaf node corresponds to one of the subspaces Each tree node is stored on one disk page Cannot exploit edge information, pure geometric H H H3 H1 H2 H3 Cell-tree Index Node 0 Node 1 Node 3 Node 6 Node 4 Node 5 Node 7 Node 18 Node 8 Node 9 Node 11 Node 14 Node 12 Node 13 Node 15 Node 26 Data age 22

34 Clustering Method: Cell Tree Binary Space artitioning(bs) Decompose universe into disjoint convex subspaces Each leaf node corresponds to one of the subspaces Each tree node is stored on one disk page Cannot exploit edge information, pure geometric H H H

35 Clustering Method: Z-order Impose a total order on the nodes Z order = interleave (bits of X, bits of Y) Use B+ tree as the primary index Cannot exploit edge information, pure geometric Key 8 B + tree index Key 0 Key 1 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Key 9 Key 11 Key 12 Key 13 Key 14 Key 15 Key 18 Key 26 Node 0 Node 1 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 11 Node 12 Node 13 Node 14 Node 15 Node 18 Node 26 Data age 23

36 Clustering Method: Z-order Impose a total order on the nodes Z order = interleave (bits of X, bits of Y) Use B+ tree as the primary index Cannot exploit edge information, pure geometric

37 Experiment Design Experiment Data Set Twin-Cities Trac Data Each data object (node) { Attribute Values { Neighbor List { Size: 256 bytes CCAM Z-order Cell-tree age Size Buffer Size No of Neighbors Twin-Cities Highway Map Clustering method Sets of pages of data Sets of pages of data Global arameters Test Spatial Outlier (A2) (a) Set of random nodes (b) Nodes in a Route Buffer size No of neighbors Model Building (spatial self- join) I/O Cost Clustering Efficiency Figure 4: Experimental Layout 24

38 Experimental Observation and Results Step 1: Model Building Compute global aggregate function Compute global parameters Spatial self join structure Step 2: Test Spatial Outlier Detect outliers along a Route Detect outliers in a set of random nodes 25

39 Model Building: Eect of age Size Fixed arameters Buer Size: 64k Variable arameters age size, clustering strategy Number of page accesses Zord CELL CCAM CE Value Zord CELL CCAM age Size (K bytes) age Size (K Bytes) CCAM has the best performance CCAM has the highest CE value High CE => Low I/O cost { Cost Model: (N/Bfr)+N*K*(1-CE) Increase page size => reduce number of page accesses 26

40 Model Building: eect of Buer Size Fixed arameters age size: 2K Clustering Eciency: { CCAM=0.81, Cell=0.69, Z-ord=0.51 Variable arameters Number of buers, clustering strategy Number of page accesses Zord CELL CCAM Number of Buffers Increase Buer size => reduce number of page accesses CCAM has the best performance 27

41 $ Test Spatial Outlier (Route): Eect of age Size Average I/O cost of outlier query over 50 routes Fixed arameters Buer size: 4 Kbytes, Data point size = 256 bytes Variable arameters age size, clustering strategy Number of age Accesses Zord CELL CCAM CE Value Zord CELL CCAM age Size (K Bytes) age Size (K Bytes) Increase page size => reduce no of page accesses CCAM has the best performance CCAM has the highest CE value High CE => Low I/O cost { Cost Model: L*(1-CE) + L*K*(1-CE) Cell Tree has zero CE value when Bfr=2 Increase page size => erformance gap reduces 28

42 $ Test Spatial Outlier (Random Nodes): Eect of Buer Size Average I/O cost of outlier query over 50 sets of random nodes Fixed arameters Number of random nodes: 150, page size : 1 Kbytes CE value : CCAM = 0.68 Cell = 0.53 Z-ord = 0.31 Variable arameters Buer Size, clustering strategy Number of page accesses Zord CELL CCAM Number of Buffers Increase buer size => No eect CCAM has the best performance

43 $ Test Spatial Outlier (Random Nodes): Eect of age Size Average I/O cost over 50 sets of random nodes Fixed arameters Number of random nodes: 150 points, buer size: 4K Variable arameters age size, clustering strategy Number of age Accesses Zord CELL CCAM age Size (K Bytes) Increase page size => reduce no of page accesses CCAM has the best performance

44 $ Summary of Experimental Results Clustering Eciency CCAM achieves higher clustering eciency than Cell tree and Z-order Test parameter and test result computation CCAM has lower I/O than Cell tree and Z-order Higher CE leads to lower I/O cost CE is a good predicator of relative I/O performance age size improves clustering eciency of all methods Reduces performance gap between methods 29

45 Conclusion SO-outlier A general denition of spatial outlier Show that existing denition are special cases of SO-outlier Scatter plot, Moran Scatterplot, Spatial Statistic Develop ecient algorithm to detect spatial outlier Model Buliding Test Spatial Outlier Recognize the computation structure of spatial outlier detection algorithms Self Join Get-All-Neighbor() dominates I/O cost Develop Algebraic Cost Models Evaluate Alternate age Clustering Methods 30

46 Future Direction Extend Spatial Outlier Detection Test Multi attributes (non-location) Location attribute includes time Temporal and Spatial-Temporal Outliers Extend Experiments NASA data sets - uniform grid Hypothesis - Geometric clustering may perform well Explore other spatial patterns beyond spatial outlier Land-use classication 31

Unified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang. Pekka Maksimainen University of Helsinki 2007

Unified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang. Pekka Maksimainen University of Helsinki 2007 Unified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang Pekka Maksimainen University of Helsinki 2007 Spatial outlier Outlier Inconsistent observation in data set

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Exploratory Data Analysis EDA

Exploratory Data Analysis EDA Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Introduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science

Introduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science Introduction Open GEODA GIS Seminar Series 2012 Division of Spatial Information Science University of Tsukuba H.Malinda Siriwardana The GeoDa Center for Geospatial Analysis and Computation develops state

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

What we have covered?

What we have covered? What we have covered? Indexing and Hashing Data warehouse and OLAP Data Mining Information Retrieval and Web Mining XML and XQuery Spatial Databases Transaction Management 1 Lecture 6: Spatial Data Management

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

M. Andrea Rodríguez-Tastets. I Semester 2008

M. Andrea Rodríguez-Tastets. I Semester 2008 M. -Tastets Universidad de Concepción,Chile andrea@udec.cl I Semester 2008 Outline refers to data with a location on the Earth s surface. Examples Census data Administrative boundaries of a country, state

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS

DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS S. E. Pawar and Agwan Priyanka R. Dept. of I.T., University of Pune, Sangamner, Maharashtra, India M.E. I.T., Dept. of I.T., University of

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Non-convex Multi-objective Optimization

Non-convex Multi-objective Optimization Non-convex Multi-objective Optimization Multi-objective Optimization Real-world optimization problems usually involve more than one criteria multi-objective optimization. Such a kind of optimization problems

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Segmentation and Grouping

Segmentation and Grouping Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation

More information

An Introduction to Spatial Databases

An Introduction to Spatial Databases An Introduction to Spatial Databases R. H. Guting VLDB Journal v3, n4, October 1994 Speaker: Giovanni Conforti Outline: a rather old (but quite complete) survey on Spatial DBMS Introduction & definition

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties. Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Multidimensional Data and Modelling - DBMS

Multidimensional Data and Modelling - DBMS Multidimensional Data and Modelling - DBMS 1 DBMS-centric approach Summary: l Spatial data is considered as another type of data beside conventional data in a DBMS. l Enabling advantages of DBMS (data

More information

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee Mesh segmentation Florent Lafarge Inria Sophia Antipolis - Mediterranee Outline What is mesh segmentation? M = {V,E,F} is a mesh S is either V, E or F (usually F) A Segmentation is a set of sub-meshes

More information

Clustering in Ratemaking: Applications in Territories Clustering

Clustering in Ratemaking: Applications in Territories Clustering Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:

More information

Searching for Similar Trajectories in Spatial Networks

Searching for Similar Trajectories in Spatial Networks Journal of Systems and Software (accepted), 2008 Searching for Similar Trajectories in Spatial Networks E. Tiakas A.N. Papadopoulos A. Nanopoulos Y. Manolopoulos Department of Informatics, Aristotle University

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

Nearest Neighbor Queries

Nearest Neighbor Queries Nearest Neighbor Queries Nick Roussopoulos Stephen Kelley Frederic Vincent University of Maryland May 1995 Problem / Motivation Given a point in space, find the k NN classic NN queries (find the nearest

More information

CCAM: A Connectivity-Clustered Access Method for Networks and Network Computations

CCAM: A Connectivity-Clustered Access Method for Networks and Network Computations 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 9, NO. 1, JANUARY-FEBRUARY 1997 CCAM: A Connectivity-Clustered Access Method for Networks and Network Computations Shashi Shekhar, Senior Member,

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary

Chap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

PSS718 - Data Mining

PSS718 - Data Mining Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Mobility Data Management & Exploration

Mobility Data Management & Exploration Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter

More information

Spatial Interpolation - Geostatistics 4/3/2018

Spatial Interpolation - Geostatistics 4/3/2018 Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places

More information

Data Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

GEOGRAPHIC INFORMATION SYSTEMS Lecture 18: Spatial Modeling

GEOGRAPHIC INFORMATION SYSTEMS Lecture 18: Spatial Modeling Spatial Analysis in GIS (cont d) GEOGRAPHIC INFORMATION SYSTEMS Lecture 18: Spatial Modeling - the basic types of analysis that can be accomplished with a GIS are outlined in The Esri Guide to GIS Analysis

More information

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Kun Yuan, Jae-Seo Yang, David Z. Pan Dept. of Electrical and Computer Engineering The University of Texas at Austin

More information

Network visualization techniques and evaluation

Network visualization techniques and evaluation Network visualization techniques and evaluation The Charlotte Visualization Center University of North Carolina, Charlotte March 15th 2007 Outline 1 Definition and motivation of Infovis 2 3 4 Outline 1

More information

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 1 3.1 Linearization and Optimization of Functions of Vectors 1 Problem Notation 2 Outline 3.1.1 Linearization 3.1.2 Optimization of Objective Functions 3.1.3 Constrained

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Appendix A: Scenarios

Appendix A: Scenarios Appendix A: Scenarios Snap-Together Visualization has been used with a variety of data and visualizations that demonstrate its breadth and usefulness. Example applications include: WestGroup case law,

More information

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University CS443: Digital Imaging and Multimedia Binary Image Analysis Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines A Simple Machine Vision System Image segmentation by thresholding

More information

A Mixed Fragmentation Methodology For. Initial Distributed Database Design. Shamkant B. Navathe. Georgia Institute of Technology.

A Mixed Fragmentation Methodology For. Initial Distributed Database Design. Shamkant B. Navathe. Georgia Institute of Technology. A Mixed Fragmentation Methodology For Initial Distributed Database Design Shamkant B. Navathe Georgia Institute of Technology Kamalakar Karlapalem Hong Kong University of Science and Technology Minyoung

More information

Lecture 22 The Generalized Lasso

Lecture 22 The Generalized Lasso Lecture 22 The Generalized Lasso 07 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Due today Problem Set 7 - Available now, please hand in by the 16th Motivation Today

More information

Course on Microarray Gene Expression Analysis

Course on Microarray Gene Expression Analysis Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales

Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales Valerie LeMay University of British Columbia, Canada and H. Temesgen, Oregon State University Presented at the Evaluation of

More information

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 9: Performance Issues in Shared Memory. The Lecture Contains: The Lecture Contains: Data Access and Communication Data Access Artifactual Comm. Capacity Problem Temporal Locality Spatial Locality 2D to 4D Conversion Transfer Granularity Worse: False Sharing Contention

More information

Scatterplot: The Bridge from Correlation to Regression

Scatterplot: The Bridge from Correlation to Regression Scatterplot: The Bridge from Correlation to Regression We have already seen how a histogram is a useful technique for graphing the distribution of one variable. Here is the histogram depicting the distribution

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Simulations of the quadrilateral-based localization

Simulations of the quadrilateral-based localization Simulations of the quadrilateral-based localization Cluster success rate v.s. node degree. Each plot represents a simulation run. 9/15/05 Jie Gao CSE590-fall05 1 Random deployment Poisson distribution

More information

Topic:- DU_J18_MA_STATS_Topic01

Topic:- DU_J18_MA_STATS_Topic01 DU MA MSc Statistics Topic:- DU_J18_MA_STATS_Topic01 1) In analysis of variance problem involving 3 treatments with 10 observations each, SSE= 399.6. Then the MSE is equal to: [Question ID = 2313] 1. 14.8

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Storage and Access Methods for Advanced Traveler Information Systems ... Report 9626 S~1PT O TANSPORTAT57ON E.S. Number 96-U CTS TE

Storage and Access Methods for Advanced Traveler Information Systems ... Report 9626 S~1PT O TANSPORTAT57ON E.S. Number 96-U CTS TE Report 9626 Number 96-U S~1PT O TANSPORTAT57ON 3 0314 000235712 L il 4 4 4 4 4.* E.S... Highway Based - - - - Sensors _ Drivers Traffic Persional Report nu Devices ation Road Maps City Maps Construction

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

W[1]-hardness. Dániel Marx. Recent Advances in Parameterized Complexity Tel Aviv, Israel, December 3, 2017

W[1]-hardness. Dániel Marx. Recent Advances in Parameterized Complexity Tel Aviv, Israel, December 3, 2017 1 W[1]-hardness Dániel Marx Recent Advances in Parameterized Complexity Tel Aviv, Israel, December 3, 2017 2 Lower bounds So far we have seen positive results: basic algorithmic techniques for fixed-parameter

More information

Fosca Giannotti et al,.

Fosca Giannotti et al,. Trajectory Pattern Mining Fosca Giannotti et al,. - Presented by Shuo Miao Conference on Knowledge discovery and data mining, 2007 OUTLINE 1. Motivation 2. T-Patterns: definition 3. T-Patterns: the approach(es)

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Fundamentals of Integer Programming

Fundamentals of Integer Programming Fundamentals of Integer Programming Di Yuan Department of Information Technology, Uppsala University January 2018 Outline Definition of integer programming Formulating some classical problems with integer

More information

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Latent Variable Models for Structured Prediction and Content-Based Retrieval Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio

More information

Interactive Math Glossary Terms and Definitions

Interactive Math Glossary Terms and Definitions Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Addend any number or quantity being added addend + addend = sum Additive Property of Area the

More information

8. Automatic Content Analysis

8. Automatic Content Analysis 8. Automatic Content Analysis 8.1 Statistics for Multimedia Content Analysis 8.2 Basic Parameters for Video Analysis 8.3 Deriving Video Semantics 8.4 Basic Parameters for Audio Analysis 8.5 Deriving Audio

More information

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015 Topics today YOUR REPORTS OF A-2 Thematic maps with charts

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

How and what do we see? Segmentation and Grouping. Fundamental Problems. Polyhedral objects. Reducing the combinatorics of pose estimation

How and what do we see? Segmentation and Grouping. Fundamental Problems. Polyhedral objects. Reducing the combinatorics of pose estimation Segmentation and Grouping Fundamental Problems ' Focus of attention, or grouping ' What subsets of piels do we consider as possible objects? ' All connected subsets? ' Representation ' How do we model

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

Mining di Dati Web. Lezione 3 - Clustering and Classification

Mining di Dati Web. Lezione 3 - Clustering and Classification Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised

More information

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/ TEXTURE ANALYSIS Texture analysis is covered very briefly in Gonzalez and Woods, pages 66 671. This handout is intended to supplement that

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX

More information

Overview of various smoothers

Overview of various smoothers Chapter 2 Overview of various smoothers A scatter plot smoother is a tool for finding structure in a scatter plot: Figure 2.1: CD4 cell count since seroconversion for HIV infected men. CD4 counts vs Time

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Chapter 3: Data Mining:

Chapter 3: Data Mining: Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

Lecture Topic Projects

Lecture Topic Projects Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data

More information

Geostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents

Geostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents GMS 7.0 TUTORIALS 1 Introduction Two-dimensional geostatistics (interpolation) can be performed in GMS using the 2D Scatter Point module. The module is used to interpolate from sets of 2D scatter points

More information

A Survey on Spatial Co-location Patterns Discovery from Spatial Datasets

A Survey on Spatial Co-location Patterns Discovery from Spatial Datasets International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 A Survey on Spatial Co-location Patterns Discovery from Spatial Datasets Mr.Rushirajsinh L. Zala 1, Mr.Brijesh

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

3.1. Solution for white Gaussian noise

3.1. Solution for white Gaussian noise Low complexity M-hypotheses detection: M vectors case Mohammed Nae and Ahmed H. Tewk Dept. of Electrical Engineering University of Minnesota, Minneapolis, MN 55455 mnae,tewk@ece.umn.edu Abstract Low complexity

More information

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration

More information