Outlier Detection Methodologies Overview

Similar documents
Parallelism for Nested Loops with Non-uniform and Flow Dependences

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Comparative Study for Outlier Detection Techniques in Data Mining

Cluster Analysis of Electrical Behavior

Machine Learning: Algorithms and Applications

A Deflected Grid-based Algorithm for Clustering Analysis

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A Similarity Measure Method for Symbolization Time Series

CS 534: Computer Vision Model Fitting

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A New Approach For the Ranking of Fuzzy Sets With Different Heights

X- Chart Using ANOM Approach

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

S1 Note. Basis functions.

An Entropy-Based Approach to Integrated Information Needs Assessment

Hierarchical clustering for gene expression data analysis

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Robust Subspace Outlier Detection in High Dimensional Space

An Optimal Algorithm for Prufer Codes *

Module Management Tool in Software Development Organizations

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Study of Data Stream Clustering Based on Bio-inspired Model

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Unsupervised Learning and Clustering

A Binarization Algorithm specialized on Document Images and Photos

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Concurrent Apriori Data Mining Algorithms

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Support Vector Machines

Classifier Selection Based on Data Complexity Measures *

Detection of an Object by using Principal Component Analysis

An Image Fusion Approach Based on Segmentation Region

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Constructing Minimum Connected Dominating Set: Algorithmic approach

An Improved Image Segmentation Algorithm Based on the Otsu Method

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

The Shortest Path of Touring Lines given in the Plane

Unsupervised Learning

Performance Evaluation of Information Retrieval Systems

Video Proxy System for a Large-scale VOD System (DINA)

Local Quaternary Patterns and Feature Local Quaternary Patterns

Edge Detection in Noisy Images Using the Support Vector Machines

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Mathematics 256 a course in differential equations for engineering students

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Face Recognition Method Based on Within-class Clustering SVM

TN348: Openlab Module - Colocalization

STING : A Statistical Information Grid Approach to Spatial Data Mining

Related-Mode Attacks on CTR Encryption Mode

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

A Robust Method for Estimating the Fundamental Matrix

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Fast Computation of Shortest Path for Visiting Segments in the Plane

Unsupervised Learning and Clustering

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

Image Alignment CSC 767

A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING

Research Article. A Novel Spectral Clustering and its Application in Image Processing. Gu Ruijun*, Chen Shenglei and Wang Jiacai

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Analyzing Popular Clustering Algorithms from Different Viewpoints

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Reducing Frame Rate for Object Tracking

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

Private Information Retrieval (PIR)

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

Optimizing Document Scoring for Query Retrieval

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Optimal Workload-based Weighted Wavelet Synopses

Maintaining temporal validity of real-time data on non-continuously executing resources

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

Outlier Detection based on Robust Parameter Estimates

Clustering Algorithm of Similarity Segmentation based on Point Sorting

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

An Anti-Noise Text Categorization Method based on Support Vector Machines *

Sensors & Transducers 2015 by IFSA Publishing, S. L.

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Query Clustering Using a Hybrid Query Similarity Measure

Report on On-line Graph Coloring

Load-Balanced Anycast Routing

Wireless Sensor Network Localization Research

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Transcription:

Outler Detecton Methodologes Overvew Mohd. Noor Md. Sap Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems Unverst Teknolog Malaysa 81310 Skuda, Johor Bahru, Malaysa mohdnoor@fksm.utm.my Ehsan Moheb Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems, Unverst Teknolog Malaysa 81310 Skuda, Johor Bahru, Malaysa saeh_hamo@yahoo.com Abstract The Outler detecton problem s an mportant ssue n many safety crtcal envronments. Outlers arse due to mechancal faults, changes n system behavor, fraudulent behavor, human error, nstrument error or smply through natural devatons n populatons. The most popular outler detecton methods that have been suggested so far are densty and dstrbuton based methods that employ a metrc equaton to consder the outlers. On the other hand some methods apply neural network methodologes to keep track of outlers. In ths paper we compare recent known outler detecton technques and consder the strength and weakness of each approach separately. Keywords Outlers, Statstc, Spatal data, K-NN. 1. Introducton Outlers can be defned as gven by [1], An outler s an observaton that devates so much from other observatons as to arouse suspcon that t was generated by a dfferent mechansm. In fact Statstcal approaches were the earlest algorthms used for outler detecton, whch are suted to quanttatve real-valued data sets or at the very least quanttatve ordnal data dstrbutons. One of the earlest outler detecton methods has been suggested by [2] whch calculates a Z value as the dfference between the mean value for the attrbute and the result value s dvded by the standard devaton where the mean and standard devaton are calculated from all attrbute values. The common crteron that s beng used for outler detecton s K- Nearest Neghbor algorthm. In ths case to fnd outlers, all the neghbors of each pont should be calculated wth the complexty of, m s dmenson and n s the number of ponts. Ths method s expensve for large data sets and hgh dmensonal data sets. [3], [4] and [5] have proposed new methods to overcome ths ssue. All K-NN methods use a dstance calculaton metrc such as Eucldean or Mehalanobs dstance to measure the dstances between each pont. The later one s so expensve because t calculates the correlaton matrx ( ) between all the related pont records. One of the most popular and wdely studed clusterng methods for objects n Eucldean space proposed [6] whch s called k- means clusterng algorthm. K-means method requres the users to specfy the value of k clusters and ths model provdes a local model of data. The algorthm represents each of k clusters by a prototype vector wth attrbute values equvalent to the mean values across all ponts n the cluster. It updates cluster centers to ndcate the new nstance. In secton 2 of ths paper we dscussed dstance based outler detecton method. Densty based also wll be represented n secton 3. In secton 4 we represent two types of spatal outler detecton methodology. In the last part of ths paper we wll represent dscusson and concluson n the case of tme complexty. 2. Dstance Based Knorr and Ng (1998) presented an effcent K-NN algorthm, whch s effcent because t does not calculate all k neghbors of, only m<k neghbors wll be determned. In fact t's not senstve to computatonal growth. It means that dealng wth large data set ths method wll result n an acceptable tme complexty. The outler s defned as followng: If there are less than neghbors nsde the dstance threshold then the nstance s an outler. But the consdered shortcomng s that user should defne parameters and n advance. Ths knd of problems may be susceptble to fndng normal ponts as false outlers and vce versa. Ramaswamy (2000) ntroduced an optmzed K-NN outler detecton method, whch produce a lst of potental outlers. Ths optmzng method was just mxng K-NN wth parttonng data nto cells, Ramaswamay Used Nested loop, Index based and partton based algorthm to defne outlers. In ths method the outler s defne as below: p s an outler f no more than (n-1) ponts n data set have hgher (dstance to the neghbor), whch m s user defned. But ts complexty s not good for computatonal growth, because all K-NN must be calculated.. The result of the tme complexty for these three algorthms has been compared (see fgure 1), s number of nstances. A drawback of method that was proposed by Ramaswamay s that user has to know n advance how many outlers there are n the data set [7], because n some cases only one outler ts dstance to neghbor s so large whch s clearly n sparse space and obvously detected as outler.

Fgure 1 Performance Results for N [5] 3. Densty Based To acheve better result for fndng nterestng outlers and overcome some of the shortcomngs of dstance based method ( capture global outlers, etc...), densty based method has been proposed. M. Breung et al (2000) ntroduced the concept of local densty outlers and a measure LOF (Local Outler Factor), whch captures the degree of outler-ness of every object n the data set, to pck up local outlers. Aggrawal and Yu (2001) use a lower dmensonal projecton of data set and focus on key attrbutes. Then used an evolutonary search algorthm and The Brute-force algorthm whch examne all -dm projectons and retan the projecton whch have the most negatve sparsty coeffcent, Then usng the searchng algorthm to fnd the outlers. In ths proposed method all ponts wthn the same cell are regarded as normal objects or outlers. Therefore, ths method has a drawback that sometmes normal objects may be detected as outlers, and vce versa. G. Kollos et al (2003) proposed a densty based based samplng method to detectt, outlers. A kernel densty estmator s bult usng randomly sampled ponts to approxmately represent the densty of the data set. The estmator can be used to estmate the probablty that each data pont belongs to the data set. For each object, the functon, s defned to be the number of objects whose dstance s at most from the object x n the data set. the defnton of outlers s as followng: An object s a, -outler only f,. The proposed algorthm takes one pass over the data set to compute the densty estmator functon, and the complexty of ths step s. Snce each object n the data set has to be read once n order to compute the value of,, one full data set scan s needed. The complexty of ths step s, where s the number of samples for constructng the densty estmator. One drawback of ths method s that a large number of wll mprove the accuracy but ncrease the runnng tme complexty. In fact how good a kernel densty estmator can work n hgh- dmensonal space has not been fully explored but t seems to be less accurate. We wll dscuss a dfferent densty estmaton strategy to overcome some shortcomng of Brto s method [7]. Brto et al ( 1997) proposed a Mutual -Nearest Neghbor (MkNN) graph based approach. MkNN graph s a graph where an edge exsts between vectors and f they both belong to each other s - case of neghborhood. MkNN graph s undrected and s a specal -Nearest Neghbor (knn) graph, n whch every node has ponters to ts -nearest neghbors. Each connected component s consdered as a cluster f, t contans more than one vector and an outler when connected component contans only one vector. Potental problem wth Berto s defnton s that, an outler that s too close to an nler could be msclassfed [7]. To have a good performance and mprove the Berto s method, Hautamak et al (2004) proposed an outler detecton method usng In-degree Number (ODIN) algorthm that utlzes -nearest neghbor graph. In ths method the defnton of outler s: Gven knn graph for data set, outler s a vertex, whose n-degree s less than equal to threshold. Where s a dfferent varant of Ramaswamay s defnton,.e. t measured from maxmum knn dstances (, as followng: max 0 1, Expermental results show that ODIN makes a good performance and produces less error rate n synthetc data sets to comparson wth Berto and Ramaswamay s methodology. Bay and Schwabacher (2003) proposed an approach that can detect outlers n near lnear runnng tme wth the data set sze. Indeed, ths method s an optmzed verson of the nested loop algorthm by makng use of the technque of randomzaton and a smple prunng rule. The data set randomzed and dvded nto small blocks, and the blocks are handled one by one. For the frst block, each object s compared wth every object n the whole data set n order to compute ts score (whch s the dstance to ts nearest neghbor) ). Accordng to these scores, the top outlers n the frst block can be decded, and the score of the outler s used as a cut-off for the second block. As more blocks have been processed, more extreme outlers can be found and a larger cut-off can be used for the next block. As a result, prunng becomes more effcent after each teraton. But the procedure of randomzng the whole data set s mportant for ths method. The performance can be very poor f the data set s sorted or the objects clustered together n space also appear together n the data set fle. In fact the man shortcomng s that ths method needs to scan the whole data set tmes, where s the number of blocks. When the whole data set cannot ft n the man memory, expensve dsk scans could result n very poor performance. Even though the worse case complexty s stll, the expermental results show that ths method can acheve near lnear runnng tme. One of the shortcomngs of Knorr et al proposed method s that t cannot acheve good performance wth very large datasets and hgh dmensonal datasets. To overcome such dsadvantage, D.Ren et al (2004) mproved knorr s method by ntroducng the defnton of processng vertcal structure nstead of tradtonal horzontal structure. The defnton of neghborhood of a data pont wth the radus s defnedd as followng, where s the dataset:,,

And the defnton of outlers s:,,, 1, They proposed a vertcal by-neghbor outler detecton method wth local prunng (PODMP) 1, whch can detect outlers effcently and scale well n large datasets. The vertcal method works as follows. Frst, the dataset to be mned s represented as the set of P-Trees. Secondly, one pont n the dataset s selected arbtrarly; then, the -neghbors are searched usng the fast computaton of nequalty P-Tree, and the -neghbors are represented wth an nequalty P-Tree, whch s called a neghborhood P-Tree. In the neghborhood P-Tree, 1 means the pont s a neghbor of the pont, whle 0 means the pont s not a neghbor. Thrdly, the number of ponts n -neghbors s calculated effcently by extractng values from the root node of the neghbor P-Tree [12]. They compared the tme consumng of ther method wth nested loop (NL) as followng (see fg 2): Fgure 2 Comparson of Scalablty of NL, PODM, and PODMP [10] In fact, as concluson both the defntons of, and can only capture global outlers, because these defntons take a global vew of the data set. For a data set wth smple structure, for example, one that contans one or more clusters wth smlar densty, these two defntons work well. However, for many real world data sets whch have complex structure, the methods based on these two defntons mght not be able to fnd nterestng outlers. 4. Spatal Outler Detecton Spatal outlers are spatal objects whose non-spatal attrbute values are sgnfcantly dfferent from the value of ther neghborhoods. Spatal outler detecton methods n the lterature of spatal statstcs can be grouped nto two categores, graphcal approach and quanttatve tests. 5.1 Graphcal approach In graph based spatal outler detecton the man dea s based on graph connectvty [13]. For spatal outler detecton methods, the choce of statstcs s mportant and depends on what knd of data s consdered. The statstc that proposed s, where s attrbute functon, s the fxed set of neghbors of and s average attrbute value for neghbors of. In fact denotes the dfference of the 1 P-Tree-based outler detecton method usng prunng attrbute value of each node and the average of each neghbor. Detecton of outlers can be consders as /. and are the mean and standard devaton of all. The most costly part of the algorthm s to fnd neghbor nodes set. The I/O cost of fnd neghbor nodes set s determned by connectvty resdue rato (CRR),.e. how the nodes are grouped nto dsk pages. If the node and ts entre neghbor nodes can be resde n the same dsk page, there wll be no redundant I/O operaton requred. 5.2 Quanttatve tests Chang et.al (2003) proposed two teratve algorthms that detect outler by mult teratons and also employ a non-teratve algorthm whch uses medan as the neghborhood functon namely, teratve algorthm, teratve algorthm and medan algorthm respectvely. The frst and second algorthm compute the nearest neghbors set ( ) for each spatal pont and a neghborhood functon whch s the average attrbute values of of. Consder both algorthms, to detect the spatal outlers, the attrbute value of each pont (attrbute functon : ) wll be compared to those attrbute values of ts neghborhoods by a comparson functon. Then a pont x s an outler f s a maxmum value of the set,,, whch. It means that s an outler f compare to threshold wll be large enough. Once an outler s detected, some correctons are made mmedately, such as replacng the attrbute value of outlers by the average of ts neghbors to avod normal ponts labeled as outler canddates. In the thrd algorthm (medan), nstead of the average value, s the medan (n the ordered data set,, the medan s ) of the attrbute values n the data set :. All the three proposed algorthms wll detect true outlers more effcent than algorthem [15], Scatterplot [16] and Moran Scatterpolt algorthm [17]. The method that next ntroduced by Zhan et.al (2004), ntroduced a set of mult-attrbutve and mult-dmensonal spatal objects ( n a matrx ) each wth attrbutes correspondng n a twodmensonal matrx, could accurately detect spatal outlers after the attrbuted correlatons was calculated by, wth the attrbute functon :. Ths method also employ an attrbute mportant values set (0 9 for 1,2,, ) whch s the mportant degree of attrbutes related to dfferent attrbutes of objects n the data set,,,. Consder object, assumng the spatal objects n neghborhood of, In order to compute the dstrbuton value of neghborhoods connectng wth, an aggregate functon of attrbute correlatons s proposed. k ' Faggr ( s ) = R F ( s ) / k = 0 The estmaton of mult attrbutve set: V ( s ) = P ( F ' ( s ) F ' aggr ( s )) Accordng to the theory of mult dmensonal dstrbuton of random functon f and are the sample mean and varance of

the set, to detect the outlers concernng the set / whch s the standard value of each. So now we conclude that s extreme value n orgnal data set f s extreme n the standard data set, as before t should be compared to threshold. To gan better result n complexty of computaton of the last algorthm an auxlary secondary ndex (the dynamc ndex R-tree structure) on the top of the data fle s used to support the query operaton. The expermental test shows that the algorthm wll detects true outlers more effcent than and medan algorthm. Hung et al. (2005) ntroduced new densty based spatal outler detecton wth stochastcally searchng algorthm, named SODSS. Ths method reduced many neghborhood queres. It does not scan data base one by one to fnd the neghborhood of each spatal pont lke DBSCAN. In fact the algorthm dvdes data set nto three segments or labeled data, cluster set, canddate set and outler. Unlke the DBSCAN and GDBSCAN, once the algorthm has labeled the neghbors as a part of a cluster, t wll not examne each neghborhood for each of those neghbors. Neghborhood query could be computed n log usng data structure. Wth the new approach the complexty of computaton decreases from to log, whch s related to the threshold or maxmum numbers of neghbors and t s much smaller than. 5. Dscusson and Concluson The earlest methods that need the users to Have knowledge about the dstrbuton of data sets [4]. All the earlest method (dstance or densty based) wll result poorly as the dmenson ncreases. To have better result researchers such as [19]. The other factor s the tme complexty of exstng algorthms consderaton. Some algorthms such as nested loop (NL) [6] wll scan the data set at least twce, whch s very expensve for large data sets that the result needed mmedately. Some method presented to have a better performance n large data sets [10] [11]. The tme complexty of the known algorthms s as followng (see Table 1). Table 1 the tme complexty of exstng algorthms Algorthm Nested-loop [6] Tree Indexed Complexty log Cell Based [19] lnear n, exponental n (dmenson ) PODMP [11], where s much small than log 6. Acknowledgments I wsh to thank my supervsor Dr Mohd Noor Md Sap and revewers for ther nsghtful comments. Ths work was supported by Mnstry of scence, Technology and Innovaton grant vote 79224. 7. References [1] Hawkns (1980). Identfcaton of outlers. Chapman and Hall, London. 1980. [2] Grubbs, F. E. (1969). Procedures for detectng outlyng observatons,technometrcs,11, 1 21. [3] Aggarwal, C. C. & Yu, P. S. (2001). Outler Detecton for Hgh Dmensonal Data. Proceedngs of the ACM SIGMOD Conference 2001. [4] Knorr, E. M. & Ng, R. T. (1998). Algorthms for Mnng Dstance-Based Outlers n Large Datasets. Proceedngs of the VLDB Conference, 392 403, New York, USA. [5] Ramaswamy, S., Rastog, R. & Shm, K. (2000). Effcent Algorthms for Mnng Outlers from Large Data Sets. Proceedngs of the ACM SIGMOD Conference on Management of Data, Dallas, TX, 427 438. [6] Han and M. Kamber, Data Mnng: Concepts and Technques. The Morgan Kaufmann Seres n Data Management Systems, Jm Gray, Seres Edtor Morgan Kaufmann Publshers, 550 pages, August 2000. [7] Hautamak, Ismo Karkkanen and Pas Frant (2004). Outler Detecton Usng k-nearest Neghbor Graph. Proceedngs of the 17th Internatonal Conference on Pattern Recognton (ICPR 04). [8] Breung, M. M., Kregel, H.-P., Ng, R. T., and Sander, J., Lof: Identfyng densty-based local outlers, Proceedngs of the 2000 ACM SIGMOD Internatonal Conference on Management Data, Dallas, Texas, USA, ACM, 2000, pp. 93 104. [9] Brto, E. L. Chavez, A. J. Quroz, and J. E. Yukch. Connectvty of the mutual -nearest-neghbor graph n clusterng and outler detecton. Statstcs & Probablty Letters, 35(1):33 42, August 1997. [10] Bay, S. D. and Schwabacher, M., Mnng dstance-based outlers n near lnear tme wth randomzaton and a smple prunng rule, Proceedngs of Nnth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, Washngton, D.C. USA, 2003, pp. 29 38. [11] Ren, Imad Rahal, Wllam Perrzo (2004). A Vertcal Dstance-based Outler Detecton Method wth Local Prunng., 2004, Washngton, DC, USA. Copyrght 2004 ACM, CIKM 04 November 8-13. [12] Dng, M. Khan, A. Roy, and W. Perrzo. The P-tree algebra. Proceedngs of the ACM SAC, Symposum on Appled Computng, 2002. [13] Shekhar, Ch.T Lu, and P.Zhang. (2002). Detectng Graphbased Spatal Outlers. Intellgent Data Analyss: An Internatonal Journal, 6(5):451 468. [14] Chang-Lu, D.Cheng, and Y.Kou. (2003), Algorthms for Spatal Outler Detecton. Proceedngs of the Thrd IEEE Internatonal Conference on Data Mnng (ICDM 03) pp. 597 600. [15] Shekhar, C.-T. Lu, and P. Zhang. Detectng Graph-Based Spatal Outler: Algorthms and Applcatons (A Summary of Results). In Proc. of the Seventh ACM-SIGKDD Int l

Conference on Knowledge Dscovery and Data Mnng, Aug 2001. [16] A. Luc. Exploratory Spatal Data Analyss and Geographc Informaton Systems. In M. Panho, edtor, New Tools for Spatal Analyss, pages 45 54, 1994. [17] A. Luc. Local Indcators of Spatal Assocaton: LISA. Geographcal Analyss, 27(2):93 115, 1995. [18] Huang, X.Qn, C.Chen, and Q.Wang.(2005), Densty Based Spatal Outler Detectng. Sprnger-Verlag Berln Hedelberg, ICCS 2005, LNCS 3514, pp. 979 986. [19] Aggarwal, C. C. and Yu, P. S., An effectve and effcent algorthm for hgh-dmensonal outler detecton. VLDB J., Vol. 14, No. 2, 2005, pp. 211 2