Outlier Detection Methodologies Overview

Size: px
Start display at page:

Download "Outlier Detection Methodologies Overview"

Transcription

1 Outler Detecton Methodologes Overvew Mohd. Noor Md. Sap Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems Unverst Teknolog Malaysa Skuda, Johor Bahru, Malaysa Ehsan Moheb Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems, Unverst Teknolog Malaysa Skuda, Johor Bahru, Malaysa Abstract The Outler detecton problem s an mportant ssue n many safety crtcal envronments. Outlers arse due to mechancal faults, changes n system behavor, fraudulent behavor, human error, nstrument error or smply through natural devatons n populatons. The most popular outler detecton methods that have been suggested so far are densty and dstrbuton based methods that employ a metrc equaton to consder the outlers. On the other hand some methods apply neural network methodologes to keep track of outlers. In ths paper we compare recent known outler detecton technques and consder the strength and weakness of each approach separately. Keywords Outlers, Statstc, Spatal data, K-NN. 1. Introducton Outlers can be defned as gven by [1], An outler s an observaton that devates so much from other observatons as to arouse suspcon that t was generated by a dfferent mechansm. In fact Statstcal approaches were the earlest algorthms used for outler detecton, whch are suted to quanttatve real-valued data sets or at the very least quanttatve ordnal data dstrbutons. One of the earlest outler detecton methods has been suggested by [2] whch calculates a Z value as the dfference between the mean value for the attrbute and the result value s dvded by the standard devaton where the mean and standard devaton are calculated from all attrbute values. The common crteron that s beng used for outler detecton s K- Nearest Neghbor algorthm. In ths case to fnd outlers, all the neghbors of each pont should be calculated wth the complexty of, m s dmenson and n s the number of ponts. Ths method s expensve for large data sets and hgh dmensonal data sets. [3], [4] and [5] have proposed new methods to overcome ths ssue. All K-NN methods use a dstance calculaton metrc such as Eucldean or Mehalanobs dstance to measure the dstances between each pont. The later one s so expensve because t calculates the correlaton matrx ( ) between all the related pont records. One of the most popular and wdely studed clusterng methods for objects n Eucldean space proposed [6] whch s called k- means clusterng algorthm. K-means method requres the users to specfy the value of k clusters and ths model provdes a local model of data. The algorthm represents each of k clusters by a prototype vector wth attrbute values equvalent to the mean values across all ponts n the cluster. It updates cluster centers to ndcate the new nstance. In secton 2 of ths paper we dscussed dstance based outler detecton method. Densty based also wll be represented n secton 3. In secton 4 we represent two types of spatal outler detecton methodology. In the last part of ths paper we wll represent dscusson and concluson n the case of tme complexty. 2. Dstance Based Knorr and Ng (1998) presented an effcent K-NN algorthm, whch s effcent because t does not calculate all k neghbors of, only m<k neghbors wll be determned. In fact t's not senstve to computatonal growth. It means that dealng wth large data set ths method wll result n an acceptable tme complexty. The outler s defned as followng: If there are less than neghbors nsde the dstance threshold then the nstance s an outler. But the consdered shortcomng s that user should defne parameters and n advance. Ths knd of problems may be susceptble to fndng normal ponts as false outlers and vce versa. Ramaswamy (2000) ntroduced an optmzed K-NN outler detecton method, whch produce a lst of potental outlers. Ths optmzng method was just mxng K-NN wth parttonng data nto cells, Ramaswamay Used Nested loop, Index based and partton based algorthm to defne outlers. In ths method the outler s defne as below: p s an outler f no more than (n-1) ponts n data set have hgher (dstance to the neghbor), whch m s user defned. But ts complexty s not good for computatonal growth, because all K-NN must be calculated.. The result of the tme complexty for these three algorthms has been compared (see fgure 1), s number of nstances. A drawback of method that was proposed by Ramaswamay s that user has to know n advance how many outlers there are n the data set [7], because n some cases only one outler ts dstance to neghbor s so large whch s clearly n sparse space and obvously detected as outler.

2 Fgure 1 Performance Results for N [5] 3. Densty Based To acheve better result for fndng nterestng outlers and overcome some of the shortcomngs of dstance based method ( capture global outlers, etc...), densty based method has been proposed. M. Breung et al (2000) ntroduced the concept of local densty outlers and a measure LOF (Local Outler Factor), whch captures the degree of outler-ness of every object n the data set, to pck up local outlers. Aggrawal and Yu (2001) use a lower dmensonal projecton of data set and focus on key attrbutes. Then used an evolutonary search algorthm and The Brute-force algorthm whch examne all -dm projectons and retan the projecton whch have the most negatve sparsty coeffcent, Then usng the searchng algorthm to fnd the outlers. In ths proposed method all ponts wthn the same cell are regarded as normal objects or outlers. Therefore, ths method has a drawback that sometmes normal objects may be detected as outlers, and vce versa. G. Kollos et al (2003) proposed a densty based based samplng method to detectt, outlers. A kernel densty estmator s bult usng randomly sampled ponts to approxmately represent the densty of the data set. The estmator can be used to estmate the probablty that each data pont belongs to the data set. For each object, the functon, s defned to be the number of objects whose dstance s at most from the object x n the data set. the defnton of outlers s as followng: An object s a, -outler only f,. The proposed algorthm takes one pass over the data set to compute the densty estmator functon, and the complexty of ths step s. Snce each object n the data set has to be read once n order to compute the value of,, one full data set scan s needed. The complexty of ths step s, where s the number of samples for constructng the densty estmator. One drawback of ths method s that a large number of wll mprove the accuracy but ncrease the runnng tme complexty. In fact how good a kernel densty estmator can work n hgh- dmensonal space has not been fully explored but t seems to be less accurate. We wll dscuss a dfferent densty estmaton strategy to overcome some shortcomng of Brto s method [7]. Brto et al ( 1997) proposed a Mutual -Nearest Neghbor (MkNN) graph based approach. MkNN graph s a graph where an edge exsts between vectors and f they both belong to each other s - case of neghborhood. MkNN graph s undrected and s a specal -Nearest Neghbor (knn) graph, n whch every node has ponters to ts -nearest neghbors. Each connected component s consdered as a cluster f, t contans more than one vector and an outler when connected component contans only one vector. Potental problem wth Berto s defnton s that, an outler that s too close to an nler could be msclassfed [7]. To have a good performance and mprove the Berto s method, Hautamak et al (2004) proposed an outler detecton method usng In-degree Number (ODIN) algorthm that utlzes -nearest neghbor graph. In ths method the defnton of outler s: Gven knn graph for data set, outler s a vertex, whose n-degree s less than equal to threshold. Where s a dfferent varant of Ramaswamay s defnton,.e. t measured from maxmum knn dstances (, as followng: max 0 1, Expermental results show that ODIN makes a good performance and produces less error rate n synthetc data sets to comparson wth Berto and Ramaswamay s methodology. Bay and Schwabacher (2003) proposed an approach that can detect outlers n near lnear runnng tme wth the data set sze. Indeed, ths method s an optmzed verson of the nested loop algorthm by makng use of the technque of randomzaton and a smple prunng rule. The data set randomzed and dvded nto small blocks, and the blocks are handled one by one. For the frst block, each object s compared wth every object n the whole data set n order to compute ts score (whch s the dstance to ts nearest neghbor) ). Accordng to these scores, the top outlers n the frst block can be decded, and the score of the outler s used as a cut-off for the second block. As more blocks have been processed, more extreme outlers can be found and a larger cut-off can be used for the next block. As a result, prunng becomes more effcent after each teraton. But the procedure of randomzng the whole data set s mportant for ths method. The performance can be very poor f the data set s sorted or the objects clustered together n space also appear together n the data set fle. In fact the man shortcomng s that ths method needs to scan the whole data set tmes, where s the number of blocks. When the whole data set cannot ft n the man memory, expensve dsk scans could result n very poor performance. Even though the worse case complexty s stll, the expermental results show that ths method can acheve near lnear runnng tme. One of the shortcomngs of Knorr et al proposed method s that t cannot acheve good performance wth very large datasets and hgh dmensonal datasets. To overcome such dsadvantage, D.Ren et al (2004) mproved knorr s method by ntroducng the defnton of processng vertcal structure nstead of tradtonal horzontal structure. The defnton of neghborhood of a data pont wth the radus s defnedd as followng, where s the dataset:,,

3 And the defnton of outlers s:,,, 1, They proposed a vertcal by-neghbor outler detecton method wth local prunng (PODMP) 1, whch can detect outlers effcently and scale well n large datasets. The vertcal method works as follows. Frst, the dataset to be mned s represented as the set of P-Trees. Secondly, one pont n the dataset s selected arbtrarly; then, the -neghbors are searched usng the fast computaton of nequalty P-Tree, and the -neghbors are represented wth an nequalty P-Tree, whch s called a neghborhood P-Tree. In the neghborhood P-Tree, 1 means the pont s a neghbor of the pont, whle 0 means the pont s not a neghbor. Thrdly, the number of ponts n -neghbors s calculated effcently by extractng values from the root node of the neghbor P-Tree [12]. They compared the tme consumng of ther method wth nested loop (NL) as followng (see fg 2): Fgure 2 Comparson of Scalablty of NL, PODM, and PODMP [10] In fact, as concluson both the defntons of, and can only capture global outlers, because these defntons take a global vew of the data set. For a data set wth smple structure, for example, one that contans one or more clusters wth smlar densty, these two defntons work well. However, for many real world data sets whch have complex structure, the methods based on these two defntons mght not be able to fnd nterestng outlers. 4. Spatal Outler Detecton Spatal outlers are spatal objects whose non-spatal attrbute values are sgnfcantly dfferent from the value of ther neghborhoods. Spatal outler detecton methods n the lterature of spatal statstcs can be grouped nto two categores, graphcal approach and quanttatve tests. 5.1 Graphcal approach In graph based spatal outler detecton the man dea s based on graph connectvty [13]. For spatal outler detecton methods, the choce of statstcs s mportant and depends on what knd of data s consdered. The statstc that proposed s, where s attrbute functon, s the fxed set of neghbors of and s average attrbute value for neghbors of. In fact denotes the dfference of the 1 P-Tree-based outler detecton method usng prunng attrbute value of each node and the average of each neghbor. Detecton of outlers can be consders as /. and are the mean and standard devaton of all. The most costly part of the algorthm s to fnd neghbor nodes set. The I/O cost of fnd neghbor nodes set s determned by connectvty resdue rato (CRR),.e. how the nodes are grouped nto dsk pages. If the node and ts entre neghbor nodes can be resde n the same dsk page, there wll be no redundant I/O operaton requred. 5.2 Quanttatve tests Chang et.al (2003) proposed two teratve algorthms that detect outler by mult teratons and also employ a non-teratve algorthm whch uses medan as the neghborhood functon namely, teratve algorthm, teratve algorthm and medan algorthm respectvely. The frst and second algorthm compute the nearest neghbors set ( ) for each spatal pont and a neghborhood functon whch s the average attrbute values of of. Consder both algorthms, to detect the spatal outlers, the attrbute value of each pont (attrbute functon : ) wll be compared to those attrbute values of ts neghborhoods by a comparson functon. Then a pont x s an outler f s a maxmum value of the set,,, whch. It means that s an outler f compare to threshold wll be large enough. Once an outler s detected, some correctons are made mmedately, such as replacng the attrbute value of outlers by the average of ts neghbors to avod normal ponts labeled as outler canddates. In the thrd algorthm (medan), nstead of the average value, s the medan (n the ordered data set,, the medan s ) of the attrbute values n the data set :. All the three proposed algorthms wll detect true outlers more effcent than algorthem [15], Scatterplot [16] and Moran Scatterpolt algorthm [17]. The method that next ntroduced by Zhan et.al (2004), ntroduced a set of mult-attrbutve and mult-dmensonal spatal objects ( n a matrx ) each wth attrbutes correspondng n a twodmensonal matrx, could accurately detect spatal outlers after the attrbuted correlatons was calculated by, wth the attrbute functon :. Ths method also employ an attrbute mportant values set (0 9 for 1,2,, ) whch s the mportant degree of attrbutes related to dfferent attrbutes of objects n the data set,,,. Consder object, assumng the spatal objects n neghborhood of, In order to compute the dstrbuton value of neghborhoods connectng wth, an aggregate functon of attrbute correlatons s proposed. k ' Faggr ( s ) = R F ( s ) / k = 0 The estmaton of mult attrbutve set: V ( s ) = P ( F ' ( s ) F ' aggr ( s )) Accordng to the theory of mult dmensonal dstrbuton of random functon f and are the sample mean and varance of

4 the set, to detect the outlers concernng the set / whch s the standard value of each. So now we conclude that s extreme value n orgnal data set f s extreme n the standard data set, as before t should be compared to threshold. To gan better result n complexty of computaton of the last algorthm an auxlary secondary ndex (the dynamc ndex R-tree structure) on the top of the data fle s used to support the query operaton. The expermental test shows that the algorthm wll detects true outlers more effcent than and medan algorthm. Hung et al. (2005) ntroduced new densty based spatal outler detecton wth stochastcally searchng algorthm, named SODSS. Ths method reduced many neghborhood queres. It does not scan data base one by one to fnd the neghborhood of each spatal pont lke DBSCAN. In fact the algorthm dvdes data set nto three segments or labeled data, cluster set, canddate set and outler. Unlke the DBSCAN and GDBSCAN, once the algorthm has labeled the neghbors as a part of a cluster, t wll not examne each neghborhood for each of those neghbors. Neghborhood query could be computed n log usng data structure. Wth the new approach the complexty of computaton decreases from to log, whch s related to the threshold or maxmum numbers of neghbors and t s much smaller than. 5. Dscusson and Concluson The earlest methods that need the users to Have knowledge about the dstrbuton of data sets [4]. All the earlest method (dstance or densty based) wll result poorly as the dmenson ncreases. To have better result researchers such as [19]. The other factor s the tme complexty of exstng algorthms consderaton. Some algorthms such as nested loop (NL) [6] wll scan the data set at least twce, whch s very expensve for large data sets that the result needed mmedately. Some method presented to have a better performance n large data sets [10] [11]. The tme complexty of the known algorthms s as followng (see Table 1). Table 1 the tme complexty of exstng algorthms Algorthm Nested-loop [6] Tree Indexed Complexty log Cell Based [19] lnear n, exponental n (dmenson ) PODMP [11], where s much small than log 6. Acknowledgments I wsh to thank my supervsor Dr Mohd Noor Md Sap and revewers for ther nsghtful comments. Ths work was supported by Mnstry of scence, Technology and Innovaton grant vote References [1] Hawkns (1980). Identfcaton of outlers. Chapman and Hall, London [2] Grubbs, F. E. (1969). Procedures for detectng outlyng observatons,technometrcs,11, [3] Aggarwal, C. C. & Yu, P. S. (2001). Outler Detecton for Hgh Dmensonal Data. Proceedngs of the ACM SIGMOD Conference [4] Knorr, E. M. & Ng, R. T. (1998). Algorthms for Mnng Dstance-Based Outlers n Large Datasets. Proceedngs of the VLDB Conference, , New York, USA. [5] Ramaswamy, S., Rastog, R. & Shm, K. (2000). Effcent Algorthms for Mnng Outlers from Large Data Sets. Proceedngs of the ACM SIGMOD Conference on Management of Data, Dallas, TX, [6] Han and M. Kamber, Data Mnng: Concepts and Technques. The Morgan Kaufmann Seres n Data Management Systems, Jm Gray, Seres Edtor Morgan Kaufmann Publshers, 550 pages, August [7] Hautamak, Ismo Karkkanen and Pas Frant (2004). Outler Detecton Usng k-nearest Neghbor Graph. Proceedngs of the 17th Internatonal Conference on Pattern Recognton (ICPR 04). [8] Breung, M. M., Kregel, H.-P., Ng, R. T., and Sander, J., Lof: Identfyng densty-based local outlers, Proceedngs of the 2000 ACM SIGMOD Internatonal Conference on Management Data, Dallas, Texas, USA, ACM, 2000, pp [9] Brto, E. L. Chavez, A. J. Quroz, and J. E. Yukch. Connectvty of the mutual -nearest-neghbor graph n clusterng and outler detecton. Statstcs & Probablty Letters, 35(1):33 42, August [10] Bay, S. D. and Schwabacher, M., Mnng dstance-based outlers n near lnear tme wth randomzaton and a smple prunng rule, Proceedngs of Nnth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, Washngton, D.C. USA, 2003, pp [11] Ren, Imad Rahal, Wllam Perrzo (2004). A Vertcal Dstance-based Outler Detecton Method wth Local Prunng., 2004, Washngton, DC, USA. Copyrght 2004 ACM, CIKM 04 November [12] Dng, M. Khan, A. Roy, and W. Perrzo. The P-tree algebra. Proceedngs of the ACM SAC, Symposum on Appled Computng, [13] Shekhar, Ch.T Lu, and P.Zhang. (2002). Detectng Graphbased Spatal Outlers. Intellgent Data Analyss: An Internatonal Journal, 6(5): [14] Chang-Lu, D.Cheng, and Y.Kou. (2003), Algorthms for Spatal Outler Detecton. Proceedngs of the Thrd IEEE Internatonal Conference on Data Mnng (ICDM 03) pp [15] Shekhar, C.-T. Lu, and P. Zhang. Detectng Graph-Based Spatal Outler: Algorthms and Applcatons (A Summary of Results). In Proc. of the Seventh ACM-SIGKDD Int l

5 Conference on Knowledge Dscovery and Data Mnng, Aug [16] A. Luc. Exploratory Spatal Data Analyss and Geographc Informaton Systems. In M. Panho, edtor, New Tools for Spatal Analyss, pages 45 54, [17] A. Luc. Local Indcators of Spatal Assocaton: LISA. Geographcal Analyss, 27(2):93 115, [18] Huang, X.Qn, C.Chen, and Q.Wang.(2005), Densty Based Spatal Outler Detectng. Sprnger-Verlag Berln Hedelberg, ICCS 2005, LNCS 3514, pp [19] Aggarwal, C. C. and Yu, P. S., An effectve and effcent algorthm for hgh-dmensonal outler detecton. VLDB J., Vol. 14, No. 2, 2005, pp

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Comparative Study for Outlier Detection Techniques in Data Mining

A Comparative Study for Outlier Detection Techniques in Data Mining A Comparatve Study for Outler Detecton Technques n Data Mnng Zurana Abu Bakar, Rosmayat Mohemad, Akbar Ahmad Department of Computer Scence Faculty of Scence and Technology Unversty College of Scence and

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Robust Subspace Outlier Detection in High Dimensional Space

Robust Subspace Outlier Detection in High Dimensional Space Robust Subspace Outler Detecton n Hgh Dmensonal Space Zhana Noname manuscrpt No. In 202 Abstract Rare data n a large-scale database are called outlers that reveal sgnfcant nformaton n the real world. The

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning A Post Randomzaton Framework for Prvacy-Preservng Bayesan Network Parameter Learnng JIANJIE MA K.SIVAKUMAR School Electrcal Engneerng and Computer Scence, Washngton State Unversty Pullman, WA. 9964-75

More information

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm 01 Internatonal Conference on Image, Vson and Computng (ICIVC 01) IPCSIT vol. 50 (01) (01) IACSIT Press, Sngapore DOI: 10.776/IPCSIT.01.V50.4 Vectorzaton of Image Outlnes Usng Ratonal Splne and Genetc

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

STING : A Statistical Information Grid Approach to Spatial Data Mining

STING : A Statistical Information Grid Approach to Spatial Data Mining STING : A Statstcal Informaton Grd Approach to Spatal Data Mnng We Wang, Jong Yang, and Rchard Muntz Department of Computer Scence Unversty of Calforna, Los Angeles {wewang, jyang, muntz}@cs.ucla.edu February

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2 Pa. J. Statst. 5 Vol. 3(4), 353-36 A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN Sajjad Ahmad Khan, Hameed Al, Sadaf Manzoor and Alamgr Department of Statstcs, Islama College,

More information

Image Alignment CSC 767

Image Alignment CSC 767 Image Algnment CSC 767 Image algnment Image from http://graphcs.cs.cmu.edu/courses/15-463/2010_fall/ Image algnment: Applcatons Panorama sttchng Image algnment: Applcatons Recognton of object nstances

More information

A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING

A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING CHANG-AN YUAN,, CHANG-JIE TANG, CHUAN LI, JIAN-JUN HU, JING PENG College of Computer, Schuan unversty, Chengdu,

More information

Research Article. A Novel Spectral Clustering and its Application in Image Processing. Gu Ruijun*, Chen Shenglei and Wang Jiacai

Research Article. A Novel Spectral Clustering and its Application in Image Processing. Gu Ruijun*, Chen Shenglei and Wang Jiacai Jestr Journal of Engneerng Scence and echnology Revew 6 (3 (03 0-5 Research Artcle JOURNAL OF Engneerng Scence and echnology Revew www.jestr.org A Novel Spectral Clusterng and ts Applcaton n Image Processng

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Analyzing Popular Clustering Algorithms from Different Viewpoints

Analyzing Popular Clustering Algorithms from Different Viewpoints 1000-9825/2002/13(08)1382-13 2002 Journal of Software Vol.13, No.8 Analyzng Popular Clusterng Algorthms from Dfferent Vewponts QIAN We-nng, ZHOU Ao-yng (Department of Computer Scence, Fudan Unversty, Shangha

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks 2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers 62626262621 Journal of Uncertan Systems Vol.5, No.1, pp.62-71, 211 Onlne at: www.us.org.u A Smple and Effcent Goal Programmng Model for Computng of Fuzzy Lnear Regresson Parameters wth Consderng Outlers

More information

Outlier Detection based on Robust Parameter Estimates

Outlier Detection based on Robust Parameter Estimates Outler Detecton based on Robust Parameter Estmates Nor Azlda Aleng 1, Ny Ny Nang, Norzan Mohamed 3 and Kasyp Mokhtar 4 1,3 School of Informatcs and Appled Mathematcs, Unverst Malaysa Terengganu, 1030 Kuala

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Sensors & Transducers 2015 by IFSA Publishing, S. L.

Sensors & Transducers 2015 by IFSA Publishing, S. L. Sensors & Transducers, Vol. 89, Issue 6, June 05, pp. 97-06 Sensors & Transducers 05 by IFSA Publshng, S. L. http://www.sensorsportal.com SGR: A New Effcent Kernel for Outler Detecton n Sensor Data Mnmzng

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Estimating Costs of Path Expression Evaluation in Distributed Object Databases Estmatng Costs of Path Expresson Evaluaton n Dstrbuted Obect Databases Gabrela Ruberg, Fernanda Baão, and Marta Mattoso Department of Computer Scence COPPE/UFRJ P.O.Box 685, Ro de Janero, RJ, 2945-970

More information

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation Internatonal Conference on Logstcs Engneerng, Management and Computer Scence (LEMCS 5) Maxmum Varance Combned wth Adaptve Genetc Algorthm for Infrared Image Segmentaton Huxuan Fu College of Automaton Harbn

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Wireless Sensor Network Localization Research

Wireless Sensor Network Localization Research Sensors & Transducers 014 by IFSA Publshng, S L http://wwwsensorsportalcom Wreless Sensor Network Localzaton Research Lang Xn School of Informaton Scence and Engneerng, Hunan Internatonal Economcs Unversty,

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information