MARKOV MODEL BASED TIME SERIES SIMILARITY MEASURING
|
|
- Quentin Hunt
- 5 years ago
- Views:
Transcription
1 Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi an, 2-5 November 2003 MARKOV MODEL BASED TIME SERIES SIMILARITY MEASURING YUN-TA0 QIAN, SEN JIA, WEN-WU SI College of Computer Science and Technology, Zhejiang University, Hangzhou, China Abstract: Similarity or distance measures between two time series play an important role in analysis and retrieval of time series database, which is a fundamental problem in time series data mining. Mathematical model is widely used as the representation of time series, but few papers discuss it in similarity measure of time series. In this paper, we propose a Markov model based technique for similarity/distance measures of variable-length time sequences. State space of Markov model is partitioned by hierarchical clustering method, and the information of state-transition is used to represent a time series. The similarity/distance measures of time sequences can be defined as various functions of the difference between their state-transition information, and some widely used distance measures can be considered as our specific cases. In addition, in modeling procedure, the vector sequence in reconstructed phase space is used instead of the original time sequence, which more effectively reflects the dynamical property of time series. Experimental results show that it works well under the strong noise environment, and it is versatile for various applications by its flexible definition. Keywords: Similarity measure,;markov model; Hierarchical c1ustering;phase space reconstruction;^ Time series data mining 1 Introduction Similarity or distance measuring of time series is a fhdamental problem in time series data mining (DBMS), which is widely applied for speech recognition, retrieval of time series databases, trajectory analysis, rule extraction from time series, clustering, classification and prediction of time series [1,2,3]. Many similarity measures have been developed for various applications ~41. Most of similarity measures are directly derived from the original time series, and few are based on the model of time series [5,6]. However, model based time series analysis has solid mathematical foundation, and has been proven effectiveness in many applications. The model of time series provides inherent information about the structure and parameter of time series, which can alleviate the affection of noises, outliers and other exterior factors. But model structure and parameter estimation is very complicated and has high time-costing, which impedes its application in time series similarity measuring, because the volume of time series databases is very large in general. In this paper, a novel Markov model based time series similarity measuring method is proposed, in which hierarchical clustering is used to partition state space for adaptively simplifying the model by coarse-to-fine scheme, and phase space reconstruction is used to effectively deal with nonlinear time series. Moreover, various time-dependent and independent state-transitions are defined to build different similarity measures that are suitable for the corresponding applications. Many popular time series similarity measures can be considered as our specific cases from a certain point of view. Experimental results show the power and efliciency of our approach. The rest of the paper is organized as follows. Section 2 surveys related work and background about time series similarity measuring. Section 3 summarizes our contributions to time series similarity, and gives details of model-based similarity method. An experimental evaluation on our similarity measures is given in section 4. Finally, the proposed algorithm is summarized, and the conclusions of our work are given in section 5. 2 Summary of relevant research Defining the similarity between two time series is at the heart of most series data mining tasks. The real mean of similarity is not doubtless, and the time series have different sampling rates and noisy or uncertain values, therefore, similarity is hard to define for time series, and all existed defining method are pragmatic. We will give a brief review on some such popular similarity measures as Euclidean metrics and dynamic time warping (DTW) in the following. Let Q and c be two time series with length m and n, where /$ IEEE 278
2 Q 1 (41,q2 9. ',qi 9. *.,qn 1 c = (c,,c2,. *., cj, * *, c,} In this paper, we only discuss the similarity based on whole matching, because the similarity based on subsequence matching can be derived from whole matching by "window" sliding technique [7]. Definition 1 (Minkowski distance): if p = 1, it is Manhattan distance, if p = 2, it is Euclidean distance,. and if p = CO, it is Maximum distance. In order to eliminate some distortions in the data, Euclidean distance measure needs preprocessing procedures including offset translation, amplitude scaling, linear trend, and noise removing. However, Euclidean distance could not deal with the time series with different sampling rates. DTW method is proposed to solve this problem [8]. Definition 2 (Dynamic Time Warping): Warping path w=~w,,w,,.--,w,,...,w~) is a contiguous set of matrix elements of D,,, that defines a mapping between Q and c, and it must satisfy the following requirements: 1) Boundary conditions: w, = (l,l), WK = (n,m). 2) Continuity conditions: if wk = (a, b), wk-, = (a', b') then a - a's 1 and b-b'i1. 3) Monotony conditions: a-a'2o and b-b'>o. The time and space complexities of DTW are very high. Even though some fast algorithms have been developed, DTW is dificult to be used in large time series databases. Besides the above two similarity measures, many other similarity measures are proposed according to their understanding of similarity. In addition, in order to speed up computational time of similarity measuring and indexing, dimensionality reduction technique is widely used, in which the similarity measuring is done in reduced space instead of original dimensional space. Such popular dimensionality reduction methods as time-frequency transformation algorithm, singular value decomposition, - (1) piecewise linear approximation, and symbolic ripproximation, are deeply studied for time series [9,10]. The evaluation of a similarity measure is mostly dependent on user's opinions, therefore, machine learning based weighted similarity measures are proposed to improve the quality of similarity with feedback information. Model-based method plays an important role in time series data mining, and various models such as linear and nonlinear sequence models are deeply studied, among them Markov model or hidden Markov model (HMM) is a good choice in many cases. Assume a set of states {s,,s2,---,sm}, and an output chain {x,, n = 1,2;.-, N}. It is a Markov sequence, if this random sequence has the following Property P(Xn+, =sj (x, =s,,x,-, =sk,"',x] =s,) = P(xn+l= sj I xn = s,) Markov model is characterized by an initial distribution 17 and a state-transition probability matrix A with ay = P(X,+~ = s, I x, = s,). In practice, a Markov sequence is always polluted by noise in observation process, so HMM is proposed. Let assume there are a Markov sequence (x,, n = 1,2,. - -, N }, and its observation (y,, n = 1,2, (3), N}. If an observation is generated by adding Gaussian white noise into a Markov sequence, its density bction is P(Yn I x n = sj 1 (4) Therefore, HMM is characterized by 17, A, p, and 0. Since similarity measuring is always used for large time series databases, it requires that the algorithm must be simple and easy completed. But the parameter. estimation of Markov or HMM is very complicated, therefore, by now Markov model or HMM is seldom used in this field [5,6]. 3 Model-based time series similarity In this paper, we propose a novel model-based method for time series similarity measuring, whose main features are phase space reconstruction and hierarchical state space partition.
3 The theory of dynamical systems becomes to be more and more important in time series analysis, especially for nonlinear series. Based on dynamical theory, the time evolution of a sequence is defined in some phase space, i.e. the dynamics of a time sequence can be obtained by studying the dynamics of the corresponding phase space points. In practice, a scalar sequence of measurements is the only information that we can observe. We therefore have to convert the observations into state vectors in phase space. According to Taken's theorem, phase space reconstruction is technically solved by the method of delays [ 111. LetX = ( X~,X~,...,X,,.'.,X~)~~ a time sequence, in which X, = x(nat). Its delay reconstruction in m dimensions is formed by the vectors Yn = (xn-(m-l)r,xn-(m-2)r 7. * 7 Xn-r xn) (5) z is lag or delay time, m is embedding dimension, and m z is embedding time length. Finding a good embedding is a very difficult theoretic problem, and by now there exists no clear solution for this problem. However, some semi-theoretical and semi-experienced methods have been presented to compute m and z. Markov model is used for nondeterministic system, in which the fiture state is selected randomly according to the state-transition probabilities. Moreover, deterministic system is also regarded as a limiting case of Markov model. Since Markov model in phase space is a general solution to correctly representing various time series, it can be used for computing the similarity of time series. The complexity of discrete Markov model of a time sequence is mainly dependent on the number of states. Uniformly partition of state space for generating discrete states is frequently used in practice and also frequently criticized because it does not consider the distribution information of state space. Therefore, a hierarchical clustering method is used to adaptively partition state space from coarse to fine. A hierarchical algorithm yields a dendrogram representing the nested clusters by agglomerative or divisive scheme. For agglomerative hierarchical clustering algorithm, two clusters are merged to a new cluster if they have the minimal distance (or maximal similarity) in all pairs of clusters. Therefore, the definition of distance between two clusters is the core of an agglomerative algorithm. Most of hierarchical clustering algorithms are variants of minimum, maximum, mean, and average distance based algorithms, and these four distances are defined as 'mm (ci 3 c, = mi' xec,,ycc, IX - Y I (6) mi is the mean for cluster Ci (9) ni is the number of points in cluster Ci Here mean distance based hierarchical clustering is used, and its procedure can be summarized as follows. Each state point in state space is defined as a sub-cluster, and all these sub-clusters form an initial clustering result. Find two closest sub-clusters that have minimal mean distance, and merge then into a new sub-cluster. Repeat step 2 until the required number of sub-clusters is reached or the mean distances between any pair of sub-clusters is larger than a presumed threshold. Examine all sub-clusters to eliminate little sub-clusters whose number of state points is less than a threshold. After hierarchical clustering procedure, the phasekite space is partitioned into non-overlapped subspace, and each vector point in phase space has a label that marks which subspace this point is in. If the required precision of similarity is high, the number of subspaces (sub-clusters) is given a large value, and while the required precision is lower, the number of subspaces is given a little value. Obviously, high precision means the sensitiveness to noise. Therefore, the number of subspaces is defined by the compromise between the precision and the ability of anti-noise. An original sequence in time space is transformed to a vector sequence in phase space, and a discrete state-transition (vector-transition) sequence is constructed by hierarchical clustering based phase space partition. From the state-transition sequence, a state-transition probability matrix and a frequency matrix that describes the number of appearances of every specific state-transition in the sequence, are derived. Now we discuss the similarity of two time series Q and C. Their corresponding vector sequences in reconstructed phase space are Qph,, and cph,,,. The phase space H is partitioned into 1+1 subspace {H,,H,,...,H,,H,+,}, in which H,,H,,-..H, are formed by hierarchical clustering procedure, and the rest of space forms HI+, = H -H, - H, - -. a - H,. The discrete state-transition sequence are (7)., etransit and 280
4 Proceedings of the Second International Conference on Mac :he Learning and Cybernetics, Xi an, 2-5 November 2003 clransit. Their corresponding state-transition probability matrices are A, and A,, and frequency matrices are F, and F, whose each element represents the number of the appearances of a specific state-transition. Through the above model parameters of these two time series, the following model based similarity measures (MSM) can be defined. but the sampling rate of sequence is adaptive modified in DTW to reach a minimal distance. Adaptive sampling rate modification can be completed by choosing suitable delay time in phase space reconstruction for two time series. As this problem of determining delay time is very complicated, we will study it in another paper. 4 Experiments A range of experiments has been done to veri@ our novel model-based similarity measures between two time series, but limited by space, we only give an experiment on the Funnel-Bell-Cylinder dataset which is always used as benchmark for evaluation. Three groups of time series in Funnel-Bell-Cylinder dataset are generated by the following formula: Obviously, MSM, is similar to Euclidean distance metric, but it uses a vector sequence in phase space instead of the original sequence in time space. MSM, is more precise than Euclidean distance in representing the inherent information from the view of dynamical system theory. MSM, can be considered as a specific MSM, with the hybrid dimensional reduction technique of piecewise aggregate approximation (PAA) and symbolic approximation, which is more robust to noise than MSM,. Both of MSM, and MSM, is sensitive to the order of the sequence, i.e., it has not order-invariant property. Differ from MSM,, MSM, only uses the frequency information about state-transition, it is therefore not related to the order of these state-transitions. From the strict definition of model-based similarity, the similarity of two time series should only consider their models, and has no relationship with the order and frequency [12]. MSM, could be regarded as a strict model based similarity measure, because it only uses the state-transition probability matrices of Markov model in phase space. Since the criterion of similarity is not unique, these four model-based similarity measures are suitable for different applications. One advantage. of our method is its flexibility that one model can produce several different similarity measures. It should be noted that there is not any model-based similarity measure corresponding to DTW. DTW includes the factor of the order of sequence, Where and E(t) are drawn from a standard random normal distribution, a is an integer drawn uniformly from the range [16,32], and (b-u) is an integer drawn from the range [32,96]. Fig. 1 gives some examples of the dataset used for experiment. Fig. 2 shows that three sequences are transformed into two-dimensional phase space. Our time series dataset contains 120 sequences with the length of 1000, and each group has 40 examples. We use leaving-one-out evaluation and nearest neighbor algorithm in our classification experiment. The error rates of MSM,, MSM,, MSM,, and MSM, are 25.4% 20.3%, 13.7%, and 12.9% respectively when the number of subclusters is 50, the embedding dimension is 3, and the time delay is 2. This result is better than that of Euclidean distance 26.2%. 281
5 the complex of model. From the Markov model in phase space, several similarity measures are derived for various different applications. Many popular similarity measures of time series have their corresponding model-based forms. Our model-based method is possible to become a general framework for time series similarity measuring, which is our next research topic. In addition, the indexing problem based on our similarity measures is also our future work. Acknowledgements This work was supported by ational Natural Science Foundation of China under Grant References c J Figure. 1. Examples of Funnel-Bell-Cylinder dataset. Figure 2. Original time series and their corresponding vector sequence in two-dimensional phase space. 5 Conclusions In this paper, we propose a novel model-based time series similarity measuring method, motivated by the shortcoming of the existed similarity measures and the great potential of Markov model. In order to deeply find the inherent dynamical features, phase space reconstruction is used to transform an original sequence in time space into a vector sequence in phase space. We also use hierarchical clustering method to partition phase space for reducing state number, which significantly decreases [I] K.Kalpakis, D.Gada and V.Puttagunta, Distance measures for effective clustering of arima time-series. In proceedings of the IEEE Int? Conference on Data Mining, San Jose, CA, Nov 29-Dec 2, 2001, pp [2] M.K.Ng and Z.Huang, Data-mining massive time series astronomical data: changes, problems and solutions, Information and Software Technology, 41 : , [3] R.Agrawa1, K. Lin, H.S.Sawhney and K.Shim, Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In proceedings of the 21st Int? Conference on Veiy Large Databases, Zurich, Switzerland, Sept. 1995, pp [4] E.Keogh and S. Kasetty, On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In the 8rh ACM SZGKDD International Conference on Knowledge Discovery and Data Mining. July 23-26, Edmonton, Alberta, Canada. pp [5] M.H.Law and J.T.Kwok, Rival penalized competitive leaming for model-based sequence clustering, In proceeding of 151h Int l Con$ On Pattern Recognition, Barcelona, Spain, September, 2000, pp [6] X.Ge and P.Smyth, Deformable Markov model templates for time-series pattern matching. In proceedings of the 6th ACM SIGKDD Int l Conference on Knowledge Discovely and Data Mining. Boston, MA, Aug 20-23,2000. pp [7] S.Park, W.W.Chu, J.Yoon and C.Hsu, Efficient searches for similar subsequences of different lengths in sequence databases, In proceedings of the 16th Int l Conference on Data Engineering, San Diego, CA, Feb 28-Mar 3,2000, pp
6 E.Keogh and M.Pazzani, Scaling up dynamic time warping to massive datasets, In Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discoveiy in Databases, pp , E.Keogh and M.Pazzani, A simple dimensionality reduction technique for fast similarity search in large time series databases. In Proceedings of PaciJic- Asia Con$ on Knowledge Discovery and Data Mining, pp ,2000. [lo] K.Chan and A.W.Fu, Efficient time series matching by wavelets, In proceedings of the 15th IEEE Int'l Conference on Data Engineering, Sydney, Australia, Mar 23-26, 1999, pp [lo] H.Kantz and T.Schreiber, Nonlinear Time Series Analysis, Cambridge Press, [ll] T.Kahveci, A.Singh, and A. Gurel, An efficient index structure for shift and scale invariant search of multi-attribute time sequences. In proceedings of the 18th Int'l Conference on Data Engineering, San Jose, CA, Feb 26-Mar 1,
TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)
TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationAnalysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data
Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationAN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationThe Effect of Word Sampling on Document Clustering
The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationTexture Image Segmentation using FCM
Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationClustering Lecture 3: Hierarchical Methods
Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationHOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery
HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationTime Series Analysis DM 2 / A.A
DM 2 / A.A. 2010-2011 Time Series Analysis Several slides are borrowed from: Han and Kamber, Data Mining: Concepts and Techniques Mining time-series data Lei Chen, Similarity Search Over Time-Series Data
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationA Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation
A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation * A. H. M. Al-Helali, * W. A. Mahmmoud, and * H. A. Ali * Al- Isra Private University Email: adnan_hadi@yahoo.com Abstract:
More informationResearch on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models
Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models Wenzhun Huang 1, a and Xinxin Xie 1, b 1 School of Information Engineering, Xijing University, Xi an
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationFactorization with Missing and Noisy Data
Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationCluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole
Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,
More informationA Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis
A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract
More informationAn Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters
An Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters Akhtar Sabzi Department of Information Technology Qom University, Qom, Iran asabzii@gmail.com Yaghoub Farjami Department
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationMedical Image Segmentation Based on Mutual Information Maximization
Medical Image Segmentation Based on Mutual Information Maximization J.Rigau, M.Feixas, M.Sbert, A.Bardera, and I.Boada Institut d Informatica i Aplicacions, Universitat de Girona, Spain {jaume.rigau,miquel.feixas,mateu.sbert,anton.bardera,imma.boada}@udg.es
More informationA Two-phase Distributed Training Algorithm for Linear SVM in WSN
Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 015) Barcelona, Spain July 13-14, 015 Paper o. 30 A wo-phase Distributed raining Algorithm for Linear
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationAn Efficient Clustering for Crime Analysis
An Efficient Clustering for Crime Analysis Malarvizhi S 1, Siddique Ibrahim 2 1 UG Scholar, Department of Computer Science and Engineering, Kumaraguru College Of Technology, Coimbatore, Tamilnadu, India
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationFast trajectory matching using small binary images
Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationA SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2
Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2 1 P.G. Scholar, Department of Computer Engineering, ARMIET, Mumbai University, India 2 Principal of, S.S.J.C.O.E, Mumbai University, India ABSTRACT Now a
More informationCHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION
CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)
More informationTime Series Clustering Ensemble Algorithm Based on Locality Preserving Projection
Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationA NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD
A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON WITH S.Shanmugaprabha PG Scholar, Dept of Computer Science & Engineering VMKV Engineering College, Salem India N.Malmurugan Director Sri Ranganathar Institute
More informationFine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes
2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationData Mining Algorithms
for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationSYMBOLIC FEATURES IN NEURAL NETWORKS
SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:
More informationAutomatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.
ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. Dartmouth, MA USA Abstract: The significant progress in ultrasonic NDE systems has now
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationImage Segmentation Based on Watershed and Edge Detection Techniques
0 The International Arab Journal of Information Technology, Vol., No., April 00 Image Segmentation Based on Watershed and Edge Detection Techniques Nassir Salman Computer Science Department, Zarqa Private
More informationImage Classification Using Wavelet Coefficients in Low-pass Bands
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationNORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM
NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationDiscovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data
Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data Alain Saas, Anna Guitart and África Periáñez (Silicon Studio) IEEE CIG 2016 Santorini 21 September, 2016 About us Who are
More informationImage Analysis, Classification and Change Detection in Remote Sensing
Image Analysis, Classification and Change Detection in Remote Sensing WITH ALGORITHMS FOR ENVI/IDL Morton J. Canty Taylor &. Francis Taylor & Francis Group Boca Raton London New York CRC is an imprint
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationDCT-BASED IMAGE QUALITY ASSESSMENT FOR MOBILE SYSTEM. Jeoong Sung Park and Tokunbo Ogunfunmi
DCT-BASED IMAGE QUALITY ASSESSMENT FOR MOBILE SYSTEM Jeoong Sung Park and Tokunbo Ogunfunmi Department of Electrical Engineering Santa Clara University Santa Clara, CA 9553, USA Email: jeoongsung@gmail.com
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationA COMPARATIVE STUDY ON K-MEANS AND HIERARCHICAL CLUSTERING
A COMPARATIVE STUDY ON K-MEANS AND HIERARCHICAL CLUSTERING Susan Tony Thomas PG. Student Pillai Institute of Information Technology, Engineering, Media Studies & Research New Panvel-410206 ABSTRACT Data
More informationLinear Discriminant Analysis in Ottoman Alphabet Character Recognition
Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationMotivation. Technical Background
Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering
More informationA Naïve Soft Computing based Approach for Gene Expression Data Analysis
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2124 2128 International Conference on Modeling Optimization and Computing (ICMOC-2012) A Naïve Soft Computing based Approach for
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationResearch Article A Novel Steganalytic Algorithm based on III Level DWT with Energy as Feature
Research Journal of Applied Sciences, Engineering and Technology 7(19): 4100-4105, 2014 DOI:10.19026/rjaset.7.773 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:
More informationA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan
More informationA Survey on Feature Extraction Techniques for Palmprint Identification
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 03 Issue. 12 A Survey on Feature Extraction Techniques for Palmprint Identification Sincy John 1, Kumudha Raimond 2 1
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationClustering Algorithms In Data Mining
2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017) Clustering Algorithms In Data Mining Xiaosong Chen 1, a 1 Deparment of Computer Science, University of Vermont,
More informationUSC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams
Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Cyrus Shahabi and Donghui Yan Integrated Media Systems Center and Computer Science Department, University of Southern California
More informationHandwritten Script Recognition at Block Level
Chapter 4 Handwritten Script Recognition at Block Level -------------------------------------------------------------------------------------------------------------------------- Optical character recognition
More information