ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

Similar documents
Concurrent Apriori Data Mining Algorithms

Cluster Analysis of Electrical Behavior

Programming in Fortran 90 : 2017/2018

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

The Research of Support Vector Machine in Agricultural Data Classification

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Classifier Selection Based on Data Complexity Measures *

Support Vector Machines

A fast algorithm for color image segmentation

Network Intrusion Detection Based on PSO-SVM

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Available online at Available online at Advanced in Control Engineering and Information Science

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Outline. CHARM: An Efficient Algorithm for Closed Itemset Mining. Introductions. Introductions

Association Rule Mining with Parallel Frequent Pattern Growth Algorithm on Hadoop

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

A Similarity Measure Method for Symbolization Time Series

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

The Shortest Path of Touring Lines given in the Plane

Module Management Tool in Software Development Organizations

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Machine Learning. Topic 6: Clustering

A Clustering Algorithm Solution to the Collaborative Filtering

Related-Mode Attacks on CTR Encryption Mode

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A Binarization Algorithm specialized on Document Images and Photos

A Deflected Grid-based Algorithm for Clustering Analysis

From Comparing Clusterings to Combining Clusterings

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Parallel matrix-vector multiplication

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Optimal Algorithm for Prufer Codes *

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Wireless Sensor Network Localization Research

Research Article A High-Order CFS Algorithm for Clustering Big Data

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Vectorization in the Polyhedral Model

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Fast Computation of Shortest Path for Visiting Segments in the Plane

Load Balancing for Hex-Cell Interconnection Network

S1 Note. Basis functions.

Smoothing Spline ANOVA for variable screening

CS 534: Computer Vision Model Fitting

Edge Detection in Noisy Images Using the Support Vector Machines

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Machine Learning: Algorithms and Applications

Professional competences training path for an e-commerce major, based on the ISM method

Virtual Machine Migration based on Trust Measurement of Computer Node

Support Vector Machines

Mining User Similarity Using Spatial-temporal Intersection

A Resources Virtualization Approach Supporting Uniform Access to Heterogeneous Grid Resources 1

Suppression for Luminance Difference of Stereo Image-Pair Based on Improved Histogram Equalization

An Image Fusion Approach Based on Segmentation Region

Meta-heuristics for Multidimensional Knapsack Problems

Private Information Retrieval (PIR)

Clustering is a discovery process in data mining.

A Robust Method for Estimating the Fundamental Matrix

Unsupervised Learning and Clustering

Query Clustering Using a Hybrid Query Similarity Measure

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

Unsupervised Learning

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Hierarchical Image Retrieval by Multi-Feature Fusion

Performance Evaluation of Information Retrieval Systems

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

The Codesign Challenge

Solving two-person zero-sum game by Matlab

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

A Compressing Method for Genome Sequence Cluster using Sequence Alignment

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

Conditional Speculative Decimal Addition*

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Simulation Based Analysis of FAST TCP using OMNET++

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Mathematics 256 a course in differential equations for engineering students

Research on Categorization of Animation Effect Based on Data Mining

Transcription:

ApproxMGMSP: A Scalable Method of Mnng Approxmate Multdmensonal Sequental Patterns on Dstrbuted System Changha Zhang, Kongfa Hu, Zhux Chen, Lng Chen Department of Computer Scence and Engneerng, Yangzhou Unversty, Yangzhou 225009,Chna Ysheng Dong Department of Computer Scence and Engneerng, Southeast Unversty, Nanjng 210096,Chna Abstract We present a scalable and effectve algorthm called ApproxMGMSP (Approxmate Mnng of Global Multdmensonal Sequental Patterns) to solve the problem of mnng the multdmensonal sequental patterns for large databases n the dstrbuted envronment. Our method dffers from prevous related works of mnng multdmensonal patterns on dstrbuted system. The man dfference s that an approxmate mnng method s used n large multdmensonal sequence database frstly. In ths paper, to convert the mnng on the multdmensonal sequental patterns to sequental patterns, the multdmensonal nformaton s embedded nto the correspondng sequences. Then the sequences are clustered, summarzed, and analyzed on the dstrbuted stes, and the local patterns could be obtaned by the effectve approxmate sequental pattern mnng method. Fnally, the global multdmensonal sequental patterns could be quckly mned by hgh vote sequental pattern model after collectng all the local patterns on one ste. Both the theores and the experments ndcate that ths method could smplfy the problem of mnng the multdmensonal sequental patterns and avod mnng the redundant nformaton. The global sequental patterns could be obtaned effectvely by the scalable method after reducng the cost of communcaton. 1. Introducton Sequental pattern mnng has become an essental data mnng task, wth broad applcatons, ncludng web log analyss, market and customer analyss, pattern dscovery n proten sequences, and mnng XML query access patterns for cachng. However, mnng multdmensonal sequental patterns could extract more useful nformaton than mnng sequental patterns. At present, databases and data warehouses wth huge amount of data make data mnng on PC not very effectve, especally can not make the need of the ablty of data process on functon and performance. In actual applcatons, most large nformaton systems are dstrbuted, such as the data access of large nterregonal shoppng markets. So, dstrbuted multdmensonal patterns mnng s proposed n order to deal wth ths problem frstly. At present, many multdmensonal sequental pattern mnng-related researches have been advanced. such as the well-known algorthms UnSeq, PSFP and HYBRID[1]. However, the overall performance of these algorthms s not hgh n mnng global multdmensonal patterns for the large amount of data scattered n dstrbuted envronment. So the ssue only can be solved by the dstrbuted or parallel data mnng technology. In 2003, S.C. Zhang proposed the technque of dstrbuted mnng of mult-database[2] to resolve the problem, and then the methods of global assocaton rule mnng[3] and exceptonal sequental patterns mnng[4] n dfferent data sources were also proposed. Recently H.C. Kum has also proposed the method of mnng global sequental patterns[5] n mult-database. Tradtonal methods of mnng sequental patterns are to fnd all the patterns that satsfy the user-specfed mnmum support threshold, such as the well-known algorthms GSP[6], Prefxspan[7], SPADE[8] and so on. However, these sequental patterns mnng algorthms based on support have some nherent lmtatons. So, we propose a novel method of mnng approxmate multdmensonal sequental patterns on dstrbuted system. Our experments ndcate that the method smplfy the process of mnng multdmensonal sequental patterns and solve the problem of hgh dmenson effectvely. The global multdmensonal sequental patterns could be obtaned effectvely by reducng the redundant nformaton.

2. Problem formulaton Assume that there are n stes S 1,S 2,,S n n the dstrbuted envronment and the multdmensonal sequence database MSDB s parttoned over the n stes nto {MSDB 1,MSDB 2,,MSDB n }, respectvely. Let the ndependent computer on each ste can communcate each other. Gven schema MSDB (TID, A 1,,A m, S) s a multdmensonal sequence database, where TID s a prmary key, A 1,,A m s multdmensonal nformaton and S are sequences. Let * be any value belong to any doman of A 1,,A m. A multdmensonal sequence takes the form of (a 1,,a m,s), where a ( A {*} ) for(1 m) and s s a sequence. Defnton 1. Gven a local sequence database DB x, let dst (seq,seq j ) be the dstance measure for seq and seq j (0<dst(seq,seq j ) <1), and DB x can be parttoned nto smlarty clusters G x1 G xn such thatσ j dst (seq a, seq jb ) s maxmzed and Σ j dst (seq a, seq jb ) s mnmzed where seq a G x,seq jb G xj. Defnton 2. Let G x1,, G xn be smlarty clusters for a local database DB x, an approxmate sequental pattern for group G x, denoted as lpat x, s a sequence that mnmzes dst (lpat x,seq a ) for all seq a n smlarty group G x. Defnton 3. Let the set M be approxmate sequental patterns on all stes, ts subset HS s a homogeneous set of range γ when the smlarty between any two patterns p and p j n HS s not less than γ, p HS Λ p j HS Λ sm(p,p j ) γ, where sm(p,p j )1-dst(p,p j ). Defnton 4. The vote of a homogeneous set HS s defned as the sze of the homogenous set. Vote(HS, γ) HS(γ). Defnton 5. Let γ and Ө be desred smlarty level and threshold correspondngly, a hgh vote homogenous set s a homogeneous set HS such that Vote(HS, γ) Ө. Gven a hgh vote homogenous set, the hgh vote sequental pattern s the longest common subsequence of all local patterns n the set. Defnton 6. Gven a schema WS<X 1 : v 1,,X l :v l >:n, WS s a weghted sequence when carryng the followng nformaton: the current algnment has n sequences, v sequences have a non-empty temset X algned n the th temset, where(1< <l), and an temset n the algnment s n the form of X (x j1 :w j1,...,x jm :w jm ), whch means, n the current algnment, there are w jk sequences that have tem x jk n the th poston of the algnment, where (1< <l) and (1<k<m). Gven the,ג that s specfed by users, f w jk /n ג mnmum degree then x jk can be collected for obtanng approxmate sequental patterns. 3. Multdmensonal sequental patterns mnng on dstrbuted system 3.1 Embeddng multdmensonal nformaton nto sequences Inspred by UnSeq, for a tuple n the multdmensonal sequence database, the multdmensonal nformaton could be embedded nto the correspondng sequence through ntroducng a specal element. So, the problem of mnng s predgested by convertng the mnng n both nformaton of the dmenson and sequence to the mnng only n the sequence. For example, gven a tuple q(10,busness, Chcago, Mddle, <(bd)cb(ac)>), the multdmensonal nformaton (Busness, Chcago, Mddle) could be embedded nto the correspondng sequence<(bd)cb(ac)>as the frst element. That s to say, the sequence x <(bd)cb(ac)> n q could be extended to y <(Busness Chcago Mddle)(bd)cb(ac)>. Ths method could convert the mnng of the sequence n multdmensonal sequence database to the mnng of the extended sequence n the extended sequence database. In the same way, the multdmensonal nformaton could be embedded nto the correspondng sequence as the last element. Now, let us verfy approxmate multdmensonal sequental pattern mnng usng the extended database. Theorem 1. Gven a multdmensonal sequence database MSDB and extended database ESDB. A multdmensonal sequence t(a 1,,a n,s)s an approxmate sequental pattern n MSDB f and only f sequence t 1 <(a 1,,a n ),s> s an approxmate sequental pattern n ESDB. Proof. If a multdmensonal sequence t(a 1,,a n,s)s an approxmate sequental pattern n MSDB, then the levenshten dstance dst(t,seq) s mnmum for all seq n smlarty group G. So, the dst(t 1,seq) s also mnmum by calculatng the levenshten dstance(algorthm 1), that s to say, the sequence t 1 <(a 1,,a n ),s> s an approxmate sequental pattern n ESDB. In the same way, we can educe that the multdmensonal sequence t(a 1,,a n,s) s an approxmate sequental pattern n MSDB. 3.2 Multdmensonal sequence mnng The goal of the multdmensonal sequental pattern mnng n the dstrbuted envronment s to reduce cost of the communcaton n the network. Though we can get hgh performance by the tradtonal method of mnng patterns wth low dmenson, the effcency s very low when the dmenson s hgh for the need of

mnng the long sequental patterns. So we adopt the approxmate sequence mnng method for extended database n every staton. Frst the levenshten dstance s ntroduced whch s commonly used as a dstance measure for sequences. It s used to computng the mnmum cost of nsertng, deletng, and replacng when one sequence S s converted to another sequence T. Gven S<s 1,,s n > and T<t 1,,t m >, the levenshten dstance could be obtaned by the dynamc programmng and the followng crcle operatons. Algorthm 1. Calculatng levenshten dstance Input: Tow sequences S<s 1,,s n >, T<t 1,,t m >. Output: Levenshten dstance between S and T, dst(s,t) 1) If n 0, return m and ext. If m 0, return n and ext. Construct a matrx contanng m rows and n columns 2) Intalze the frst row to 0 n. Intalze the frst column to 0 m. 3) Examne each character of S ( from 1 to n). Examne each character of T (j from 1 to m). 4) If S[] equals T[j], the cost s 0. If S[] doesn't equal T[j], the cost s 1. 5) Set cell dst[,j] of the matrx equal to the mnmum of: a. The cell mmedately above plus 1: dst[-1,j] + 1. b. The cell mmedately to the left plus 1: dst[,j-1] + 1. c. The cell dagonally above and to the left plus the cost: dst[-1,j-1] + cost. 6) After the teraton steps (3, 4, 5, 6) are complete, the dstance s found n cell dst[n,m]. The normalzed levenshten dstance as Formula 1. Formula 1. dst( S, T) D ( S, T) max{ S, T } The normalzed set dfference s used to ft sequence of sets properly for measurng the dstance, as Formula 2. Formula 2. ( s t) ( t s) 2 s t Re pl( s, t) 1 s + t s t + t s + 2 s t We adopt a densty-based clusterng algorthm to cluster sequences. For each sequence s n the database S, let d 1,,d k be the k smallest non-zero values of D(s, s j ), where s j S, s s j, then Den(s ) n/d, dmax{ d 1,,d k },n { s j S D(s, s j ) d}. Algorthm 2. Unform kernel k-nn clusterng Input: A set of sequences {s }, the number of neghbor sequences k. Output: A set of clusters {C j }. 1) Generate ntal cluster. Set every sequence as a cluster, and Den(Cs )Den(s ). 2) Expand ntal cluster based on the densty of sequences. Set s 1,,s n be the nearest neghbor for s, for each s j {s 1,,s n },merge cluster Cs contanng s wth a cluster Cs j contanng s j, f Den(s ) < Den(s j ) and there exsts no s p havng D(s, s p ) < D(s, s j ) and Den(s ) < Den(s p ), set Den(new cluster) max{den(cs ),Den(Cs j )}. 3) Merge based on the densty of new clusters. Fnd sequences s such that Den(s )Den(s j ), merge the two clusters Cs and Cs j contanng each sequence f Den (Cs ) >Den(Cs j ). Sequences n every database are parttoned nto several groups by clusterng. All sequences are sorted wthn a group n densty descendng order, then the frst two sequences are compressed nto the weghted sequence ws 1 ; then a weghted replace cost s adopted to ensure that the dstance between the sequence assgned and the weghted sequence ws 1 s mnmum, as Formula 3, let ws(x 1 :w 1,,x m :w m ):v be an temset n a weghted sequence, and t(y 1,,y l ) s an temset n a sequence n the database. Let n be the global weght of the weghted sequence, the weghted sequence ws n-1 s obtaned by compressng sequences nto the correspondng weghted sequence, and then we could collect approxmate sequental patterns accordng to WS n-1. Formula 3. R v + n v REPL( ws, t) n R m 1 w + t v 2 m 1 w + t v x t The global multdmensonal sequences are obtaned by hgh vote sequental pattern model. Algorthm 3. Global multdmensonal sequence mnng Input: All local patterns L 1,,L n for stes 1,,n. Output: Global patterns G. 1) Collect all local patterns L 1,,L n to a ste, and generate homogeneous sets. 2) Collect hgh vote homogenous sets M from results n step one, and then generate global patterns G, that s the longest common subsequences n M. 3) Broadcast G to each ste. 4 Expermental evaluatons 4.1 Effectveness analyss of ApproxMGMSP w

For effectveness analyss of ApproxMGMSP, we adopt a general evaluaton method that can evaluate the accuracy of the approxmaton n terms of how well t fnds the real underlyng patterns n the data and whether or not t generates any spurous patterns. The datasets were generated by the well-known IBM data generator[9]. Base patterns were generated randomly accordng to the user s specfcaton. Then, these base patterns were corrupted and merged to generate the sequences n the database. Dmensonal nformaton was generated and merged randomly so that values were dstrbuted evenly n every dmenson. For evaluaton crtera, recoverablty R,Precson P, N redun : the number of redundant patterns, N spur : the number of base patterns, N max : the number of spurous patterns, L: the average length of sequence. Followng, Table1 and Table2 demonstrate how 7 of the most frequent 10 base patterns were uncovered from 1000 sequences usng ApproxMGMSP Table1. Base patterns 10 base patterns L B 0 <(B, X, D, Y)(20)(63 24)(2)(5)(2 74)(95)(96)> 13 B 1 <(F, A, Z, F)(66 62 50)(16)(16 30 22)(58 66) > 13 B 2 <(W, A, D, F)(6)(24 65 93)(2 24 16 63)(58)(22)> 14 B 3 <(W, L,D,Y)(62)(66)(76 31)(2 74)(58 99)(15)(16 66)> 15 <(G, H, C, Y)(63 99)(16)(22 58)(51)(66)(96)(50) B 4 19 (45 36) (94)(96 29)(18)> B 5 <(B, L, I, Y)(40 62)(15)(40)(29 40)(24 63)(2 74 88)> 15 B 6 <(G, H, I, J)(23 96)(50)(2 22)(16)(58)(10 74)(51 63)> 15 B 7 <(W, X, D, O)(22)(58)(96)(88)(58 78)> 10 B 8 <(B, A, I, O)(22 41)(2 74)(31 76)(2 74)(22)(58 66)> 15 B 9 <(W, H, C, F)(2 22)(24)(22 50 66)(50)(16)> 12 Table2. Local patterns Local patterns: approxmate sequental patterns L A 0 <(B, X, D, Y)(20)(63 24)(2)(5)(2 74)(95) > 12 A 1 <(F, A, Z, F)(66 62 50)(16)(16 30 22) > 11 A 2 <(W, A, D, F)(6)(24 65 93)(2 24 16 63)(58) > 13 A 3 <(W, L, D, Y)(62)(66)(76 31)(2 74)(58 99)(15) > 13 <(G, H, C, Y)(63 99)(16)(22 58)(51)(66)(96)(50) A 4 (45 36) (94)(96 29)> 18 <(G, H,C,Y)(63 99)(16)(22 58)(51)(66)(96)(50)(45 A 5 15 36)> A 6 <(G, H, I, J)(23 96)(50)(2 22)(16)(58)(10 74)(51 63)> 13 A 7 <(W, X, C, O)(22)(58 66)(96)(88)(58 78)> 11 Clearly, 8 local patterns are generated and recover major parts of the base patterns wth hgh expected frequency n the database from 1000 sequences, each of the 8 approxmate patterns match a base pattern well. The recoverablty s excellent at 90.66%. The precson s qute good at P1-2/8797.7%. In all approxmate patterns, only 2 tems ((W, X, C, O) (58 66)) do not appear on the correspondng poston n the base pattern. There were no spurous patterns and only one redundant pattern A 5. Ths s because B 4 s too long, as a result of the long B 4, the sequences generated from a long base pattern B 4 can be parttoned nto multple clusters by ApproxMGMSP. To sum up, ApproxMGMSP s an effectve method of mnng multdmensonal sequental patterns. 4.2 Scalablty analyss of ApproxMGMSP The followng experments have been carred out to text the scalablty of ApproxMGMSP. Group 1, the recoverablty changes as dfferent sequence numbers on the average length of sequence L 20, the average tem length I 2.5, the number of tem 10000, the number of base pattern N seq 1000, the average length of base pattern L seq 14, the average tem length of base pattern Iseq 2, the number of neghbor sequences k4, the mnmum degree, 50% ג the results n Fgure 1. Group 2, the recoverablty changes as dfferent average lengths of sequence on N 100000, I 2.5, 10000, N seq 1000, I seq 2, k4,, 50% ג the results n Fgure 2. Group 3, the executve tme of ApproxMGMSP changes as dfferent dmensons on N 100000, L 20, I 2.5, 10000, N seq 1000, I seq 2, k4,, 50% ג the results n Fgure 3. Fgure1. Recoverablty vs. N Fgure2. Recoverablty vs. L

Fgure3. Runnng Tme vs. Dmenson Obvously, we observe that ApproxMGMSP s scalable wth respect to database sze from Fgure 1. The more the sequences n the database, the better the recoverablty. For a base pattern wth the same Probablty n sequences, the large the database sze, the more the approxmate sequental patterns, so there are more sequences smlar to base patterns, and the recoverablty s more ncreased. From Fgure 2, we could fnd that ApproxMGMSP s scalable wth respect to the average length of sequence. That s because the larger the average length of sequence, the more the repeated tems, so the recoverablty s more ncreased. We can see from Fgure 3, the executve tme decrease wth the ncreasng dmensons. As the ncreasng dmensons, the man task of the entre mnng process s to mne dmensonal nformaton ncreasngly, and dmensonal nformaton mnng does not need to fnd the mnmum dstance between the sequences by sequence comparson. So, wth the dmenson ncreasng, the runnng tme has decreased gradually. 5. Concluson and future work A scalable method s proposed n ths paper to mne multdmensonal sequental patterns effectvely. The multdmensonal nformaton s embedded nto the correspondng sequences to convert complex mnng on multdmensonal sequences to mnng on sequences n ths method. If the dmenson s low, we could adopt the mnng method based on support n every ste, and obtan global multdmensonal sequental patterns by collectng local patterns. But the tradtonal approach would have a lot of redundancy and short patterns, and dffcult to resolve long patterns when the dmenson s hgh. So, the method of mnng approxmate sequences s adopted to mne local patterns, and fnally collect global patterns by hgh vote sequental patterns. The experments show that ths scalable method not only smplfy the problem of mnng multdmensonal patterns, but also resolve the ssue of hgh dmenson. Although ths approach s very effcent for mnng multdmensonal sequental patterns n large databases n the dstrbuted envronment, t brngs a hgh degree of complexty. So, reducng complexty of ApproxMGMSP and the evaluaton of global sequental pattern mnng are our future researches. Acknowledgements: The research n the paper s supported by the Natonal Natural Scence Foundaton of Chna under Grant No. 60673060; the Natonal Facltes and Informaton Infrastructure for Scence and Technology of Chna under Grant No. 2004DKA20310; the Natural Scence Foundaton of Jangsu Provnce under Grant No. BK2005047 ; the Qng Lan Project Foundaton of Jangsu Provnce of Chna. References [1] H. Pnto, J. Han and J. Pe, Mult- dmensonal Sequental Pattern Mnng, In Proc. of the 10 th Int. Conf. on Informaton and Knowledge Management (CIKM), ACM, Atlanta, Georga, pp. 81-88, November 2001. [2] S. Zhang, X. Wu, and C. Zhang, Mult-Database Mnng, IEEE Computatonal Intellgence Bulletn, Vol.2, No.1, pp. 5-13, June 2003. [3] X. Wu and S. Zhang, Syntheszng Hgh-Frequency Rules from Dfferent Data Sources, IEEE Transactons Knowledge Data Engneerng, Vol.15, No.1, pp. 353-367, January 2003. [4] C. Zhang, M. Lu, W. Ne, and S. Zhang, Identfyng Global Exceptonal Patterns n Mult-database Mnng, IEEE Computatonal Intellgence Bulletn, Vol.3, No.1, pp. 19-24, Feb 2004. [5] H.C. Kum, J.H. Chang, W. Wang, Sequental Pattern Mnng n Mult-Databases va Multple Algnment, Data Mnng & Knowledge Dscovery, Vol.12, No.1, pp. 151-180, January 2006. [6] R. Srkant and R. Agrawal, Mnng Sequental Patterns: Generalzatons And Performance Improvements, In Proc. of the 5 th Int. Conf. on Extendng Database Technology (EDBT), Sprnger, Avgnon, France, pp. 3-17, March 1996. [7] J. Pe, J. Han, H. Pnto, Q. Chen and U. Dayal, PrefxSpan: Mnng Sequental Patterns Effcently by Prefx-Projected Pattern Growth, IEEE Transactons on Knowledge & Data Engneerng, Vol.16, No.1, pp. 1424-1440, January 2004. [8] M. Zak, SPADE: An Effcent Algorthm for Mnng Frequent Sequences, Machne Learnng, Vol.42, No. 1/2, pp. 31-60, January 2001. [9] R. Agrawal and R. Srkant, Mnng Sequental Patterns, In Proc. of the 11 th Int. Conf. on Data Engneerng (ICDE), IEEE Computer Socety, Tape, Tawan, pp. 3-14, March 1995.