MINING CTMSPS IN LBS

Size: px

Start display at page:

Download "MINING CTMSPS IN LBS"

Anabel Doyle
5 years ago
Views:

1 981 MINING CTMSPS IN LBS Pooja Mauskar*, Manisha Naoghare** *Department of Computer Engineering, SVIT, Chincholi ** Department of Computer Engineering, SVIT, Chincholi ABSTRACT Increasing popularity of wireless technology leads to research in mining and prediction of mobile movements and associated transactions. Discovery of mobile patterns from logs may not be precise enough for predictions since the varying mobile behaviors among users and temporal periods are not considered. In this work, a novel algorithm, namely, CTMSP-Mine is proposed, to discover the CTMSPs. A prediction strategy is proposed to predict the mobile behaviors. In CTMSP- Mine, user clusters are constructed by a novel algorithm named Smart CAST and similarities between users are evaluated by the proposed measure, LBS-Alignment. Also a time segmentation approach is proposed to find segmenting time intervals where similar mobile characteristics exist. behavior, mobile transaction database is complicated. Each cluster has different mobile behaviors at various time intervals. Prediction can be more precise if we can find the corresponding mobile patterns in each user cluster and time interval. Effective mobile behavior mining systems are required to provide precise locationbased service for users. Keywords - LBS, mining techniques, clustering methods, mobile environment I. INTRODUCTION Emerging trends in area wireless communication techniques and the popularity of mobile phones, PDA, and GPS-enabled cellular phones, have contributed to a new business model. Mobile users can request services through their mobile devices via ISAP from anywhere at any time. This business model is known as MC that provides LBS through mobile phones. Popularity of MC is increasing as e- commerce. MC is based on the cellular network composed of several base stations. Communication within hexagonal area called a cell is controlled by base station. While moving information about users locations and service requests are stored in a centralized mobile transaction database. MC scenario in Fig. 1 shows that user moves within mobile network and requests services in the corresponding cell via mobile devices. User moves in a sequence as shown in Fig 1a where cells are underlined only if service is requested there. When user moved to the location A at time 5, requested service is s1 that can be represented via record of service transaction as shown in Fig 1b.As a large amount of mobile transaction records are produced by user's mobile Fig.1 An example for a mobile transaction sequence. a) Moving sequences. b) Service sequences [1]. In this project a novel data mining algorithm named CTMSP-Mine [1] is proposed to effectively mine CTMSPs of user. To effectively predict user s subsequent behavior using discovered CTMSPs, prediction strategies are proposed. To mine CTMSPs, firstly transaction clustering algorithm named smart CAST [18] is proposed that builds cluster model for mobile transactions based on proposed LBS- Alignment similarity measure. Then advantage of GA is taken to produce more suitable time interval table. The proposed method discovers all CTMSPs based on produced user cluster and time interval table. II. LITERATURE SURVEY In recent years, a number of studies have discussed the usage of data mining techniques to discover useful rules/patterns from: World Wide Web (WWW) Transaction databases Mobility data.

982 This study can be briefly classified into mobile pattern mining techniques, clustering methods, temporal pattern mining techniques, and mobile behavior predictions.

2 982 This study can be briefly classified into mobile pattern mining techniques, clustering methods, temporal pattern mining techniques, and mobile behavior predictions. A number of studies have discussed the usage of data mining techniques to discover useful rules/patterns, transaction databases mining association rules [2] are proposed to find important items in a transaction database. Agrawal and Srikant[2] proposed the Apriori algorithm to mine the association rules. Sequential pattern mining was first introduced in [4] to search for time ordered patterns, known as sequential patterns within transaction databases. The clustering analysis can be roughly divided into two categories. The first category is on similarity measures that may affect the final clustering results directly. LCSS [5], DTW [6], ERP [7] and Euclidean distance [9] are most popular similarity measures for string sequence or time series data analysis. Since mobile transaction sequences are not only time series movement string but also with service sequences, it is crucial to properly define the similarity between different sequences. The second category is on the clustering methods. The most well-known clustering method is the k-means algorithm, which is partition based. Other partition-based methods contain k-medoids, Partitioning around Medoids (PAM), etc. These methods partition the data set into k clusters, based on similarities between data items, where k is a parameter specified by the user. For density-based clustering methods, Ben-Dor and Yakhini [8] proposed the Cluster Affinity Search Technique (CAST) that requires an affinity threshold t, where 0 < t < 1. But CAST is basic clustering algorithm and hence cannot evaluate. However, the similarity between mobile transactions cannot be measured by the Euclidean distance. Besides, most clustering methods request the users to set up some parameters before the clustering task. However, in real applications, it is difficult to determine the right parameters manually for the clustering tasks. Previous studies and applications consider time to be an important factor. The segmenting points of the time intervals influence the precision rate of mobile behavior prediction. Because it is not easy to find the best segmentation points of time intervals, the genetic algorithm is generally used to solve such complicated problems. SMAP-Mine was first proposed to discover sequential mobile access rules and predict the user s next locations and services. The mobile behavior predictions can be roughly divided into two categories. The first category is time series-based prediction that can be divided into two types: 1) linear models and 2) nonlinear models. The second category is pattern-based prediction. However, these methods can only predict the next spatial locations of objects. SMAP-Mine [20] was first proposed to discover sequential mobile access rules and predict the user s next locations and services. Yun and Chen [21] proposed the MSP to predict the next mobile behaviors. However, there is no work that considers the temporal factor, i.e., users at different time may have different mobile behaviors. III. IMPLEMENTATION DETAILS 1. Block Diagram Figure 2 shows the conceptual block diagram. In that all the major tasks which are present in system are shown. The system is mainly divided into four parts. Those are: Clustering of mobile users Time segmentation of mobile transaction sequences Mining of mobile behaviors Mobile behavior prediction for mobile users using combined approach Fig 2: Conceptual Block Diagram The methodology is organized into subsequent sections, where first section, section 3 contains clustering of mobile transactions which has subsection 3.1 explains LBS Alignment algorithm and subsection 3.2 gives brief description of CAST algorithm. The section 3.3 gives detailed explanation of segmentation of mobile transactions. Section respectively. Section 3.4 gives detailed description of the mining of mobile transactions with subsection gives frequent transaction mining, gives mobile transaction database transformation and gives CTMSP mining technique. Section 3.5 gives prediction strategies explains GA with subsections explaining selection, crossover and fitness function in , ,

983 2. System Framework and Methodology Fig. 3. shows the proposed system framework. System has an offline mechanism for CTMSPs mining and an online engine for mobile behavior prediction.

Table 1 shows an example of mobile transaction database which contains seven records.

3 System Framework and Methodology Fig. 3. shows the proposed system framework. System has an offline mechanism for CTMSPs mining and an online engine for mobile behavior prediction. When mobile users move within the mobile network, the information which includes time, locations, and service requests will be stored in the mobile transaction database. Table 1 shows an example of mobile transaction database which contains seven records. In each user s record there are tuples consisting of time at which user requests some service, location of user and service number. In the offline data mining mechanism, there are two design techniques and the CTMSP-Mine algorithm to discover the knowledge. First, the CAST algorithm is proposed to cluster the mobile transaction sequences. In this algorithm, the LBS-Alignment is proposed to evaluate the similarity of mobile transaction sequences. behaviors. The first task to tackle is to cluster mobile transaction sequences. A parameter-less clustering algorithm CAST is proposed. Before performing the CAST, a similarity matrix S, based on the mobile transaction database is generated. The entry S i,j in matrix S represents the similarity of the mobile transaction sequences i and j in the database, with the degrees in the range of [0, 1]. TABLE 1 MOBILE TRANSACTION DATABASE 3.1 Location Based Service Alignment algorithm Fig 3: System Framework Second, a GA based time segmentation algorithm is proposed to find the most suitable time intervals. After clustering and segmentation, a user cluster table and a time interval table are generated, respectively. Third, the CTMSP-Mine algorithm is proposed to mine the CTMSPs from the mobile transaction database according to the user cluster table and the time interval table. In the online prediction engine, a behavior prediction strategy is proposed to predict the subsequent behaviors according to the mobile user s previous mobile transaction sequences and current time. The main purpose of this framework is to provide mobile users a precise and efficient mobile behavior prediction system. 3. Clustering mobile transaction database In a mobile transaction database, users in the different user groups may have different mobile transaction A mobile transaction sequence can be viewed as a sequence string, where each element in the LBS Alignment is based on the consideration that two mobile transaction sequences are more similar, when the orders and timestamps of their mobile transactions are more similar. Based on this concept, the TP and the SR in in the LBS-Alignment are generated. The base similarity score is set as 0.5. Two mobile transactions can be aligned if their locations are the same. Otherwise, a location penalty is generated to decrease their similarity score. The location penalty is defined as 0.5/( s 1 + s 2 ),where s 1 and s 2 are the lengths of sequences s 1 and s 2,respectively. When two sequences are totally different, their similarity score is 0.When two mobile transactions are aligned, their time penalty and service reward is measured. TP focuses on their time distance. The farther the time distances between them, the larger their time penalty. TP that is generated to decrease their similarity score is defined as ( s1.time - s2.time )/len, where len indicates the time length. SR focuses on the similarity of the service requests. The more similar their service requests, the larger their service reward. SR that is generated to increase their similarity score is defined as ( s1.services s2.services )/ ( s1.services s2.services ).Fig. 3.3 shows the procedures of an LBS-Alignment measure. Input

4 984 data include two mobile transaction sequences (line 1). Output data are the similarity between two mobile transaction sequences, with the degrees in the range from 0 to 1 (line 2). Some parameters are initialized (line 4 to line 7). The base similarity score is set as 0.5 (line 5). Dynamic programming to calculate M i,j (line 8 to line18) is used. M i,j indicates the value of matrix M in column i and row j, where M is the score matrix of LBS- Alignment. In this procedure, if the locations of two transactions are the same (line 10), both the time penalty (line 11) and the service reward (line 12) are calculated to measure the similarity score (line 13). Otherwise (line 14), the location penalty is generated to decrease the similarity score (line 15). Finally, s.length, s.length is returned as the similarity score of the two mobile transaction sequences (line 19). (3) CAST handles more general inputs. Namely, it allows the user to specify both a real-valued similarity matrix, and a threshold parameter which determines what is considered significantly similar. This parameter controls the number and sizes of the produced clusters. The input to the algorithm is a pair < where is an n-by-n similarity matrix, and t is a similarity cutoff. The clusters are constructed one at a time. The currently constructed cluster is denoted by Copen. The affinity of an element is defined as x, denoted by a(x), to be the sum of similarity values between x and the elements in Copen. An element x is of high affinity if a(x) t Copen. Otherwise, x is called of low affinity. An elements' status (high /low affinity) depends on Copen. Roughly speaking, CAST alternates between adding high affinity elements to Copen, and removing low affinity elements from it. When this process stabilizes Copen is closed and a new cluster is started. A pseudo-code of the algorithm is given in Fig. 3.4.The cleaning" steps in CAST serve to avoid a common shortcoming shared by many popular clustering techniques (such as single linkage,completelinkage, group-average, and centroid).due to their greedy" nature, once a decision to join two clusters is made, it cannot be reversed. Fig. 4: LBS-Alignment algorithm 3.2 Cluster Affinity Search Technique (CAST): The algorithm relies on average similarity (affinity) between unassigned vertices and the current cluster seed to make its next decision. However, it differs from the theoretical algorithm in some aspects: (1) Theoretical algorithm repeats the same process for many initial seeds. Here cleaning steps are used to remove spurious elements from cluster seeds and avoid the repetition. (2) CAST adds (and removes) elements from the current seed one at a time (and not independently, as in the theoretical algorithm). Heuristically, this helps by strengthening the constructed seed, thus improving the decision base for the next step. Fig 5: CAST Algorithm [8]

985 3.3 Segmentation of Mobile Transactions In a mobile transaction database, similar mobile behaviors exist under some certain time segments.

A GAbased method is proposed to automatically obtain the most suitable time segmentation table with common mobile behaviors. Fig.

5 Segmentation of Mobile Transactions In a mobile transaction database, similar mobile behaviors exist under some certain time segments. Hence, it is important to make suitable settings for time segmentation so as to discriminate the characteristics of mobile behaviors under different time segments. A GAbased method is proposed to automatically obtain the most suitable time segmentation table with common mobile behaviors. Fig. 7 shows the procedure of time segmentation method, named Get Number of Time Segmenting Points (GetNTSP) algorithm. The input data are a mobile transaction database D and its time length T (line 01). The output data are the number of time segmenting points (line 02). For each item, total number of occurrences at each time point (line 07 to line 11) is accumulated. Therefore, an item (location, service) can draw a curve of count distribution, as shown in Fig. 8. For all curves, the time points with the largest change rate are found (line 13). The rate of change is defined as(c[i+1]-c[i])/(1+c[i]), where c[i] represents the total number of occurrences for the item at time point i. Count occurrences of all these time points (line 15), and find out the satisfied time points whose counts are larger than or equal to the average of all occurrences from these ones, and then, take these satisfied ones as a set of the TPS (line 17). In the time point sequence, calculate the average time distance a between two neighboring time points (line 18). Then calculate the number of neighboring time point pairs, in which the time distance is higher than a (line 19 to line 23). The result represents the time segmentation count (line 24) Genetic Algorithm Once time segmenting points are obtained from NTSP algorithm to obtain most suitable time interval Genetic Algorithm (GA) is used. Typically GA is a search heuristic/methodology based on process of natural evolution depicting survival of the fittest. GA was initially proposed by John Holland in early 70s while researching Cellular Automata. Evaluating better phenotype/candidate (in this case time segments) using GA is an iterative process which will derive Fitness Function giving optimized/fittest time segment for given random input. The weakest chromosomes become obsolete at the end of iteration as evolution continues to flow through following operating drivers of GA. Fig 8 shows steps in GA Selection Fig 8. Steps in GA For the selection operator, a proportion of the current time segments are selected to product the next population in an iteration resulting a new generation. Individual chromosomes i.e. time segments are selected based on their fitness value. Fitness function measures the quality of the represented solution. Based on fitness value chromosomes are selected. If fitness value is higher than probability of selecting that candidate is more Crossover Fig 7 GetNTSP Algorithm Next step produce second generation of chromosomes is to introduce crossover which essentially is recombination. One-point crossover that involves a crossover probability to this operator is applied to do this. To breed next generation, a crossover point on both parent chromosomes is randomly selected. All time segments beyond the crossover point are interchanged between the two parent chromosomes.

6 986 The resulting chromosomes are the children i.e. next generation. Only the best time segments from the first generation are selected for breeding so that stronger generation is produced Fitness Function A better time interval segmentation will result in higher standard deviation of the frequency table. Therefore, the fitness function of chromosome X is defined in (1). Fitness( X ) Len X 1 1 Nc Ns i1 c1 s1. (3.1) N c -total no. of cells Ns-total no of services T i [c, s]-request count of cell c & service s in time interval T i avg. service request count 3.4 Mining mobile transactions A mobile transaction database is complicated since a huge amount of mobile transaction logs is produced based on the user s mobile behaviors. Data mining is a widely used technique for discovering valuable information in a complex data set and a number of studies have discussed the issue of mobile behavior mining. However, mobile behaviors vary among different user clusters or at various time intervals. The prediction of mobile behavior will be more precise if we can find the corresponding mobile patterns in each user cluster and time interval. To provide precise locationbased services for users, effective mobile behavior mining systems are required pressingly. In order to mine the cluster-based temporal mobile sequential patterns efficiently, a novel method named CTMSP-Mine is proposed to achieve this mining procedure. The entire procedures of CTMSP-Mine algorithm can be divided into three main steps: Nc Ns T c, s i T i 2 support of each cell and service is counted in each user cluster and time interval according to the user cluster table and time interval table. The patterns, i.e., frequent 1-transactions are kept, whose support satisfies the userspecified minimal support threshold T SUP. A candidate 2- transaction is generated by joining two frequent 1- transactions if their user clusters, time intervals, and cells are the same. Then, next patterns are kept, i.e., frequent 2-transactions, whose support is larger than T SUP. Finally, the same procedures are repeated until no candidate transaction is generated. The frequent transactions are shown in Table 2. Besides, construct a service mapping table to transform services into F- Transactions in Table 2. For each service set, use a contiguous and unique symbol LS i (Large Service i) to represent it. The mapping procedure can reduce the time required to check if a mobile sequential pattern is contained in a mobile transaction sequence. After frequent transaction mapping, the frequent 1-CTMSPs can be obtained in Table Mobile transaction database transformation In this phase, F-Transactions are used to transform each mobile transaction sequence S into a frequent mobile transaction sequence. According to Table 2, if a transaction T in S is frequent, T would be transformed into the corresponding F-Transaction. Otherwise, the cell of T would be transformed into a part of path. Table 4 shows the result of frequent mobile transaction database transformed from Table 1. The main objectives and advantages are :1) service sets can be represented by symbols for efficiently processing; and 2) transactions whose support is less than the minimal support threshold can be eliminated to reduce the size of database. Table 2. Frequent Transactions [1] 1) Frequent-Transaction Mining 2) Mobile Transaction Database Transformation, and 3) CTMSP Mining Frequent Transaction Mining In this phase, the frequent transactions (F Transactions) in each user cluster and time interval are mined by applying a modified Apriori algorithm [2]. At first, the

987 Table 3.Frequent 1 Transactions [1] The internal nodes in the tree store the frequent mobile transactions, and the leaf nodes store the corresponding paths.

7 987 Table 3.Frequent 1 Transactions [1] The internal nodes in the tree store the frequent mobile transactions, and the leaf nodes store the corresponding paths. Moreover, every parent node of a leaf node is designed as a hash table which stores the combinations of user cluster tables and time interval tables. The procedure of the CTMSP-Tree generation is as follows: Step 1: CTMSP-Mine generates candidate 2-CTMSPs by hashing each combination of frequent transactions from the frequent mobile transaction sequence in each pair of user cluster and time interval, and then, stores the results in the CTMSP-Tree Mobile transaction database transformation In this phase, F-Transactions are used to transform each mobile transaction sequence S into a frequent mobile transaction sequence. According to Table 2, if a transaction T in S is frequent, T would be transformed into the corresponding F-Transaction. Otherwise, the cell of T would be transformed into a part of path. Table 4 shows the result of frequent mobile transaction database transformed from Table 1. The main objectives and advantages are :1) service sets can be represented by symbols for efficiently processing; and 2) transactions whose support is less than the minimal support threshold can be eliminated to reduce the size of database CTMSPs Mining In this phase, all the CTMSPs from the frequent mobile transaction database are mined. Frequent 1- CTMSPs are obtained in the frequent-transaction mining phase. In the Frequent-transaction mining phase. In the mining algorithm, a two-level tree named Cluster-based Temporal Mobile Sequential Pattern Tree (CTMSP-Tree) is utilized. Table 4. Frequent Mobile Transaction Database Step 2: To identify frequent 2-CTMSPs, CTMSP-Mine checks the candidate patterns whose support is larger than the minimal support threshold. Step 3: CTMSP-Mine counts the support of candidate 3- CTMSPs and identifies the frequent 3-CTMSPs. The goal of CTMSP-Tree is to efficiently generate candidate mobile sequential patterns because CTMSP-Tree can quickly compare two patterns whether they have the same first and last transactions. Step 4: Repeat Step 3 until no more candidate patterns can be generated 3.5 Prediction Strategies Three prediction strategies for selecting the appropriate CTMSP are proposed to predict the mobile behaviors of users: 1) The patterns are selected only from the corresponding cluster a user belongs to; 2) The patterns are selected only from the time interval corresponding to current time; and 3) The patterns are selected only from the ones that match the user s recent mobile behaviors. If there exist more than one pattern that satisfies the above conditions, the one with the maximal support is selected. The CTMSPs are selected from the corresponding user cluster and time interval. IV. EXPERIMENTAL RESULTS After completing coding part, implemented algorithms are tested on sample of Dataset shown in TABLE 1. All research work is done in C# on 2.4 GHz machine with 4 GB of memory running Windows 7.

988 The chapter is organized into subsequent sections, where first section, section 1 gives results for dataset shown in TABLE1. The results of LBS are shown in 1.1.Subsection 1.

6 gives results of GA, results of CTMSPs and results of prediction strategies respectively. 1.

It is checked, whether it is giving correct results as provided in [1]. main idea is to narrow down the range of pi effectively. Therefore total four threshold values are applied.

25, obtained clusters are shown in fig.4.1. When the value of pi=0.25 two clusters are generated.

8 988 The chapter is organized into subsequent sections, where first section, section 1 gives results for dataset shown in TABLE1. The results of LBS are shown in 1.1.Subsection 1.2 gives results of CAST algorithm. In subsection 1.3 results of time segmenting algorithm are given. Subsections 1.4, 1.5 and 1.6 gives results of GA, results of CTMSPs and results of prediction strategies respectively. 1. Results for dataset shown in TABLE 1 (Each stepwise result) Dataset provided in TABLE 1 is used for the first testing of algorithms. It is checked, whether it is giving correct results as provided in [1]. main idea is to narrow down the range of pi effectively. Therefore total four threshold values are applied. The results of clustering are shown in figures shown below. Input: N by N similarity matrix Output: Clusters Total 4 affinity threshold values applied. 1) For pi=0.25, obtained clusters are shown in fig.4.1. When the value of pi=0.25 two clusters are generated. First cluster C 1 contains four users with ids {1, 4, 2, 7} and second cluster C 2 contains three users with ids {3, 5,6}. 1.1 Results obtained for LBS Alignment Algorithm LBS Alignment is based on the consideration that two mobile transaction sequences are more similar (refer section 3.1.1), when the orders and timestamps of their mobile transactions are more similar. Based on this concept, the time penalty (TP) and the service reward (SR) in the LBS-Alignment are specifically designed. The base similarity score is set as 0.5. Two mobile transactions can be aligned if their locations are the same. Otherwise, a location penalty is generated to decrease their similarity score. Fig.9: Clusters with pi=0.25 2) For pi=0.50, obtained clusters are shown in fig.4.2. When the value of pi=0.50 six clusters are generated. First cluster C 1 contains two users with ids {1,4}, C 2 contains user with id {2}, C 3 contains user with id {3}, C 4 contains user with id {5}, C 5 contains user with id {6} and C 6 contains user with id {7}. Input: Mobile transaction sequences from TABLE 1 Output: Similarity matrix TABLE 5 shows the similarities between 7 users. First row and column shows the user ids. From second row and second column each M i,j shows similarities between i th and j th users TABLE 5: Similarity Matrix for 7 users Fig.10: Clusters generated with pi=0.50 3) For pi=0.75 and pi=1.0 gives same clusters shown in fig.4.3. When the value of pi is 0.75 and 1.0, number of clusters formed equal to the number of users. As similarity between any users is not more than 0.75, all users will form their own clusters. First cluster C 1 contains user with id {1}, C 2 contains user with id {2}and so far C 7 contains user with id {7}. 1.2 Results of CAST algorithm: In CAST algorithm various threshold [pi] values are applied, ranging from 0 to 1(Refer section 3.1.2).The Fig.11: Clusters generated with pi=0.75 and pi=1.0 In all four cases we got some clusters as a output of a CAST algorithm. All these results are dependent on value of threshold. If threshold is small, less number of

9 989 clusters are formed. As the value of threshold increases more number of clusters are formed. 1.3 Results of Time Segmenting Algorithm Input: Mobile transaction Database and time length (Refer section 3.2). Output: Number of time segmenting points. Accumulative distributions for each location service pair These time points can be sorted as 5(2), 7(1), 10(2), 13(1), 20(1), 25(1), 28(2), and 30(2), where t(n) indicates that the number of time points t is n. Frequency wise time points are arranged in TABLE 7 TABLE 7: FREQUENCY OF TIME POINTS WITH LARGEST CHANGE RATE Maximum time change rates Frequency Time point sequence having frequency equal to or greater than average number of time points (refer section 3.2) is shown in TABLE 8.Here, average number of time points are 2. Fig. 12: Accumulative distributions [1] Fig. 4.4 shows the accumulative distributions for the mobile transaction database in TABLE 1. There are 12 pairs of locations and services, and their time points with the largest change rates are 5, 10, 13, 30, 5, 28, 10, 7, 19, 30, 25, and 28.The same change rates are shown in TABLE 6. TABLE 6: TIME POINTS WITH LARGEST CHANGE RATES (Location, Service) Maximum time change rate (A,S1) 5 (D,S2) 10 (F,{S3, S4}) 13 (K,S2) 30 (B,S1) 5 (D,S4) 28 (H,S3) 10 (C,S3) 7 (K,S5) 19 (A,S3) 30 (G,S1) 25 (F,S3) 28 TABLE 8: TIME POINT SEQUENCE Maximum time change rates Frequency Output: Time interval obtained=1 between 10 and 28. This result tells number time slots in which users request services of same type. Here in the above example there were only 7 users and limited log entries, so we got only 1 time segmenting point. 1.4 Results of GA Input: Number of time segmenting points and slots limits (refer section 3.3). Output: Time segmenting points. Frequency table obtained after selecting chromosomes is shown in Fig. 4.5.

990 Fig.13 Frequency tables of {13} and {20}. (a) T1: [1-13], T2: [14-32]. (b) T1: [1-20], T2: [21-32] Fitness values for {13} =1.738 and for {20} = 1.

4) Output: CTMSP trees Step by step generated CTMSP trees are shown in Fig. 14: Fig. 14: Two CTMSP Tree After applying linear trimming technique Three CTMPs tree is generated as shown in Fig 15 Fig.

10 990 Fig.13 Frequency tables of {13} and {20}. (a) T1: [1-13], T2: [14-32]. (b) T1: [1-20], T2: [21-32] Fitness values for {13} =1.738 and for {20} = After applying repeated crossover, finally best fitness obtained is {20} 1.5 Results of CTMSP Mining Input: Frequent transaction table (refer section 3.4) Output: CTMSP trees Step by step generated CTMSP trees are shown in Fig. 14: Fig. 14: Two CTMSP Tree After applying linear trimming technique Three CTMPs tree is generated as shown in Fig 15 Fig.15: Final CTMSP Tree Results of Prediction Strategies Final output of the system is the prediction (refer section 3.5). This framework gives next probable location and service users may request, using CTMSP tree. If network provider asks for next location of user 1, then it gives next location is A and service is 1.In this way we get predictions for all users. V. CONCLUSION AND FUTURE WORK In this paper, various algorithms for mining mobile transactions are proposed. Smart CAST algorithm and GA can be used for clustering and time segmentation approach respectively. Along with this, Cluster-Based Temporal Mobile Sequential Patterns mining technique can be used to predict next probable location of user. Using various evaluation measures we are going to test performance of the algorithms. Further work can be extended by improving time segmenting technique. Also CTMSP mining method can be applied to real datasets. In addition, the CTMSP-Mine can be used with other applications, such as GPS navigations, with the aim to enhance precision for predicting user behaviors. REFERENCES [1] Eric Hsueh-Chan Lu, Vincent S.Tseng and Philip S. Yu, Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environment, IEEE Trans. Knowledge and Data engineering, vol. 23, no. 6, June [2] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rule between Sets of Items in Large Databases, Proc. ACM SIGMOD Conf. Management of Data, pp , May [3] L. Chen, M. Tamer O zsu, and V. Oria, Robust and Fast Similarity Search for Moving Object Trajectories, Proc. ACM SIGMOD Conf. Management of Data, pp , June [4] M.-S. Chen, J.-S. Park, and P.S. Yu, Efficient Data Mining for Path Traversal Patterns, IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp , Apr [5] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, Basic Local Alignment Search Tool, J. Molecular Biology, vol. 215,no. 3, pp , Oct [6] L. Chen and R. Ng, On the Marriage of Lp- Norms and Edit Distance, Proc. 30th Int l Conf. Very Large Databases, pp , Aug [7] L. Chen, M. Tamer O zsu, and V. Oria, Robust and Fast Similarity Search for Moving Object

11 991 Trajectories, Proc. ACM SIGMOD Conf. Management of Data, pp , June [8] A. Ben-Dor and Z. Yakhini, Clustering Gene Expression Patterns, J. Computational Biology, vol. 6, no. 3, pp , July1999. [9] J. Han and M. Kamber, Data Mining: Concepts and Techniques, second ed., Morgan Kaufmann, Sept [10] H. Jeung, Q. Liu, H.T. Shen, and X. Zhou, A Hybrid Prediction Model for Moving Objects, Proc. 24th Int l Conf. Data Eng., pp , Apr [11] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Mar [12] S.C. Lee, J. Paik, J. Ok, I. Song, and U.M. Kim, Efficient Mining of User Behaviors by Temporal Mobile Access Patterns, Int l J. Computer Science Security, vol. 7, no. 2, pp , Feb [13] Y.B. Lin, GSM Network Signaling, ACM Mobile Computing and Comm., vol. 1, no. 2, pp , July [14] A. Monreale, F. Pinelli, R. Trasarti, and F. Giannotti, WhereNext: A Location Predictor on Trajectory Pattern Mining, Proc. 15th Int l Conf. Knowledge Discovery and Data Mining, pp , June2009. [15] W.C. Peng and M.S. Chen, Developing Data Allocation Schemes by Incremental Mining of User Moving Patterns in a Mobile Computing System, IEEE Trans. Knowledge and Data Eng., vol. 15, no. 1, pp , Feb [16] Y.B. Lin, GSM Network Signaling, ACM Mobile Computing and Comm., vol. 1, no. 2, pp , July [17] V.S. Tseng, H.C. Lu, and C.H. Huang, Mining Temporal Mobile Sequential Patterns in Location-Based Service Environments, Proc. 13th IEEE Int l Conf. Parallel and Distributed Systems, pp. 1-8, Dec [18] V.S. Tseng and C. Kao, Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method, IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp ,Oct.-Dec [19] J. Veijalainen, Transaction in Mobile Electronic Commerce, Proc. Eighth Int l Workshop Foundations of Models and Languages for Data and Objects, pp , Sept [20] V.S. Tseng and W.C. Lin, Mining Sequential Mobile Access Patterns Efficiently in Mobile Web Systems, Proc. 19th Int l Conf. Advanced Information Networking and Applications, pp , Mar [21] C.H. Yun and M.S. Chen, Mining Mobile Sequential Patterns in a Mobile Commerce Environment, IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 37, no. 2, pp , Mar [22] Wen-Chih Peng and Ming-Syan Chen, Allocation of Shared Data based on Mobile User Movement, Proceedings of third international conference on mobile data Management 2002

SURVEY ON PERSONAL MOBILE COMMERCE PATTERN MINING AND PREDICTION

SURVEY ON PERSONAL MOBILE COMMERCE PATTERN MINING AND PREDICTION S. Jacinth Evangeline, K.M. Subramanian, Dr. K. Venkatachalam Abstract Data Mining refers to extracting or mining knowledge from large amounts