Identifying The Stay Point Using GPS Trajectory of Taxis Hao Xiao 1,a, Wenjun Wang 2,b, Xu Zhang 3,c

Applied Mechanics and Materials Online: 2013-08-08 ISSN: 1662-7482, Vols. 353-356, pp 3511-3515 doi:10.4028/www.scientific.net/amm.353-356.3511 2013 Trans Tech Publications, Switzerland Identifying The Stay Point Using GPS Trajectory of Taxis Hao Xiao 1,a, Wenjun Wang 2,b, Xu Zhang 3,c 1 Tianjin Key Laboratory of Cognitive Computing and Application, 2 School of Computer Science and Technology Tianjin University Tianjin, China a xiaohao1989825@163.com, b wwj@pku.org.cn, c xiaoyaoliu@tju.edu.cn Keywords: GPS,Trajectory,Stay Point,CB SMOT,POI,Taxi Abstract.With the widespread use of personal mobile communications location-aware devices, a large amount of data of trajectory produced and can be used in information services. These huge amounts of data involves the pattern of human behavior information and cause numerous researchers' research interests. As is known,the key to travel information mining from the trajectory data is the stay point recognition and semantic annotation.overcoming the shortcomings on adaptability and resistance to noise exists in existed stay points identification methods, and also combined with the basic characteristics of the taxi GPS data,we proposed a way with an parameter optimization stratage to get the stay points from a single trajectory and the figure shows it really works well, with high precision and strong adaptability on the recall ratio and precision ratio.and then,based on this significant achievements,we applies a refined clustering method based on the clustering radius and frequency parameters and get the POI results. Introduction Actually, track records user activity in the real world, and these activities will be in a certain extent, reflects the personal intentions, preferences, and behavior patterns[1]. If understands the laws of individual life, location based service will gives the users effective recommendation at the appropriate time[2]. Simple route of visualization and trajectory of exchange is not fully excavated hidden knowledge and one of the most important step of trajectory information mining is to model the users trajectory to semantic level,recongising the stay point and find related Point Of Interest. Along with the development of the Internet of things technology, many vehicles are equipped with GPS devices[3]. In this article, we will use the taxi GPS data to experiment, to confirm in a more general situation of application effect. A GPS trajectory is usually consists of a series of coordinate points with a timestamp. Each contains the basic information such as latitude and longitude coordinates. In this path, we can through some algorithm to detect the user stayed in place. The stay point not refers to the speed zero point, but by a group of actual GPS points, as shown in figure 1 Figure 1. Stay points..a,.b,.c,.d,.e form a stop point. and.f,.g, H,,.I,.J form a stay point..it said users stranded in a certain area of events over a certain time range. Compared with other GPS point, these spots contain more important semantic information, such as user been to restaurants and cinemas. Based on these spots can be user to represent historical trajectory of a point sequence, this provides the basis for the user behavior modeling, significantly reducing the amount of data processing. All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans Tech Publications, www.ttp.net. (ID: 130.203.136.75, Pennsylvania State University, University Park, USA-10/05/16,15:48:15)

3512 Advances in Civil and Industrial Engineering This article will base on the continuity characteristics of the GPS track status, proposing a clustering method based on space-time relationship and more adaptable to large-scale taxi trajectory data. We study an area of POI, needs a lot of different subjects (such as a person or a vehicle) of GPS data, But in order to model a user s behavior which is fundmental and important, we need to reveal a person's behavior intention hidden in a single trajectory.to extract stay point sequence from a series of trajectory, and this is the first and vital step of our algorithm. The calculation process of this step is a very delicate, because the distance between each continual track points is small and we need to combine clusters based on time or space similarity. In the second stage we will process to get stay point cluster together, and we can get a big point set, and each point track is a representative of the stop point. And at this point we need to judge different points according to the frequency, in general sense, how many times did the the stay appears in different sequences.we cluster these single track stay points again in order to eliminate repetition.because calculated stops may be around a building, and our goal should be locked to the level of a building, it can stay within a certain range to merge the clusters, so as to achieve our final result. Related Works Traditional stay point recongition uses constant time stationary point as the basis of the identification[5], and part of the study, will be 180 degrees in the direction of the shift as one of the important marks a stay[6]. Part of this research will obtaining signal loss and the time difference between the features to identify again and identification[7]. But the above three kinds of methods are based on single track points, the space-time characteristics of threshold logic algorithm is simpler, antinoise ability is poor, often difficult to achieve the ideal accuracy, need other methods or artificial auxiliary. We record in such cases, the space that track points distribution density is higher than moves.some better methods adopt the way of density slice divided by testing the common features of adjacent trajectory points.these methods mainly including K-value algorithm, DBSACAN algorithm DJ-Cluster algorithm, and CB-SMoT algorithm, and so on.because of the noise of a single point, K-value is very likely to fail to recongnise a stay sequence. DJ-Cluster algorithm is a classical density algorithm improved on DBSCAN clustering. This algorithm assumes that GPS trajectory, such as records, did not take data loss into account.but in actual cases, GPS trajectory data loss phenomenon is quite common.cb-smot algorithm DBSCAN in neighborhood set method from point counting change to time threshold, solves the problems before the interval assumptions can handle with missing data[8]. But the handling method in the field of processing using the distance between the adjacent point accumulation, is larger, the influence of drift in the GPS trajectory stay especially when the drift far fatal influence on the accuracy of this method will be. Because of the distance accumulation, in the field of inevitable interference, long drift will stay orbit split into multiple individual, the identification accuracy is still limited. Data Processing Part adopted in the experiment data of tianjin city taxi GPS data, these data actually is the geography data with a timestamp, we processed the data into GPX file format. As shown in Figure 2, red point is to record the location of GPS points, but the yellow line represents a two adjacent time points of the trajectory of the attachment.we need to do, such as processing, and setting time interval as small as possible and fill the missing part, which can make detection accuracy improved. In fact, this taxi track data itself also contains a lot of information.the data records the taxi is with passengers or not, that is the taxi loading and unloading switch in fact accurately expressed the taxi stop at a time. But often these boarding location is actually the user travel start place or the destination, if the passengers getting on or off from a certain area above a certain frequency, it means the region is likely to be a point of interest.

Applied Mechanics and Materials Vols. 353-356 3513 Figure 2. Stay points and trajectories As a matter of fact, different taxi stop position overlap provides us a chance to discover and verify the stay points, probably the calculated stay point is a point of interest, and point of interest should be independent of the trajectory of a single sample. Methodology If we have obtained a different terminal trajectory sequence, which corresponds to all of its trajectory clustering, has completed the preliminary modeling for single individual behavior, forming the collection of stay point.obviously we need to overcome the problems in the process are the effects of noise on the clustering results, and fully considering the trajectory characteristics of itself, namely the continuity of time and space. We based on the basic situation of the CB-SMoT, made two aspect of improvements. Figure 3. Algorithm for detecting stay points. As shown in Figure3.The first is Eps parameter optimization, Original CB algorithm parameters of Eps is calculated with gauss formula, simply speaking, is to figure out all the distance between the adjacent point, put into an array, and then the array average u, standard deviation r, to set a parameter a (0 < a < 1), with gauss formula to calculate the Eps. The result is that when a = 0.5, Eps = u; A. the Eps will increase, as reduced, Eps will decrease. Eps floats up and down according to the value of the average.this approach is flawed, because some taxi often stay in the same point, lead to the average distance from u is very small, even for 1 m.for some taxis,because of the loss of signal, the average

3514 Advances in Civil and Industrial Engineering distance is very large, even more than 500 m. Eps is too large and too small, for calculating stops are very bad, leading to the decrease of the recall, which led to the decrease of the precision. The optimized algorithm is shown in Figure 3. I do the following simple optimization of Eps, if through the gauss formula calculation of Eps is less than 30 m, adjust to 100 m; If the Eps is larger than 500m, adjust to 350m while Eps is smaller than 10m adjust to 100m.Of course there are other optimization methods, but this kind of method is the most effective and adaptive.the second is for the time on the adjacent cluster, if time interval is less than mintime, and intersect on the space area, the two clusters will be combined,the result is each trajectory s collection of point Cluster.Two parameters are required for combination, respectively, the clustering radius and frequency parameters. We put in a range of points, such as a building is considered to be a point, that the very low frequency points are eliminated, so that you get the point of collection of output, and the frequency is high,the possibility that a stay point is a point of interest is relatively larger. Data Analysis We have adopted different terminal of data on different methods to do the experiments repeatedly, verifying the effect of different methods under different parameters threshold.of course, our purpose is to compare in effective situation with relatively high precision and recall ratio. Figure 4. The statistical result in K-value and CB-SMoT For K-value algorithm, after many experiments, we think for taxi data clustering effect will be relatively good if we pick up the d = 50 m. As shown in figure 4, if m = 12, we can accurately identify all point while the m value has shrunk it will cause a decline in precision and if make the value of m bigger it will lead to check recall rate of decline. For CB - SMoT algorithm in taxi data, a = 0.7 can achieve relatively good results, at this point, if the mintime = 250 s, we are able to accurately identify all the stop point, along with the mintime value rise, precision rate fell slightly, and with the mintime fell, the recall rate dropped significantly. Figure 5. The statistical result of CB-SMoT and Refined Algorithm When we use the improved algorithm, to observe two values influence on identification accuracy, take the mintime = 350s.As shown in Figure5,if a is less than 0.86 we can accurately identify all the real stops and recall ratio is above 80%, this improvement is very obvious on the performance. And if the value is greater than 0.86,it is also able to exhibit excellent performance to a certain extent. And when we take a = 0.7 to measure the influence of different values of mintime on identification results, when the mintime value is 250 or greater, identified all the staying points, and recall ratio have greater stability than the basic one. When the mintime drops,we can accurately identify all

Applied Mechanics and Materials Vols. 353-356 3515 points, but accuracy is slightly down, but it is also not obvious. Comprehensively, this algorithm have increased greatly on the adaptability and stability compared with the original, and can obtain higher recall and precision rate. Conclusion Combining the basic characteristics of the taxi GPS data,we proposed a way with an optimization stratage from basic CB-SMoT algorithm to get the stay points from a single trajectory and the figure shows it really works well, with high precision and strong adaptability. The innovation point is that through experiments we find the way to refine the parameter Eps in the taxi GPS trajectories.we adopt Tianjin taxi GPS data in our experiment, proves that the calculation really have the very big enhancement, on the recall ratio and precision ratio.and then,based on this significant achievements, we applies a refined clustering method based on frequency and get the POIs. Acknowledgements We would like to express our gratitude to the National Science and Technology Pillar Program(2013BAK02B00, 2013BAK02B06, 2012BAK03B00 and 2012BAK03B06), the major research plan of the National Natural Science Foundation (91224009), Science and Technology Pillar Program of Tianjin Municipal Science and Technology Commission (12CZDSF07200). References [1] Yang Ye, Yu Zheng, Yukun Chen, Jianhua Feng, Xing Xie, Mining Individual Life Pattern Based on Location History, Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, p.1-10, May 18-20, 2009. [2] Yu Zheng, Like Liu, Longhao Wang, Xing Xie, Learning transportation mode from raw gps data for geographic applications on the web, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China [3] Yu Zheng, Longhao Wang, Ruochi Zhang, Xing Xie, Wei-Ying Ma, GeoLife: Managing and Understanding Your Past Life over Maps, Proceedings of the The Ninth International Conference on Mobile Data Management, p.211-212, April 27-30, 2008 [4] Spaccapietra S,Parent C,Damiana M L,et al.a conceptual view on trajectories[j]. Data & Knowledge Engineering,2008,65:126-146 [5] Nadine Schuessler, Kay W. Axhausen.Processing raw data from Global Positioning Systems without additional information.transportation Research Record: Journal of the Transportation Research28-36 [J].2009.10.6 [6] Andrey Tietbohl Palma,Vania Bogorny,Bart Kuijpers,et al.a clustering-based approach for discovering interesting places in trajectories[a].procedings of the ACM Symposium on Applied Computer,Advances inspatial and Image-Based Information Systems Track[C], Fortaleza, Brazil, 16-20 March,2008,pp.863-868 [7] Jianhe Du,Lisa Aultman-Hall.Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets:automatic trip end identification issues[j].transportation Research Part A,2007,41:220-232 [8] Zhihua Zhang,Minhe Ji.Hierarchical segmentation for identifying activity stop from GPS trajectories. IEEE Transactions on Engineering Management,2010,57:9-21.

Advances in Civil and Industrial Engineering 10.4028/www.scientific.net/AMM.353-356 Identifying the Stay Point Using GPS Trajectory of Taxis 10.4028/www.scientific.net/AMM.353-356.3511