Identifying The Stay Point Using GPS Trajectory of Taxis Hao Xiao 1,a, Wenjun Wang 2,b, Xu Zhang 3,c

Similar documents
Clustering Analysis based on Data Mining Applications Xuedong Fan

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c

Research Of Data Model In Engineering Flight Simulation Platform Based On Meta-Data Liu Jinxin 1,a, Xu Hong 1,b, Shen Weiqun 2,c

Research on the Application of Digital Images Based on the Computer Graphics. Jing Li 1, Bin Hu 2

An Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles

Applied Mechanics and Materials Vol

Construction of the Library Management System Based on Data Warehouse and OLAP Maoli Xu 1, a, Xiuying Li 2,b

AN IMPROVED TAIPEI BUS ESTIMATION-TIME-OF-ARRIVAL (ETA) MODEL BASED ON INTEGRATED ANALYSIS ON HISTORICAL AND REAL-TIME BUS POSITION

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

The RTP Encapsulation based on Frame Type Method for AVS Video

The Analysis of the Loss Rate of Information Packet of Double Queue Single Server in Bi-directional Cable TV Network

A Data Classification Algorithm of Internet of Things Based on Neural Network

Accuracy of Matching between Probe-Vehicle and GIS Map

Study on the Quantitative Vulnerability Model of Information System based on Mathematical Modeling Techniques. Yunzhi Li

Application of Three-dimensional Visualization Technology in Real Estate Management Jian Cui 1,a, Jiju Ma 2,b, Dongling Ma 1, c and Nana Yang 3,d

An Improved Method of Vehicle Driving Cycle Construction: A Case Study of Beijing

An improved PageRank algorithm for Social Network User s Influence research Peng Wang, Xue Bo*, Huamin Yang, Shuangzi Sun, Songjiang Li

Utilizing Restricted Direction Strategy and Binary Heap Technology to Optimize Dijkstra Algorithm in WebGIS

Trajectory analysis. Ivan Kukanov

Research on Full-text Retrieval based on Lucene in Enterprise Content Management System Lixin Xu 1, a, XiaoLin Fu 2, b, Chunhua Zhang 1, c

3 The standard grid. N ode(0.0001,0.0004) Longitude

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

Evaluation of Meta-Search Engine Merge Algorithms

Shape Optimization Design of Gravity Buttress of Arch Dam Based on Asynchronous Particle Swarm Optimization Method. Lei Xu

The Application Analysis and Network Design of wireless VPN for power grid. Wang Yirong,Tong Dali,Deng Wei

Simulation Technology of Light Effect Based on Catia and Workbench Software HongXia Hu

result, it is very important to design a simulation system for dynamic laser scanning

Customizing dynamic libraries of Qt based on the embedded Linux Li Yang 1,a, Wang Yunliang 2,b

The Analysis and Research of IPTV Set-top Box System. Fangyan Bai 1, Qi Sun 2

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Research on motion tracking and detection of computer vision ABSTRACT KEYWORDS

Serial Communication Based on LabVIEW for the Development of an ECG Monitor

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

An Efficient Character Segmentation Algorithm for Printed Chinese Documents

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

An embedded system of Face Recognition based on ARM and HMM

Keywords: trajectory data; stops and moves; improved DBSCAN algorithm; temporal and spatial properties

A Kind of Fast Image Edge Detection Algorithm Based on Dynamic Threshold Value

Constructing an University Scientific Research Management Information System of NET Platform Jianhua Xie 1, a, Jian-hua Xiao 2, b

Web Data mining-a Research area in Web usage mining

Fingerprint Ridge Distance Estimation: Algorithms and the Performance*

Design and Implementation of LED Display Screen Controller based on STM32 and FPGA Chi Zhang 1,a, Xiaoguang Wu 1,b and Chengjun Zhang 1,c

Periodic Pattern Mining Based on GPS Trajectories

Research on the Checkpoint Server Selection Strategy Based on the Mobile Prediction in Autonomous Vehicular Cloud

Density Based Clustering using Modified PSO based Neighbor Selection

Research on Algorithm Schema of Parametric Architecture Design Based on Schema Theory

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Survey on Recommendation of Personalized Travel Sequence

Power Load Forecasting Based on ABC-SA Neural Network Model

The Gene Modular Detection of Random Boolean Networks by Dynamic Characteristics Analysis

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Research on Design and Application of Computer Database Quality Evaluation Model

Analysis Range-Free Node Location Algorithm in WSN

An Adaptive Threshold LBP Algorithm for Face Recognition

Design and Realization of Data Mining System based on Web HE Defu1, a

Research of Video Surveillance and Diagnosis System for Plant Diseases Based on DM6446 Wang Xiuqing 1, a Qie Xu 1, b Zhang Chunxia 1, c Zhao Na 1, d

Realization of Automatic Keystone Correction for Smart mini Projector Projection Screen

Towards New Heterogeneous Data Stream Clustering based on Density

Real Time Access to Multiple GPS Tracks

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article

A System for Discovering Regions of Interest from Trajectory Data

Research on Destination Prediction for Urban Taxi based on GPS Trajectory

A Digital Menu System Based on the Cloud client Technology Lin Dong 1, a, Weibo Li 1, b, Ping He 2,c,Jia Liu 1,d

Linear Feature Extraction from Scanned Color

Research of 3D parametric design system of worm drive based on Pro/E. Hongbin Niu a, Xiaohua Li b

Kaiserstr. 10; Karlsruhe, Germany *corresponding Author

RiMOM Results for OAEI 2010

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology

Web Usage Mining: A Research Area in Web Mining

Reconstructing Uncertain Pedestrian Trajectories From Low-Sampling-Rate Observations

Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority

Bridge Surface Crack Detection Method

The Design of CAN Bus Communication System Based on MCP2515 and S3C2440 Jinmei Liu, Junhong Wang, Donghui Sun

A Survey on DBSCAN Algorithm To Detect Cluster With Varied Density.

Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks

OSM-SVG Converting for Open Road Simulator

Design and Implementation of unified Identity Authentication System Based on LDAP in Digital Campus

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations

International Journal of Scientific Research and Modern Education (IJSRME) Impact Factor: 6.225, ISSN (Online): (

Open Access Algorithm of Context Inconsistency Elimination Based on Feedback Windowing and Evidence Theory for Smart Home

Keywords: Interactive electronic technical manuals; GJB6600; XML markup language; Automatic control equipment

Introduction to and calibration of a conceptual LUTI model based on neural networks

Recognition of Human Body Movements Trajectory Based on the Three-dimensional Depth Data

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

M Thulasi 2 Student ( M. Tech-CSE), S V Engineering College for Women, (Affiliated to JNTU Anantapur) Tirupati, A.P, India

A Totally Astar-based Multi-path Algorithm for the Recognition of Reasonable Route Sets in Vehicle Navigation Systems

Text Clustering Incremental Algorithm in Sensitive Topic Detection

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities

A Compatible Public Service Platform for Multi-Electronic Certification Authority

Improving Suffix Tree Clustering Algorithm for Web Documents

5th International Conference on Information Engineering for Mechanics and Materials (ICIMM 2015)

The Research of A multi-language supporting description-oriented Clustering Algorithm on Meta-Search Engine Result Wuling Ren 1, a and Lijuan Liu 2,b

STUDY OF THE DEVELOPMENT OF THE STRUCTURE OF THE NETWORK OF SOFIA SUBWAY

A Road Network Construction Method Based on Clustering Algorithm

Research on Hybrid Network Technologies of Power Line Carrier and Wireless MAC Layer Hao ZHANG 1, Jun-yu LIU 2, Yi-ying ZHANG 3 and Kun LIANG 3,*

Excavation Balance Routing Algorithm Simulation Based on Fuzzy Ant Colony

Semantic Website Clustering

Transcription:

Applied Mechanics and Materials Online: 2013-08-08 ISSN: 1662-7482, Vols. 353-356, pp 3511-3515 doi:10.4028/www.scientific.net/amm.353-356.3511 2013 Trans Tech Publications, Switzerland Identifying The Stay Point Using GPS Trajectory of Taxis Hao Xiao 1,a, Wenjun Wang 2,b, Xu Zhang 3,c 1 Tianjin Key Laboratory of Cognitive Computing and Application, 2 School of Computer Science and Technology Tianjin University Tianjin, China a xiaohao1989825@163.com, b wwj@pku.org.cn, c xiaoyaoliu@tju.edu.cn Keywords: GPS,Trajectory,Stay Point,CB SMOT,POI,Taxi Abstract.With the widespread use of personal mobile communications location-aware devices, a large amount of data of trajectory produced and can be used in information services. These huge amounts of data involves the pattern of human behavior information and cause numerous researchers' research interests. As is known,the key to travel information mining from the trajectory data is the stay point recognition and semantic annotation.overcoming the shortcomings on adaptability and resistance to noise exists in existed stay points identification methods, and also combined with the basic characteristics of the taxi GPS data,we proposed a way with an parameter optimization stratage to get the stay points from a single trajectory and the figure shows it really works well, with high precision and strong adaptability on the recall ratio and precision ratio.and then,based on this significant achievements,we applies a refined clustering method based on the clustering radius and frequency parameters and get the POI results. Introduction Actually, track records user activity in the real world, and these activities will be in a certain extent, reflects the personal intentions, preferences, and behavior patterns[1]. If understands the laws of individual life, location based service will gives the users effective recommendation at the appropriate time[2]. Simple route of visualization and trajectory of exchange is not fully excavated hidden knowledge and one of the most important step of trajectory information mining is to model the users trajectory to semantic level,recongising the stay point and find related Point Of Interest. Along with the development of the Internet of things technology, many vehicles are equipped with GPS devices[3]. In this article, we will use the taxi GPS data to experiment, to confirm in a more general situation of application effect. A GPS trajectory is usually consists of a series of coordinate points with a timestamp. Each contains the basic information such as latitude and longitude coordinates. In this path, we can through some algorithm to detect the user stayed in place. The stay point not refers to the speed zero point, but by a group of actual GPS points, as shown in figure 1 Figure 1. Stay points..a,.b,.c,.d,.e form a stop point. and.f,.g, H,,.I,.J form a stay point..it said users stranded in a certain area of events over a certain time range. Compared with other GPS point, these spots contain more important semantic information, such as user been to restaurants and cinemas. Based on these spots can be user to represent historical trajectory of a point sequence, this provides the basis for the user behavior modeling, significantly reducing the amount of data processing. All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans Tech Publications, www.ttp.net. (ID: 130.203.136.75, Pennsylvania State University, University Park, USA-10/05/16,15:48:15)

3512 Advances in Civil and Industrial Engineering This article will base on the continuity characteristics of the GPS track status, proposing a clustering method based on space-time relationship and more adaptable to large-scale taxi trajectory data. We study an area of POI, needs a lot of different subjects (such as a person or a vehicle) of GPS data, But in order to model a user s behavior which is fundmental and important, we need to reveal a person's behavior intention hidden in a single trajectory.to extract stay point sequence from a series of trajectory, and this is the first and vital step of our algorithm. The calculation process of this step is a very delicate, because the distance between each continual track points is small and we need to combine clusters based on time or space similarity. In the second stage we will process to get stay point cluster together, and we can get a big point set, and each point track is a representative of the stop point. And at this point we need to judge different points according to the frequency, in general sense, how many times did the the stay appears in different sequences.we cluster these single track stay points again in order to eliminate repetition.because calculated stops may be around a building, and our goal should be locked to the level of a building, it can stay within a certain range to merge the clusters, so as to achieve our final result. Related Works Traditional stay point recongition uses constant time stationary point as the basis of the identification[5], and part of the study, will be 180 degrees in the direction of the shift as one of the important marks a stay[6]. Part of this research will obtaining signal loss and the time difference between the features to identify again and identification[7]. But the above three kinds of methods are based on single track points, the space-time characteristics of threshold logic algorithm is simpler, antinoise ability is poor, often difficult to achieve the ideal accuracy, need other methods or artificial auxiliary. We record in such cases, the space that track points distribution density is higher than moves.some better methods adopt the way of density slice divided by testing the common features of adjacent trajectory points.these methods mainly including K-value algorithm, DBSACAN algorithm DJ-Cluster algorithm, and CB-SMoT algorithm, and so on.because of the noise of a single point, K-value is very likely to fail to recongnise a stay sequence. DJ-Cluster algorithm is a classical density algorithm improved on DBSCAN clustering. This algorithm assumes that GPS trajectory, such as records, did not take data loss into account.but in actual cases, GPS trajectory data loss phenomenon is quite common.cb-smot algorithm DBSCAN in neighborhood set method from point counting change to time threshold, solves the problems before the interval assumptions can handle with missing data[8]. But the handling method in the field of processing using the distance between the adjacent point accumulation, is larger, the influence of drift in the GPS trajectory stay especially when the drift far fatal influence on the accuracy of this method will be. Because of the distance accumulation, in the field of inevitable interference, long drift will stay orbit split into multiple individual, the identification accuracy is still limited. Data Processing Part adopted in the experiment data of tianjin city taxi GPS data, these data actually is the geography data with a timestamp, we processed the data into GPX file format. As shown in Figure 2, red point is to record the location of GPS points, but the yellow line represents a two adjacent time points of the trajectory of the attachment.we need to do, such as processing, and setting time interval as small as possible and fill the missing part, which can make detection accuracy improved. In fact, this taxi track data itself also contains a lot of information.the data records the taxi is with passengers or not, that is the taxi loading and unloading switch in fact accurately expressed the taxi stop at a time. But often these boarding location is actually the user travel start place or the destination, if the passengers getting on or off from a certain area above a certain frequency, it means the region is likely to be a point of interest.

Applied Mechanics and Materials Vols. 353-356 3513 Figure 2. Stay points and trajectories As a matter of fact, different taxi stop position overlap provides us a chance to discover and verify the stay points, probably the calculated stay point is a point of interest, and point of interest should be independent of the trajectory of a single sample. Methodology If we have obtained a different terminal trajectory sequence, which corresponds to all of its trajectory clustering, has completed the preliminary modeling for single individual behavior, forming the collection of stay point.obviously we need to overcome the problems in the process are the effects of noise on the clustering results, and fully considering the trajectory characteristics of itself, namely the continuity of time and space. We based on the basic situation of the CB-SMoT, made two aspect of improvements. Figure 3. Algorithm for detecting stay points. As shown in Figure3.The first is Eps parameter optimization, Original CB algorithm parameters of Eps is calculated with gauss formula, simply speaking, is to figure out all the distance between the adjacent point, put into an array, and then the array average u, standard deviation r, to set a parameter a (0 < a < 1), with gauss formula to calculate the Eps. The result is that when a = 0.5, Eps = u; A. the Eps will increase, as reduced, Eps will decrease. Eps floats up and down according to the value of the average.this approach is flawed, because some taxi often stay in the same point, lead to the average distance from u is very small, even for 1 m.for some taxis,because of the loss of signal, the average

3514 Advances in Civil and Industrial Engineering distance is very large, even more than 500 m. Eps is too large and too small, for calculating stops are very bad, leading to the decrease of the recall, which led to the decrease of the precision. The optimized algorithm is shown in Figure 3. I do the following simple optimization of Eps, if through the gauss formula calculation of Eps is less than 30 m, adjust to 100 m; If the Eps is larger than 500m, adjust to 350m while Eps is smaller than 10m adjust to 100m.Of course there are other optimization methods, but this kind of method is the most effective and adaptive.the second is for the time on the adjacent cluster, if time interval is less than mintime, and intersect on the space area, the two clusters will be combined,the result is each trajectory s collection of point Cluster.Two parameters are required for combination, respectively, the clustering radius and frequency parameters. We put in a range of points, such as a building is considered to be a point, that the very low frequency points are eliminated, so that you get the point of collection of output, and the frequency is high,the possibility that a stay point is a point of interest is relatively larger. Data Analysis We have adopted different terminal of data on different methods to do the experiments repeatedly, verifying the effect of different methods under different parameters threshold.of course, our purpose is to compare in effective situation with relatively high precision and recall ratio. Figure 4. The statistical result in K-value and CB-SMoT For K-value algorithm, after many experiments, we think for taxi data clustering effect will be relatively good if we pick up the d = 50 m. As shown in figure 4, if m = 12, we can accurately identify all point while the m value has shrunk it will cause a decline in precision and if make the value of m bigger it will lead to check recall rate of decline. For CB - SMoT algorithm in taxi data, a = 0.7 can achieve relatively good results, at this point, if the mintime = 250 s, we are able to accurately identify all the stop point, along with the mintime value rise, precision rate fell slightly, and with the mintime fell, the recall rate dropped significantly. Figure 5. The statistical result of CB-SMoT and Refined Algorithm When we use the improved algorithm, to observe two values influence on identification accuracy, take the mintime = 350s.As shown in Figure5,if a is less than 0.86 we can accurately identify all the real stops and recall ratio is above 80%, this improvement is very obvious on the performance. And if the value is greater than 0.86,it is also able to exhibit excellent performance to a certain extent. And when we take a = 0.7 to measure the influence of different values of mintime on identification results, when the mintime value is 250 or greater, identified all the staying points, and recall ratio have greater stability than the basic one. When the mintime drops,we can accurately identify all

Applied Mechanics and Materials Vols. 353-356 3515 points, but accuracy is slightly down, but it is also not obvious. Comprehensively, this algorithm have increased greatly on the adaptability and stability compared with the original, and can obtain higher recall and precision rate. Conclusion Combining the basic characteristics of the taxi GPS data,we proposed a way with an optimization stratage from basic CB-SMoT algorithm to get the stay points from a single trajectory and the figure shows it really works well, with high precision and strong adaptability. The innovation point is that through experiments we find the way to refine the parameter Eps in the taxi GPS trajectories.we adopt Tianjin taxi GPS data in our experiment, proves that the calculation really have the very big enhancement, on the recall ratio and precision ratio.and then,based on this significant achievements, we applies a refined clustering method based on frequency and get the POIs. Acknowledgements We would like to express our gratitude to the National Science and Technology Pillar Program(2013BAK02B00, 2013BAK02B06, 2012BAK03B00 and 2012BAK03B06), the major research plan of the National Natural Science Foundation (91224009), Science and Technology Pillar Program of Tianjin Municipal Science and Technology Commission (12CZDSF07200). References [1] Yang Ye, Yu Zheng, Yukun Chen, Jianhua Feng, Xing Xie, Mining Individual Life Pattern Based on Location History, Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, p.1-10, May 18-20, 2009. [2] Yu Zheng, Like Liu, Longhao Wang, Xing Xie, Learning transportation mode from raw gps data for geographic applications on the web, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China [3] Yu Zheng, Longhao Wang, Ruochi Zhang, Xing Xie, Wei-Ying Ma, GeoLife: Managing and Understanding Your Past Life over Maps, Proceedings of the The Ninth International Conference on Mobile Data Management, p.211-212, April 27-30, 2008 [4] Spaccapietra S,Parent C,Damiana M L,et al.a conceptual view on trajectories[j]. Data & Knowledge Engineering,2008,65:126-146 [5] Nadine Schuessler, Kay W. Axhausen.Processing raw data from Global Positioning Systems without additional information.transportation Research Record: Journal of the Transportation Research28-36 [J].2009.10.6 [6] Andrey Tietbohl Palma,Vania Bogorny,Bart Kuijpers,et al.a clustering-based approach for discovering interesting places in trajectories[a].procedings of the ACM Symposium on Applied Computer,Advances inspatial and Image-Based Information Systems Track[C], Fortaleza, Brazil, 16-20 March,2008,pp.863-868 [7] Jianhe Du,Lisa Aultman-Hall.Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets:automatic trip end identification issues[j].transportation Research Part A,2007,41:220-232 [8] Zhihua Zhang,Minhe Ji.Hierarchical segmentation for identifying activity stop from GPS trajectories. IEEE Transactions on Engineering Management,2010,57:9-21.

Advances in Civil and Industrial Engineering 10.4028/www.scientific.net/AMM.353-356 Identifying the Stay Point Using GPS Trajectory of Taxis 10.4028/www.scientific.net/AMM.353-356.3511