Publishing CitiSense Data: Privacy Concerns and Remedies

Size: px
Start display at page:

Download "Publishing CitiSense Data: Privacy Concerns and Remedies"

Transcription

1 University of California, San Diego Master s Project Publishing CitiSense Data: Privacy Concerns and Remedies Author: Kapil Gupta Supervisor: Prof. Bill Griswold March 15, 2013

2 Publishing CitiSense Data: Privacy Concerns and Remedies KAPIL GUPTA University of California, San Diego Abstract Publishing original spatial trajectories obtained from a Location based Service (LBS) to the public or a third party for data analysis could result in serious privacy breaches. CitiSense generates huge collections of spatio-temporal data, variously called moving object data, trajectory data, or mobility data. In the first part of this report we study about the possible privacy violations for an individual such as identity revelation, if the CitiSense data is made public. Later we propose an existing methodology for privacy-preserving data publication called (k, δ)-anonymity and demonstrate its effectiveness on the CitiSense dataset. This technique utilizes the inherent uncertainty of location in order to decrease the extent of distortion required to anonymize data. Location based services data have great utility in various data analysis based applications such as city traffic control, mobility management, urban planning, and location-based service advertisements, just to mention a few. Therefore, extensive amount of research has been done on these data, which is clearly indicated by the large number of spatio-temporal data mining techniques that have been developed in the recent past [28, 29, 27, 42, 43, 49, 36, 37, 44, 9]. As such, it is critical that such techniques to transform a database of trajectories of moving objects, be developed that satisfy some concept of anonymity while maintaining most of their original utility in the transformed database. Anonymity cannot be assured by simply replacing users real identifiers (e.g., name, age, date of birth, etc.) with pseudonyms. As demonstrated in [1], using pseudonyms does not guarantee anonymity, since location is a property that can be used to determine the identification of an individual. For example, if a person is known to follow a certain route every day, it is highly likely that the end-points of the route are the workplace (or school) and the home of that person. Also due to the existence of the quasi-identifier locations, i.e., a set of locations that can be linked to external information to re-identify individuals, the anonymous location data may be traced back to personally identifying information with the help of additional data sources [10]. Contemporary techniques for trajectory data mining and knowledge discovery have concentrated both on the geometrical properties and the background geographic information (semantic trajectory mining) of trajectories. We cannot simply strip the location information from a reading in the CitiSense data as it will hurt the utility of the published data. Adding noise to location data to anonymize it will also hurt the utility of sensors readings in the CitiSense data. On the other hand, if we simply publish the location information of the readings as they are, we risk exposing many forms of sensitive information that the trajectories are likely to contain. Therefore trajectories cannot be released for public use before they are properly anonymized. The problem of location privacy has been well studied in the context of location-based services [39; 46; 31; 22; 47]. The focus is both on on-line, service-centric anonymity and off-line and datacentric anonymity (as in the context of data publishing). In this report, we will focus on the latter and study the problem of anonymity preserving data publishing of the CitiSense Data. We have used the NWA algorithm [6] which extends the concept of k-anonymity [2] to handle the type of data we have, and to utilize its inherent uncertainty [3], [4], [5]. Please note that discussing the extent to which the location of an individual represents vulnerable information or what exactly constitutes private and sensitive information are philosophical, social and individual concerns and beyond the scope of this project. Paper content and organization: The rest of the paper is organized as follows. Section I gives an overview of the CitiSense project and its dataset. Section II describes the preprocessing of the Ci- 2

3 Table 1: Sample reading of publicly available CitiSense Sensor s data sensorid reading datesampled Latitude Longitude locationaccuracy :57: tisense dataset to remove outliers and to compress the data. Section III discusses privacy concerns on the data publication and various information that can be extracted from the published data. This section also presents privacy breaches on the CitiSense data. Section IV, examines existing anonymization techniques and proposes Region of Interest based anonymization and temporal cloaking on the CitiSense data. Section V evaluates the dataset after applying anonymization and discusses the findings. Finally, Section VI concludes the paper and suggests some ideas for extensions to this work. I. CitiSense Dataset CitiSense is a portable pollution monitoring system that allows one to get real-time air quality readings for one s surroundings on a smart phone. The CitiSense system includes small sensors carried by users, users Android mobile phone and a backend infrastructure that stores the collected data. CitiSense devices can estimate air quality in the area where they re deployed, providing information to everyone, not just those carrying sensors. For publishing this dataset to individuals and public health agencies, providing only sensor s reading, date sampled and location information is sufficient. A sample of the publicly available dataset is shown in Table 1. The sensorid can take 7 different values. The dataset used in this project contains readings from 30 users over a period of five weeks (Jul30 Sep7). Total number of rows present in this dataset is more than 21.5 million. The sampling interval for sensor s reading in CitiSense System is very aggressive, usually about a few seconds. Processing data at such high rate would be computationally very challenging. Also, due to high sampling rate, the database will have enormous privacy implications (because even for only 30 days, the data for an individual would be enormous and would contain hundreds of thousands of data points leading to identification of his/her home, office and spatial patterns etc). II. Preprocessing Choosing high sampling rates for acquiring the sensors readings from individuals, leads to massive data collections. Thus, it is imperative to apply data compression methodologies during preprocessing of trajectory. Additionally, filtering data also helps in diminishing noise and assessing higher-level properties such as speed and direction. Since trajectories are normally measured by a sensor, they inevitably have some error, including occasional outliers. Simple techniques like mean and median filtering can reduce these errors. In addition to error reduction, certain filters like the Kalman filter and particle filter can also give error estimates and inferences on speed and direction. Because we acquire data using a sampling-based approach, the representation of object trajectories is in a discrete form despite the object movement being continuous. However, object movements display predictable patterns due to the linear properties of the underlying transportation framework. Consequently, much of the redundant and erroneous data can be eliminated from the trajectory without compromising much of the useful information [8]. These preprocessing steps are also necessary for an attacker to mine the underlying hidden information. A. Trajectory Filtering & Smoothing Due to the uncertainty of the data obtained from GPS devices, outliers need to be removed before behavior mining or region of interest extraction can be done. Filtering of data is particularly essential when one intends to deduce other properties from it, such as speed or direction. In this project, we discuss two filters to eliminate outliers and segment trajectory data into trips on different bases. All the calculations (speed/ acceleration/ compression/ data mining etc) are done after converting location data (latitude, longitude, altitude) into earth s coordinates using Mercator projection [7]. It is a cylindrical map projection which specify how the geographic detail is transferred from the globe to a cylinder tangential to it at the equator. The cylinder is then unrolled to give the planar map 3

4 (see Figure 1 ). how to extract trips from GPS data using the concept of moves, stops etc [11]. Figure 1: A cylindrical map projection to find coordinates in frame of reference of Earth s center using latitude and longitude. Figure taken from [7]. Duplication filter: If the distance between two consecutive positions is smaller than a threshold, the duplication filter removes the second position. CitiSense dataset contains multiple sensor types, leading to multiple readings with same location and time stamps. To make the computation on the dataset efficient, it is essential to remove these duplicate entries. Table 2 presents the results of application of duplication filter on the CitiSense data. Speed and Acceleration filter: It is assumed that individuals move at a plausible speed between two consecutive positions, and that there is a reasonable speed range for individuals (for different means of transportation like walking, biking, car, bus etc). The speed and acceleration filter removes the second position if the speed and/or acceleration between two consecutive positions are/is unreasonable. For example, there are a few readings in the CitiSense data which indicate impossible speed of 546km/hr with 52m/sec 2 acceleration. These invalid readings need to be removed for trajectory analysis and their neighboring data points need to be smoothed accordingly. Table 2 presents the results after application of speed and acceleration filter, with speed limit of 150km/s and acceleration of 10m/s 2, on the CitiSense data. Figure 2 shows the variation of speed for a user in a given trip and smoothed speed after removal of outliers. B. Trip Segmentation Route pattern mining requires recorded data to be segmented into trips. However, asking users to manually turn on and off their GPS devices several times a day for the purpose of trip segmentation would drastically decrease usability of the system and reliability of the data. This section discusses Figure 2: Variation of Speed before and after applying speed and acceleration filter The basic criterion for splitting GPS data is the time gap between two consecutive positions, since a stop indicates the end of a trip. Algorithm A is used to segment the trip. In this algorithm T is the array containing all recorded trips of a person, λ time_gap is the time threshold used to segment trips, λ trj_len is the threshold used to remove short trips, and Funct() is one of the data filtering functions described above. Funct() returns true if the positions comply with the restriction of the data filter; otherwise, it returns f alse, and the corresponding positions are removed. procedure Trip Segmentation(A) Input: T, λ time_gap, λ trj_len, Funct() Output: T tmp T tmp φ for each route r i in T do r tmp = φ for each position p j in r i do if Funct(p j, r i ) returns true then r tmp =Append(r tmp, p j ) else if Time(p j )-Time(p j 1 )>λ time_gap then if Size(r tmp )>λ trj_len then 4

5 T tmp =Append(T tmp, r tmp ) end if end if end for end for Return T tmp end procedure its n 1 predecessors in time. The mean filter can be thought of as a sliding window covering n temporally adjacent values of p i. A major drawback of the mean filter is its sensitivity to outliers. This outlier problem can be alleviated by using a median filter instead of a mean filter. In the median filter, everything is same as in mean filter except that the mean is replaced with a median [8]. ˆx i = median{p i n+1, p i n+2,..., p i 1, p i } (1) Figure 3: Various trips made by user on 10th Aug, Different colors represent different trips. The data filtering process can remove the noisy raw data, and greatly reduce the amount of the original real trip data. Applying trip segmentation on the CitiSense data results in number of trips per person over a period of 30 days. Note that trip count of 0 shows the presence of stationary node in the CitiSense dataset. This implies that simple filters and trip segmentation can identity stationary users in the dataset. The λ time_gap is chosen to be 300 seconds and λ trj_len is 100 meters. Figure 3 shows all the trips made by a user on 10 th Aug, C. Trajectory Smoothing: A simple method to smooth noise is to apply a mean filter. For a measured point p i, the estimate of the (unknown) true value is the mean of p i and Figure 4: Example of Median filter for Trajectory smoothing. Figure taken from [8] See figure 4 to see the Median filter in effect on a sample trajectory with outliers. For smoothing a trajectory both the mean filter and median filter are simple and effective techniques, but both these filters suffer from lag. Kalman filter and the particle filter are two more advanced techniques that reduce lag and can be designed to estimate more than just location. Though they are not used in this project, they are worth exploring. D. Error Measure for Trajectory Compression In this section, we discuss two error measures for the deviation of an approximate trajectory from its original trajectory - perpendicular Euclidean distance and time synchronized Euclidean distance. An estimate of the accuracy of the approximated Table 2: Filtering of the CitiSense Dataset Raw After Duplication filter After Duplication + Speed & Acceleration filter #Rows

6 (a) Error measure based on perpendicular Euclidean distance. This error measure takes into account the geometric relationship of the trajectories. (b) Error measure based on time synchronized Euclidean distance. This error measure takes into account both the geometric relationship and temporal factor of the trajectories. Figure 5: Error Measure for Trajectory Compression. Figure taken from [8] location values can be obtained from the distance between a location on the original trajectory and the estimated location on the approximated trajectory. The shortest distance from a sampled location point in the original trajectory to the approximated trajectory is perpendicular Euclidean distance. A measure of the error can be obtained by the averaging the perpendicular Euclidean distance for all sampled location points. Figure 5(a) illustrates the computation of error measure based on the perpendicular Euclidean distance between the original trajectory acquired by a moving object and an approximated trajectory generated by applying one of the trajectory data reduction algorithms. However, this conception of projecting each of the possible points in the original trajectory onto the segments of approximated trajectory, takes into consideration only the geometric characteristics of the trajectories. The temporal component of object movement in the trajectories is not accounted for [8]. Notice that a sampled data point < x, y, t > in the original trajectory denotes the time t when the moving object are located at x, y. Thus, there is a need to also consider the temporal factor in the projection. To take the temporal factor into account, time synchronized Euclidian distance was proposed [8] as a new error measure for approximated trajectories generated by trajectory data reduction algorithms [24, 25]. This error measure realizes that there should be a "time-synchronization" of the projected movement on the approximated trajectory with the real movement on the actual trajectory. Notice that a sampled data point < x, y, t > in the original trajectory denotes the time t when the moving object are located at x, y. Thus, there is a need to also consider the temporal factor in the projection. Figure 5(b) illustrates the idea of time synchronized Euclidean distance. As shown, the location points on the approximated trajectory, i.e. p 0, p 5 and p 1 6, are already synchronized by time. The other sampled location points, e.g. p 1, p 2, p 3 and p 4, are projected to time synchronized location points p 1, p 2, p 3, and p 4, on the line segment p 0 p 5. E. Trajectory Compression Our aim here is to produce an approximate trajectory from the actual trajectory by eliminating some location points while making sure that the error introduced is negligible. This problem is very much alike the well-studied line generalization problem in computer graphics and cartography [8]. A very simple approximation technique utilizes uniform sampling algorithm, where every i th location points (e.g. 10th, 20th, 30th etc) are retained and the other points are rejected [27]. This approach does not work if each location point in the original trajectory contains different amount of information required to represent the trajectory. Douglas-Peucker (DP), a renowned algorithm, can be employed for the approximation of original trajectory [9,15]. This algorithm, given a curve composed of line segments, finds a similar curve with fewer points. The objective is to use an approximate line segment to replace the actual trajectory. If the replacement does not comply with the specified error conditions, the original problem is partitioned into two sub-problems by choosing the location point responsible for maximum errors as the split point. This partitioning is a recursive process and it continues till it meets the stopping condition. 6

7 Table 3: Variation of each user s data after preprocessing User Id Raw After Filtering After Compression Compression % % % % % % % % % % % % % % % % % % % Total % The stopping condition would be that the error between the approximate and original trajectories falls below the given error threshold. A modified DP algorithm, called the top-down time-ratio (TD-TR) algorithm [24], which uses synchronous Euclidean distance (SED), as compared to the perpendicular Euclidean distance is also very popular algorithm for trajectory compression. Figure 7: Variation of % compression vs SED Figure 6: Pseudocode of proposed GPS trajectory approximation process In this project we have used the GTC trajectory compression algorithm [14] (See Figure 6) which is a greedy solution for the trajectory approximation. It starts from the first point, and the farthest point is found with an approximated SED less than the given error tolerance. The pseudocode is shown in Figure 6. The rest of the analysis in this paper is done on compressed dataset for SED = 5m unless specified. Figure 7 shows the variation of percentage of points/rows left for different values of SED 7

8 used. Table 4 shows variation in number of readings for each user after filtering and compression. III. Privacy Breaches There are many real-life situations when attackers exploit location-detection technologies to gain access to private location information and other sensitive information about victims [16, 17, 18, 19]. Following are some of the techniques which can be applied on LBS data to mine information about the individuals: A. Region of Interest (ROI) In 2008, Spaccapietra et al. proposed the first data model looking at trajectories from the conceptual point of view which provides robust semantic analysis, called stops and moves [11]. A stop is a semantically important part of a trajectory that is relevant for an application, and where the object has stayed for a minimal amount of time. For instance, on weekdays a stop could be an office or workplace and on weekends or holidays, a stop could be a touristic place, a restaurant, a movie theater, etc. Figure 8 describes this idea pictorially. STPM is an extension of Weka for spatio-temporal data. Figure 9 shows some of the stops taken by an individual over a period of 15 days. It can easily be inferred that if a stop is repeated more than a particular number of times, it is region of interest for an attacker. Taking this notion a step forward and plotting stops over time can lead to identification of region of interest for an attacker such as victim s home, office, gym location, preferred shopping mall etc. Although this interpretation requires manual endeavor, there exist semantic trajectory frameworks to perform this automatically [20]. B. Behavior Mining For most purposes, we can assume that individuals adhere to the same paths (approximately) over regular intervals in time. For instance, people usually follow a fixed routine throughout the day; they wake up at the same time, take just about the same route to work and follow daily or weekly chores in a regular way. Therefore, trajectory patterns most likely represent summaries of repeated behavior, in terms of both space (i.e., the regions of space visited during movements) and time (i.e., the duration of movements) [8]. Figure 8: Identifying stops and moves from GPS data points. Figure taken from [23] To extract stops and moves from trajectory points, Alvares et al. introduced an algorithm called IB-SMoT (Intersection Based Stops and Moves of Trajectories) [12]. While IB-SMoT searches for intersections among trajectories, there are several other ways like speed-based spatiotemporal clustering approach (CB-SMoT) to find important points of interest [13]. In this project we have used Weka-STPM [21] to do IB-SMot and CB-SMot analysis. Weka- Figure 9: Visualization of stops of an individual over a period of 30 days. The markers are color-coded to emphasize the frequency of a stop taken by the user. The discovery of hidden periodic movement patterns in spatiotemporal data may violate privacy of users. Figure 10 and 11 provide examples of revelation of hidden information of an individual. 8

9 D. Regular Routes Mining Figure 10: Loss of privacy. It can be inferred that the user is a faculty at CSE, UCSD and uses faculty parking to park his/her car. This technique is useful for mining Regular (or frequently repeated) Routes from users route sets. It involves following steps [33]: Trajectory Similarity: An estimate of the similarity between two trajectories can be obtained by some form of aggregation of distances between trajectory points. On this ideology, we have several similarity functions developed for different purposes, including Closest-Pair Distance, Sum-of- Pairs Distance [34], Dynamic Time Warping (DTW) [38], Longest Common Subsequence (LCSS) [37], and Edit Distance with Real Penalty (ERP) [40], Edit Distance on Real Sequences (EDR) [41]. Even though some of these similarity functions were initially put forth for time series data, they can also be employed for trajectory data as trajectories can be viewed as a distinctive type of time series in multi-dimensional space. Figure 12 shows the basic step to break route into frequent directed edge (FDE) to compare two trajectories [33]. Figure 11: The trajectory paths on weekdays, from 8am to 11 am and from 4pm -9pm for a user. It can be inferred that the user is a student at CSE, UCSD and uses bike as conveyance and takes the same route most of the time. C. Predictive Query Given the recent movements of an individual and the current time, predictive queries ask for the probable location of the individual at some future time. [30, 32] accurately forecast locations when the forecast time is far away from the current time. The long term prediction uses previously extracted movement patterns named Trajectory Patterns, which are a concise representation of behaviors of moving objects as sequences of regions frequently visited within a typical travel time. It has been shown that prediction based on the trajectory patterns of an object is a powerful method [35]. Figure 12: Steps involved in converting a route into FDE to convert it into time series data for further computation. Figure taken from [33]. Routes Grouping: We group the routes that are followed by someone at approximately same times of the day and which have the high trajectory similarity (from above). Finding Regular Routes: Then we mine Regular routes from each set of routes. For qualifying as a regular route, the route must have been traveled on approximately same hours frequently. Figure 13 shows regular routes taken by a particular user. The open source code T-Pattern [48] is used to mine the regular trajectory pattern which uses the algorithm proposed in [23]. 9

10 This section discusses privacy preserving trajectory data publication algorithm. With regards to the difficulties in privacy protection, it is different from continuous LBS data publication in the following ways [45]: (1) The need for privacy protecting mechanisms to be scalable is much more for continuous LBS than for trajectory data publication. This is because continuous LBS s anonymization module handles enormous number of real-time location updates at high rates; whereas trajectory data publication can accomplish the anonymization process offline. (2) Global optimization techniques can be implemented for trajectory data publication as its anonymization process can scrutinize the entire trajectory data (static) for optimization possibilities. On the other hand, attaining global optimization is very tough for continuous LBS, due to run-time data caused by extremely dynamic, unpredictable user movements. Figure 13: Regular routes taken by a user. Red denotes the most common route taken by the user followed by green and blue respectively. On a side note, all the stops of the user are localized and pointing his/her home (Rita Atkinson Residence) and office (CSE, UCSD) location. E. Recognizing Travel Modes: The different travel modes of a route can be recognized. It is observed that a public transport stops frequently, and also stops periodically at fixed positions. Therefore, fixed stop rate (FSR) can be used to recognize the different travel modes along with speed variation. Figure 14 compares the speed variation of 3 users using walking, bike and car for commuting. From this figure we can also see the FSR. IV. Trajectory Anonymization Figure 14: Speed variations for different modes of transportation. Top graph shows mode of transportation as walking with speed of 2-4 miles/hr, middle graph shows mode as biking with speed of 3-7 miles/hr while bottom graph shows mode of transportation as car with speed upto 70 miles/hr. In the literature there are four major trajectory anonymization techniques for static trajectory data publication, namely, clustering-based [6], generalization-based [50], suppression-based [51] 10

11 and grid-based anonymization [30] approaches. In this project we have used a combination of three techniques, namely, clustering based techniques, Temporal Cloaking and ROI anonymization. A. Clustering based Anonymization The clustering-based approach [6] utilizes the uncertainty of trajectory data to group k co-localized trajectories within the same time period to form a k-anonymized aggregate trajectory. Given a trajectory T between times t 1 and t n, i.e., [t 1, t n ], and an uncertainty threshold δ, each location sample in T, p i = (x i, y i, t i ), is modeled by a horizontal disk with radius δ centered at (x i, y i ). The union of all such disks constitutes the trajectory volume of T, as shown in Figure 15. Two trajectories T p and T q defined in [t 1, t n ] are said to be co-localized with respect to δ, if the Euclidean distance between each pair of points in T p and T q at time t [t 1, t n ] is less than or equal to δ. An anonymity set of k trajectories is defined as a set of at least k co-localized trajectories. The cluster of k co-localized trajectories is then transformed into an aggregate trajectory where each of its location points is computed by the arithmetic mean of the location samples at the same time. The clustering-based anonymization algorithm consists of three main steps as mentioned in [6]: 1. Pre-processing step. The main task of this phase is to group all trajectories that have the same starting and ending times, i.e., they are in the same equivalence class with respect to time span. To increase the number of trajectories in an equivalence class, given an integer parameter π, all trajectories are trimmed if necessary such that only one timestamp every π can be the starting or ending point of a trajectory. 2. Clustering step. This phase clusters trajectories based on a greedy clustering scheme. For each equivalence class, a set of appropriate pivot trajectories are selected as cluster centers. For each cluster center, its nearest k 1 trajectories are assigned to the cluster, such that the radius of the bounding trajectory volume of the cluster is not larger than a certain threshold (e.g., δ/2). 3. Space transformation step. Each cluster is transformed into a k-anonymized aggregate trajectory by moving all points at the same time to the corresponding arithmetic mean of the cluster. Figure 15: Uncertain trajectory: uncertainty area, trajectory volume and possible motion curve. Figure taken from [6] Figure 16 gives the trajectory volumes of T p and T q that are represented by grey dotted lines, respectively. The trajectory volume with black lines is a bounding trajectory volume for T p and T q. The bounding trajectory volume is then transformed into an aggregate trajectory which is represented by the sequence of square markers. Figure 16: A (2, δ)-anonymity set formed by two colocalized trajectories, their respective uncertainty volumes, and the central cylindrical volume of radius δ/2 that contains both trajectories. Figure taken from [6] B. ROI based Anonymization As mentioned earlier ROIs are regions where a large number of moving objects remain for at least 11

12 a given time interval. As shown in previous sections, the main threat in publishing the CitiSense data is revelation of home and office locations of the users. Since information is revealed by analyzing stops and moves of trajectory data, easiest way to remedy such kind of privacy threat is to remove from trajectory data, neighboring regions of stops that satisfy certain criteria like duration of stop being greater than 30 minutes etc. Therefore for this kind of analysis, two parameter values need to be decided upon 1) Circular area with radius λ r from a stop, and 2) Duration of stop λ t to qualify a stop for anonymization. Figure 17 shows the result of applying ROI based anonymization on a user s trip. Adding semantic analysis: The previous approach can be improved by taking into account semantics of graphical location. It performs graphical semantic analysis on the stops and tags all the locations as public (like highways, shopping malls, parkways, highways etc) or private (residential places, offices etc). For publishing the data, we can selectively choose the location data tagged as public. Increasing utility: To further decrease the amount of data lost by discarding private location data in dense region, we can take advantage of the notion of k-anonymity. If there are sufficient number of data points available from k or more users within a circular area of radius δ, we can average the readings for that circular area into buckets of minutes or hours and publish them. Publishing private location data in this way will keep our notion of (k, δ)-anonymity and maintain the utility of CitiSense data for places tagged as private. C. Temporal Cloaking All trajectory pattern mining and behavior mining algorithms depend on successful creation of trips from the raw GPS data. If the GPS data does not contain user identifier (as in case of publicly available CitiSense data), the trip segmentation is heavily dependent on temporal pattern. The idea of temporal cloaking is to blur the users presence at a location at a particular time by inserting Gaussian noise into time so that the linear relation between distance and time doesn t hold. Gaussian noise is statistical noise that has its probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution [53]. In other words, the values that the noise can take on are Gaussian-distributed. P(x) = 1 / σ 2 2π e (x µ) 2σ 2 (2) Temporal cloaking can result in drastic decrease in trip segmentation and, hence, revelation of information from trajectory data. It is noteworthy that utility of the CitiSense data is not much affected by introducing uncertainty in time by a few minutes. D. Results Figure 17: Application of ROI based anonymization on a user s trip. All the analysis done in this section uses filtered dataset ( readings) and not the compressed dataset. Data compression was needed for making data mining algorithms computationally efficient. As mentioned in previous section, the inherent uncertainty parameter (δ) in NWA algorithm is set to be 50m while k is set to be 2. The percentage of points changed by NWA algorithm is only about 48%. On further analysis it is seen that most of these 12

13 points are located in one major region (CSE, UCSD). Hence, NWA does not anonymize the entire dataset. The reason behind this concentration of data points is the co-existence of different users at a given time. This shows a problem in the CitiSense data i.e data points are sparsely distributed and there is hardly any other region where CitiSense users coexistent. This drawback of NWA is addressed by ROI based anonymization which is discussed next. Table 4: Variation in number of stops, trips and active days for each user. This information is further used in ROI based anonymization. userid # trips # stops # active days peculiarly greater than the number of trips as data points sampled in a period are concentrated in a dense area leading to no actual movement (see Figure 18). After recognizing points which belong to stops, we need to remove them from the dataset for anonymization. Simply removing points which belong to a stop-cluster can still pose a privacy threat as surrounding points that survived can still be extrapolated to the removed ROIs. To circumvent this possibility, data points from all stops are first clustered using DBSCAN. The advantage of using DB-SCAN (density-based spatial clustering of applications with noise) over other clustering algorithms is that it can find arbitrarily shaped clusters. Once the diameters (d) of the clusters are found, all the data points present in the circular area with radius (d/2 + γ) and center as mean position calculated from that cluster members are removed. Figures 19 and 20 depict this idea pictorially. The two parameters required by above DBSCAN algorithm are set to ɛ = 100 (i.e distance between farthest points in the cluster) and minpts = 200 (i.e minimum number of points to consider a set as a cluster). Also γ is set to 50m. Table 5 presents the result after removing the stops in such manner. From the table, it looks like this approach destroyed the number of data points in the original dataset and hurt the original utility. Although this is not the complete picture. To fully understand why the situation is not as bad as it appears from Table 5, we introduce the concept of coverage. # rows # rows after Anonymization % data loss (in %) Table 5: Loss of data in terms of number of rows. Figure 18: Concentrated stop, leading to 0 trips. Table 4 shows the number of trips and stops taken by users. Also, the number of stops can be Coverage: The coverage by a data point can be defined as the area where the readings from the sensor can be considered same. For example, in CitiSense dataset, CO 2 reading at a location x can be treated same as the reading at location x + γ for a very small value of γ. Hence the surrounding area can be said to be covered by a single data point. The coverage by clustered stops data points can be thought of as the circular area with radius (d/2 + ɛ) and center as the mean calculated from the cluster members. Similarly, for each pair of adjacent moving points, a rectangular area covered by those points can be thought of as covered area. This is 13

14 Figure 19: DB-SCAN clustering performed on the stop data points. Figure 20: To find the coverage loss, areas spanned by stops (circular area) are removed. Total area is calculated by using the notion that if a reading is present at a point, it covers some surrounding area. 14

15 also shown pictorially in Figure 22. Further Figure 21 shows this concept in real trajectory. algorithm on those points, thus, keeping the utility unaltered. Table 6: Loss of data in terms of area coverage. Area is in m 2 Area covered Area covered after Anonymization % coverage loss (in %) Figure 22: The circular area and rectangular boxes around the trajectory path depict areas covered by the data points. Kindly note that in the CitiSense data, utility is directly related to coverage, not to the number of data points. Using this notion of coverage, approximate coverage loss is calculated for ROI based anonymization which is shown in Table 6. Hence ROI based anonymization hurts utility by only 6%. Interestingly, the coverage calculated above does not need to take into account the overlapping of different users trajectories. This is because if there are overlapping trajectories, we can apply NWA We applied temporal cloaking on the dataset obtained by applying the above anonymization techniques. The Gaussian parameters µ and σ for temporal cloaking are set to 600 seconds and 1 respectively. Performing preprocessing on this transformed dataset resulted in creation of 54% less trips as opposed to those created by our previous analysis on non-anonymized dataset. This will result in even lesser information that can be gained (for example, finding regular routes, mode of transportation etc). V. Conclusion Lately, it has been recognized in [7] and in many other works, that k-anonymity alone does not put us on the safe side, because although one individual Figure 21: The rectangular boxes around the trajectory path depict areas covered by the data points. 15

16 is hidden in a group, if the group has not enough diversity of the sensitive attributes then an attacker can still associate one individual to sensitive information. However, in the context of moving object data the problem is very challenging, because location is a particular kind of information that could be considered sensitive as well as quasi-identifier at the same time. Moreover major privacy concern of identification of locations private to the user is resolved by ROI based anonymization method with a mere 6% loss of in coverage. Another concern regarding lack of effectiveness of clustering based anonymization technique as mentioned in the results will disappear when the data becomes denser ( more precisely when each region has more than 1 CitiSense user present at approximately the same time). Temporal Cloaking needs more analysis in order to derive rigorous mathematical guarantees for immunity against attackers. Another interesting area to explore is continuous CitiSense real time data publication. This is a relatively newer field and worth exploring in context of CitiSense. VI. Acknowledgement I would like to thank Prof. Bill Griswold, Department of Computer Sciennce for his constant support and guidance throughout the course of this project. I would also like to thank Prof. Sanjoy Dasgupta and Prof. Hovav Shacham for their valuable inputs. Last but certainly not the least I am grateful to the kind assistance and cooperation of Nima Nikzad and Celal Ziftci for helping me to obtain the CitiSense data. References [1] C. Bettini, X. S. Wang, and S. Jajodia, "Protecting Privacy Against Location-Based Personal Identification." in Proc. of the Second VLDB Workshop on Secure Data Management (SDM 05). [2] P. Samarati and L. Sweeney, "Generalizing data to provide anonymity when disclosing information (abstract)," in Proc. of the 17th ACM Symp. on Principles of Database Systems (PODS 98). [3] O. Wolfson, S. Chamberlain, S. Dao, L. Jiang, and G. Mendez, "Cost and imprecision in modeling the position of moving objects." in Proc. of the 14th IEEE Int. Conf. on Data Engineering (ICDE 98). [4] G. Trajcevski, O. Wolfson, K. Hinrichs, and S. Chamberlain, "Managing uncertainty in moving objects databases." ACM Trans. Database Syst., vol. 29, no. 3, pp , [5] D. Pfoser and C. S. Jensen, "Capturing the uncertainty of moving-object representations." in Proc. of the 6th International Symp. on Advances in Spatial Databases (SSD 99). [6] Osman Abul, Francesco Bonchi, Mirco Nanni, Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases, Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, p , April 07-12, 2008 [7] _projection [8] Y. Zheng, X. Zhou, Computing with spatial trajectories. Springer ISBN: [9] Douglas, D., Peucker, T.: Algorithms for the Reduction of the Number of Points Required to Represent a Line or its Caricature. The Canadian Cartographer 10(2), (1973) [10] Francesco Bonchi, Laks V.S. Lakshmanan, Hui (Wendy) Wang, Trajectory anonymity in publishing personal mobility data, ACM SIGKDD Explorations Newsletter, v.13 n.1, June 2011 [11] Spaccapietra, S., Parent C., Damiani M. L., Macedo J. A., Porto F., Vangenot C A Conceptual View on Trajectories Data and Knowledge Engineering (DKE) [12] L. O. Alvares, V. Bogorny, B. Kuijpers, J. A. F. de Macedo, B. Moelans, and A. Vaisman. A model for enriching trajectories with semantic geographical information. In ACM-GIS, pages , New York, NY, USA, ACM Press [13] Nanni, M., Pedreschi, D Timefocused clustering of trajectories of moving objects. Journal of Intelligent Information Systems 27(3) (2006) [14] M. Chen, M. Xu and P. Franti "Compression of GPS trajectories", Proc. IEEE Data Compression Conf., pp [15] Hershberger, J., Snoeyink, J.: Speeding up the Douglas-Peucker Line simplification Algorithm. In: International Symposium on Spatial Data Handling, pp (1992) [16] Dateline NBC: Tracing a stalker. (2007) [17] FoxNews: Man accused of stalking exgirlfriend with GPS. story/0,2933,131487,00.html (2004) 16

17 [18] USAToday: Authorities: GPS system used to stalk woman. com/tech/news/ gps-stalker_x.htm (2002) [19] Voelcker, J.: Stalked by satellite: An alarming rise in gps-enabled harassment. IEEE Spectrum 47(7), (2006) [20] Yan, Z., (2009), "Towards Semantic Trajectory Data Analysis : A Conceptual and Computational Approach". VLDB 09, Lyon, France. [21] L.O. Alvares, A. Palma, G. Oliveira, and V. Bogorny, "Weka-STPM: From Trajectory Samples to Semantic Trajectories", Proceedings of the XI Workshop de Software Livre, WSL 10, Porto Alegre, Brazil, 2010, pp [22] Gedik, B., and Liu, L. Location Privacy in Mobile Systems: A Personalized Anonymization Model. In Proc. of the 25th Int. Conf. on Distributed Computing Systems (ICDCS 05). [23] Norma Saiph Savage, Shoji Nishimura, Norma Elva Chavez, and Xifeng Yan Frequent trajectory mining on GPS data. In Proceedings of the 3rd International Workshop on Location and the Web (LocWeb 10). ACM, New York, NY, USA. [24] Maratnia, N., de By, R.: Spatio-Temporal Compression Techniques for Moving Point Objects. In: International Conference on Extending Database Technology (EDBT), pp (2004) [25] Potamias, M., Patroumpas, K., Sellis, T.: Sampling Trajectory Streams with Spatio-Temporal Criteria. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp (2006) [26] Ye Qian,Chen Ling,Chen Gencai.Personal continuous route pattern mining[j].journal of Zhejiang University,2009,10(2): [27] gil Lee, J., and Han, J. Trajectory clustering: A partition-and-group framework. In Proc. of the 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 07) (2007), pp [28] gil Lee, J., Han, J., and Li, X. Trajectory outlier detection: A partition-and-detect framework. In Proc. of the 24th IEEE International Conference on Data Engineering (ICDE 08) (2008). [29] gil Lee, J., Han, J., Li, X., and Gonzalez, H. Traclass: Trajectory classification using hierarchical region-based and trajectory-based clustering? abstract. In Proc. of the 34th Int. Conf. on Very Large Databases (VLDB 08) (2008). [30] Gidofalvi, G., Huang, X., Pedersen, T.B.: Privacy-preserving data mining on moving object trajectories. In: Proceedings of the International Conference on Mobile Data Management (2007) [31] Gruteser, M., and Grunwald, D. Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking. In Proc. of the First Int. Conf. on Mobile Systems, Applications, and Services (MobiSys 2003). [32] Freudiger, J., Raya, M., Felegyhazi, M., Papadimitratos, P., Hubaux, J.P.: Mix-zones for location privacy in vehicular networks. In: Proceedings of the InternationalWorkshop onwireless Networking for Intelligent Transportation Systems (2007) [33] Mining Regular Routes from GPS Data for Ridesharing Recommendation Wen He, Deyi Li, Tianlei Zhang, Mu Guo, Lifeng An [34] Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. FODO pp (1993) [35] Anna Monreale, Fabio Pinelli, Roberto Trasarti, Fosca Giannotti, WhereNext: a location predictor on trajectory pattern mining, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France [36] Jeung, H., Liu, Q., Shen, H. T., and Zhou, X. A hybrid prediction model for moving objects. In Proc. of the 24th IEEE International Conference on Data Engineering (ICDE 08) (2008). [37] Zheng, Y., Zhang, L., Xie, X., Ma, W.Y.: Mining interesting locations and travel sequences from gps trajectories. WWW (2009) [38] Yi, B.K., Jagadish, H., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. ICDE (1998) [39] Kido, H., Yanagisawa, Y., and Satoh, T. An Anonymous Communication Technique using Dummies for Location-based Services. In Proc. of the Third Int. Conf. on Pervasive Computing (Pervasive 2005) (2005), pp [40] Chen, Z., Shen, H.T., Zhou, X., Zheng, Y., Xie, X.: Searching trajectories by locations - an efficiency study. SIGMOD (2010) [41] Chen, L., Ozsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. SIGMOD (2005) [42] Li, X., Han, J., Kim, S., and Gonzalez, H. Anomaly detection in moving object. [43] Li, X., Han, J., Lee, J.-G., and Gonzalez, H. Traffic density-based discovery of hot routes in road 17

18 networks. [44] Mamoulis, N., Cao, H., Kollios, G., Hadjieleftheriou, M., Tao, Y., and Cheung, D. W.: Mining, indexing, and querying historical spatiotemporal data. [45] Chow, Chi-Yin: Trajectory Privacy in Location-based Services and Data. In: ACM SIGKDD Explorations Newsletter 13 (2011), Nr. 1, [46] Mokbel, M. F., Chow, C.-Y., and Aref, W. G. Casper: Query processing for location services without compromising privacy. In Proceeding of the 32nd International Conference on Very Large Databases (VLDB 06) [47] Mokbel, M. F., Chow, C.-Y., and Aref, W. G. The new casper: A privacy-aware location-based database server. In Proc. of the 23rd IEEE International Conference on Data Engineering (ICDE 07). [48] [49] Nanni, M., and Pedreschi, D. Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems 27, 3 (2006), [50] Nergiz, M.E., Atzori, M., Saygin, Y., GÂĺucÂÿ, B.: Towards trajectory anonymization: A generalization-based approach. Transactions on Data Privacy 2(1), (2009) [51] Terrovitis, M., Mamoulis, N.: Privacy preservation in the publication of trajectories. In: Proceedings of the International Conference on Mobile Data Management (2008) [52] Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp (1996) [53] _distribution 18

Publishing CitiSense Data: Privacy Concerns and Remedies

Publishing CitiSense Data: Privacy Concerns and Remedies Publishing CitiSense Data: Privacy Concerns and Remedies Kapil Gupta Advisor : Prof. Bill Griswold 1 Location Based Services Great utility of location based services data traffic control, mobility management,

More information

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf Where Next? Data Mining Techniques and Challenges for Trajectory Prediction Slides credit: Layla Pournajaf o Navigational services. o Traffic management. o Location-based advertising. Source: A. Monreale,

More information

A Framework for Trajectory Data Preprocessing for Data Mining

A Framework for Trajectory Data Preprocessing for Data Mining A Framework for Trajectory Data Preprocessing for Data Mining Luis Otavio Alvares, Gabriel Oliveira, Vania Bogorny Instituto de Informatica Universidade Federal do Rio Grande do Sul Porto Alegre Brazil

More information

Mobility Data Management and Exploration: Theory and Practice

Mobility Data Management and Exploration: Theory and Practice Mobility Data Management and Exploration: Theory and Practice Chapter 4 -Mobility data management at the physical level Nikos Pelekis & Yannis Theodoridis InfoLab, University of Piraeus, Greece infolab.cs.unipi.gr

More information

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in

More information

Mobility Data Mining. Mobility data Analysis Foundations

Mobility Data Mining. Mobility data Analysis Foundations Mobility Data Mining Mobility data Analysis Foundations MDA, 2015 Trajectory Clustering T-clustering Trajectories are grouped based on similarity Several possible notions of similarity Start/End points

More information

Trajectory Compression under Network constraints

Trajectory Compression under Network constraints Trajectory Compression under Network constraints Georgios Kellaris University of Piraeus, Greece Phone: (+30) 6942659820 user83@tellas.gr 1. Introduction The trajectory of a moving object can be described

More information

arxiv: v1 [cs.db] 9 Mar 2018

arxiv: v1 [cs.db] 9 Mar 2018 TRAJEDI: Trajectory Dissimilarity Pedram Gharani 1, Kenrick Fernande 2, Vineet Raghu 2, arxiv:1803.03716v1 [cs.db] 9 Mar 2018 Abstract The vast increase in our ability to obtain and store trajectory data

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

Detect tracking behavior among trajectory data

Detect tracking behavior among trajectory data Detect tracking behavior among trajectory data Jianqiu Xu, Jiangang Zhou Nanjing University of Aeronautics and Astronautics, China, jianqiu@nuaa.edu.cn, jiangangzhou@nuaa.edu.cn Abstract. Due to the continuing

More information

OSM-SVG Converting for Open Road Simulator

OSM-SVG Converting for Open Road Simulator OSM-SVG Converting for Open Road Simulator Rajashree S. Sokasane, Kyungbaek Kim Department of Electronics and Computer Engineering Chonnam National University Gwangju, Republic of Korea sokasaners@gmail.com,

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

A NEW METHOD FOR FINDING SIMILAR PATTERNS IN MOVING BODIES

A NEW METHOD FOR FINDING SIMILAR PATTERNS IN MOVING BODIES A NEW METHOD FOR FINDING SIMILAR PATTERNS IN MOVING BODIES Prateek Kulkarni Goa College of Engineering, India kvprateek@gmail.com Abstract: An important consideration in similarity-based retrieval of moving

More information

A System for Discovering Regions of Interest from Trajectory Data

A System for Discovering Regions of Interest from Trajectory Data A System for Discovering Regions of Interest from Trajectory Data Muhammad Reaz Uddin, Chinya Ravishankar, and Vassilis J. Tsotras University of California, Riverside, CA, USA {uddinm,ravi,tsotras}@cs.ucr.edu

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

Mobility Data Management & Exploration

Mobility Data Management & Exploration Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter

More information

Similarity-based Analysis for Trajectory Data

Similarity-based Analysis for Trajectory Data Similarity-based Analysis for Trajectory Data Kevin Zheng 25/04/2014 DASFAA 2014 Tutorial 1 Outline Background What is trajectory Where do they come from Why are they useful Characteristics Trajectory

More information

Trajectory Compression under Network Constraints

Trajectory Compression under Network Constraints Trajectory Compression under Network Constraints Georgios Kellaris, Nikos Pelekis, and Yannis Theodoridis Department of Informatics, University of Piraeus, Greece {gkellar,npelekis,ytheod}@unipi.gr http://infolab.cs.unipi.gr

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

CrowdPath: A Framework for Next Generation Routing Services using Volunteered Geographic Information

CrowdPath: A Framework for Next Generation Routing Services using Volunteered Geographic Information CrowdPath: A Framework for Next Generation Routing Services using Volunteered Geographic Information Abdeltawab M. Hendawi, Eugene Sturm, Dev Oliver, Shashi Shekhar hendawi@cs.umn.edu, sturm049@umn.edu,

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Clustering to Reduce Spatial Data Set Size

Clustering to Reduce Spatial Data Set Size Clustering to Reduce Spatial Data Set Size Geoff Boeing arxiv:1803.08101v1 [cs.lg] 21 Mar 2018 1 Introduction Department of City and Regional Planning University of California, Berkeley March 2018 Traditionally

More information

ScienceDirect. A privacy preserving technique to prevent sensitive behavior exposure in semantic location-based service

ScienceDirect. A privacy preserving technique to prevent sensitive behavior exposure in semantic location-based service Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 318 327 18 th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

More information

Mining Frequent Trajectory Using FP-tree in GPS Data

Mining Frequent Trajectory Using FP-tree in GPS Data Journal of Computational Information Systems 9: 16 (2013) 6555 6562 Available at http://www.jofcis.com Mining Frequent Trajectory Using FP-tree in GPS Data Junhuai LI 1,, Jinqin WANG 1, Hailing LIU 2,

More information

Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks

Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks Journal of Computational Information Systems 11: 10 (2015) 3459 3467 Available at http://www.jofcis.com Voronoi-based Trajectory Search Algorithm for Multi-locations in Road Networks Yu CHEN, Jian XU,

More information

International Journal of Scientific Research and Modern Education (IJSRME) Impact Factor: 6.225, ISSN (Online): (

International Journal of Scientific Research and Modern Education (IJSRME) Impact Factor: 6.225, ISSN (Online): ( 333A NEW SIMILARITY MEASURE FOR TRAJECTORY DATA CLUSTERING D. Mabuni* & Dr. S. Aquter Babu** Assistant Professor, Department of Computer Science, Dravidian University, Kuppam, Chittoor District, Andhra

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Analyzing Dshield Logs Using Fully Automatic Cross-Associations Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust

Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust Accumulative Privacy Preserving Data Mining Using Gaussian Noise Data Perturbation at Multi Level Trust G.Mareeswari 1, V.Anusuya 2 ME, Department of CSE, PSR Engineering College, Sivakasi, Tamilnadu,

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Jain, 2(1): Jan., 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Trajectory Anonymity for Privacy in Location Based Services Neha Jain *1, Klesh Lakhwani

More information

CONTENT ADAPTIVE SCREEN IMAGE SCALING

CONTENT ADAPTIVE SCREEN IMAGE SCALING CONTENT ADAPTIVE SCREEN IMAGE SCALING Yao Zhai (*), Qifei Wang, Yan Lu, Shipeng Li University of Science and Technology of China, Hefei, Anhui, 37, China Microsoft Research, Beijing, 8, China ABSTRACT

More information

Privacy-preserving Publication of Trajectories Using Microaggregation

Privacy-preserving Publication of Trajectories Using Microaggregation Privacy-preserving Publication of Trajectories Using Microaggregation Josep Domingo-Ferrer, Michal Sramka, and Rolando Trujillo-Rasúa Universitat Rovira i Virgili UNESCO Chair in Data Privacy Department

More information

Data Stream Clustering Using Micro Clusters

Data Stream Clustering Using Micro Clusters Data Stream Clustering Using Micro Clusters Ms. Jyoti.S.Pawar 1, Prof. N. M.Shahane. 2 1 PG student, Department of Computer Engineering K. K. W. I. E. E. R., Nashik Maharashtra, India 2 Assistant Professor

More information

xiii Preface INTRODUCTION

xiii Preface INTRODUCTION xiii Preface INTRODUCTION With rapid progress of mobile device technology, a huge amount of moving objects data can be geathed easily. This data can be collected from cell phones, GPS embedded in cars

More information

Location Traceability of Users in Location-based Services

Location Traceability of Users in Location-based Services Location Traceability of Users in Location-based Services Yutaka Yanagisawa Hidetoshi Kido Tetsuji Satoh, NTT Communication Science Laboratories, NTT Corporation Graduate School of Information Science

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data Wei Yang 1, Tinghua Ai 1, Wei Lu 1, Tong Zhang 2 1 School of Resource and Environment Sciences,

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Clustering Spatio-Temporal Patterns using Levelwise Search

Clustering Spatio-Temporal Patterns using Levelwise Search Clustering Spatio-Temporal Patterns using Levelwise Search Abhishek Sharma, Raj Bhatnagar University of Cincinnati Cincinnati, OH, 45221 sharmaak,rbhatnag@ececs.uc.edu Figure 1: Spatial Grids at Successive

More information

Trajectory Data Mining: An Overview

Trajectory Data Mining: An Overview Trajectory Data Mining: An Overview 1 YU ZHENG Microsoft Research The advances in location-acquisition and mobile computing techniques have generated massive spatial trajectory data, which represent the

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

A Novel Method to Estimate the Route and Travel Time with the Help of Location Based Services

A Novel Method to Estimate the Route and Travel Time with the Help of Location Based Services A Novel Method to Estimate the Route and Travel Time with the Help of Location Based Services M.Uday Kumar Associate Professor K.Pradeep Reddy Associate Professor S Navaneetha M.Tech Student Abstract Location-based

More information

Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore

Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore By: Shan Jiang, Joseph Ferreira, Jr., and Marta C. Gonzalez Published in: 2017 Presented by: Masijia Qiu

More information

Density Based Clustering using Modified PSO based Neighbor Selection

Density Based Clustering using Modified PSO based Neighbor Selection Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com

More information

Fosca Giannotti et al,.

Fosca Giannotti et al,. Trajectory Pattern Mining Fosca Giannotti et al,. - Presented by Shuo Miao Conference on Knowledge discovery and data mining, 2007 OUTLINE 1. Motivation 2. T-Patterns: definition 3. T-Patterns: the approach(es)

More information

Large-Scale Flight Phase identification from ADS-B Data Using Machine Learning Methods

Large-Scale Flight Phase identification from ADS-B Data Using Machine Learning Methods Large-Scale Flight Phase identification from ADS-B Data Using Methods Junzi Sun 06.2016 PhD student, ATM Control and Simulation, Aerospace Engineering Large-Scale Flight Phase identification from ADS-B

More information

Chapter 8: GPS Clustering and Analytics

Chapter 8: GPS Clustering and Analytics Chapter 8: GPS Clustering and Analytics Location information is crucial for analyzing sensor data and health inferences from mobile and wearable devices. For example, let us say you monitored your stress

More information

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING Neha V. Sonparote, Professor Vijay B. More. Neha V. Sonparote, Dept. of computer Engineering, MET s Institute of Engineering Nashik, Maharashtra,

More information

Approximate Evaluation of Range Nearest Neighbor Queries with Quality Guarantee

Approximate Evaluation of Range Nearest Neighbor Queries with Quality Guarantee Approximate Evaluation of Range Nearest Neighbor Queries with Quality Guarantee Chi-Yin Chow 1, Mohamed F. Mokbel 1, Joe Naps 1, and Suman Nath 2 1 Department of Computer Science and Engineering, University

More information

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

AN IMPROVED TAIPEI BUS ESTIMATION-TIME-OF-ARRIVAL (ETA) MODEL BASED ON INTEGRATED ANALYSIS ON HISTORICAL AND REAL-TIME BUS POSITION

AN IMPROVED TAIPEI BUS ESTIMATION-TIME-OF-ARRIVAL (ETA) MODEL BASED ON INTEGRATED ANALYSIS ON HISTORICAL AND REAL-TIME BUS POSITION AN IMPROVED TAIPEI BUS ESTIMATION-TIME-OF-ARRIVAL (ETA) MODEL BASED ON INTEGRATED ANALYSIS ON HISTORICAL AND REAL-TIME BUS POSITION Xue-Min Lu 1,3, Sendo Wang 2 1 Master Student, 2 Associate Professor

More information

Solutions. Location-Based Services (LBS) Problem Statement. PIR Overview. Spatial K-Anonymity

Solutions. Location-Based Services (LBS) Problem Statement. PIR Overview. Spatial K-Anonymity 2 Location-Based Services (LBS) Private Queries in Location-Based Services: Anonymizers are Not Necessary Gabriel Ghinita Panos Kalnis Ali Khoshgozaran 2 Cyrus Shahabi 2 Kian Lee Tan LBS users Mobile devices

More information

CACHING IN WIRELESS SENSOR NETWORKS BASED ON GRIDS

CACHING IN WIRELESS SENSOR NETWORKS BASED ON GRIDS International Journal of Wireless Communications and Networking 3(1), 2011, pp. 7-13 CACHING IN WIRELESS SENSOR NETWORKS BASED ON GRIDS Sudhanshu Pant 1, Naveen Chauhan 2 and Brij Bihari Dubey 3 Department

More information

Detecting Anomalous Trajectories and Traffic Services

Detecting Anomalous Trajectories and Traffic Services Detecting Anomalous Trajectories and Traffic Services Mazen Ismael Faculty of Information Technology, BUT Božetěchova 1/2, 66 Brno Mazen.ismael@vut.cz Abstract. Among the traffic studies; the importance

More information

M Thulasi 2 Student ( M. Tech-CSE), S V Engineering College for Women, (Affiliated to JNTU Anantapur) Tirupati, A.P, India

M Thulasi 2 Student ( M. Tech-CSE), S V Engineering College for Women, (Affiliated to JNTU Anantapur) Tirupati, A.P, India Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Enhanced Driving

More information

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS Yun-Ting Su James Bethel Geomatics Engineering School of Civil Engineering Purdue University 550 Stadium Mall Drive, West Lafayette,

More information

Predicting Bus Arrivals Using One Bus Away Real-Time Data

Predicting Bus Arrivals Using One Bus Away Real-Time Data Predicting Bus Arrivals Using One Bus Away Real-Time Data 1 2 3 4 5 Catherine M. Baker Alexander C. Nied Department of Computer Science Department of Computer Science University of Washington University

More information

Efficient distributed computation of human mobility aggregates through User Mobility Profiles

Efficient distributed computation of human mobility aggregates through User Mobility Profiles Efficient distributed computation of human mobility aggregates through User Mobility Profiles Mirco Nanni, Roberto Trasarti, Giulio Rossetti, Dino Pedreschi KDD Lab - ISTI CNR Pisa, Italy name.surname@isti.cnr.it

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Location Privacy Protection in Contention Based Forwarding for VANETs

Location Privacy Protection in Contention Based Forwarding for VANETs Location Privacy Protection in Contention Based Forwarding for VANETs Qing Yang Alvin Lim Xiaojun Ruan and Xiao Qin Computer Science and Software Engineering Auburn University, Auburn, AL, USA 36849 Email:

More information

L2P2: Location-aware Location Privacy Protection for Location-based Services

L2P2: Location-aware Location Privacy Protection for Location-based Services L2P2: Location-aware Location Privacy Protection for Location-based Services Yu Wang Dingbang Xu Xiao He Chao Zhang Fan Li Bin Xu Department of Computer Science, University of North Carolina at Charlotte,

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf

More information

Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases

Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases Costas Panagiotakis 1, Nikos Pelekis 2, and Ioannis Kopanakis 3 1 Dept. of Computer Science, University

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing A. Rao +, A.P. Jayasumana * and Y.K. Malaiya* *Colorado State University, Fort Collins, CO 8523 + PalmChip Corporation,

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Defining a Better Vehicle Trajectory With GMM

Defining a Better Vehicle Trajectory With GMM Santa Clara University Department of Computer Engineering COEN 281 Data Mining Professor Ming- Hwa Wang, Ph.D Winter 2016 Defining a Better Vehicle Trajectory With GMM Christiane Gregory Abe Millan Contents

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Jong Taek Lee, M. S. Ryoo, Matthew Riley, and J. K. Aggarwal Computer & Vision Research Center Dept. of Electrical & Computer Engineering,

More information

Constructing Popular Routes from Uncertain Trajectories

Constructing Popular Routes from Uncertain Trajectories Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei, Yu Zheng, Wen-Chih Peng presented by Slawek Goryczka Scenarios A trajectory is a sequence of data points recording location information

More information

Ad-hoc Trusted Information Exchange Scheme for Location Privacy in VANET

Ad-hoc Trusted Information Exchange Scheme for Location Privacy in VANET Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 1, January 2015,

More information

Semantic Representation of Moving Entities for Enhancing Geographical Information Systems

Semantic Representation of Moving Entities for Enhancing Geographical Information Systems International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 4 No. 1 Sep. 2013, pp. 83-87 2013 Innovative Space of Scientific Research Journals http://www.issr-journals.org/ijias/ Semantic

More information

Development of an application using a clustering algorithm for definition of collective transportation routes and times

Development of an application using a clustering algorithm for definition of collective transportation routes and times Development of an application using a clustering algorithm for definition of collective transportation routes and times Thiago C. Andrade 1, Marconi de A. Pereira 2, Elizabeth F. Wanner 1 1 DECOM - Centro

More information

Offline Approaches for Preserving Privacy of Trajectories on the Road Networks

Offline Approaches for Preserving Privacy of Trajectories on the Road Networks Offline Approaches for Preserving Privacy of Trajectories on the Road Networks Rubina Shahin Zuberi Department of Electronics and Communications, Jamia Millia Islamia, New Delhi E-mail : rshahinz@gmail.com

More information

Route Pattern Mining From Personal Trajectory Data *

Route Pattern Mining From Personal Trajectory Data * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 31, 147-164 (2015) Route Pattern Mining From Personal Trajectory Data * MINGQI LV 1, YINGLONG LI 1, ZHENMING YUAN 2 AND QIHUI WANG 2 1 College of Computer

More information

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud R. H. Jadhav 1 P.E.S college of Engineering, Aurangabad, Maharashtra, India 1 rjadhav377@gmail.com ABSTRACT: Many

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Automated semantic trajectory annotation with indoor point-of-interest visits in urban areas

Automated semantic trajectory annotation with indoor point-of-interest visits in urban areas Automated semantic trajectory annotation with indoor point-of-interest visits in urban areas Victor de Graaff Dept. of Computer Science University of Twente Enschede, The Netherlands v.degraaff@utwente.nl

More information

Hotspot District Trajectory Prediction *

Hotspot District Trajectory Prediction * Hotspot District Trajectory Prediction * Hongjun Li 1,2, Changjie Tang 1, Shaojie Qiao 3, Yue Wang 1, Ning Yang 1, and Chuan Li 1 1 Institute of Database and Knowledge Engineering, School of Computer Science,

More information

Mobility Models. Larissa Marinho Eglem de Oliveira. May 26th CMPE 257 Wireless Networks. (UCSC) May / 50

Mobility Models. Larissa Marinho Eglem de Oliveira. May 26th CMPE 257 Wireless Networks. (UCSC) May / 50 Mobility Models Larissa Marinho Eglem de Oliveira CMPE 257 Wireless Networks May 26th 2015 (UCSC) May 2015 1 / 50 1 Motivation 2 Mobility Models 3 Extracting a Mobility Model from Real User Traces 4 Self-similar

More information

TRAJECTORY PATTERN MINING

TRAJECTORY PATTERN MINING TRAJECTORY PATTERN MINING Fosca Giannotti, Micro Nanni, Dino Pedreschi, Martha Axiak Marco Muscat Introduction 2 Nowadays data on the spatial and temporal location is objects is available. Gps, GSM towers,

More information

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

More information

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures An Enhanced Density Clustering Algorithm for Datasets with Complex Structures Jieming Yang, Qilong Wu, Zhaoyang Qu, and Zhiying Liu Abstract There are several limitations of DBSCAN: 1) parameters have

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

Density-based clustering algorithms DBSCAN and SNN

Density-based clustering algorithms DBSCAN and SNN Density-based clustering algorithms DBSCAN and SNN Version 1.0, 25.07.2005 Adriano Moreira, Maribel Y. Santos and Sofia Carneiro {adriano, maribel, sofia}@dsi.uminho.pt University of Minho - Portugal 1.

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Final Report Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This report describes a method to align two videos.

More information

Trajectory analysis. Ivan Kukanov

Trajectory analysis. Ivan Kukanov Trajectory analysis Ivan Kukanov Joensuu, 2014 Semantic Trajectory Mining for Location Prediction Josh Jia-Ching Ying Tz-Chiao Weng Vincent S. Tseng Taiwan Wang-Chien Lee Wang-Chien Lee USA Copyright 2011

More information

Survey of Anonymity Techniques for Privacy Preserving

Survey of Anonymity Techniques for Privacy Preserving 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Survey of Anonymity Techniques for Privacy Preserving Luo Yongcheng

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information