Efficient Processing of Multiple DTW Queries in Time Series Databases

Size: px

Start display at page:

Download "Efficient Processing of Multiple DTW Queries in Time Series Databases"

Claude Norris
6 years ago
Views:

1 Efficient Processing of Multiple DTW Queries in Time Series Databases Hardy Kremer 1 Stephan Günnemann 1 Anca-Maria Ivanescu 1 Ira Assent 2 Thomas Seidl 1 1 RWTH Aachen University, Germany 2 Aarhus University, Denmark SSDBM 2011, Portland, Oregon, USA July 20-22, 2011

2 Time Series Similarity Search Time series Sequence of time related values Stock data, sensor data, EEG measurements, climate data,... Similarity search Find time series with similar patterns over time value (t i) [mv] time series from EEG data set point in time (i) with e.g., Euclidean Distance only corresponding Time Points are compared But: In many applications, time series are out-of-sync. Efficient Processing of Multiple DTW Queries 1 / 22

3 Dynamic Time Warping Dynamic Time Warping DTW with k-band DTW: Distance between time series is based on new alignment. Matching by stretching & scaling along time axis. k-band constraint avoids degenerate warpings. Efficient processing Quadratic complexity in length of time series There are many efficient query processing algorithms good for single, ad-hoc queries Efficient Processing of Multiple DTW Queries 2 / 22

4 Multiple Query Processing for DTW Today s applications massive amounts of queries need to be processed in limited time: Sensor networks timely reaction to events Data Mining applications Scalability Interactive Visualization Interactiveness Often, queries may be similar or even share subsequences. E.g., determine the transitive closure in density-based clustering. Definition: Multiple DTW Range Query Input: set of time series queries Q = { q 1,..., q c}, database DB, range ɛ Output: multiple result sets Res i (DB) = {t DB DTW (q, t) ɛ} with i = 1,..., c. Later on: k NN-queries Efficient Processing of Multiple DTW Queries 3 / 22

5 Multiple Query Processing for DTW (2) Definition: Multiple DTW Range Query Input: Output: set of time series queries Q = { q 1,..., q c}, database DB, range ɛ multiple result sets Res i (DB) = {t DB DTW (q, t) ɛ} with i = 1,..., c. Our novel approach... exploits the similarity among queries, for combined pruning. uses a nested hierarchy of query subgroups, for pruning of smaller and more similar query subsets. is combinable with existing single DTW query approaches. guarantees exact results. Efficient Processing of Multiple DTW Queries 4 / 22

6 Overview 1 Preliminaries 2 Processing of Multiple DTW Queries 3 Experiments 4 Conclusion Efficient Processing of Multiple DTW Queries 5 / 22

7 DTW Definition k-band DTW DTW ([s 1,..., s n ], [t 1,..., t m ]) = DTW ([s 1,..., s n 1 ], [t 1,..., t m 1 ]) dist band (s n, t m ) + min DTW ([s 1,..., s n ], [t 1,..., t m 1 ]) DTW ([s 1,..., s n 1 ], [t 1,..., t m ]) with { dist(si, t dist band (s i, t j ) = j ) i j k else DTW (, ) = 0, DTW (x, ) =, DTW (, y) = Distance between points is measured by ground distance dist(s i, t j ) Efficient Processing of Multiple DTW Queries 6 / 22

8 Efficient, exact single query DTW processing DTW is computationally expensive Many approaches use multistep filter-and-refine architecture If filter lower bounds DTW lossless There are filters that achieve substantial speed-ups, e.g. LBKeogh query index (filter) candidates refinement database (exact) result Figure: Multistep filter-and-refine architecture Efficient Processing of Multiple DTW Queries 7 / 22

9 Baseline Solution for Multiple DTW Processing Independent processing of each query of query set Q I.e., each single query passes through filter cascade w.r.t. whole DB DB q 1 Res q1,d1 1 Res q1,d2 1 dist 1 q dist 2 DB q q dist 1 Res q2,d1 dist Res q2,d2 2 q q DTW DTW does not exploit knowledge about whole query set Q. In our approach, queries are processed simultaneously. Efficient Processing of Multiple DTW Queries 8 / 22

10 Overview 1 Preliminaries 2 Processing of Multiple DTW Queries 3 Experiments 4 Conclusion Efficient Processing of Multiple DTW Queries 9 / 22

11 Multiple Query Distance Function: Idea For simultaneous processing, we need a distance function between several query objects and a database object. One single calculation for several queries. DB Q multidtw Res Q... q1 DTW Res 1 q2 DTW Res 2 3 q DTW Res 3 No matching query in Q if multidtw (Q, t) > ɛ pruning. Shared Lower Bounding Property for preventing false dismissals: for all Q, t : q Q : multidtw (Q, t) DTW (q, t) Efficient Processing of Multiple DTW Queries 10 / 22

12 Multiple Query Distance Function: Solution We use a compact, single representation of a time series query set, called Multiple Query Bounding Box: multibox(q) = [(L 1, U 1 ),..., (L n, U n )] = [B 1,..., B n ] with L i = min q Q q i and U i = max q Q q i t L U Final multidtw function uses the minimal distance from time series t to a MultiBox B as DTW ground distance: t j U i if t j > U i dist(b i, t j ) = dist((l i, U i ), t j ) = t j L i if t j < L i 0 otherwise Efficient Processing of Multiple DTW Queries 11 / 22

13 Multiple Query Tree one single group = one single query to be processed BUT: we have a relatively large intermediate result set Res Q. smaller, more similar groups could reduce the intermediate result size. We introduce a hierarchy of query sets with: Q, Q Q, p : multidtw (Q, p) multidtw (Q, p) DB Q multidtw Res Q Q multidtw Q multidtw Q multidtw Res Q Res Q... q DTW 5 7 q DTW Res Q... Trade-off: smaller intermediate sets vs. higher computational costs Efficient Processing of Multiple DTW Queries 12 / 22

14 Filter-supported Hierarchical Multiple DTW Query Existing filter techniques are orthogonal to our approach. Flexibility w.r.t. pruning. In each node of the tree we can either go to new query granularity, i.e. by splitting the query group, or use a new lower bound filter.... DB Q multidist 2 Res Q,d2 Q Q Q multidist 2 multidist 2 multidist 2 Res Q,d2 Q multidist 1 Res Q,d1 Q multidist 1 Q multidist 1 Q multidist 1 Res Q,d1 Res Q,d1 Res Q,d1 Res Q,d2 Res Q,d2 q DTW q DTW Efficient Processing of Multiple DTW Queries 13 / 22

15 Multiple DTW knn Query (1) Definition: Multiple DTW k NN Query Input: set of time series queries Q = { q 1,..., q c}, database DB, k Output: multiple result sets Res i (DB) with Res i (DB) = k and t Res i (DB) s DB\Res i (DB) : DTW (q i, t) DTW (q i, s) for all i = 1,..., c. Efficient Processing of Multiple DTW Queries 14 / 22

16 Multiple DTW knn Query (2) k NN Processing For knn queries the ɛ range for pruning is not known a priori. We need one moving threshold per query subgroup for each query we have a current set of k result both are constantly updated... DB Q multidist 2 Res Q,d2 Q Q Q multidist 2 multidist 2 multidist 2 Res Q,d2 Q multidist 1 Res Q,d1 Q multidist 1 Q multidist 1 Q multidist 1 Res Q,d1 Res Q,d1 Res Q,d1 Res Q,d2 Res Q,d2 q DTW q DTW Efficient Processing of Multiple DTW Queries 15 / 22

17 Overview 1 Preliminaries 2 Processing of Multiple DTW Queries 3 Experiments 4 Conclusion Efficient Processing of Multiple DTW Queries 16 / 22

18 Experimental Evaluation: Setup Our approach: Linear scan to process database time series, but usage of an index & dimensionality reduction is also possible Measurements: Wall clock time averaged over 10 queries relative number of exact refinements Multiple Query generator: for each Multiple Query, we select S random seeds for each seed, we generate g queries deviating from the seed by a standard deviation of 10% As default, we use 8 seeds and 5 generated queries per seed, resulting in 40 queries per multiple query. Efficient Processing of Multiple DTW Queries 17 / 22

19 Comparison of the three variants of our method average query time [s] multidtw MQ Tree FSMQ Tree Range Query NN Query refinements [%] multidtw MQ Tree FSMQ Tree Range Query NN Query Random Walk data both MQ-tree and the FSMQ-tree dramatically reduce the number of exact DTW calculations. Filter support has significant influence on efficiency. Efficient Processing of Multiple DTW Queries 18 / 22

20 Number of individual queries per multiple query average query time [s] LB_Keogh FSMQ rel. improv number of queries per multiple query relative improvement [%] refinements [%] LB_Keogh FSMQ number of queries per multiple query EEG real world data, knn queries the runtime of the single query solution increases much faster. Efficient Processing of Multiple DTW Queries 19 / 22

21 Scalabilty: Time Series Length average query times [s] LB_Keogh FSMQ rel. improv ,600 time series length relative improvement [%] refinements [%] LB_Keogh FSMQ ,600 time series length EEG real world data, knn queries Filter LBKEogh has problems with the large scatter in EEG data. Grouping by similarity is the method of choice here. Efficient Processing of Multiple DTW Queries 20 / 22

22 Overview 1 Preliminaries 2 Processing of Multiple DTW Queries 3 Experiments 4 Conclusion Efficient Processing of Multiple DTW Queries 21 / 22

23 Conclusion In this talk, I discussed an approach that... processes multiple DTW queries... exploits the similarity among queries.... uses a nested hierarchy of query subgroups.... is orthogonal to existing single DTW query approaches.... guarantees exact results.... outperforms the baseline solution. Efficient Processing of Multiple DTW Queries 22 / 22

24 Conclusion In this talk, I discussed an approach that... processes multiple DTW queries... exploits the similarity among queries.... uses a nested hierarchy of query subgroups.... is orthogonal to existing single DTW query approaches.... guarantees exact results.... outperforms the baseline solution. Thank you for your attention. Questions? Efficient Processing of Multiple DTW Queries 22 / 22

Efficient Similarity Search in Scientific Databases with Feature Signatures

DATA MANAGEMENT AND DATA EXPLORATION GROUP Prof. Dr. rer. nat. Thomas Seidl DATA MANAGEMENT AND DATA EXPLORATION GROUP Prof. Dr. rer. nat. Thomas Seidl Efficient Similarity Search in Scientific Databases