Time series representations: a state-of-the-art

Similar documents
Time Series Analysis DM 2 / A.A

Hierarchical Piecewise Linear Approximation

The Curse of Dimensionality. Panagiotis Parchas Advanced Data Management Spring 2012 CSE HKUST

Evaluating String Comparator Performance for Record Linkage William E. Yancey Statistical Research Division U.S. Census Bureau

Machine learning in fmri

Satellite Images Analysis with Symbolic Time Series: A Case Study of the Algerian Zone

A Supervised Time Series Feature Extraction Technique using DCT and DWT

Synopsis and Discussion of "Derivation of Lineof-Sight Stabilization Equations for Gimbaled-Mirror Optical Systems

In Proc. 4th Asia Pacific Bioinformatics Conference (APBC'06), Taipei, Taiwan, 2006.

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

Approximate Clustering of Time Series Using Compact Model-based Descriptions

Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Package TSrepr. January 26, Type Package Title Time Series Representations Version Date

Efficient Time Series Data Classification and Compression in Distributed Monitoring

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

High Dimensional Data Mining in Time Series by Reducing Dimensionality and Numerosity

Introduction to Wavelets

DYADIC WAVELETS AND DCT BASED BLIND COPY-MOVE IMAGE FORGERY DETECTION

MR Image Reconstruction from Pseudo-Hex Lattice Sampling Patterns Using Separable FFT

Increasing Efficiency of Time Series Clustering by Dimension Reduction Techniques

A Study of knn using ICU Multivariate Time Series Data

TREND DETECTION AND PATTERN RECOGNITION IN FINANCIAL TIME SERIES

Research Article Piecewise Trend Approximation: A Ratio-Based Time Series Representation

Symbolic Representation and Clustering of Bio-Medical Time-Series Data Using Non-Parametric Segmentation and Cluster Ensemble

Comparison of Di erent Strategies of Utilizing Fuzzy Clustering in Structure Identi cation

Aggregation and maintenance for database mining

Shortest-route formulation of mixed-model assembly line balancing problem

How to Solve Dynamic Stochastic Models Computing Expectations Just Once

Genetic algorithm with deterministic crossover for vector quantization

Finding Similarity in Time Series Data by Method of Time Weighted Moments

Comparison of Digital Image Watermarking Algorithms. Xu Zhou Colorado School of Mines December 1, 2014

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

The L-index: An Indexing Structure for Efficient Subsequence Matching in Time Sequence Databases

Overview of Clustering

Accelerometer Gesture Recognition

LOWER-DIMENSIONAL TRANSFORMATIONS FOR HIGH-DIMENSIONAL MINIMUM BOUNDING RECTANGLES. Received January 2012; revised May 2012

III Data Structures. Dynamic sets

Atime series data is a (potentially long) sequence of real

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

Similarity Search in Time Series Databases

Robust Image Watermarking based on Discrete Wavelet Transform, Discrete Cosine Transform & Singular Value Decomposition

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

CERIAS Tech Report SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks by Christopher

Visualization of change in the Interactive Multimedia Atlas of Switzerland

Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.

Multiresolution Inverse Wavelet Reconstruction from a Fourier Partial Sum

Similarity Search on Time Series based on Threshold Queries

Adaptive Quantization for Video Compression in Frequency Domain

Replacement of Missing Data and Outliers Using Wavelet Transform Methods

Chapter 10. Conclusion Discussion

Data Preprocessing. Data Mining 1

Time Series Analysis with R 1

Classification. 1 o Semestre 2007/2008

isax: disk-aware mining and indexing of massive time series datasets

Fractal Based Approach for Indexing and Querying Heterogeneous Data Streams

Effective Pattern Similarity Match for Multidimensional Sequence Data Sets

Statistics for a Random Network Design Problem

Literature Review on Time Series Indexing

FPGA Implementation of 4-D DWT and BPS based Digital Image Watermarking

DIGITAL IMAGE HIDING ALGORITHM FOR SECRET COMMUNICATION

A Sliding Window Mann-Kendall Method for Piecewise Linear Representation

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search

Spatially adaptive ltering as regularization in inverse imaging: compressive sensing, super-resolution and upsampling

real stock data stock closing price in UD$ time in days

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

6 Discrete Wavelet Transform-Based Time Series Analysis and Mining

Speech Modulation for Image Watermarking

Mining for Patterns and Anomalies in Data Streams. Sampath Kannan University of Pennsylvania

Relationship between Fourier Space and Image Space. Academic Resource Center

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Similarity Search in Multimedia Time Series Data using Amplitude-Level Features

UNIT 2 Data Preprocessing

Image matching using run-length feature 1

Fast Algorithms for Coevolving Time Series Mining

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Comparison of Wavelet Based Watermarking Techniques for Various Attacks

Data preprocessing Functional Programming and Intelligent Algorithms

Searching time series with Hadoop in an electric power company

Modeling the Spectral Envelope of Musical Instruments

Redundant Data Elimination for Image Compression and Internet Transmission using MATLAB

Query Workloads for Data Series Indexes

A Comparison of Allocation Policies in Wavelength Routing Networks*

Data Mining. Jeff M. Phillips. January 9, 2013

3.5 Filtering with the 2D Fourier Transform Basic Low Pass and High Pass Filtering using 2D DFT Other Low Pass Filters

IMAGE COMPRESSION. Chapter - 5 : (Basic)

Interval-focused Similarity Search in Time Series Databases

Volume 2, Issue 9, September 2014 ISSN

Eciently Supporting Ad Hoc Queries in Large. H. V. Jagadish. Abstract

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Department of Applied Mathematics. University of Twente. Faculty of EEMCS. Memorandum No. 1767

COMPARATIVE STUDY OF IMAGE FUSION TECHNIQUES IN SPATIAL AND TRANSFORM DOMAIN

Loughborough University Institutional Repository

Dictionary selection using partial matching

Determination of the Starting Point in Time Series for Trend Detection Based on Overlapping Trend

Transcription:

Laboratoire LIAS cyrille.ponchateau@ensma.fr www.lias-lab.fr ISAE - ENSMA 11 Juillet 2016 1 / 33

Content 1 2 3 4 2 / 33

Content 1 2 3 4 3 / 33

What is a time series? Time Series Time series are a temporal data class consisting in a list of values ordered chronologically (Fu 2011). 4 / 33

What is a time series? Time Series Time series are a temporal data class consisting in a list of values ordered chronologically (Fu 2011). Figure : Drawn time series example 4 / 33

What is a time series? Time Series Time series are a temporal data class consisting in a list of values ordered chronologically (Fu 2011). Figure : Drawn time series example Timestamp Valeurs 0.0000000e+00 0.0000000e+00 1.0000000e+00 9.4105346e-02 2.0000000e+00 1.8452077e-01 3.0000000e+00 2.7139095e-01 4.0000000e+00 3.5485491e-01 5.0000000e+00 4.3504619e-01 6.0000000e+00 5.1209313e-01 7.0000000e+00 5.8611902e-01...... Table : Example of time series 4 / 33

The time series data mining: a challenge Used in many scienti c elds. Needs e cient analysis method, but... Real values => no exact matching. Time series quality can be degraded by noise, outliers. The analysis must compare time series as a whole. 5 / 33

The time series data mining: a challenge Used in many scienti c elds. Needs e cient analysis method, but... Real values => no exact matching. Time series quality can be degraded by noise, outliers. The analysis must compare time series as a whole. 5 / 33

The time series data mining: a challenge Dimensionality curse In addition to the previously mentioned di culties, time series su er from the so called "dimensionality curse" problem: High-dimensional (one billion items) => Raw data processing is costly ; => "dimensionality reduction". 6 / 33

Content 1 2 3 4 7 / 33

: general idea TS Analysis program Result Figure : Naive processing chain 8 / 33

: general idea Analysis program TS Result Figure : Naive processing chain TS Reduction process reduced TS Analysis program Result Figure : processing chain 8 / 33

Temporal representations Time series representations categories The di erent representations of time series are classi ed into three categories (Fu 2011): temporal representations : keep a temporal point of view on the data ; spectral representations : project it into the frequency domain ; other representations : apply some other transformation that do not t into the two rst categories. 9 / 33

Temporal representations Simple sampling The most simple method, consist in sampling the series at xed length (Åström 1969). Figure : Sampling example 10 / 33

Temporal representations Extrema extraction The method consist in extracting all the local extrema of the series. Then the extrema are sorted in terms of importance, in order to eliminate some more points using an importance threshold value (Fink and Gandhi 2011). Figure : All extrema extraction Figure : Extrema sorting 11 / 33

Temporal representations Bit level representation Points are represented with only one bit equal to one if the value of the point is greater than the mean value of the series, zero otherwise (Bagnall et al. 2006; Ratanamahatana et al. 2005). Figure : Bit level representation 12 / 33

Temporal representations Summary of the method seen so far and some more: Sampling (Åström 1969) ; Extrema extraction (Fink and Gandhi 2011) ; Control points identi cation (Man and Wong 2001) ; Landmark model (Perng et al. 2000) ; Salient points identi cation (Perceptually Important Points) (Chung et al. 2001) ; Bit level representation (Bagnall et al. 2006; Ratanamahatana et al. 2005). 13 / 33

Spectral representations Discrete Fourier Transform Any continuous periodic function can be transformed into a combination of sines and cosines. The Discrete Fourier Transform maps a discrete periodic sequence f to a discrete sequence of coe cients F, representing the Fourier transform of the sequence (Shatkay 1995; Agrawal, Faloutsos, and Swani 1993). With N equal to the size of f : f [t ] = 1 1 NX N j =0 F [j ] e 2πitj N (1) 14 / 33

Spectral representations Discrete Fourier Transform Any continuous periodic function can be transformed into a combination of sines and cosines. The Discrete Fourier Transform maps a discrete periodic sequence f to a discrete sequence of coe cients F, representing the Fourier transform of the sequence (Shatkay 1995; Agrawal, Faloutsos, and Swani 1993). With N equal to the size of f : f [t ] = 10 8 6 4 2 00 2 4 1 1 NX N j =0 6 F [j ] e 8 2πitj (1) N 10 12 14 14 / 33

Spectral representations Discrete Wavelet Transform Any continuous function can be decomposed on a basis of wavelets functions. Wavelets are local functions, allowing more accuracy on local details and multiple resolutions (Sidney, Gopinath, and Guo 1997). Figure : Wavelet example 15 / 33

Other representations Hidden Markov Model (HMM) (Azzouzi and Nabney 1998) ; Reference time series decomposition (Kriegel et al. 2008) ; AutoRegressive Moving Average (ARMA) (Shumway and Sto er 2015) ; Singular Value Decomposition (SVD) (Korn, Jagadish, and Faloutsos 1997) ; SVD with Deltas (SVDD) (Korn, Jagadish, and Faloutsos 1997) ; Dictionary Compression Score (DCS) (Lang, Morse, and Patel 2010). 16 / 33

Content 1 2 3 4 17 / 33

Segment Segment A segment is a subsequence of a time series, also called time window. Figure : Segment example 18 / 33

process A segmentation process consist in dividing a time series into a set of non-overlapping segments. Inside a segment, the data can be represented by one speci c value or a continuous function (generally straight lines) (Lovri, Milanovi, and Stamenkovi 2014). Figure : example 19 / 33

process Segment data representation There is di erent ways to represent the data within a segment: Piecewise Aggregate Approximation (PAA) (Keogh et al. 2000) ; Adaptive Piecewise Constant Approximation (APCA) (Chakrabarti et al. 2002) ; Piecewise Linear, with two ways to de ne the lines (Keogh et al. 2004): Linear interpolation ; Linear regression. It is also possible to use polynomials ; Or any continuous and di erentiable function (Shatkay and Zdonik 1996). 20 / 33

process Segmented Sum of Variation The Segmented Sum of Variation consist in computing the di erence between each consecutive value and sum them (Lee, Kwon, and Lee 2003). Figure : SSV example 21 / 33

process Segment data representation There is di erent ways to represent the data within a segment: Piecewise Aggregate Approximation (PAA) (Keogh et al. 2000) ; Adaptive Piecewise Constant Approximation (APCA) (Chakrabarti et al. 2002) ; Piecewise Linear, with two ways to de ne the lines (Keogh et al. 2004): Linear interpolation ; Linear regression. It is also possible to use polynomials ; Or any continuous and di erentiable function (Shatkay and Zdonik 1996). 22 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Top-Down Figure : Top-Down approach example 23 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Top-Down Figure : Top-Down approach example 24 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Bottom-Up Figure : Bottom-Up approach example 25 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Sliding-Window Figure : Sliding-Window approach example 26 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Sliding-Window Figure : Sliding-Window approach example 27 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Sliding-Window Figure : Sliding-Window approach example 28 / 33

Di erent segmentation algorithms approaches algorithms can be classi ed into three classes: Top-Down, Bottom-Up and Sliding-Window. Sliding-Window Figure : Sliding-Window approach example 29 / 33

Performances of the segmentation approaches In most cases, sliding-window gives the worst performances ; In some cases, top-down gives slightly better results than bottom-up ; In general, bottom-up performances are signi cantly better than the other two. Another remark: Top-Down and Bottom-Up are o ine algorithms ; Sliding-Window are online algorithms ; SWAB (Sliding-Window And Bottom-Up) online, with performances competing with Bottom-Up (Keogh et al. 2004). 30 / 33

Content 1 2 3 4 31 / 33

Lots of di erent methods, for numerous algorithms and purposes. Lots of confusion on the superiority of a solution compared to others. DFT >< DWT ; PAA >< APCA... What is the best methods? What is the objective? Is their any assumptions on the time series? 32 / 33

Thank you. Do you have questions? 33 / 33