A Sliding Window Mann-Kendall Method for Piecewise Linear Representation

Similar documents
Video Inter-frame Forgery Identification Based on Optical Flow Consistency

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation

A Polygon Rendering Method with Precomputed Information

A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Time Series Analysis DM 2 / A.A

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

K-coverage prediction optimization for non-uniform motion objects in wireless video sensor networks

Dynamics Response of Spatial Parallel Coordinate Measuring Machine with Clearances

Dynamic Obstacle Detection Based on Background Compensation in Robot s Movement Space

KBSVM: KMeans-based SVM for Business Intelligence

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

A Data Classification Algorithm of Internet of Things Based on Neural Network

Pattern recognition through perceptually important points in financial time series

Identification of Multisensor Conversion Characteristic Using Neural Networks

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

Hierarchical Piecewise Linear Approximation

Research on time optimal trajectory planning of 7-DOF manipulator based on genetic algorithm

Mining High Average-Utility Itemsets

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

A Kind of Wireless Sensor Network Coverage Optimization Algorithm Based on Genetic PSO

Trend analysis and change point techniques: a survey

Study on the Quantitative Vulnerability Model of Information System based on Mathematical Modeling Techniques. Yunzhi Li

A Study of knn using ICU Multivariate Time Series Data

Algorithm research of 3D point cloud registration based on iterative closest point 1

Benchmarking the UB-tree

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

Sequences Modeling and Analysis Based on Complex Network

A Fourier Extension Based Algorithm for Impulse Noise Removal

An Improved Algorithm for Image Fractal Dimension

Accuracy of Matching between Probe-Vehicle and GIS Map

Project Participants

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

Research Article A Two-Level Cache for Distributed Information Retrieval in Search Engines

The Analysis and Detection of Double JPEG2000 Compression Based on Statistical Characterization of DWT Coefficients

FSRM Feedback Algorithm based on Learning Theory

Stock time series pattern matching: Template-based vs. rule-based approaches

AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES

An Improved KNN Classification Algorithm based on Sampling

Binary Histogram in Image Classification for Retrieval Purposes

The Curse of Dimensionality. Panagiotis Parchas Advanced Data Management Spring 2012 CSE HKUST

Noval Stream Data Mining Framework under the Background of Big Data

A Kind of Fast Image Edge Detection Algorithm Based on Dynamic Threshold Value

A BTC-COMPRESSED DOMAIN INFORMATION HIDING METHOD BASED ON HISTOGRAM MODIFICATION AND VISUAL CRYPTOGRAPHY. Hang-Yu Fan and Zhe-Ming Lu

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Web page recommendation using a stochastic process model

Spatial Outlier Detection

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

Using Decision Boundary to Analyze Classifiers

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

Cost-based Pricing for Multicast Streaming Services

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks

Fingerprint Ridge Distance Estimation: Algorithms and the Performance*

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Fast or furious? - User analysis of SF Express Inc

Efficient Algorithm for Frequent Itemset Generation in Big Data

Visualization with Data Clustering DIVA Seminar Winter 2006 University of Fribourg

BP Neural Network Evaluation Model Combined with Grey-Fuzzy Based on Included Angle Cosine

Texture Image Segmentation using FCM

Object Tracking Algorithm based on Combination of Edge and Color Information

Compression of the Stream Array Data Structure

ZigBee Routing Algorithm Based on Energy Optimization

HOUGH TRANSFORM CS 6350 C V

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Data Hiding in Video

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Compressive Sensing for Multimedia. Communications in Wireless Sensor Networks

Video annotation based on adaptive annular spatial partition scheme

Pedestrian Detection with Improved LBP and Hog Algorithm

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

Text Clustering Incremental Algorithm in Sensitive Topic Detection

Detection of Missing Values from Big Data of Self Adaptive Energy Systems

Fast trajectory matching using small binary images

Towards New Heterogeneous Data Stream Clustering based on Density

packet-switched networks. For example, multimedia applications which process

Invarianceness for Character Recognition Using Geo-Discretization Features

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Texture Sensitive Image Inpainting after Object Morphing

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

Energy Efficient Clustering Protocol for Wireless Sensor Network

Research on the Application of Notification Service for Service-oriented Digital Library

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Distribution Based Data Filtering for Financial Time Series Forecasting

A Method of weld Edge Extraction in the X-ray Linear Diode Arrays. Real-time imaging

Sample Adaptive Offset Optimization in HEVC

An Edge-Based Algorithm for Spatial Query Processing in Real-Life Road Networks

Reliable Mobile IP Multicast Based on Hierarchical Local Registration

Transcription:

Sensors & Transducers 14 by IFSA Publishing, S. L. http://www.sensorsportal.com A Sliding Window Mann-Kendall Method for Piecewise Linear Representation 1 Jingbo CHEN, Yu ZHANG, 1* Junbao ZHENG, 1 Yali SUN 1 School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 3118, China Patent Examination Cooperation Center of the Patent Office, SIPO, Hubei, Wuhan 437, China Tel.: +86-5718684338, fax: +86-57186843576 E-mail: horseechen@gmail.com, zhang_yu8@sipo.gov.cn, zhengjunbao@zstu.edu.cn, yalisunny@16.com Received: 5 May 14 /Accepted: 3 June 14 /Published: 31 July 14 Abstract: Piecewise Linear Representation has been widely used to compress online data which are collected by sensors. However, the current existing methods, including Piecewise Aggregate Approximation and Perceptually Important Points cannot own both advantages of calculation speed and recovery accuracy at the same time. In order to improve the Piecewise Linear Representation performance, this work proposes a Sliding Window Mann-Kendall method. This method is applied to compress four typical data series, and the compression results are compared with the existing methods. Comparing to Piecewise Aggregate Approximation method, this method has better recovery accuracy under the same compressing ratio. Comparing to Perceptually Important Points method, this method has a faster calculating speed and occupies less resources. The parameter setting of the proposed method affects the Piecewise Linear Representation result. The experiments results show that the optimal window width fit for different data series are different. And for change points picking, as the optimization threshold increases, the compressing ratio and the recovery accuracy decrease. Copyright 14 IFSA Publishing, S. L. Keywords: Change points detection, Sliding window Mann-Kendall method, Piecewise linear representation time series data compression. 1. Introduction Series data is widely studied by researchers in many different domains. For example, the water levels in hydrology [1], the precipitation in meteorology [], and the stock trading prices in finance [3], etc. Ignored the meaning these data represent for, they have some characters in common, such as high dimensionality and continuously updating. The data are often acquired from sensor systems in the way of equal interval sampling. So the sequences are usually large in size. As the Internet of Things developing rapidly, massive data are produced by more and more sensors [4]. The storage of data becomes a big problem lying before engineers. is a simple but effective solution. The basic idea is to sample with a lower rate. But this approach will distort the shape of the series curves. In order to find better ways, researchers have made some effort to reduce dimensionality of the series. Kambhatla and Leen tried PCA in 1993 [5], Korn et http://www.sensorsportal.com/html/digest/p_44.htm 187

al. used SVD in 1997 [6], Rafiei and Mendelzon used DFT in [7], Popivanov and Miller used DWT in [8], Keogh et al. used Piecewise Aggregate Approximation (PAA) in [9]. However, although these methods make progress on reducing the dimensionality, they may fail to retain the general shape of the time series after compression because some turning points are neglected during the process. Fu et al. used Perceptually Important Point (PIP) to sort the points in the time series by each point s importance in 5 [1] and 7 [11]. By PIP, the curve s shape is described by the N most important points. In Fu et al. s work, they use three distances to calculate the importance value. A structure called Specialized Binary Tree is also discussed to aid the sorting. This method deals well in keeping the shape of curves but it needs to restore large amount of importance values to support online analysis. Among the methods mentioned above, the PAA and the PIP methods are two most popular ones. Based on Piecewise Linear Representation (PLR), researchers build kinds of models using BPN or SVM to analyze and predict stock trading prices [1, 13]. In order to make a balance between speed and accuracy, the compressing methods need to be readjusted. New method need to combine the advantages to set a more accurate segmentation while occupying less resources and taking less time. Take a look at the method we have already got. PAA s detecting speed comes from its simple segmentation strategy. PIP s low representing error comes from its special point searching. So combining sliding window and turning point detection together will be a good idea. Thus we get the new method, which will be introduced in this paper, so called Sliding Window Mann-Kendall (SWMK) method. The rest parts are arranged as following. Part will briefly introduce some related works about some PLR methods and the Mann-Kendall method used in hydrology. Part 3 will give out the Sliding Window Mann-Kendall Method. Part 4 will show the results of three PLR methods and compare their compressing effects. In the last part, the conclusion and the expectation will be presented.. Related Works.1. PAA and PIP Methods Piecewise Aggregate Approximation method is a quick method to do series compressing. The basic idea of this method is to use the mean as representative of all the time points in one interval. This simple but effective strategy has been widely used in industry especially for online data segmentation [14]. But it also has a very obvious disadvantage that PAA isn t sensitive to the change inside one segment. It may cause large error when tens of thousands of segments are added together. In 7, Lin and Keogh [15] proposed a Symbolic Aggregate Approximation (SAX) method to modify the PAA method and some further work has been done since that. Lkhagva, Suzuki and Kawagoe extended the SAX method by picking up the maximum and minimum value in the segment as the important points [16]. But SAX focuses more on value clustering than the detection of important points. The extended SAX method has partly compensated for the ability to detect the detail change in the segment. But the points chosen strategy are just effective when the segment length is short enough. It seems that the compressing based on PAA, which uses mean as representation value, is difficult to get well modified to raise the accuracy drastically. To raise the recovering accuracy after the data compressing, a Perceptually Important Point (PIP) method is proposed to focus on the points which affect the most in the tendency change of the series curve [1]. Using this method, the original data series are divided into segments with some turning points. The method s logic is quite natural but distinct that it first takes the line segment connecting the start point and the end point as the two-point representation. Then calculate the distance to the line of every point between them, choose the one with largest distance as the third important point. And then the three-point representation or two-segment representation is obtained. The following work is repeating. Regard the line segments between adjacent points as the same as the whole curve. Calculate the distance of each unsorted point to its located line segment. Choose the one with largest distance as the next important point. Repeat the procedures above until all the points importance value is calculated and all the points are sorted. The PIP uses a specialized binary tree to record the distance value for each hierarchy. If this method is applied on an online system, the binary tree is important because it is the key to update the importance value as the series gaining new data points. Thus, the binary tree will become more and more complicated when the number of points increases. It also needs more and more memory space to restore the tree structure. Fu et al. have done a lot of work on specialized binary tree and the updating for PIP method to deal with stock data [17, 18]. They proposed several strategies to add a point or add another tree to the original tree. Researchers are on their way to reducing the calculating burden of PIP. However, the distance calculation and the storage occupancy of the tree are still two problems when applying PIP on online series processing... Mann-Kendall Method Mann-Kendall (MK) method is a trend detecting method. It is first proposed by Mann in 1945 [19]. After Goossens applied this method on inverse sequence analysis in 1987 [], this method gained ability to detect the abrupt change. MK method is widely used in hydrology. Hydrologists use this mathematical tool to analyze water levels of a river or precipitation of an area [1]. They focus on the 188

conception called change-points. Change-points are some turning points in trend analysis. They are the starting points where the series change the variation trend. Hydrologists commonly need to find out just two or three change points in series to divide the curve into segments. After finding some related reasons that cause the change of trend, they can use the last segment to predict the series tendencies in the near future [, 4]. Similar thoughts are also found in stock predictions [5]. And it is quite obvious that the conception change-points used in trend analysis is very close to so called important points. They both focus on the main trend/shape of the curve. Take on a time series sequence x 1, x,, x n (n 1), the statistic S k will be given out as (1). where n 1 n k j i i= 1 j= i+ 1 S = sign( x x ), ( i =,3, 4,, n) (1) 1, xj xi > sign( x j xi) =, x j xi = 1, xj xi < The trend change will be judged by statistic UF k given out in (). Sk E( Sk) UFk =,( k = 1,,3,, n ) () Var( S ) k In (), UF 1 =. E(S k ) is the mean of S k. Var(S k ) is the variance of S k. UF k curve shows the trend of series. If the curve falls in confidence interval (-U a /,+U a /), it means the original series don t hold an obvious trend change. If not, it means the trend change exists. If UF k >, the series turn upward. If UF k <, the series turn down. Thus we get some marks about the trend change direction. But we still don t get the change points. So we need to calculate another statistic UB k, which is actually the -UF k of the inverse sequence. Here, k=n,n-1,,1, UB 1 =. The curves of UF k and UB k are draw together. If the two curves have points of intersection, and the points fall in confidence interval (-U a /,+U a /), we get the change points. Generally, take α=.5 as the significance level. Fig. 1 shows the change point found in the series as an example. In Fig. 1, the No. 18 point (circled) is the change point detected by MK method. The PIP method enlightens this paper to find a less complicated PLR algorithm based on the important points. The paper also gets some points from the trend analysis in hydrology. In the following parts, a Sliding Window Mann-Kendall (SWMK) method will be discussed. 8 7 6 Original Series 5 5 1 15 5 3 Sequence Number UF UB and UF curves UB - 5 1 15 5 3 Sequence Number Fig. 1. An example of change point detection using MK method. 3. Methods 3.1. Sliding Window Mann-Kendall Method Although MK method is quite a mature algorithm in trend analysis, it cannot fulfill the need of timeseries segmentation. In order to add the ability to search out the points of detail shape change on the MK method, a sliding window is introduced. The basic thought is as Fig. shows. The original sequence is divided into equal interval segments. We take the window width as W. For each detection, the length of subsequence is 3 W. Apply MK method on the subsequence. Only the points occur in the middle segment will be retained. Each time, the window moves forward by W. For example, the n th detection detects the change-points in the interval [nw+1, (n+1)w]. While the (n+1) th detection detects the change-points in the interval [(n+1)w+1, (n+)w]. 3.. Optimization: Change Points Picking The change-points detection method will produce quite a lot possible change-points. They can just be called possible because some points may show the detail shape changes which are not necessary in our compressing work. Here we set a threshold T which is defined as (3). s max is the maximum of sequence. s min is the minimum of sequence. r is a ratio between to 1. Usually, r should be no larger than.1. T = r s max s min (3) As Fig. 3 shows, P 1 (x 1,y 1 ), P (x,y ), P 3 (x 3,y 3 ), P 4 (x 4,y 4 ) are four possible change-points searched out by the PLR methods. For the first point P 1, we assume that it is the checked point belongs to the change-points set. P 1 is chosen to be the starting point 189

of first segment. Then turn to the point P. In order to decide whether to put it into the final set, we connect P 1 and P. Then the perpendicular distance between P 3 and the line P 1 P is calculated as (4)-(7). y slope( P1, P) = s = x y x 1 1 (4) x c x3 + sy ( 3 y) + ( sx) = (5) 1+ s c ( ) ( ) y = sx sx + y (6) c ( ) ( ) ( ) 3, c c 3 c 3 D P P = x x + y y (7) Fig.. Sliding window detection. the possible change-points set. Then the calculation turns to P 1 P 3. If D(P 4,P c )>T, as Fig. 3(b) shows, it means P 3 P 4 doesn t follow the trend of P 1 P 3. The result identifies that P 3 is a final change-point. In this paper, T is represented by (8). Then P 3 can be chosen as the segment starting point. Turn to P 4 and the points after P 4 to do the optimization till all the possible change-points are checked. Thus we get the final change-points set. 3.3. Algorithm Description of PLR-SWMK (a) (b) Fig. 3. Set Optimization by using perpendicular distance judgment. If the distance D<T, as Fig. 3(a) shows, it means P 3 is still on the same trend with P 1 P. P should be eliminated, and the point P 3 takes the place of P in The total steps of PLR-SWMK method are as follows: Step1. Divide the original sequence into equal interval segments. The length of each segment is W. Step. Take the i th to (i+) th segments as the detection block. i=1,,3 Step3. Apply MK method to detect the changepoints. Step4. Retain the change-points in the middle segment. Step5. i=i+1, turn to Step. If the algorithm reaches the final segment, turn to Step6. Step6. Optimize the gathered points set using the perpendicular distance. Step7. Using the linear interpolation to decompress the data and get the new series. 4. Results and Discussion 4.1. and In order to judge and compare the compressing methods, we need to give the formulas of 19

and calculation. Formula (8) shows the calculation of and Formula (9) shows the definition. In the following parts, the methods are mainly judged by these two indicators. The number of segments' endpoints after segmentation CR = 1% (8) The number of data points of the original time series = Points data after recovering - Original points data (9) 4.. Comparison between PLR-PAA, PLR-SWMK and PLR-PIP Methods From 3., we get more detail information about SWMK method. Based on the research about thresholds settings of the method, in this part, we will compare PLR-SWMK and two other PLR methods, the PLR-PAA and PLR-PIP methods. The algorithm is coded and run in MATLAB. The computer system is Windows 7. The CPU of computer is Inter Core i5-31m,.5 GHz. The RAM capacity is 4. GB. The version of MATLAB is 13a. The algorithms researched in 4.3 are also coded and run on this device. The most important things we focus on are the and Run. By adjusting the segments quantity used in PAA and PIP, the of each method is controlled almost the same. Under the similar level, the and Run are compared. The original data come from four different domains. The first is the trajectory data of one character point coming from the motion capture of a human face. The total number of the points is 1. The second is the data of a segment of electrocardiogram. The total number of the points is 3751. The third is the data of electronic olfactory sensor. The total number of the points is 15. The last is the data of Shanghai stock index. The total number of the points is 5741. The reason why these four data series are chosen is because their frequencies and data scales are different. Table 1 to 4 shows the results of three PLR methods applied on the four series data. Table 1. Results of three PLR methods on human face data. Method Endpoints /s Run Quantity PAA 77 7.7 % 96.74.145 PIP 77 7.7 % 9.41 5.889 SWMK 77 7.7 % 7.79 1.88 Under the similar, the PLR- PIP method shows the best on representation. But as the scale of data points increases, the Run of PLR-PIP exponentially grows to a very large level. Attention to Table 4 that the Run of PLR-PIP is 1466 seconds, which means the process last for more than 3 hours. It is possibly because the code we use contains lots of loops. The PLR-PIP has to calculate importance values for quite a lot of points to get the right order and store the Specialized-Binary Tree. Which means a lot of storage resource may be occupied by the algorithm. Turn to PLR-SWMK method. It performs better than PLR-PAA method in. And the Run is much less than PLR-PIP method. Specially, for the analysis of electrocardiogram data and Shanghai stock index data, the s of PLR-SWMK are fairly close to those of PLR-PIP. Fig.6, Fig.7, Fig.8 and Fig.9 in the Appendix show the representation curves of all three methods applying on four different data. Table. Results of three PLR methods on the electrocardiogram data. Method Endpoints Compress Run Quantity ing /s PAA 186 4.96% 1.13.151 PIP 186 4.96% 3.15 58.8 SWMK 183 4.88% 4.77 3.981 Table 3. Results of three PLR methods on the electronic olfactory sensor data. Method Endpoints Quantity Run /s PAA 565 3.77%.153.185 PIP 565 3.77%.43 1.37 SWMK 559 3.73%.8 13.734 Table 4. Results of three PLR methods on the Shanghai stock index data. Method Endpoints /s Run Quantity PAA 16.79 % 113.96.64 PIP 16.79 % 84.6 1466.538 SWMK 1573.74 % 976.4 3.486 191

Take the result of Human Face characteristic point s trajectory data as an example and compare the recovering curve of PIP and the one of SWMK. The main bodies of the two curves are almost the same. But zooming in the curves, just as Fig. 4(a) and Fig. 4(b) shows, the missing of some detail turning points produces most errors in SWMK s result while PIP performs much better. Fig. 4 (c) shows the recovering result of SWMK when the window width is set as 5. Compared to Fig. 4(b), the change points found in Fig. 4 (c) are more similar to Fig. 4 (a). And the recovering error of Fig. 4(c) 33.97 is much smaller than that of Fig. 4 (b) 7.789. Take a look at Fig. 4 (b) and Fig. 4(c), when the parameters used in SWMK change, the results are quite different. So in next section, the relationship between the compressing results and the parameter setting in SWMK will be discussed. 8 6 4 8 6 4 8 6 4 PLR_PIP method result Original PIP 4 6 8 1 (a) PLR_SWMK method result Original SWMK 4 6 8 1 (b) PLR_SWMK method result (window width = 5) 4 6 8 1 (c) Original SWMK Fig. 4. Comparison between PIP and SWMK in detail: (a) PIP result (b) SWMK result (window width = 1) (c) SWMK result (window width = 5). 4.3. Selections of Window Width and Optimization Thresholds In SWMK method, there are two parameters to be set in advance. The width of window and the optimization distance. The following tables show how the compressing results change with different thresholds settings. The time series selected for Fig. 5 (a) come from a short part of an electrocardiogram, containing 3751 data points. The data for Fig. 5(b) come from the facial characteristic point s trajectory, containing 1 data points. All the detail result data of Fig. 5 are shown in Table.6 and Table.7 in the Appendix. Fig. 5 shows the performance of SWMK method when the optimization distance is fixed and the window width changes. The optimization distance parameter r in Formula (3) is set as.1. As shown in Fig. 5 (a), the compressing ratio decreases as the window width increases, except when window width equals to. When the window width equals to 1, which means each detection block contains 3 points, the residual error reaches the minimum. So a window with the width of cannot fulfill the need for SWMK, this parameter must be set at least larger than 3. Fig. 5 (b) shows the similar conclusion. The difference is when the window width equals to 5, and the detection block contains 15 points, the residual error reaches the minimum. Combine Fig. 5 (a) and Fig. 5 (b), the conclusion is that the parameters in SWMK should be adjust according to the data series. For ECG data, the best window width is 1. While for facial characteristic point s trajectory, the best window width is 5. In this work, we cannot point out which reasons demonstrate on the choice of parameters. The signals usually hold their own characters, including special frequency, changing tendency and some other features. These characters have the possibility to affect the selection of the window width, which is to be studied in the future. Table 5 shows how SWMK method s compressing performance changes as the optimization distance changes while window width is fixed as 1. The data is the electrocardiogram series. Obviously, the compressing ratio decreases as the optimization distance increases. On the contrary, the residual error rises as the optimization distance increases. This conclusion can be explained as, when the optimization distance is smaller, which means more points remained and smaller error produced. At the same time, the compressing ratio becomes larger. Summarizing all the results of Fig. 5 and Table 5, it means that the compressing ratio and the recovery accuracy cannot achieve their optimal values under the same parameter settings. At least one of the two indexes needs to be sacrificed. To get best comprehensive effects, both indexes need to be taken into consideration according to the actual performance requirements. 19

18 16 14 1 1 8 6 4 5 1 15 5 3 Window Width 1 9 8 7 6 5 4 3 1 / % 18 16 14 1 1 8 6 4 5 1 15 5 3 4 Window Width 15 14 13 1 11 1 9 8 7 6 5 / % (a) (b) Fig. 5. performance under different thresholds settings (a) ECG Data (3751 points) result (b) Face Data (1 points) result. Table 5. performance when optimization distance changes while window width fixed. Window Width Optimization Distance 1.1.9 % 4.3 1.5 18.1 % 4.7 1.5 1.3 % 4.8 1.8 7.78 % 4.75 1.1 6.8 % 4.315 1.15 5.41 % 4.55 1. 4.88 % 4.767 5. Conclusions A Sliding Window Mann-Kendall method was discussed to do Piecewise Linear Representation for time series data. The s and Efficiency were researched and compared with PLR-PAA and PLR-PIP methods. The parameters and thresholds used in this PLR-SWMK method were also studied. Some compressing results show that the PLR-SWMK performs better than PLR-PAA in accuracy while PLR-PIP method performs better than PLR-SWMK. Although PLR-SWMK cannot recover the data as precisely as PLR-PIP does, its recovery error is very close to that of PLR-PIP. What s more, PLR-SWMK is a much faster method and occupies acceptable computer resources. Experiments prove that PLR-SWMK is a proper method which can be used on online sensor data compressing. The parameters setting in this method is very important to the compressing ratio and the residual errors. For different data series, the parameters and thresholds should be set different in order to get better comprehensive compressing performance according to actual requirements. Acknowledgments The first two authors contributed equally to this work. This work is financially supported by Zhejiang Provincial Natural Science Foundation of China under grants (No.LY13F4), Work-safety Science & Technology Foundation of Zhejiang Administration of Work-safety under Grant No. 13A15, and National Natural Science Foundation of China under grants (No. 617311, 61763). Appendix 3 3 8 6 4 PLR_PAA method results for Trajectory Data of Human Face 1 3 4 5 6 7 8 9 1 193

3 3 8 6 4 PLR_PIP method results for Trajectory Data of Human Face 3 3 8 6 4 1 3 4 5 6 7 8 9 1 PLR_SWMK method results for Trajectory Data of Human Face 1 3 4 5 6 7 8 9 1 Fig. 6. Representing Curves of three PLR methods for trajectory data of human face. PLR_PAA method results for Data of Electrocardiogram - -4 5 1 15 5 3 35 PLR_PIP method results for Data of Electrocardiogram - -4 5 1 15 5 3 35 PLR_SWMK method results for Data of Electrocardiogram - -4 5 1 15 5 3 35 Fig. 7. Representing Curves of three PLR methods for data of electrocardiogram. 194

1.7 1.65 1.6 PLR_PAA method results for Data of Electronic Olfactory Sensors 1.55 5 1 15 PLR_PIP method results for Data of Electronic Olfactory Sensors 1.7 1.65 1.6 1.55 5 1 15 PLR_SWMK method results for Data of Electronic Olfactory Sensors 1.7 1.65 1.6 1.55 5 1 15 Fig. 8. Representing Curves of three PLR methods for data of electronic olfactory sensors. PLR_PAA method results for Data of Shanghai Stock Index 8 6 4 1 3 4 5 PLR_PIP method results for Data of Shanghai Stock Index x 1 4 8 6 4 1 3 4 5 PLR_SWMK method results for Data of Shanghai Stock Index x 1 4 8 6 4 1 3 4 5 x 1 4 Fig. 9. Representing Curves of three PLR methods for data of Shanghai stock index. 195

Table 6. performance when window width changes while optimization distance fixed (ECG). Window Width Optimization Distance Window Width Optimization Distance.1 6.69 % 15.565 17.1 5.95 % 6.199 3.1 7.46 % 4.91 18.1 6. % 5.951 4.1 7.54 % 4.543 19.1 5. % 7.18 5.1 7.3 % 4.4.1 5.39 % 7.319 6.1 7.57 % 4.789 1.1 5.7 % 7.175 7.1 7. % 4.765.1 4.91 % 7.13 8.1 7.1 % 5.171 3.1 4.4 % 7.471 9.1 6.85 % 4.59 4.1 4.35 % 8.596 1.1 6.8 % 4.316 5.1 4.4 % 9.37 11.1 6.4 % 6.35 6.1 4.48 % 8.93 1.1 6.53 % 5.36 7.1 4.9 % 9.558 13.1 6.19 % 6.11 8.1 4.8 % 9.337 14.1 6.19 % 6.89 9.1 4. % 9.49 15.1 6.19 % 6.44 3.1 4.9 % 1.76 16.1 6.13 % 7.776 Table 7. performance when window width changes while optimization distance fixed (Face Data). Window Width Optimization Distance Window Width Optimization Distance.1 6.6 % 139.97 17.1 5.5 % 14.41 3.1 1.1 % 39.98 18.1 6.3 % 8.19 4.1 1.1 % 38.814 19.1 6. % 11.356 5.1 9.7 % 33.97.1 6.1 % 15.66 6.1 9. % 45.175 1.1 5.3 % 19.639 7.1 8.9 % 53.51.1 5.1 % 139.57 8.1 8. % 53.159 3.1 5.3 % 17.6 9.1 7.8 % 67.155 4.1 6. % 73.47 1.1 7.7 % 7.789 5.1 5.5 % 117.836 11.1 7.1 % 58.58 6.1 4.9 % 169.434 1.1 6.1 % 98.98 7.1 5. % 16.445 13.1 7. % 84.637 8.1 5.5 % 155.786 14.1 7. % 14.68 9.1 4.3 % 167.663 15.1 6.9 % 89.815 3.1 5. % 18.845 16.1 6. % 19.139 References [1]. K. H. Hamed, Trend detection in hydrologic data: the Mann Kendall trend test under the scaling hypothesis, Journal of Hydrology, Vol. 349, Issue 3, 8, pp. 35-363. []. T. Partal, E. Kahya, Trend analysis in Turkish precipitation data, Hydrological Processes, Vol., Issue 9, 6, pp. 11-6. [3]. T. Chen, C. Chen, H. J. Teoh, Fuzzy time-series based on Fibonacci sequence for stock price forecasting, Physica A: Statistical Mechanics and its Applications, Vol. 38, 7, pp. 377-39. [4]. L. Atzori, A. Iera, G. Morabito, The internet of things: A survey, Computer Networks, Vol. 54, Issue 15, 1, pp. 787-85. [5]. N. Kambhatla, T. K. Leen, Fast nonlinear dimension reduction, in Proceedings of the IEEE International Conference on: Neural Networks, San Francisco, CA, USA, 8 March 1 April 1993, pp. 113-118. [6]. F. Korn, H. V. Jagadish, C. Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, ACM SIGMOD Record, Vol. 6, Issue, 1997, pp. 89-3. [7]. D. Rafiei, A. Mendelzon, Querying time series data based on similarity, IEEE Transactions on Knowledge and Data Engineering, Vol. 1, Issue 5,, pp. 675-693. [8]. I. Popivanov, R. J. Miller, Similarity search over time-series data using wavelets, in Proceedings of the 18 th IEEE International Conference on: Data Engineering, San Jose, CA, USA, 6 February 1 March, pp. 1-1. [9]. E. J. Keogh, M. J. Pazzani, Scaling up dynamic time warping for data mining applications, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, -3 August, pp. 85-89. [1]. T. Fu, F. Chung, P. Tang, et al, Incremental stock time series data delivery and visualization, in Proceedings of the 14 th ACM International Conference on Information and Knowledge Management, Bremen, Germany, 31 October 5 November 5, pp. 79-8. [11]. T. Fu, F. Chung, R. Luk, et al, Representing financial time series based on data point importance, Engineering Applications of Artificial Intelligence, Vol. 1, Issue, 8, pp. 77-3. 196

[1]. L. Luo, X. Chen, Integrating piecewise linear representation and weighted support vector machine for stock trading signal prediction, Applied Soft Computing, Vol. 13, Issue, 13, pp. 86-816. [13]. P. Chang, T. Liao, J. Lin, et al, A dynamic threshold decision system for stock trading signal detection, Applied Soft Computing, Vol. 11, Issue 5, 11, pp. 3998-41. [14]. E. J. Keogh, S. Chu, D. Hart, et al, An online algorithm for segmenting time series, in Proceedings of the IEEE International Conference on Data Mining (ICDM), San Jose, CA, USA, 9 November December 1, pp. 89-96. [15]. J. Lin, E. Keogh, L. Wei, et al, Experiencing SAX: a novel symbolic representation of time series, Data Mining and Knowledge Discovery, Vol. 15, Issue, 7, pp. 17-144. [16]. B. Lkhagva, Y. Suzuki, K. Kawagoe, Extended SAX: Extension of symbolic aggregate approximation for financial time series data representation, in Proceedings of the IEEE Data Engineering Workshop, Atlanta, GA, USA, 3-7 April 6, pp. 4A-i8. [17]. T. Fu, F. Chung, K. Kwok, et al, Stock time series visualization based on data point importance, Engineering Applications of Artificial Intelligence, Vol. 1, Issue 8, 8, pp. 117-13. [18]. T. Fu, F. Chung, R. Luk, et al, Stock time series pattern matching: Template-based vs. rule-based approaches, Engineering Applications of Artificial Intelligence, Vol., Issue 3, 7, pp. 347-364. [19]. H. Mann, Non-parametric test against trend, Econometrika, Vol. 13, 1945, pp. 45-59. []. C. Goossens, A. Berger, Abrupt climatic change, Springer, Netherlands, 1987. [1]. H. Tabari, S. Marofi, M. Ahmadi. Long-term variations of water quality parameters in the Maroon River, Iran, Environmental Monitoring and Assessment, Vol. 177, Issue 1-4, 11, pp. 73-87. []. V. Naddeo, D. Scannapieco, T. Zarra, et al, River water quality assessment: Implementation of nonparametric tests for sampling frequency optimization, Land use Policy, Vol. 3, Issue 1, 13, pp. 197-5. [3]. J. Pärn, Ü. Mander, Increased organic carbon concentrations in Estonian rivers in the period 199 7 as affected by deepening droughts, Biogeochemistry, Vol. 18, Issue 1-3, 1, pp. 351-358. [4]. H. Tao, M. Gemmer, Y. Bai, et al, Trends of streamflow in the Tarim river basin during the past 5 years: Human impact or climate change?, Journal of Hydrology, Vol. 4, Issue 1, 11, pp. 1-9. [5]. V. Khrapko, Testing the weak-form efficiency hypothesis in the Ukrainian stock market versus those of the USA, Russia, and Poland, Ekonomika/Economics, Vol. 9, Issue, 13, pp. 18-11. 14 Copyright, International Frequency Sensor Association (IFSA) Publishing, S. L. All rights reserved. (http://www.sensorsportal.com) 197