Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Similar documents
3D Model Retrieval Method Based on Sample Prediction

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Study on effective detection method for specific data of large database LI Jin-feng

Accuracy Improvement in Camera Calibration

BASED ON ITERATIVE ERROR-CORRECTION

An Efficient Algorithm for Graph Bisection of Triangularizations

arxiv: v2 [cs.ds] 24 Mar 2018

The isoperimetric problem on the hypercube

Relationship between augmented eccentric connectivity index and some other graph invariants

Ones Assignment Method for Solving Traveling Salesman Problem

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

Algorithms for Disk Covering Problems with the Most Points

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

An Efficient Algorithm for Graph Bisection of Triangularizations

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

INTERSECTION CORDIAL LABELING OF GRAPHS

A Comparative Study of Positive and Negative Factorials

What are Information Systems?

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

DATA MINING II - 1DL460

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Low Complexity H.265/HEVC Coding Unit Size Decision for a Videoconferencing System

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

Empirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic.

Properties and Embeddings of Interconnection Networks Based on the Hexcube

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

Cubic Polynomial Curves with a Shape Parameter

EFFECT OF QUERY FORMATION ON WEB SEARCH ENGINE RESULTS

Lower Bounds for Sorting

Analysis of Class Design Coupling Based on Information Entropy Di Jiang 1,2, a, Hua Zhou 1,2,b and Xingping Sun 1,2,c

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Improved Random Graph Isomorphism

Python Programming: An Introduction to Computer Science

Bayesian Network Structure Learning from Attribute Uncertain Data

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Using Markov Model and Popularity and Similarity-based Page Rank Algorithm for Web Page Access Prediction

Lecture 2: Spectra of Graphs

Python Programming: An Introduction to Computer Science

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Evaluation scheme for Tracking in AMI

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

Fuzzy Minimal Solution of Dual Fully Fuzzy Matrix Equations

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

The measurement of overhead conductor s sag with DLT method

New HSL Distance Based Colour Clustering Algorithm

1 Graph Sparsfication

Chapter 3 Classification of FFT Processor Algorithms

Visualization of Gauss-Bonnet Theorem

SOFTWARE usually does not work alone. It must have

Lecture 5. Counting Sort / Radix Sort

The Counterchanged Crossed Cube Interconnection Network and Its Topology Properties

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work

On (K t e)-saturated Graphs

Rapid Frequent Pattern Growth and Possibilistic Fuzzy C-means Algorithms for Improving the User Profiling Personalized Web Page Recommendation System

Design and Implementation of Web Usage Mining Intelligent System in the Field of e-commerce

Exact Minimum Lower Bound Algorithm for Traveling Salesman Problem

WEBSITE STRUCTURE IMPROVEMENT USING ANT COLONY TECHNIQUE

State-space feedback 6 challenges of pole placement

Image based Cats and Possums Identification for Intelligent Trapping Systems

BOOLEAN MATHEMATICS: GENERAL THEORY

. Written in factored form it is easy to see that the roots are 2, 2, i,

Fuzzy Rule Selection by Data Mining Criteria and Genetic Algorithms

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui

Text Feature Selection based on Feature Dispersion Degree and Feature Concentration Degree

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Extracting Repitative Patterns from Fuzzy Temporal Data

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Research Article Optimal Configuration of Virtual Links for Avionics Network Systems

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

1 Enterprise Modeler

A New Bit Wise Technique for 3-Partitioning Algorithm

Counting the Number of Minimum Roman Dominating Functions of a Graph

Identification of the Swiss Z24 Highway Bridge by Frequency Domain Decomposition Brincker, Rune; Andersen, P.

A Note on Least-norm Solution of Global WireWarping

1.2 Binomial Coefficients and Subsets

Software Fault Prediction of Unlabeled Program Modules

Intro to Scientific Computing: Solutions

Evaluation of Support Vector Machine Kernels for Detecting Network Anomalies

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #2: Randomized MST and MST Verification January 14, 2015

We are IntechOpen, the first native scientific publisher of Open Access books. International authors and editors. Our authors are among the TOP 1%

performance to the performance they can experience when they use the services from a xed location.

Intermediate Statistics

New Results on Energy of Graphs of Small Order

RAID-RMS: A fault tolerant stripped mirroring RAID architecture for distributed systems

Lecture 28: Data Link Layer

Introduction to Sigma Notation

Performance Plus Software Parameter Definitions

Novel pruning based hierarchical agglomerative clustering for mining outliers in financial time series

Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach

CS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection

Lecture 1: Introduction and Strassen s Algorithm

On Nonblocking Folded-Clos Networks in Computer Communication Environments

Transcription:

Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules from Mechaical Sesor Data Qig YANG1,a,*, Shao-Yu WANG1,b, Tig-Tig ZHANG2,c 1 School of Computer Sciece ad Techology, Doghua Uiversity, Chia 2 Departmet of Iformatio Techology ad Media, Mid Swede Uiversity, Swede a yqij2929@163.com, bsywag@dhu.edu.c, ctigtig.zhag@miu.se * Correspodig author Keywords: Sesor Time Series, Associatio Rules, Rules Pruig, Rules Summarizig, BIGBAR. Abstract. Sesors are widely used i all aspects of our daily life icludig factories, hospitals ad eve our homes. Discoverig time series associatio rules from sesor data ca reveal the potetial relatioship betwee differet sesors which ca be used i may applicatios. However, the time series associatio rule miig algorithms usually produce rules much more tha expected. It s hardly to uderstad, preset or make use of the rules. So we eed to prue ad summarize the huge amout of rules. I this paper, a two-step pruig method is proposed to reduce both the umber ad redudacy i the large set of time series rules. Besides, we put forward the BIGBAR summarizig method to summarize the rules ad preset the results ituitively. Itroductio Rule discovery is oe of the cetral tasks of data miig [1]. Associatio rule miig has a capability to fid hidde correlatios amog differet items withi a data set [2]. Existig researches have proposed differet algorithms for miig associatio rules from time series data. However, the problem is the umber of discovered rules are too may ad the huge amout of rules may iclude may redudat rules. We ca hardly use those rules directly or preset the huge amout of rules to users. I this paper, we propose two methods to aalyze a large dataset of discovered time series associatio rules ad give a summary of the rules. Firstly, a pruig method of redudat rules has bee applied to cut dow the redudat rules ad the we itroduce BIGBAR, a bipartite graph based associatio rules summarizig method to summarize the rest of rules ad fid the iterestig rules. The rest of the paper are orgaized as follows. We itroduce the state-of-art methods for associate rules pruig ad summarizig i related work. I the method sectio, we explai the methods ad algorithms used i this paper. After that, we show the experimets ad results. Fially, we summarize our work ad explai the future work. Related Work Pruig methods ca be used to reduce the umber of rules ad elimiate isigificat rules. Iterestigess measure is a importat techique for pruig methods. H.Toivoe et al. [3] use cofidece as iterestigess measure ad Big Liu et al. [4] use the correlatio by testig the chi-square betwee rules. Szymo Jaroszewicz et al. [5] itroduced the maximum etropy priciple to pruig rules. I, S.Kaa et al. [2] give a detailed summary of more tha 40 iterestigess measures. I additio, the researcher or user ca defie the iterestig or redudat rules by themselves [6]. Aother techique is called close item set or rule cover. These works gave a subset of rules that ca cover all of the database trasactios or importat iformatio [7]. Usually, there are still may rules after pruig. Ad we eed to summarize the remaiig rules ad extract useful rules from them. It s commoly to use clusterig methods [8] to group these rules Copyright 2017, the Authors. Published by Atlatis Press. This is a ope access article uder the CC BY-NC licese (http://creativecommos.org/liceses/by-c/4.0/). 40

Advaces i Egieerig Research (AER), volume 131 ad give some represetative rules for each cluster. Besides, i [9], the paper itroduced a rule template method tryig to coclude the templates for differet types of rules. It s useful to preset the geeral idea of rules. Methods Pruig Redudat Rules I most cases, umber of redudat rules is sigificatly larger tha that of essetial rules [10]. It s ecessary to prue redudat rules before we use the rules or preset them to users. Whe we group time series associatio rules with the same left item or with the same right item, there are a lot of rules i the same group which is cofusig whe we wat to visualize rules or use them for predictio. Our proposed pruig method is based o the two cases. Let s start from the first case: pruig rules i the group of the same left items. For example, there are two rules mied from time series A ad B havig the same left item: [ o ] [ o ] p p Accordig to the defiitio of cofidece i, we ca get:, (1). (2) ppo p meas p happeed i time series A ad b happeed i time series B ad support p meas p happeed i time series A ad cb happeed i time series B withi a period time. It s easy to fid that ppo p ppo. So is greater tha or equals to. If equals to, we will defie is a redudat rule. For the same patter p i time series A, idicates that there will be a b i time series B with cofidece of but tells us there will be a cb with the same cofidece which gives us more iformatio. We choose to keep the rule which provides more iformatio. Aother case is pruig rules i the group with the same right item. For example, the two rules [ o ], ad are mied from time series A ad B: [ o ]. Accordig to the defiitio of cofidece, we ca get:, (3), (4) where * represets ay time series data. meas a follows ay data i time series A leads to t happeig i time series B which is equal to. This is differet from geeral associatio rules because a ad t happeed i differet time series. If equals to, we regard as a redudat rule because if there is a followed by * (ay data) i time series A, with the cofidece of we ca kow there will be a t i time series B, but if there is a a followed by b i time series A, the same cofidece we ca kow there will be a t i time series B. b i is redudat because it do t carry more iformatio. The formalized defiitios of the two cases are give below: Defiitio 1: For rules, with the same right item, if l m is a substrig of l m ad o o, is a redudat rule. Defiitio 2: For rules, with the same left item, if g m is a substrig of g m ad o o, is a redudat rule. 41

Advaces i Egieerig Research (AER), volume 131 BIGBAR Summarizig Method Pruig is oe way to cut dow redudat rules. However, there is o guaratee that the result of pruig ca be preseted to users because the umber of rules could still be very large ad hard to uderstad. What eeds to be doe ext is to aalyze ad summarize the rules ad extract useful iformatio. I this paper, we itroduce a ew method to fid the iterestig clusters of rules ad the extract iterestig rules withi each clusters. This method is bipartite graph based associatio rule (BIGBAR) summarizig method which presets the associatio rules i a bipartite graph. Bipartite graph has two idepedet sets of odes ad a set of edges liked betwee the two sets of odes as showed i Fig. 1.We deote oe set of odes as the clusters of left items ad the other set as the right items. The odes i the same set have o liks. Oe edge betwee two odes deotes there is at least oe rule whose left item is i oe ode, ad right item is i the other ode. We will record the average cofidece ad umber of rules o the edge. After we fiish this bipartite graph, we ca easily see how differet clusters of rules are distributed accordig to average cofidece ad umber of rules o each edges. s of right item G2 s of left item R1 (um=7, cof=0.67) R2 C2 C1 M2 G1 Y2 Fig. 1. Bipartite Graph of Rule Item Clustes Before we draw a bipartite graph, the first thig is to cluster the left items ad right items. We use hierarchical clusterig for this purpose: as it provide us a simple ad practical way to capture the similarity structure of the items. It combies two closest odes as oe cluster ad the ew cluster is cosidered as a ew ode. This process is doe iteratively util there is oe cluster of all the odes. Detailed iformatio of hierarchical clusterig is itroduced i [8]. The core of the algorithm is the defiitio of the distace betwee two items. Defiitio 3: The item distace betwee two items is defied as follows: m m ( ) (5) where dis() is the distace of items, lcs() is logest commo subsequece ad le() is the legth of sequece. We use the logest commo subsequeces to describe the similarity of the two items. We cosider the factor of the legth of the two items. Besides, the distace value should be smaller if they are closer. The secod step is to draw the bipartite graph. Algorithm 1 summarizes the process. Algorithm 1 BIGBAR Iput: Itemlist: leftclst, rightclst Rulelist: rules 42

Advaces i Egieerig Research (AER), volume 131 Output: a bipartite graph 1: for rule i rules 2: if rule.left i leftclst [i] ad rule.right i rightclst [j] 3: if edge liked with leftcluts [i] ad rightclst [j] 4: tempcof = edge.cof * edge.um 5: edge.um = edge.um + 1 6: edge.cof = (tempcof+rule.cof) / edge.um 7: else 8: draw a edge from leftclst [i] to rightclst [j] 9: edge.um = 1 10: edge.cof = rule.cof 11: ed if 12: ed if Whe the graph is built, we ca fid the iterestig rules from it. Before we explai how to fid out iterestig rules, we eed fid the iterestig rule clusters first. Defiitio 4: A rule cluster is a abstract rule whose left item is a cluster of left items ad right item is a cluster of right items. A rule cluster s cofidece is the average cofidece of all the rules i this cluster. The last thig is to fid iterestig rule clusters ad choose iterestig rules i each clusters. There are three measures that ca be cosidered to fid iterestig rule clusters: cofidece, umber of rules ad both. For selectig represetative rules, we ca simply choose rules with higher cofidece i each rule clusters. Experimet ad Results Our experimet data is a large set of time series associatio rules from [11]. There are total 198,405 associatio rules mied from time series data from 23 sesors deployed o differet parts of the idustrial machie icludig motors, coolers, pumps, drives ad taks. Firstly, we preprocess the rules ad prue the redudat rules from them. After pruig, we summarize the remaiig rules usig BIGBAR algorithm ad extract the iterestig rules i each rule clusters. We show the results of 5 differet rule sets i Table1. Each lie i Table1 show the result of oe rule set. The first item is the ame of time series pairs of the rules. For example, the first lie show the result of rules mied from P2 time series ad T2 time series. We prued 226 rules from total 332 rules. Usig hierarchical clusterig o left ad right items, we ca get 3 ad 5 clusters respectively. Fially, we extract 45 iterestig rules with higher cofidece after BIGBAR summarizig method. The pruig rate ad reducig rate (icludig pruig ad summarizig) are 68% ad 86% respectively. Besides the results of the above 5 rule sets. The last lie is the fial results of all the rule sets. We extract 21825 iterestig rules from 198405 rules ad the reducig rate is early 89%. 43

Advaces i Egieerig Research (AER), volume 131 Time series pair Total No. of rules Table 1. Results of pruig ad summarizig. No. of pruig rules Rules pruig rate No. of rule items clusters No. of iterestig rules Rules reducig rate [P2->T2] [332] [226] [68%] [Left=3, right=5] [45] [86%] [C2->C1] [352] [236] [67%] [Left=3, right=6] [54] [84%] [D2->C1] [520] [493] [95%] [Left=2, right=3] [18] [97%] [C1->D1] [100] [82] [82%] [Left=2, right=2] [12] [88%] [T1->P1] [1043] [490] [47%] [Left=6, right=6] [108] [90%] [All] [198,405] [130,947] [66%] [21,825] [89%] Summary Time series associatio rules miig leads us ito a ew world of associatio rules i big data field. However we are still facig may challeges. Oe of the biggest challeges is to uderstad the huge amout of discovered time series associatio rules. I this paper, we itroduced a two-step way to iterpret the huge amout of rules to be uderstadable. The first pruig step is to fid those rules that ca represet other rules or carry much iformatio tha other rules. The umber of rules ca be reduced a lot. The secod step is summarizig the remaiig rules usig bipartite graph based associatio rules summarizig method which ca show the distributio of the rule clusters ad summarize the iterestig rules. Time series associatio rules ca be mied betwee multiple time series. It s more complex to prue ad summarize the multi-item rules. This is a problem eeds to be solved i the future. Refereces [1] Liu, Big, Yimig Ma, ad Roie Lee. "Aalyzig the iterestigess of associatio rules from the time series dimesio." Data Miig, 2001. ICDM 2001, Proceedigs IEEE Iteratioal Coferece o. IEEE, 2001. [2] Liu, Big, Wye Hsu, ad Yimig Ma. "Pruig ad summarizig the discovered associatios." Proceedigs of the fifth ACM SIGKDD iteratioal coferece o Kowledge discovery ad data miig. ACM, 1999. [3] Toivoe, Hau, et al. "Pruig ad groupig discovered associatio rules." (1995). [4] Liu, Big, Wye Hsu, ad Yimig Ma. "Pruig ad summarizig the discovered associatios." Proceedigs of the fifth ACM SIGKDD iteratioal coferece o Kowledge discovery ad data miig. ACM, 1999. [5] Kaa, S., ad R. Bhaskara. "Associatio rule pruig based o iterestigess measures with clusterig." arxiv preprit arxiv:0912.1822 (2009). [6] Ashwii Batbarai1, Devishree Naidu2. Approach for Rule Pruig i Associatio Rule Miig for Removig Redudacy Iteratioal Joural of Iovative Research i Computer ad Commuicatio Egieerig.Vol. 2, Issue 5, May 2014. [7] Cristofor, Lauretiu, ad Da Simovici. "Geeratig a iformative cover for associatio rules." Data Miig, 2002. ICDM 2003. Proceedigs. 2002 IEEE Iteratioal Coferece o. IEEE, 2002. 44

Advaces i Egieerig Research (AER), volume 131 [8] Jorge, Alipio. "Hierarchical clusterig for thematic browsig ad summarizatio of large sets of associatio rules." Proceedigs of the 2004 SIAM Iteratioal Coferece o Data Miig. Society for Idustrial ad Applied Mathematics, 2004. [9] Klemettie, Mika, et al. "Fidig iterestig rules from large sets of discovered associatio rules." Proceedigs of the third iteratioal coferece o Iformatio ad kowledge maagemet. ACM, 1994. [10] Ashrafi, Mafruz Zama, David Taiar, ad Kate Smith. "A ew approach of elimiatig redudat associatio rules." Iteratioal Coferece o Database ad Expert Systems Applicatios. Spriger Berli Heidelberg, 2004. [11] Xue, Ruidog, et al. "Sesor time series associatio rule discovery based o modified discretizatio method." Computer Commuicatio ad the Iteret (ICCCI), 2016 IEEE Iteratioal Coferece o. IEEE, 2016. 45