Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules from Mechaical Sesor Data Qig YANG1,a,*, Shao-Yu WANG1,b, Tig-Tig ZHANG2,c 1 School of Computer Sciece ad Techology, Doghua Uiversity, Chia 2 Departmet of Iformatio Techology ad Media, Mid Swede Uiversity, Swede a yqij2929@163.com, bsywag@dhu.edu.c, ctigtig.zhag@miu.se * Correspodig author Keywords: Sesor Time Series, Associatio Rules, Rules Pruig, Rules Summarizig, BIGBAR. Abstract. Sesors are widely used i all aspects of our daily life icludig factories, hospitals ad eve our homes. Discoverig time series associatio rules from sesor data ca reveal the potetial relatioship betwee differet sesors which ca be used i may applicatios. However, the time series associatio rule miig algorithms usually produce rules much more tha expected. It s hardly to uderstad, preset or make use of the rules. So we eed to prue ad summarize the huge amout of rules. I this paper, a two-step pruig method is proposed to reduce both the umber ad redudacy i the large set of time series rules. Besides, we put forward the BIGBAR summarizig method to summarize the rules ad preset the results ituitively. Itroductio Rule discovery is oe of the cetral tasks of data miig [1]. Associatio rule miig has a capability to fid hidde correlatios amog differet items withi a data set [2]. Existig researches have proposed differet algorithms for miig associatio rules from time series data. However, the problem is the umber of discovered rules are too may ad the huge amout of rules may iclude may redudat rules. We ca hardly use those rules directly or preset the huge amout of rules to users. I this paper, we propose two methods to aalyze a large dataset of discovered time series associatio rules ad give a summary of the rules. Firstly, a pruig method of redudat rules has bee applied to cut dow the redudat rules ad the we itroduce BIGBAR, a bipartite graph based associatio rules summarizig method to summarize the rest of rules ad fid the iterestig rules. The rest of the paper are orgaized as follows. We itroduce the state-of-art methods for associate rules pruig ad summarizig i related work. I the method sectio, we explai the methods ad algorithms used i this paper. After that, we show the experimets ad results. Fially, we summarize our work ad explai the future work. Related Work Pruig methods ca be used to reduce the umber of rules ad elimiate isigificat rules. Iterestigess measure is a importat techique for pruig methods. H.Toivoe et al. [3] use cofidece as iterestigess measure ad Big Liu et al. [4] use the correlatio by testig the chi-square betwee rules. Szymo Jaroszewicz et al. [5] itroduced the maximum etropy priciple to pruig rules. I, S.Kaa et al. [2] give a detailed summary of more tha 40 iterestigess measures. I additio, the researcher or user ca defie the iterestig or redudat rules by themselves [6]. Aother techique is called close item set or rule cover. These works gave a subset of rules that ca cover all of the database trasactios or importat iformatio [7]. Usually, there are still may rules after pruig. Ad we eed to summarize the remaiig rules ad extract useful rules from them. It s commoly to use clusterig methods [8] to group these rules Copyright 2017, the Authors. Published by Atlatis Press. This is a ope access article uder the CC BY-NC licese (http://creativecommos.org/liceses/by-c/4.0/). 40
Advaces i Egieerig Research (AER), volume 131 ad give some represetative rules for each cluster. Besides, i [9], the paper itroduced a rule template method tryig to coclude the templates for differet types of rules. It s useful to preset the geeral idea of rules. Methods Pruig Redudat Rules I most cases, umber of redudat rules is sigificatly larger tha that of essetial rules [10]. It s ecessary to prue redudat rules before we use the rules or preset them to users. Whe we group time series associatio rules with the same left item or with the same right item, there are a lot of rules i the same group which is cofusig whe we wat to visualize rules or use them for predictio. Our proposed pruig method is based o the two cases. Let s start from the first case: pruig rules i the group of the same left items. For example, there are two rules mied from time series A ad B havig the same left item: [ o ] [ o ] p p Accordig to the defiitio of cofidece i, we ca get:, (1). (2) ppo p meas p happeed i time series A ad b happeed i time series B ad support p meas p happeed i time series A ad cb happeed i time series B withi a period time. It s easy to fid that ppo p ppo. So is greater tha or equals to. If equals to, we will defie is a redudat rule. For the same patter p i time series A, idicates that there will be a b i time series B with cofidece of but tells us there will be a cb with the same cofidece which gives us more iformatio. We choose to keep the rule which provides more iformatio. Aother case is pruig rules i the group with the same right item. For example, the two rules [ o ], ad are mied from time series A ad B: [ o ]. Accordig to the defiitio of cofidece, we ca get:, (3), (4) where * represets ay time series data. meas a follows ay data i time series A leads to t happeig i time series B which is equal to. This is differet from geeral associatio rules because a ad t happeed i differet time series. If equals to, we regard as a redudat rule because if there is a followed by * (ay data) i time series A, with the cofidece of we ca kow there will be a t i time series B, but if there is a a followed by b i time series A, the same cofidece we ca kow there will be a t i time series B. b i is redudat because it do t carry more iformatio. The formalized defiitios of the two cases are give below: Defiitio 1: For rules, with the same right item, if l m is a substrig of l m ad o o, is a redudat rule. Defiitio 2: For rules, with the same left item, if g m is a substrig of g m ad o o, is a redudat rule. 41
Advaces i Egieerig Research (AER), volume 131 BIGBAR Summarizig Method Pruig is oe way to cut dow redudat rules. However, there is o guaratee that the result of pruig ca be preseted to users because the umber of rules could still be very large ad hard to uderstad. What eeds to be doe ext is to aalyze ad summarize the rules ad extract useful iformatio. I this paper, we itroduce a ew method to fid the iterestig clusters of rules ad the extract iterestig rules withi each clusters. This method is bipartite graph based associatio rule (BIGBAR) summarizig method which presets the associatio rules i a bipartite graph. Bipartite graph has two idepedet sets of odes ad a set of edges liked betwee the two sets of odes as showed i Fig. 1.We deote oe set of odes as the clusters of left items ad the other set as the right items. The odes i the same set have o liks. Oe edge betwee two odes deotes there is at least oe rule whose left item is i oe ode, ad right item is i the other ode. We will record the average cofidece ad umber of rules o the edge. After we fiish this bipartite graph, we ca easily see how differet clusters of rules are distributed accordig to average cofidece ad umber of rules o each edges. s of right item G2 s of left item R1 (um=7, cof=0.67) R2 C2 C1 M2 G1 Y2 Fig. 1. Bipartite Graph of Rule Item Clustes Before we draw a bipartite graph, the first thig is to cluster the left items ad right items. We use hierarchical clusterig for this purpose: as it provide us a simple ad practical way to capture the similarity structure of the items. It combies two closest odes as oe cluster ad the ew cluster is cosidered as a ew ode. This process is doe iteratively util there is oe cluster of all the odes. Detailed iformatio of hierarchical clusterig is itroduced i [8]. The core of the algorithm is the defiitio of the distace betwee two items. Defiitio 3: The item distace betwee two items is defied as follows: m m ( ) (5) where dis() is the distace of items, lcs() is logest commo subsequece ad le() is the legth of sequece. We use the logest commo subsequeces to describe the similarity of the two items. We cosider the factor of the legth of the two items. Besides, the distace value should be smaller if they are closer. The secod step is to draw the bipartite graph. Algorithm 1 summarizes the process. Algorithm 1 BIGBAR Iput: Itemlist: leftclst, rightclst Rulelist: rules 42
Advaces i Egieerig Research (AER), volume 131 Output: a bipartite graph 1: for rule i rules 2: if rule.left i leftclst [i] ad rule.right i rightclst [j] 3: if edge liked with leftcluts [i] ad rightclst [j] 4: tempcof = edge.cof * edge.um 5: edge.um = edge.um + 1 6: edge.cof = (tempcof+rule.cof) / edge.um 7: else 8: draw a edge from leftclst [i] to rightclst [j] 9: edge.um = 1 10: edge.cof = rule.cof 11: ed if 12: ed if Whe the graph is built, we ca fid the iterestig rules from it. Before we explai how to fid out iterestig rules, we eed fid the iterestig rule clusters first. Defiitio 4: A rule cluster is a abstract rule whose left item is a cluster of left items ad right item is a cluster of right items. A rule cluster s cofidece is the average cofidece of all the rules i this cluster. The last thig is to fid iterestig rule clusters ad choose iterestig rules i each clusters. There are three measures that ca be cosidered to fid iterestig rule clusters: cofidece, umber of rules ad both. For selectig represetative rules, we ca simply choose rules with higher cofidece i each rule clusters. Experimet ad Results Our experimet data is a large set of time series associatio rules from [11]. There are total 198,405 associatio rules mied from time series data from 23 sesors deployed o differet parts of the idustrial machie icludig motors, coolers, pumps, drives ad taks. Firstly, we preprocess the rules ad prue the redudat rules from them. After pruig, we summarize the remaiig rules usig BIGBAR algorithm ad extract the iterestig rules i each rule clusters. We show the results of 5 differet rule sets i Table1. Each lie i Table1 show the result of oe rule set. The first item is the ame of time series pairs of the rules. For example, the first lie show the result of rules mied from P2 time series ad T2 time series. We prued 226 rules from total 332 rules. Usig hierarchical clusterig o left ad right items, we ca get 3 ad 5 clusters respectively. Fially, we extract 45 iterestig rules with higher cofidece after BIGBAR summarizig method. The pruig rate ad reducig rate (icludig pruig ad summarizig) are 68% ad 86% respectively. Besides the results of the above 5 rule sets. The last lie is the fial results of all the rule sets. We extract 21825 iterestig rules from 198405 rules ad the reducig rate is early 89%. 43
Advaces i Egieerig Research (AER), volume 131 Time series pair Total No. of rules Table 1. Results of pruig ad summarizig. No. of pruig rules Rules pruig rate No. of rule items clusters No. of iterestig rules Rules reducig rate [P2->T2] [332] [226] [68%] [Left=3, right=5] [45] [86%] [C2->C1] [352] [236] [67%] [Left=3, right=6] [54] [84%] [D2->C1] [520] [493] [95%] [Left=2, right=3] [18] [97%] [C1->D1] [100] [82] [82%] [Left=2, right=2] [12] [88%] [T1->P1] [1043] [490] [47%] [Left=6, right=6] [108] [90%] [All] [198,405] [130,947] [66%] [21,825] [89%] Summary Time series associatio rules miig leads us ito a ew world of associatio rules i big data field. However we are still facig may challeges. Oe of the biggest challeges is to uderstad the huge amout of discovered time series associatio rules. I this paper, we itroduced a two-step way to iterpret the huge amout of rules to be uderstadable. The first pruig step is to fid those rules that ca represet other rules or carry much iformatio tha other rules. The umber of rules ca be reduced a lot. The secod step is summarizig the remaiig rules usig bipartite graph based associatio rules summarizig method which ca show the distributio of the rule clusters ad summarize the iterestig rules. Time series associatio rules ca be mied betwee multiple time series. It s more complex to prue ad summarize the multi-item rules. This is a problem eeds to be solved i the future. Refereces [1] Liu, Big, Yimig Ma, ad Roie Lee. "Aalyzig the iterestigess of associatio rules from the time series dimesio." Data Miig, 2001. ICDM 2001, Proceedigs IEEE Iteratioal Coferece o. IEEE, 2001. [2] Liu, Big, Wye Hsu, ad Yimig Ma. "Pruig ad summarizig the discovered associatios." Proceedigs of the fifth ACM SIGKDD iteratioal coferece o Kowledge discovery ad data miig. ACM, 1999. [3] Toivoe, Hau, et al. "Pruig ad groupig discovered associatio rules." (1995). [4] Liu, Big, Wye Hsu, ad Yimig Ma. "Pruig ad summarizig the discovered associatios." Proceedigs of the fifth ACM SIGKDD iteratioal coferece o Kowledge discovery ad data miig. ACM, 1999. [5] Kaa, S., ad R. Bhaskara. "Associatio rule pruig based o iterestigess measures with clusterig." arxiv preprit arxiv:0912.1822 (2009). [6] Ashwii Batbarai1, Devishree Naidu2. Approach for Rule Pruig i Associatio Rule Miig for Removig Redudacy Iteratioal Joural of Iovative Research i Computer ad Commuicatio Egieerig.Vol. 2, Issue 5, May 2014. [7] Cristofor, Lauretiu, ad Da Simovici. "Geeratig a iformative cover for associatio rules." Data Miig, 2002. ICDM 2003. Proceedigs. 2002 IEEE Iteratioal Coferece o. IEEE, 2002. 44
Advaces i Egieerig Research (AER), volume 131 [8] Jorge, Alipio. "Hierarchical clusterig for thematic browsig ad summarizatio of large sets of associatio rules." Proceedigs of the 2004 SIAM Iteratioal Coferece o Data Miig. Society for Idustrial ad Applied Mathematics, 2004. [9] Klemettie, Mika, et al. "Fidig iterestig rules from large sets of discovered associatio rules." Proceedigs of the third iteratioal coferece o Iformatio ad kowledge maagemet. ACM, 1994. [10] Ashrafi, Mafruz Zama, David Taiar, ad Kate Smith. "A ew approach of elimiatig redudat associatio rules." Iteratioal Coferece o Database ad Expert Systems Applicatios. Spriger Berli Heidelberg, 2004. [11] Xue, Ruidog, et al. "Sesor time series associatio rule discovery based o modified discretizatio method." Computer Commuicatio ad the Iteret (ICCCI), 2016 IEEE Iteratioal Coferece o. IEEE, 2016. 45