Visualisation of Temporal Interval Association Rules

Similar documents
Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

Visualization Techniques to Explore Data Mining Results for Document Collections

MetaData for Database Mining

Discovering interesting rules from financial data

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Discovery of Association Rules in Temporal Databases 1

Value Added Association Rules

Improved Frequent Pattern Mining Algorithm with Indexing

Circle Graphs: New Visualization Tools for Text-Mining

Performance Based Study of Association Rule Algorithms On Voter DB

A mining method for tracking changes in temporal association rules from an encoded database

DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING

Finding Local and Periodic Association Rules from Fuzzy Temporal Data

COLLABORATIVE AGENT LEARNING USING HYBRID NEUROCOMPUTING

Discovering Periodic Patterns in System Logs

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

Database and Knowledge-Base Systems: Data Mining. Martin Ester

The Discovery and Retrieval of Temporal Rules in Interval Sequence Data

Mining Generalised Emerging Patterns

Mining Negative Rules using GRD

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Usability Evaluation as a Component of the OPEN Development Framework

Tadeusz Morzy, Maciej Zakrzewicz

A Statistical Approach to Rule Selection in Semantic Query Optimisation

Using EasyMiner API for Financial Data Analysis in the OpenBudgets.eu Project

DATABASE ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING ABSTRACT INTRODUCTION

Induction of Association Rules: Apriori Implementation

Mining Temporal Association Rules in Network Traffic Data

Association Rule Selection in a Data Mining Environment

Bachelor of Applied Finance (Financial Planning)

Mining of Web Server Logs using Extended Apriori Algorithm

Performance Analysis of Frequent Closed Itemset Mining: PEPP Scalability over CHARM, CLOSET+ and BIDE

TEMPORAL data mining is a research field of growing

Data Access Paths for Frequent Itemsets Discovery

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

A Literature Review of Modern Association Rule Mining Techniques

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Optimization using Ant Colony Algorithm

Association Rule Mining Using Revolution R for Market Basket Analysis

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

APD-A Tool for Identifying Behavioural Patterns Automatically from Clickstream Data

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Visualisation of Abstract Information

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

CEM Visualisation and Discovery in

Mining the optimal class association rule set

Association Rule Learning

IJMIE Volume 2, Issue 9 ISSN:

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

An Approach to Software Component Specification

Temporal Weighted Association Rule Mining for Classification

rule mining can be used to analyze the share price R 1 : When the prices of IBM and SUN go up, at 80% same day.

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

INTELLIGENT SUPERMARKET USING APRIORI

Dynamic Aggregation to Support Pattern Discovery: A case study with web logs

Featured Articles AI Services and Platforms A Practical Approach to Increasing Business Sophistication

Development of Efficient & Optimized Algorithm for Knowledge Discovery in Spatial Database Systems

Mixture models and frequent sets: combining global and local methods for 0 1 data

Real World Performance of Association Rule Algorithms

Reducing Redundancy in Characteristic Rule Discovery by Using IP-Techniques

Mining High Order Decision Rules

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Materialized Data Mining Views *

COLUMN. What attractive intranets look like. Intranets can t afford to be useful but ugly JULY Attractive and useful.

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches

Keyword AAA. National Archives of Australia

Performance Analysis of Data Mining Classification Techniques

Course on Data Mining ( )

Association Rules Mining:References

Interactive Visualization of Fuzzy Set Operations

Fundamentals of Design, Implementation, and Management Tenth Edition

A Beginners Guide to UML Part II

Mining Frequent Patterns with Counting Inference at Multiple Levels

Local Mining of Association Rules with Rule Schemas

Fuzzy Cognitive Maps application for Webmining

Marwan AL-Abed Abu-Zanona Department of Computer Information System Jerash University Amman, Jordan

Mining Quantitative Association Rules on Overlapped Intervals

2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

Maintenance of fast updated frequent pattern trees for record deletion

UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances

Stylus Studio Case Study: FIXML Working with Complex Message Sets Defined Using XML Schema

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Mining Temporal Indirect Associations

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Information mining and information retrieval : methods and applications

Applying Objective Interestingness Measures. in Data Mining Systems. Robert J. Hilderman and Howard J. Hamilton. Department of Computer Science

Online generation of profile association rules

PROJECT PERIODIC REPORT

Mining Association Rules in Temporal Document Collections

Discovering Periodic Patterns in Database Audit Trails

BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment

Visualisation of ATM network connectivity and topology

Transcription:

Visualisation of Temporal Interval Association Rules Chris P. Rainsford 1 and John F. Roddick 2 1 Defence Science and Technology Organisation, DSTO C3 Research Centre Fernhill Park, Canberra, 2600, Australia. chris.rainsford@dsto.defence.gov.au 2 School of Informatics and Engineering, Flinders University of South Australia GPO Box 2100, Adelaide 5001, Australia. roddick@cs.flinders.edu.au Abstract. Temporal intervals and the interaction of interval-based events are fundamental in many domains including medicine, commerce, computer security and various types of normalcy analysis. In order to learn from temporal interval data we have developed a temporal interval association rule algorithm. In this paper, we will provide a definition for temporal interval association rules and present our visualisation techniques for viewing them. Visualisation techniques are particularly important because the complexity and volume of knowledge that is discovered during data mining often makes it difficult to comprehend. We adopt a circular graph for visualising a set of associations that allows underlying patterns in the associations to be identified. To visualize temporal relationships, a parallel coordinate graph for displaying the temporal relationships has been developed. 1 Introduction In recent years data mining has emerged as a field of investigation concerned with automating the process of finding patterns within large volumes of data [9]. The results of data mining are often complex in their own right and visualisation has been widely employed as a technique for assisting users in seeing the underlying semantics [12]. In addition, mining from temporal data has received increased attention recently as it provides insight into the nature of changes in data [11]. Temporal intervals are inherent in nature and in many business domains that are modelled within information systems. In order to capture these semantics, we have developed an extension to the definition of association rules [1] to accommodate temporal interval data [10]. Association rules have been widely used as a data mining tool for market analysis, inference in medical data and product promotion. By extending these rules to accommodate temporal intervals, we allow users to find patterns that describe the interaction between events and intervals over time. For example, a financial services company may be interested to see the way in which certain products and portfolios are interrelated. Customers may initially purchase an insurance policy and then open an investment portfolio or superannuation fund with the same company. It may then be interesting to see which the customer terminates first. Likewise, a customer history may show that they have held investments in three

different investment funds. It may then be interesting to see if all three were held simultaneously, one following the other, or in some overlapping fashion. Looking for underlying trends and patterns in this type of behaviour is likely to be highly useful for analysts who are seeking to market these products, both to new investors and long term clients. In order to increase the comprehensibility of rules that describe such relationships, we have also developed two visualisation tools. The first tool uses a circular graph to display the underlying association rules. This allows the user to see patterns within the underlying associations. The second visualisation uses a parallel coordinate approach to present the temporal relationships that exist within the data in an easily comprehendible format. Importantly, both of these visualisation techniques are capable of displaying large numbers of rules and can be easily represented in a fixed two-dimensional format that can be easily reproduced on paper or other media. In the next section we will provide a definition for temporal interval association rules. Section 3 discusses our association rule visualisation tool. Section 4 then describes our temporal relationship visualiser. A conclusion is provided in Section 5. 2 Temporal Interval Association Rules We define a temporal interval association rule to be a conventional association rule that includes a conjunction of one or more temporal relationships between items in the antecedent or consequent. Building upon the original formalism in [1] temporal interval association rules can be defined as follows: Let I = I 1, I 2,...,I m be a set of binary attributes or items and T be a database of tuples. Association rules were first proposed for use within transaction databases, where each transaction t is recorded with a corresponding tuple. Hence attributes represented items and were limited to a binary domain where t(k) = 1 indicated that the item I k was positive in that case (for example, had been purchased as part of the transaction, observed in that individual, etc.), and t(k) = 0 indicated that it had not. Temporal attributes are defined as attributes with associated temporal points or intervals that record the time for which the item or attribute was valid in the modeled domain. Let X be a set of some attributes in I. It can be said that a transaction t satisfies X if, for all attributes I k in X, t(k) = 1. Consider a conjunction of binary temporal predicates P 1 P n defined on attributes contained in either X or Y where n 0. Then by a temporal association rule, we mean an implication of the form X Y P 1... P n, where X, the antecedent, is a set of attributes in I and Y, the consequent, is a set of attributes in I that are not present in X. The rule X Y P 1... P n is satisfied in the set of transactions T with the confidence factor 0 c 1 iff at least c% of transactions in T that satisfy X also satisfy Y. Likewise each predicate P i is satisfied with a temporal confidence factor of 0 tc Pi 1 iff at least tc% of transactions in T that satisfy X and Y also satisfy P i. The notation X Y c P 1 tc tc P n tc is adopted to specify that the rule X Y P 1 P n has a confidence factor of c and temporal confidence factor of tc. As an illustration consider the following simple example rule: policyz investx,producty 0.79 during(investx,policyz) 0.75 before(producty,investx) 0.81

This rule can be read as follows: The purchase of investment X and product Y are associated with insurance policy Z with a confidence factor of 0.79. The investment in X occurs during the period of policy Z with a temporal confidence factor of 0.75 and the purchase of product Y occurs before investment X with a temporal confidence factor of 0.81 Binary temporal predicates are defined using Allen s thirteen interval-based relationships between two intervals. A thorough description of these relationships can be found in [2]. We also use the neighborhood relationships defined by Freksa that allow generalisation of relationships [5]. A detailed description of our learning algorithm is beyond the scope of this paper and readers are referred to [10]. 3 Visualising Associations Finding patterns within the temporal interval associations may be assisted with the use of visualisation techniques. This is particularly important where the number of association rules is found to be large and the discovery of underlying patterns by inspection is not possible. For this purpose we have devised two separate visualisation techniques. The first can be used to visualise any association rule and the second is specific to temporal associations. The visualisation of sets of association rules has been addressed in a number of different ways. One approach has been to draw connected graphs [6]. However, if the number of rules is large this approach involves a complex layout process that needs to be optimised in order to avoid cluttering the graph. An elegant threedimensional model is provided in the MineSet software tool [4]. We have chosen to develop a visualisation that can handle a large volume of associations and that can be easily reproduced in two-dimensions, e.g. as a paper document, or an overhead projection slide. In addition, it provides an at-a-glance view of the data that does not need to be navigated and explored to be fully understood. This approach complements the approaches of others and is more applicable in some circumstances. We have adopted a circular graph layout where items involved in rules are mapped around the circumference of a circle, see Figure 1. Associations are then plotted as lines connecting these points, where a gradient in the colour of the line, from blue(dark) to yellow(light) indicates the direction of the association from the antecedent to the consequent. A green line highlights associations that are bidirectional and this allows bi-directional relationships to be immediately identified. Circular graph layouts have been successfully used in several other data mining applications, including Netmap [3],[8]. A key characteristic of this type of visualization is its ability to display large volumes of information. The circle graph gives an intuitive feel for patterns within the underlying data. For example, items that have several other items associated with them will have a number of blue lines leaving their node on the circle. These items may be selected for marketing to attract new clients, because it is likely that the clients will also purchase other items or services as part of an overall basket. Note however that no temporal information is

provided in this graph. In cases where the number of items is large, concept ascension may be employed to reduce complexity. Fig. 1. A screenshot of our association rule visualisation window 4 Visualising Temporal Associations Our first visualisation does not display the details of discovered temporal relationships between items in the association rules. In order to display this information it has been necessary to develop a new visualization tool. We have developed a simple visualisation technique based upon parallel coordinate visualisation. Parallel coordinate visualization has been used successfully in other data mining tools to display large volumes of data [7]. A screenshot of this visualisation is depicted in Figure 2. We start by plotting all of the items on the righthand side of temporal predicates along a vertical axis. The items on the left-hand side of the temporal predicate are plotted along an axis on the opposite side of the screen with the labels for the thirty temporal relationships we have adopted lined along a central vertical axis. The temporal relationships can be seen as semi-ordered based

upon the location of two intervals with respect to each other along the time axis. Using simple heuristics we have imposed an artificial ordering upon the relationships in order to allow them to be represented meaningfully along a single line. We then draw lines between items that have a temporal relationship and the lines intersect the central axis at the point that corresponds to the nature of that relationship. The lines are coloured to reflect the temporal confidence associated with the underlying relationship. Fig. 2. A screenshot of our temporal interval visualisation window. Based upon this visualisation it is possible to quickly determine patterns within the data. For example, a financial services company may seek to identify marketing opportunities for items to its current clients. By looking for items on the right-hand side of the graph, that are connected via lines that run predominately through the top half of the temporal relationship line (corresponding to items purchased after the item on the left hand side). The market analyst may then seek to market these services to holders of the connected items on the left-hand side of the graph. The strongest such correlations can be identified based upon the colour of the line which indicates the confidence of the relationship. The colour of these lines can be observed to quickly estimate the strength of relationships.

5 Summary In this paper we have detailed two visualisation techniques to support the analysis of temporal interval association rules. These techniques are designed to allow a rapid understanding of patterns existing within large sets of rules. The first technique is a circular association rule graph that displays patterns within association rules. The second technique is based upon a parallel coordinate visualisation and it displays the temporal interval relationships between items. Both of these techniques have been successfully used for other data mining applications. Importantly, they are able to handle high volumes of data in a way that still allows users to find underlying patterns. These two techniques are simple and can be represented in two dimensions so that they can be easily reproduced. Research at both DSTO and at Flinders University is continuing and we plan to further refine these techniques and to examine their scalability to larger datasets. References 1. Agrawal, A., Imielinski, T., Swami, A. Mining Association Rules between Sets of Items in Large Databases. International Conference on Management of Data (SIGMOD 93), May (1993) 207-216. 2. Allen, J. F.: Maintaining knowledge about temporal intervals. Communications of the ACM Vol 26. No.11 (1983). 3. Aumann Y., Feldman R., Yehuda Y.B., Landau D., Liphstat O., Schler Y. Circle Graphs: New Visualization Tools for Text-Mining. in The 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, (PKDD-99). Prague, Czech Republic, September 15-18 (1999). 4. Brunk, C. K., J. Kohavi, R. MineSet An Integrated System for Data Mining. Third International Conference on Knowledge Discovery and Data Mining (KDD 97), Newport Beach, California, AAAI Press. August 14-17, (1997). 135-138. 5. Freksa, C. Temporal reasoning based on semi-intervals. Artificial Intelligence 54, (1992) 199-227. 6. Klemettinen, M., Mannila H., Ronkainen, P., Toivonen H., Verkamo, A.I. Finding interesting Rules from Large Sets of Discovered Association Rules. Third International Conference on Information and Knowledge Management, Gaithersburg, Maryland, ACM Press. (1994). 7. Lee, H., Ong, H., Sodhi, K.S. Visual Data Exploration. The 3rd International Applied Statistics Conference, Dallas, Texas.(1995). 8. Netmap Technical Manual. The Technical Manual for the Netmap System. Netmap Solutions Pty Ltd, North Sydney NSW, Australia. (1994). 9. Piatetsky-Shapiro, G. and W. Frawley, J., Eds. Knowledge Discovery in Databases. Menlo park, California, AAAI Press, (1991). 10. Rainsford C.P., Roddick J.F. Adding Temporal Semantics to Association Rules. in The 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, (PKDD-99). Prague, Czech Republic, September 15-18 (1999). 11. Roddick, J. F. and M. Spiliopoulou. A Survey of Temporal Knowledge Discovery Paradigms and Methods. IEEE Transactions on Knowledge and Data Engineering, to appear. (2000). 12. Tattersall, G. D. and P. R. Limb. Visulisation Techniques for Data Mining., BT Technol Journal 12(4).(1994).