Non-Dominated Bi-Objective Genetic Mining Algorithm

Similar documents
Reality Mining Via Process Mining

Reality Mining Via Process Mining

ProM 6: The Process Mining Toolkit

Towards Process Instances Building for Spaghetti Processes

Discovering and Navigating a Collection of Process Models using Multiple Quality Dimensions

Data Streams in ProM 6: A Single-Node Architecture

Dierencegraph - A ProM Plugin for Calculating and Visualizing Dierences between Processes

Genetic Process Mining: A Basic Approach and its Challenges

The Multi-perspective Process Explorer

ProM 4.0: Comprehensive Support for Real Process Analysis

Dealing with Artifact-Centric Systems: a Process Mining Approach

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design

Mining with Eve - Process Discovery and Event Structures

The multi-perspective process explorer

Online Conformance Checking for Petri Nets and Event Streams

The ProM Framework: A New Era in Process Mining Tool Support

Mining CPN Models. Discovering Process Models with Data from Event Logs. A. Rozinat, R.S. Mans, and W.M.P. van der Aalst

A Probabilistic Approach for Process Mining in Incomplete and Noisy Logs

Multi-phase Process mining: Building Instance Graphs

Discovering Concurrency Learning (Business) Process Models from Examples

Towards Improving the Representational Bias of Process Mining

An Improved Pseudo-parallel Genetic Algorithm for Process Mining

Published in: Petri Nets and Other Models of Concurrency - ICATPN 2007 (28th International Conference, Siedcle, Poland, June 25-29, 2007)

Lamarckian Repair and Darwinian Repair in EMO Algorithms for Multiobjective 0/1 Knapsack Problems

QUALITY DIMENSIONS IN PROCESS DISCOVERY: THE IMPORTANCE OF FITNESS, PRECISION, GENERALIZATION AND SIMPLICITY

Online Conformance Checking for Petri Nets and Event Streams

Process Mining Based on Clustering: A Quest for Precision

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach

BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models

Multi-objective Optimization

Genetic Process Mining

Recombination of Similar Parents in EMO Algorithms

Process Mining: Using CPN Tools to Create Test Logs for Mining Algorithms

Decomposed Process Mining with DivideAndConquer

Process Mining Discovering Workflow Models from Event-Based Data

Incorporation of Scalarizing Fitness Functions into Evolutionary Multiobjective Optimization Algorithms

SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2

Process Mining in the Context of Web Services

Process Mining Put Into Context

Conformance Checking of Processes Based on Monitoring Real Behavior

Mining Process Performance from Event Logs

2015 International Conference on Computer, Control, Informatics and Its Applications

Measuring the Precision of Multi-perspective Process Models

A Simulation-Based Approach to Process Conformance

Process Mining Based on Clustering: A Quest for Precision

Evolutionary Algorithms: Lecture 4. Department of Cybernetics, CTU Prague.

Interoperability in the ProM Framework

Intra- and Inter-Organizational Process Mining: Discovering Processes Within and Between Organizations

THE SELECTION OF THE ARCHITECTURE OF ELECTRONIC SERVICE CONSIDERING THE PROCESS FLOW

Analysis of BPMN Models

Multidimensional Process Mining with PMCube Explorer

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

Data- and Resource-Aware Conformance Checking of Business Processes

A Similarity-Based Mating Scheme for Evolutionary Multiobjective Optimization

Decomposed Process Mining: The ILP Case

Multi-objective Optimization

Solving Multi-objective Optimisation Problems Using the Potential Pareto Regions Evolutionary Algorithm

Flexible evolutionary algorithms for mining structured process models Buijs, J.C.A.M.

Conformance Testing: Measuring the Fit and Appropriateness of Event Logs and Process Models

Process Discovery: Capturing the Invisible

A genetic algorithms approach to optimization parameter space of Geant-V prototype

Business Intelligence & Process Modelling

XES, XESame, and ProM 6

Change Your History: Learning from Event Logs to Improve Processes

Mechanical Component Design for Multiple Objectives Using Elitist Non-Dominated Sorting GA

Evolutionary Multi-objective Optimization of Business Process Designs with Pre-processing

Multi-Objective Pipe Smoothing Genetic Algorithm For Water Distribution Network Design

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

An Evolutionary Multi-Objective Crowding Algorithm (EMOCA): Benchmark Test Function Results

Process Mining Tutorial

A Distance Metric for Evolutionary Many-Objective Optimization Algorithms Using User-Preferences

NEW DECISION MAKER MODEL FOR MULTIOBJECTIVE OPTIMIZATION INTERACTIVE METHODS

Bidimensional Process Discovery for Mining BPMN Models

Conformance Checking Evaluation of Process Discovery Using Modified Alpha++ Miner Algorithm

Multi-Objective Optimization using Evolutionary Algorithms

Part II Workflow discovery algorithms

Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms

Scientific Workflows for Process Mining: Building Blocks, Scenarios, and Implementation

Effects of Three-Objective Genetic Rule Selection on the Generalization Ability of Fuzzy Rule-Based Systems

Flexab Flexible Business Process Model Abstraction

Multi-Objective Optimization using Evolutionary Algorithms

Hierarchical Clustering of Process Schemas

An Analysis on Selection for High-Resolution Approximations in Many-Objective Optimization

Evolutionary multi-objective algorithm design issues

Genetic-PSO Fuzzy Data Mining With Divide and Conquer Strategy

Finding Sets of Non-Dominated Solutions with High Spread and Well-Balanced Distribution using Generalized Strength Pareto Evolutionary Algorithm

APD tool: Mining Anomalous Patterns from Event Logs

FAULT-TOLERANCE AWARE MULTI OBJECTIVE SCHEDULING ALGORITHM FOR TASK SCHEDULING IN COMPUTATIONAL GRID

Performance Evaluation of Vector Evaluated Gravitational Search Algorithm II

Discovering Hierarchical Process Models Using ProM

Approximation-Guided Evolutionary Multi-Objective Optimization

Eindhoven University of Technology MASTER. Process mining in PeopleSoft. Ramesh, A. Award date: 2006

International Conference on Computer Applications in Shipbuilding (ICCAS-2009) Shanghai, China Vol.2, pp

Multi-objective Optimization Algorithm based on Magnetotactic Bacterium

Multiobjective RBFNNs Designer for Function Approximation: An Application for Mineral Reduction

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Improved Crowding Distance for NSGA-II

Vidushi: Parallel Implementation of Alpha Miner Algorithm and Performance Analysis on CPU and GPU Architecture

Towards Automated Process Modeling based on BPMN Diagram Composition

A Multi-Objective Evolutionary Approach to Pareto Optimal Model Trees. A preliminary study

Transcription:

Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1607-1614 Research India Publications http://www.ripublication.com Non-Dominated Bi-Objective Genetic Mining Algorithm Sonia 1, Aarti Purwaha 1 and Aashu Rani 1 1 Department of Computer Science, University of Delhi, New Delhi, India Abstract Process mining generates process model from event logs and used to connect data mining techniques to process modelling, analysis, simulation etc. Many process mining techniques are used to generate model all create a single model. In this work bi-objective genetic process mining algorithm has been used to generate multiple models also called as non-dominated solutions. All of these models are equally good. Keywords: Process models, Petri-nets, Process discovery, Optimization, Genetic Mining. 1. INTRODUCTION Data generated from real world processes are stored in the form of event logs [3]. Event log is a collection of traces where each trace is a sequence of events and each event has an activity, a case, and the time at which the event occurred. Process models are in the form of BPMN, Petri-nets, EPC, or UML diagram. Most widely used graphical notation is Petri-net containing a place, an activity, and arc which shows the precedence, concurrency, or occurrence of events [8]. There are three types of process mining namely: Process discovery [2], the most prominent technique which discovers model from an event log with no a-priori information; Conformance checking, which shows the deviations between process model and the event log and Enhancement technique that is used to improve the process model with the help of information present about the process in the event log [1]. There are various perspectives being focused upon by process mining. The controlflow perspective discovers a combination of all possible paths having order of activities. The organizational perspective is the discovery of social networks, resource

1608 Sonia, Aarti Purwaha and Aashu Rani behavior and organizational structures. It also focuses on the hidden information (i.e., which actors are involved and how are they related) present in the log. The case perspective is concerned with properties of cases (i.e., values of corresponding data elements, path in the process or actors working on the process). The time perspective focuses on the timing and frequency of events. With existence of event log where each event refers to a case, activity and a point in time, it is possible to discover bottlenecks, monitor the utilization of resources, predict the remaining processing time of running cases, and measure service levels. Instead of being a specific type of data mining, process mining is also a missing link between traditional model-driven BPM and data mining. Traditional data mining techniques are not process centric and do not address process discovery, conformance checking, and enhancement. Concurrency and end-to-end process models are important for process mining. Process mining is useful for both offline and online analysis. For example completion time of an event can be predicted using discovered process models. 2. PROCESS DISCOVERY ALGORITHMS There are various process discovery algorithms [2]; Deterministic mining algorithms produce the same result for same input by considering the timestamp and calculate the control flow of events in the log (i.e. α-algorithm, α + -Algorithm). Heuristic mining [4] incorporates frequencies of traces and events with deterministic mining to reconstruct a process model. In this algorithm model produced from highly complex real processes are very complex and hard to understand, this algorithm neglects the infrequent paths while constructing a process model. Genetic mining algorithm [9] is not deterministic and uses the natural evolutionary approach by generating a random population of process models. The solution is produced in four steps (initialization, selection, reproduction and termination). The individuals are iteratively selected from the population and reproduced using crossover and mutation over different generations. Better fitted models are generated due to selection and reproduction. Various process mining tools [1] are used to generate models from the event log such as ProM, Disco, RapidProM etc. ProM supports a wide variety of plug-ins and aims to cover whole process mining spectrum. It supports many notations such as petri-nets, transition systems, C-nets, fuzzy models, BPMN, Declare, etc. This tool is very powerful, extendible, and supports conformance checking and operational support. Disco has powerful filtering capabilities for comparative process mining and adhoc checking of patterns. This tool focuses on discovery and performance analysis including animation and uses the variant of fuzzy models. This tool does not support conformance checking and operational support. RapidProM can be used to combine process mining with data mining, text mining etc. as it is an extension of ProM

Non-Dominated Bi-Objective Genetic Mining Algorithm 1609 framework with RapidMiner. This tool can be used to repeat the process mining analyses (scientific experiments, reusable macros). The quality of process model is based on the four quality dimensions (Generalization, Completeness, Simplicity, and Preciseness) [9]. Completeness tells that the traces in the event log can be reproduced with the mined model. Preciseness is used to describe under fitted behavior in log while generalization avoids overfitting. Simplicity describes the complexity of the model. All these four qualities are equally important to find a good process model. Completeness is necessary to find the fitness of a model so we use it as one of the measure to make sets for optimizing the objectives namely completeness-generalization, completeness-simplicity, and completeness-preciseness. In this work we have used multi-objective genetic mining optimization algorithm to find the Pareto front solutions [5]. The models generated from multi-objective algorithms are equally good. There are many evolutionary algorithms such as NSGA, PAES, SPEA, SPEA-II but we used NSGA-II to find mutually non-dominated solutions as NSGA-II is more efficient than other algorithms [6] and SPEA-II takes longer time for large event logs. 3. METHODOLOGY Initial population was randomly generated and evaluated to find the causal relations between different activities in an event log, then Non-dominated sorting algorithm [6] [7] (see algorithm 1) is used to find the individuals to which any individual (i) dominates and the number of individuals that dominate this individual (i). Algorithm 1: Non-Dominated Sorting Algorithm

1610 Sonia, Aarti Purwaha and Aashu Rani Then we select the individuals on the basis of binary tournament selection (see algorithm 2) and selected individuals under goes cross-over and mutation and then this mutated population is combined with the original population to give the best individuals(elitism) for the next iteration. After every iteration, first front stores the individuals which are non-dominated (explained in Fig.1). Algorithm 2: Binary Tournament Selection

Non-Dominated Bi-Objective Genetic Mining Algorithm 1611 Fig.1: NSGA-II bi-objective genetic mining algorithm. 4. RESULTS The algorithm was run in matlab-2014a for 100 iterations and for a population size of 100. We used data from real business processes BPI2013

1612 Sonia, Aarti Purwaha and Aashu Rani (http://www.win.tue.nl/bpi/doku.php?id=2013:challenge). This data set contains 13 activities, 65533 events and 7554 traces. As the number of activities increases in the log time required for convergence of the algorithm increases. For large event logs it is very hard to see the relation between different activities. Process mining helps in analyzing the data with the help of generated model and NSGA-II generates more than one model so user can choose out of them. Fig. 2, 3 and 4 shows that the nondominated solutions 16, 5 and 9 are obtained from bi-objective sets (completenesssimplicity, completeness-preciseness, and completeness-generalization resp.) Fig. 2: Non-dominated Solutions with objectives Completeness-simplicity Fig. 3: Non-dominated Solutions with objectives Completeness-Preciseness

Non-Dominated Bi-Objective Genetic Mining Algorithm 1613 Fig. 4: Non-Dominated Solutions with objectives Completeness-generalization 5. CONCLUSION The experimentation has demonstrated the effectiveness of the multi-objective evolutionary approach. According to the user requirement based on the features the algorithm produces more than one model, so the user has option to select from the models which are equally good. But we face many limitations and challenges while discovering process models for the event logs involving large volumes and high velocity of event log data as the process discovering task becomes computationally intensive. So in future we want to parallelize process mining algorithms. REFERENCES [1] Van der Aalst, Wil MP. Process mining: data science in action. Springer, 2016 [2] J.E. Cook, A.L. Wolf: Discovering models of software processes from eventbased data. ACM Trans. Softw. Eng. Methodol. (TOSEM) 7 (3), 215249, 1998. [3] Wil Van der Aalst, Ton Weijters, and Laura Maruster: Work-flow mining: Discovering process models from event logs. Knowledge and Data Engineering, IEEE Transactions on, 16(9):1128-1142, 2004. [4] AJMM Weijters, Wil MP van Der Aalst, and AK Alves De Medeiros: Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, 166:1-34, 2006. [5] K. Deb: Multi-Objective Optimization. In E.K. Burke and G. Kendall, editors,

1614 Sonia, Aarti Purwaha and Aashu Rani Search Methodologies, pages 273-316. Springer US, 2005. [6] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. Lecture notes in computer science, 1917:849-858, 2000. [7] N. Srinivas and K. Deb: Multi-objective optimization using non-dominated sorting in genetic algorithms, Evolutionary Computation, vol. 2, no. 3, 221248, Fall 1994. [8] T. Murata: Petri nets: properties, analysis and applications, Proc. IEEE 77 (4), 541580, 1989. [9] A. de Medeiros: Genetic Process Mining, Ph.D. thesis, Technische Universiteit Eindhoven, 2006.