Part II Workflow discovery algorithms

Similar documents
Types of Process Mining

Bidimensional Process Discovery for Mining BPMN Models

Introduction. Process Mining post-execution analysis Process Simulation what-if analysis

Process Mining. CS565 - Business Process & Workflow Management Systems. University of Crete, Computer Science Department

Reality Mining Via Process Mining

Lezione 14 Model Transformations for BP Analysis and Execution

Business Process Modelling

Process Mining Discovering Workflow Models from Event-Based Data

Refactor Business Process Models with Maximized Parallelism

Non-Dominated Bi-Objective Genetic Mining Algorithm

Business Process Management

Applicability of Process Mining Techniques in Business Environments

Artifact lifecycle discovery

Towards Process Instances Building for Spaghetti Processes

Semantics of ARIS Model

Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets)

Process Discovery: Capturing the Invisible

Reality Mining Via Process Mining

Mining Conditional Partial Order Graphs from Event Logs

3. Data Preprocessing. 3.1 Introduction

Process Mining Based on Clustering: A Quest for Precision

Discovering Concurrency Learning (Business) Process Models from Examples

Conformance Checking Evaluation of Process Discovery Using Modified Alpha++ Miner Algorithm

Gestão e Tratamento da Informação

Genetic Process Mining: A Basic Approach and its Challenges

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say

Ontologies & Business Process modeling languages: two proposals for a fruitful pairing

A Probabilistic Approach for Process Mining in Incomplete and Noisy Logs

The Difficulty of Replacing an Inclusive OR-Join

2. Data Preprocessing

Chapter Seven: Regular Expressions

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Business Processes Modelling MPB (6 cfu, 295AA)

Mining with Eve - Process Discovery and Event Structures

Formal Process Modelling

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Process Mining Based on Clustering: A Quest for Precision

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 16th January 2014 Time: 09:45-11:45. Please answer BOTH Questions

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Multi-phase Process mining: Building Instance Graphs

Mobility Data Management & Exploration

Conformance Checking of Processes Based on Monitoring Real Behavior

2 Discrete Dynamic Systems

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Defining a Data Mining Task. CSE3212 Data Mining. What to be mined? Or the Approaches. Task-relevant Data. Estimation.

Supervised and Unsupervised Learning (II)

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Enabling Flexibility in Process-Aware

Relational Databases

Ulmer Informatik-Berichte. A Formal Semantics of Time Patterns for Process-aware Information Systems. Andreas Lanz, Manfred Reichert, Barbara Weber

CHAPTER 5 GENERATING TEST SCENARIOS AND TEST CASES FROM AN EVENT-FLOW MODEL

Multidimensional process discovery

Lecture 11: May 1, 2000

For example, in an assembly sub-floor technicians engaged in making a product,

Discovering Expressive Process Models by Clustering Log Traces

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

The Multi-perspective Process Explorer

Data Preprocessing. Slides by: Shree Jaswal

ProM 4.0: Comprehensive Support for Real Process Analysis

Dynamic Skipping and Blocking, Dead Path Elimination for Cyclic Workflows, and a Local Semantics for Inclusive Gateways

QUALITY DIMENSIONS IN PROCESS DISCOVERY: THE IMPORTANCE OF FITNESS, PRECISION, GENERALIZATION AND SIMPLICITY

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Functional Dependencies and Finding a Minimal Cover

Introduction to Data Mining

Hierarchical Clustering of Process Schemas

Document Clustering: Comparison of Similarity Measures

Schema Refinement and Normal Forms

Using Diposets to Model Concurrent Systems

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Vidushi: Parallel Implementation of Alpha Miner Algorithm and Performance Analysis on CPU and GPU Architecture

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Formal Verification of Business Process Configuration within a Cloud Environment

Genetic Process Mining

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

Lecture 4 Introduction to Data Flow Analysis

Process mining using BPMN: relating event logs and process models Kalenkova, A.A.; van der Aalst, W.M.P.; Lomazova, I.A.; Rubin, V.A.

Basic operators: selection, projection, cross product, union, difference,

Relational Model, Relational Algebra, and SQL

6. Relational Algebra (Part II)

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

MODELLING AND SOLVING SCHEDULING PROBLEMS USING CONSTRAINT PROGRAMMING

Data Mining 4. Cluster Analysis

Languages and Compilers

Temporal Databases. Week 8. (Chapter 22 & Notes)

UniLFS: A Unifying Logical Framework for Service Modeling and Contracting

Estimating the Quality of Databases

Preserving correctness during business process model configuration

OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS. John R. Clymer

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Chapter 2: Intro to Relational Model

Faster Or-join Enactment for BPMN 2.0

sflow: Towards Resource-Efficient and Agile Service Federation in Service Overlay Networks

Transcription:

Process Mining Part II Workflow discovery algorithms Induction of Control-Flow Graphs α-algorithm Heuristic Miner Fuzzy Miner

Outline Part I Introduction to Process Mining Context, motivation and goal General characteristics of the analyzed processes and logs Classification of Process Mining approaches Part II Workflow discovery Induction of basic Control Flow graphs Other approaches (α-algorithm, Heuristic Miner, Fuzzy mining) Part III Beyond control-flow mining Organizational mining Social net mining Extension algorithms Part IV Evaluation and validation of discovered models Conformance Check Log-based property verification Part V Clustering-based Process Mining Discovery of hierarchical workflow models Discovery of process taxonomies Outlier detection

Start Control-flow discovery Process Model Register order Prepare shipment (Re)send bill Ship goods Contact customer Receive payment Archive order End Organizational Model Social Network

Workflow (control flow) discovery Input: execution data of a process P (possibly unknown) log: a list of traces In the simplest case each trace just registers the sequence of tasks performed during one execution of P Output: a schema for process P captures the P s behavior, by representing all the ways its tasks are executed Usefulness of mined models? process P process enactments log recording abcdfg abcfd abcdfe. log data Workflow Discovery Workflow Schema (Process Model) for P Help better comprehend process behavior Support process (re)-design (What is the process?) Delta analysis (Are we doing what was specified?) Process Design is often a complex and time consuming task Sometimes, a fully-specified model is not available for the process

Representation of mined models A plethora of meta-models for representing workflow models Block-structured languages, Petri Nets, Logics, Process Algebra, Graph-based languages are a reasonable choice w.r.t. expressiveness, complexity and comprehensibility Most approaches derive some kind of graph over the tasks Few exceptions use alternative techniques (e.g., grammar induction, term rewriting) A simple formalism: Control Flow Graph Intuitively specifies which execution flows are allowed across the tasks A labeled, directed graph Each node corresponds to a task (and vice-versa) Each arc represents a (temporal) precedence between two tasks Cardinality constraints further (locally) restricts the possible execution flows

Control Flow Graph (CFG) models A CFG schema W for P is a tuple <A, E, a 0, A F, Fork, Join> where: Α is a finite set of activities (also nodes or tasks); E (Α Α F )x(α {a 0 }) is an acyclic relation of precedence among activities; a 0 Α is the starting activity, A F Αis the set of final activities; Local constraints are expressed through the functions Fork:(Α Α F )α{and, OR, XOR} and Join: (Α {a 0 })α{and, OR} Example: a CFG for the toy process Order Management a rec eive order AND b authenticate client c check sto c k XOR f register client OR XOR OR d ask suppliers i client relia bility g validate order plan XOR AND XOR l accept order OR h XOR decline order OR o fast dispatch m fidelity discount XOR OR n prepare bill abficgln, acbidpegln abficdgh

CFG models: executions schema W Instance of W : Connected sub-graph of S s CFG, containing at least the starting activity and one final activity, compliant with the constraints Trace of the process P: A sequence of P s tasks A trace s is compliant with the schema W if there is at least an instance Iw of W such that s is a topological order of Iw Es: the trace abfcgh is compliant with the instance, while the traces afbcgh and afblm are not

Conformance of a CFG schema w.r.t. a log Two criteria to compare a (mined) model W with a given log L: Completeness: the percentage of traces in the log that are compliant with W the larger the more complete Soundness: the percentage of traces that can be generated from W that actually occur in the log the larger the sounder.

CFG conformance: Example Schema W Admitted Instances = 20; Modeled Traces = 276. soundness({w, L})=16/276 =5.797% Log L completeness({w, L})=16/16 =100%

Example: a way to get higher soundness W 1 W 2 Modeled Traces = 64 Modeled Traces = 33? Considered trace Log (L) s 8,,12 comply with W 1 W 2 soundness( W 1 W 2,L)=11/97=11.34% completeness (W 1 W 2,L)= 11/16=68.75%

Other representation languages: Petri nets Sequence Splits Joins Loops Non-Free Choice Invisible Tasks Duplicate Tasks Travel by Train Ask Question Get Ready Travel by Train Ask Question Defense Ends Have Drinks Go Home Get Ready Defence Starts Give a Talk Defence Ends Have Drinks Go Home Travel by Car Travel by Car Defense Starts Give a Talk Travel by Train Travel by Train Pay for Parking Pay Parking Travel by Car Travel by Car

Other representation languages: Petri nets Sequence Splits Joins Loops Non-Free Choice Invisible Tasks Duplicate Tasks Travel by Train Ask Question Get Ready Travel by Train Ask Question Defense Ends Have Drinks Go Home Get Ready Defence Starts Give a Talk Defence Ends Have Drinks Go Home Travel by Car Travel by Car Defense Starts Give a Talk + noise in logs! Travel by Train Travel by Train Pay for Parking Travel by Car Pay Parking Travel by Car

Toy example: paper reviewing Event log: processes Per event: activity name (event type) (originator) (timestamp) (data) process instances events

A discovered Petri net model (α-algorithm)

Other workflow languages: EPCs EPC= Event Driven Process Chain An EPC consists of three kinds of elements, which define the flow of a business process as a chain of events. Functions: A function corresponds to an activity (task, process step) which needs to be executed. Events: Events describe the situation before and/or after a function is executed. Connectors: There are three types of connectors: ^ (and), X (xor) and V (or). Functions, events and connectors can be connected with edges in such a way that the following rules apply: Events have at most one incoming edge and at most one outgoing edge. Functions have precisely one incoming edge and precisely one outgoing edge. Connectors have either one incoming edge and multiple outgoing edges, or multiple incoming edges and one outgoing edge. In every path, functions and events alternate. No two functions are connected and no two events are connected, not even when there are connectors in between.

EPC model (SAP,ARIS, etc)

EPC model (SAP,ARIS, etc)

Workflow discovery algorithms: the case of CFG models s 1 : acdbfgih s 5 : abicglmn s 9 : abficgln s 13 : abcidglmn s 2 : abficdgh s 6 : acbiglon s 3 : acgbfih s 4 : abcgiln Event Log s 10 : acgbfilon s 14 : acdbiglmn s 7 : acbgilomn s 11 : abcfdigln s 15 : abcdgilmn s 8 : abcfgilon s 12 : acdbfigln s 16 : acbidgln Basic induction scheme 1. Mine a Dependency Graph encoding a minimal set of precedence links 2. Mine a set of cardinality (local) constraints, based on simple statistics 3. Introduce support thresholds to handle noisy data Mined Model

Dependency graph Dependency graph for a log L is a graph D L =<A,E> such that E={ (a, b) s L, i {1,..., length(s)-1} s.t. a=s[i] b=s[i+1]}; Parallel activities Two activities a and b are parallel in L, if they occur in some cycle of D L Precedence The activity a precedes b in L, denoted with a b, if a and b are not parallel and there is a path from a to b in D L Example: Log L={abcde, adbce, ae} a, b and c are parallel activities in L; a b; Dependency graph b e;

Basic Workflow Discovery scheme Build the dependency graph And make it coincide with the initial CFG model Removal of cycles Remove, from E, all edgee between parallel activities Connect the vertices of the the edge, with preceding nodes. Connect the vertices of the removed edge with following nodes Identification of the first node a 0 and the set of final nodes A F.

Basic Workflow Discovery Scheme Derive local constraints

Example: Algorithm simulation a b d c e Dependency graph Precedences: a b b c c b a c b d c d a d b e c e a e d b d c d e Parallel activities: b, c, d Edges to remove: (b, c), (b, d), (c, d).

Example: Algorithm simulation a b x c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph pre E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)};

Example: Algorithm simulation a b c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph post E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)}

Example: Algorithm simulation a b c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph pre E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)}

Example: Algorithm simulation a b c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph post E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)}

Example: Algorithm simulation a b c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)}

Example: Algorithm simulation a b c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)}

Example: Algorithm simulation a b c e Edges to remove: (b, c), (d, b), (c, d). log L = {abcde, adbce, ae}, a b a c a d b c b d b e c b c d c e d b d c d e a e d Dependency graph E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)} a 0 := a; A F := {e}

Example: Algorithm simulation a AND OR b AND AND c AND e OR log L = {abcde, adbce, ae}, A= {a, b, c, d, e}, AND d AND Dependency graph E:= {(a, b), (a, d), (a, e), (b, c), (c, d), (c, e), (d, b), (d, e)} {(a,c)} {(b,e)} a 0 := a; A F := {e}

Outline Part I Introduction to Process Mining Context, motivation and goal General characteristics of the analyzed processes and logs Classification of Process Mining approaches Part II Workflow discovery Basic CFG induction algorithm Other algorithms (α-algorithm, Heuristic Miner, Fuzzy mining) Part III Beyond the control-flow mining perspective Organizational mining Social net mining Extension algorithms Part IV Evaluation and validation of discovered models Conformance Check Log-based property verification Part V Clustering-based Process Mining Discovery of hierarchical process models Discovery of process taxonomies Outlier detection

Workflow discovery algorithms Outline Multi-phase PM α-algorithm Heuristics Miner Genetic PM Fuzzy Miner Part I Introduction to Process Mining Context, motivation and goal General characteristics of the analyzed processes and logs Classification of Process Mining approaches Part II Workflow discovery Basic CFG induction algorithm Other algorithms (α-algorithm, Heuristic Miner, Fuzzy mining) Part III Beyond the control-flow perspective Organizational mining Social net mining Extension techniques Part IV Evaluation and validation of discovered models Conformance Check Log-based property verification Part V Advanced Process Mining approaches Discovery of hierarchical process models Discovery of process taxonomies Outlier detection in a process mining setting 1

Multi-phase mining Main steps Convert each log trace into an execution graph, where each node corresponds to the execution of a task a task label can appear multiple times! Convert each instance graph into an instance graph each node is associated with a single task both nodes and edges are labelled with occurrence counters a fictive start node and a fictive final node are introduced Merge the instance graphs into an aggregated graph model The model is simply the union of all the instance graphs Arc/node counters are summed up Convert the CFG model into an EPC

Multi-phase mining: Example Execution graphs (acyclic): Instance graphs (some cycles can be created) Labels tell in how many log traces an arc (or a node) occurs

Multi-phase mining: Example (2) Instance graphs: Aggregated graph model:

Multi-phase mining: deriving an EPC ts tf

Multi-phase mining: Example (3) EPC model Aggregated graph model

Workflow discovery algorithms Outline Multi-phase PM α-algorithm Heuristics Miner Genetic PM Fuzzy Miner Part I Introduction to Process Mining Context, motivation and goal General characteristics of the analyzed processes and logs Classification of Process Mining approaches Part II Workflow discovery Basic CFG induction algorithm Other algorithms (α-algorithm, Heuristic Miner, Fuzzy mining) Part III Beyond the control-flow perspective Organizational mining Social net mining Extension techniques Part IV Evaluation and validation of discovered models Conformance Check Log-based property verification Part V Advanced Process Mining approaches Discovery of hierarchical process models Discovery of process taxonomies Outlier detection in a process mining setting 1

α-algorithm Output: a Petri net Method: Read the input log Get the set of tasks Infer a set of ordering relations Build the net based on inferred relations Return the net

α-algorithm - Ordering Relations >,,,# Direct succession: x>y iff for some case x is directly followed by y Causality: Parallel: x y iff x>y and not y>x x y iff x>y and y>x Unrelated: x#y iff not x>y and not y>x

From the ordering relations to the Petri net x y x z x y x y y z x z, y z, and x#y x y, x z, and y z y x x z z y x y, x z, and y#z x z, y z, and x y

α-algorithm - Formalization Let W be a workflow log over T. α(w) is defined as follows. 1. T W = { t T σ W t σ}, 2. T I = { t T σ W t = first(σ) }, 3. T O = { t T σ W t = last(σ) }, 4. X W = { (A,B) A T W B T W a A b B a W b a1,a2 A a 1 # W a 2 b1,b2 B b 1 # W b 2 }, 5. Y W = { (A,B) X (A,B ) X A A B B (A,B) = (A,B ) }, 6. P W = { p (A,B) (A,B) Y W } {i W,o W }, 7. F W = { (a,p (A,B) ) (A,B) Y W a A } { (p (A,B),b) (A,B) Y W b B } { (i W,t) t T I } { (t,o W ) t T O }, and 8. α(w) = (P W,T W,F W ).