Mediating Between Modeled and Observed Behavior: The Quest for the "Right" Process. prof.dr.ir. Wil van der Aalst

Similar documents
Discovering Concurrency Learning (Business) Process Models from Examples

A General Divide and Conquer Approach for Process Mining

Mine Your Own Business

QUALITY DIMENSIONS IN PROCESS DISCOVERY: THE IMPORTANCE OF FITNESS, PRECISION, GENERALIZATION AND SIMPLICITY

Decomposed Process Mining with DivideAndConquer

The Multi-perspective Process Explorer

Decomposed Process Mining: The ILP Case

Non-Dominated Bi-Objective Genetic Mining Algorithm

The multi-perspective process explorer

Towards Process Instances Building for Spaghetti Processes

Discovering and Navigating a Collection of Process Models using Multiple Quality Dimensions

Model Repair Aligning Process Models to Reality

BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models

Scalable Process Discovery and Conformance Checking

Measuring the Precision of Multi-perspective Process Models

Mining Process Performance from Event Logs

ProM 6: The Process Mining Toolkit

APD tool: Mining Anomalous Patterns from Event Logs

Data- and Resource-Aware Conformance Checking of Business Processes

Towards Improving the Representational Bias of Process Mining

Business Intelligence & Process Modelling

Bidimensional Process Discovery for Mining BPMN Models

Genetic Process Mining: A Basic Approach and its Challenges

Scientific Workflows for Process Mining: Building Blocks, Scenarios, and Implementation

Data Streams in ProM 6: A Single-Node Architecture

YAWL in the Cloud. 1 Introduction. D.M.M. Schunselaar, T.F. van der Avoort, H.M.W. Verbeek, and W.M.P. van der Aalst

Flexible evolutionary algorithms for mining structured process models Buijs, J.C.A.M.

Improving Process Model Precision by Loop Unrolling

Intra- and Inter-Organizational Process Mining: Discovering Processes Within and Between Organizations

ProM 4.0: Comprehensive Support for Real Process Analysis

Part II Workflow discovery algorithms

Reality Mining Via Process Mining

Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets)

Finding suitable activity clusters for decomposed process discovery Hompes, B.F.A.; Verbeek, H.M.W.; van der Aalst, W.M.P.

Conformance Checking of Processes Based on Monitoring Real Behavior

Multidimensional Process Mining with PMCube Explorer

Behavioral Conformance of Artifact-Centric Process Models

Process Mining Based on Clustering: A Quest for Precision

Process Model Discovery: A Method Based on Transition System Decomposition

Conformance Checking Evaluation of Process Discovery Using Modified Alpha++ Miner Algorithm

Reality Mining Via Process Mining

Process Mining in the Context of Web Services

Finding Suitable Activity Clusters for Decomposed Process Discovery

Finding Suitable Activity Clusters for Decomposed Process Discovery

Dealing with Artifact-Centric Systems: a Process Mining Approach

Mining with Eve - Process Discovery and Event Structures

Dierencegraph - A ProM Plugin for Calculating and Visualizing Dierences between Processes

Online Conformance Checking for Petri Nets and Event Streams

Semantics of ARIS Model

Process Mining Based on Clustering: A Quest for Precision

Sub-process discovery: Opportunities for Process Diagnostics

Filter Techniques for Region-Based Process Discovery

Online Conformance Checking for Petri Nets and Event Streams

Workflow : Patterns and Specifications

Conformance Testing: Measuring the Fit and Appropriateness of Event Logs and Process Models

Mining CPN Models. Discovering Process Models with Data from Event Logs. A. Rozinat, R.S. Mans, and W.M.P. van der Aalst

Process mining using BPMN: relating event logs and process models Kalenkova, A.A.; van der Aalst, W.M.P.; Lomazova, I.A.; Rubin, V.A.

Change Your History: Learning from Event Logs to Improve Processes

Process Model Consistency Measurement

A Simulation-Based Approach to Process Conformance

Discovering Hierarchical Process Models Using ProM

Process Discovery: Capturing the Invisible

CONFIGURABLE WORKFLOW MODELS

Published in: Petri Nets and Other Models of Concurrency - ICATPN 2007 (28th International Conference, Siedcle, Poland, June 25-29, 2007)

Process Mining Put Into Context

Modeling the Control-Flow Perspective. prof.dr.ir. Wil van der Aalst

Interoperability in the ProM Framework

Decomposition Using Refined Process Structure Tree (RPST) and Control Flow Complexity Metrics

Process cubes : slicing, dicing, rolling up and drilling down event data for process mining van der Aalst, W.M.P.

Faulty EPCs in the SAP Reference Model

2015 International Conference on Computer, Control, Informatics and Its Applications

Towards Automated Process Modeling based on BPMN Diagram Composition

Coverability Graph and Fairness

Concurrent Systems Modeling using Petri Nets Part II

Streaming process discovery and conformance checking

Lezione 14 Model Transformations for BP Analysis and Execution

Verification of EPCs: Using Reduction Rules and Petri Nets

Concurrent Systems Modeling using Petri Nets Part II

Artificial Intelligence. Programming Styles

Discovering Stochastic Petri Nets with Arbitrary Delay Distributions from Event Logs

arxiv: v1 [cs.ds] 12 Dec 2017

Enabling Flexibility in Process-Aware

Incorporating negative information in process discovery

Classification. Instructor: Wei Ding

Generation of Interactive Questionnaires Using YAWL-based Workflow Models

Faulty EPCs in the SAP Reference Model

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Reachability Analysis

Business Process Management: A Comprehensive Survey

Process Mining Discovering Workflow Models from Event-Based Data

Eindhoven University of Technology MASTER. The business value of process mining bringing it all together. Bayraktar, I.

Qualitative Analysis of WorkFlow nets using Linear Logic: Soundness Verification

Process Mining Tutorial

DB-XES: Enabling Process Discovery in the Large

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Advanced Reduction Rules for the Verification of EPC Business Process Models

A Study of Quality and Accuracy Tradeoffs in Process Mining

bflow* Toolbox - an Open-Source Modeling Tool

Exploring Econometric Model Selection Using Sensitivity Analysis

Process Modelling using Petri Nets

Transcription:

ediating Between odeled and Observed Behavior: The Quest for the "Right" Process prof.dr.ir. Wil van der Aalst

Outline Introduction to Process ining (short) ediating between a reference model and observed behavior (model repair) The 4+ dimensions of conformance Importance of alignments to relate observed and modeled behavior Representational bias Discovering configurable process models Decomposing process mining problems to deal with Big Data PAGE 1

Positioning Process ining performance-oriented questions, problems and solutions process model analysis (simulation, verification, etc.) process mining data-oriented analysis (data mining, machine learning, business intelligence) compliance-oriented questions, problems and solutions 2

oore's Law 2013 2060 3

Three Types of Process ining world business processes people machines components organizations models analyzes supports/ controls specifies configures implements analyzes software system records events, e.g., messages, transactions, etc. (process) model discovery conformance enhancement event logs

Play-Out process model event log PAGE 5

Play-Out (Classical use of models) B A p1 E p3 D start end p2 C p4 A B C D A E D A C B D A B C D A E D A C B D A E D A C B D PAGE 6

Play-In event log process model PAGE 7

Play-In A B C D A C B D A E D A E D A B C D A C B D A E D A C B D B A p1 E p3 D start end p2 C p4 PAGE 8

Example Process Discovery (Vestia, Dutch housing agency, 208 cases, 5987 events) PAGE 9

Example Process Discovery (ASL, test process lithography systems, 154966 events) PAGE 10

Example Process Discovery (AC, 627 gynecological oncology patients, 24331 events) PAGE 11

Replay event log process model extended model showing times, frequencies, etc. diagnostics predictions recommendations PAGE 12

Replay A B C D B A p1 E p3 D start end p2 C p4 PAGE 13

Replay A E D B A p1 E p3 D start end p2 C p4 PAGE 14

Replay can detect problems A C D Problem! token left behind B Problem! missing token A p1 E p3 D start end p2 C p4 PAGE 15

Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988) PAGE 16

Replay can extract timing information A 5 B 8 C 9 D 13 4 5 3 6 8 B 2 5 7 8 A p1 E p3 D start 5 4 p2 3 3 7 C 9 p4 4 6 4 7 13 end PAGE 17

Performance Analysis Using Replay (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988) PAGE 18

Hundreds of plug-ins available covering the whole process mining spectrum open-source (L-GPL) Download from: www.processmining.org PAGE 19

Commercial Alternatives Disco (Fluxicon) Perceptive Process ining (before Futura Reflect and BP one) ARIS Process Performance anager QPR ProcessAnalyzer Interstage Process Discovery (Fujitsu) Discovery Analyst (StereoLOGIC) XAnalyzer (XPro) 20

Three Key Observations PAGE 21

#1 Alignments are essential! conformance checking to diagnose deviations squeezing reality into the model to do model-based analysis move on model move on log move on model (harmless) move on model PAGE 22

#2 odels are like the glasses required to see and understand event data! PAGE 23

#3 Process models as maps: Breathing life into models PAGE 24

Alignments Joint work with Arya Adriansyah, Boudewijn van Dongen, Elham Ramezani, Dirk Fahland, assimiliano de Leoni, et al. PAGE 25

Replaying trace abeg a b e g b examine thoroughly g start a register request r=1 c examine casually d check ticket e decide m=1 f reinitiate request pay compensation h reject request end 1 6 1 6 = 0.83333 26

From playing the token game to optimal alignments observed trace: abeg a b» e g a b d e g b move in model only start a register request examine thoroughly c examine casually d check ticket decide f e reinitiate request g pay compensation h reject request end 27

Another alignment observed trace: abcdeg a b c d e g a b» d e g b examine thoroughly g move in log only start a register request c examine casually d check ticket decide f e reinitiate request pay compensation h reject request end 28

oves have costs a» a a» a a b Standard cost function: c(x,») = 1 c(»,y) = 1 c(x,y) = 0, if x=y c(x,y) =, if x y 29

Any cost structure is possible send-letter(john,2 weeks, $400) send-email(sue,3 weeks,$500) Similar activities (more similarity implies lower costs). Resource conformance (done by someone that does not have the specified role). Data conformance (path is not possible for this customer). Time conformance (missed the legal deadline) 30

Using Alignments An optimal alignment has the lowest possible costs. If multiple alignments are optimal, pick one or use all. Like an "oracle" revealing paths in the model. These paths can be used for further analysis! Can be used for quantifying various types conformance (possibly using a different cost function). PAGE 31

Some pointers Wil. P. van der Aalst, Arya Adriansyah, Boudewijn F. van Dongen: Replaying history on process models for conformance checking and performance analysis. Wiley Interdisc. Rew.: Data ining and Knowledge Discovery 2(2): 182-192 (2012) Arya Adriansyah, Jorge unoz-gama, Josep Carmona, Boudewijn F. van Dongen, Wil. P. van der Aalst: Alignment Based Precision Checking. Business Process anagement Workshops 2012: 137-149 Arya Adriansyah, Boudewijn F. van Dongen, Wil. P. van der Aalst: Conformance Checking Using Cost-Based Fitness Analysis. EDOC 2011: 55-64 assimiliano de Leoni, Wil. P. van der Aalst, Boudewijn F. van Dongen: Dataand Resource-Aware Conformance Checking of Business Processes. BIS 2012: 48-59 Joos C. A.. Buijs, Boudewijn F. van Dongen, Wil. P. van der Aalst: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. OT Conferences (1) 2012: 305-322 Elham Ramezani, Dirk Fahland, Wil. P. van der Aalst: Where Did I isbehave? Diagnostic Information in Compliance Checking. BP 2012: 262-278 A. Adriansyah, B.F. van Dongen, W..P. van der Aalst. emory-efficient Alignment of Observed and odeled Behavior. BP Center Report BP-13-03, BPcenter.org, 2013 PAGE 32

Conformance: Taking a Step Back PAGE 33

Conventional Conformance Notions fitness lift ability to explain observed behavior avoiding overfitting precision thrust generalization Process ining Occam s Razor simplicity avoiding underfitting drag gravity Leaving out one of these dimensions during discovery will lead to degenerate cases! PAGE 34

Problem real process is unknown process discovery event logs covers only a fraction of all possible behavior process discovery real process unknown record conformance checking event data only examples conformance checking process model model needs to provide an abstraction: urphy's Law of Process ining PAGE 35

Traditional Notions such as Precision and Recall do NOT Apply ideal or desired model based on perfect knowledge of real process event log descriptive or normative model (man-made or discovered) 0 regular behavior in log not covered by model FN 1 TP L regular behavior in log covered by model Problem I: event log does not provide information about the whole universe of traces only a selected part 0 TN FP exceptional behavior in log covered by model Problem II: in practice it is unclear where this line is exceptional behavior in log not covered by the model PAGE 36

Operationalizing the Four Conformance dimensions avoiding overfitting thrust generalization fitness Process ining Occam s Razor gravity lift ability to explain observed behavior simplicity precision drag avoiding underfitting Fitness (fraction of observed behavior possible according to the model). easure at the case or event level. How to continue after a deviation (where to put the "blame"), cf. duplicate and silent activities. Precision (avoiding underfitting; fraction of allowed behavior never observed). Log only contains examples. etrics are e.g. based on escaping edges. Generalization (avoiding overfitting; probability that the next unseen case will not fit). Reasoning about unseen behavior, strongly related to log completeness. Simplicity (Occam's Razor: the simplest of two or more competing theories is preferable) Easy to operationalize (e.g., number of nodes or arcs). Often subjective (valuation of AND/XOR/OR-split/joins). PAGE 37

Some pointers Wil. P. van der Aalst: ediating Between odeled and Observed Behavior: The Quest for the Right Process. Seventh IEEE International Conference on Research Challenges in Information Science (RCIS 2013), (2013) Wil. P. van der Aalst, Arya Adriansyah, Boudewijn F. van Dongen: Replaying history on process models for conformance checking and performance analysis. Wiley Interdisc. Rew.: Data ining and Knowledge Discovery 2(2): 182-192 (2012) Joos C. A.. Buijs, Boudewijn F. van Dongen, Wil. P. van der Aalst: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. OT Conferences (1) 2012: 305-322 Wil. P. van der Aalst: Process ining - Discovery, Conformance and Enhancement of Business Processes. Springer 2011, isbn 978-3-642-19344-6, pp. I-XVI, 1-352 PAGE 38

Representational Bias in Process ining (not about visualization) Joint work with Joos Buijs, Sander Leemans, and Boudewijn van Dongen. PAGE 39

Typical Representational Bias (Labeled) Petri Nets, WFnets, etc. Subsets of BPN diagrams, UL Activity Diagrams, Event-Driven Process Chains (EPCs), YAWL, etc. Transition Systems (Hidden) arkov odels [start] register request start start start register request [c1,c2] register request examine thoroughly examine casually check ticket a register request XOR [c2,c3] [c1,c4] OR AND check ticket c1 reinitiate request examine casually examine thoroughly [c5] c2 e1 e2 e3 decide [c3,c4] b examine thoroughly c examine casually d check ticket pay compensation examine thoroughly reject request examine casually check ticket examine thoroughly examine casually check ticket [end] c3 c4 reinitiate request OR AND start register request e decide f decide reinitiate request e4 decide XOR c1 c2 c5 OR-split e5 e6 examine thoroughly examine casually pay compensation pay compensation reject request pay compensation check ticket OR-join reject request decide new information c3 g h reject request end pay compensation reject request end end end end PAGE 40

Huge Search Space When Discovering a Petri Net, BPN model, and the like PAGE 41

with just a few interesting candidates PAGE 42

Alternative Representational Bias 1. C-nets (XOR/AND/ORsplit/join graphs; more likely to be sound due to declarative semantics). 2. Declare models (constraint based, grounded in LTL; anything is possible unless forbidden) 3. Process Trees (similar to subsets of various process algebras; sound by structure) a register request today's focus b examine thoroughly c examine casually d check ticket f reinitiate request a e decide register request g pay compensation h reject request z end pay compensation g h reject request e decide PAGE 43

Another Representational Bias: Process Trees A Always sound because of the block structure Also Loop and OR operator A X D E B C D B C A B A (BA)* E PAGE 58

Petri Net Semantics (used for comparison and conformance checking only) A B A Sequence A Parallellism B Exclusive Choice B A A C Loop B Or Choice B PAGE 59

and BPN. Sequence Or Choice Exclusive Choice Loop Parallellism PAGE 60

A Discovery Algorithm Using Process Trees: Evolutionary Tree iner (ET) Process trees as representation (= limit search space to "good" models). Genetic approach (= very flexible) Fitness function uses all four criteria (= seamlessly balance the different "forces") Change Population No Create Initial Population easure Quality Stop? Yes Return Best Individual PAGE 61

Population Change Elite Population i Population i+1 Selection Replace Crossover utation PAGE 62

Example B E A C G D F A = send e-mail, B = check credit, C = calculate capacity, D = check system, E = accept, F = reject, G = send e-mail PAGE 64

Conventional Algorithms (1/3) ("best effort" mapping to process trees to allow for comparison) alpha miner ILP miner low fitness language-based region miner low precision low fitness PAGE 65

Conventional Algorithms (2/3) heuristic miner multi-phase miner PAGE 66

Conventional Algorithms (3/3) genetic miner state-based region miner PAGE 67

Often unsound result and no mechanism to seamlessly balance the four criteria PAGE 68

Genetic ining (ET) While Considering Only One Criterion best value possible for this log ET with weight zero to three out of four perspectives. PAGE 69

Considering Replay Fitness and One Other Criterion ET with weight zero to two out of four perspectives. PAGE 70

Considering 3 of 4 Criteria replay fitness needs to have a larger weight PAGE 71

Considering All Four Criteria with Emphasis on Fitness fitness has weight 10 PAGE 72

Initial odel Versus Discovered odel simulated discovered by ET B E A C G D F PAGE 73

1) Carefully choose your representational bias during discovery: Unrelated to presentation/visualization! 2) Consider all conformance dimensions (replay fitness, precision, generalization, and simplicity)! PAGE 74

Some pointers Wil. P. van der Aalst, Arya Adriansyah, Boudewijn F. van Dongen: Causal Nets: A odeling Language Tailored towards Process Discovery. CONCUR 2011: 28-42 Joos C. A.. Buijs, Boudewijn F. van Dongen, Wil. P. van der Aalst: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. OT Conferences (1) 2012: 305-322 Joos C. A.. Buijs, Boudewijn F. van Dongen, Wil. P. van der Aalst: A genetic algorithm for discovering process trees. IEEE Congress on Evolutionary Computation 2012: 1-8 S.J.J. Leemans, D. Fahland, W..P. van der Aalst. Discovering Block-Structured Process odels From Event Logs A Constructive Approach. BP Center Report BP-13-06, BPcenter.org, 2013 PAGE 75

ediating Between a Reference odel and Real Observed Behavior Joint work with Joos Buijs, Boudewijn van Dongen, and Dirk Fahland. PAGE 76

Compromise Based on Two ain Forces b examine thoroughly g start a register request c examine casually d check ticket decide f e reinitiate request pay compensation h reject request end fitness lift ability to explain observed behavior Various techniques to compare graphs, e.g., edit distance notions (add, remove, replace). thrust avoiding overfitting generalization Process ining Occam s Razor simplicity avoiding underfitting precision drag gravity PAGE 77

Force Between Reference odel and Candidate odel Can be Viewed as a 5 th Conformance Notion Replay Fitness Simplicity Optimal Process odel Similarity Boundary Candidate Process odel Reference Process odel Precision Generalization PAGE 78

Example From CoSeLoG Project PAGE 79

Some pointers J.C.A.. Buijs,. La Rosa, H.A. Reijers, B.F. van Dongen, and W..P. van der Aalst: Improving Business Process odels using Observed Behavior, SIPDA 2012 post-proceedings, Lecture Notes in Business Information Processing, 2013. Joos C. A.. Buijs, Boudewijn F. van Dongen, Wil. P. van der Aalst: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. OT Conferences (1) 2012: 305-322 Dirk Fahland, Wil. P. van der Aalst: Repairing Process odels to Reflect Reality. BP 2012: 229-245. Dirk Fahland, Wil. P. van der Aalst: Simplifying discovered process models in a controlled manner. Inf. Syst. 38(4): 585-605 (2013). PAGE 80

ining Configurable Process odels Joint work with Joos Buijs, Boudewijn van Dongen, and Florian Gottschalk. PAGE 81

Two variants of the same process PAGE 82

Configurable Process odel PAGE 83

Variants of the same process d a b g h f c d a b g h e c f d a g h e f PAGE 84

Approach 1: erge Separately ined odels event log 1 Process model 1 pure modelbased merging C1 event log 2 Process model 2 C2... pure modelbased configuration event log n Process model n Cn "normal" discovery Step 1: Process ining Step 2a: Process odel erging Configurable Process model Step 2b: Process Configuration PAGE 85

Results Approach 1 for running example PAGE 86

Results Approach 1 for running example poor generalization poor similarity PAGE 87

Approach 2 pure modelbased merging event log 1 Process model 1 C1 event log 2 Process model 2 C2... event log n Process model n Cn Step 1a: erge Event Logs erged event log Step 1b: Process ining Common Process model Step 1c: Process Individualization Step 1d: Process odel erging Configurable Process model Step 2: Process Configuration discovery based on whole event log discovery based on local event log and common model (edit distance as 5 th dimension) pure modelbased configuration PAGE 88

Results Approach 2 for running example PAGE 89

Results Approach 2 for running example poor generalization poor similarity PAGE 90

Approach 3 event log 1 discovery based on whole event log configuration based on common model and local event log (always limiting behavior, not extending it) C1 event log 2 C2... event log n Cn Step 1a: erge Event Logs erged event log Step 1b: Process ining Configurable Process model Step 2: Process Configuration PAGE 91

Results Approach 3 for running example better generalization better similarity PAGE 92

Approach 4 discovery of the process model and the configuration is combined into one algorithm (configuration costs are added to overall fitness) event log 1 C1 event log 2 C2... event log n Cn Step 1&2: Process ining & Process Configuration Configurable Process model PAGE 93

Results Approach 4 for running example small drop in replay fitness better simplicity even better generalization even better similarity PAGE 94

Genetic algorithms are versatile and can consider different forces at the same time PAGE 95

but often not fast enough PAGE 96

Some pointers J.C.A.. Buijs, B.F. van Dongen, and W..P. van der Aalst: ining Configurable Process odels from Collections of Event Logs, under review, 2013. Florian Gottschalk, Teun A. C. Wagemakers, onique H. Jansen-Vullers, Wil. P. van der Aalst, arcello La Rosa: Configurable Process odels: Experiences from a unicipality Case Study. CAiSE 2009: 486-500 Florian Gottschalk, Wil. P. van der Aalst, onique H. Jansen- Vullers: erging Event-Driven Process Chains. OT Conferences (1) 2008: 418-426 Wil. P. van der Aalst: Business Process Configuration in the Cloud: How to Support and Analyze ulti-tenant Processes? ECOWS 2011: 3-10 PAGE 97

Decomposing Process ining Problems Joint work with Eric Verbeek, Jorge unoz, and Josep Carmona. PAGE 98

Big Data: Opportunities and Challenges PAGE 99

System Net f t7 c7 t8 c8 t11 start a t1 Labeled Petri net (P,T,F,l) c1 c2 t2 b t3 c t4 a = register request b = examine file c = check ticket d = decide e = reinitiate request f = send acceptance letter g = pay compensation h = send rejection letter Silent transitions and visible transitions (unique or not), One initial marking init, one final marking final c3 c4 e t6 d t5 c5 c6 g t9 h t10 c9 end PAGE 100

Traces and Logs A system net SN has a set of possible visible traces ϕ(sn) starting in init and ending final only showing the visible steps. An event log L is a multiset of traces. Two main process mining problems: 1. Conformance checking: Given L and SN, evaluate the "conformance" (e.g., fitness, precision, generalization, etc.) of L and ϕ(sn) 2. Process discovery: Given L, create SN such that the conformance of L and ϕ(sn) is "as good as possible" PAGE 101

Valid Decomposition f t2 start a t1 c1 c2 t2 b t3 c t4 c3 c4 t7 d t5 e t6 c5 c7 c6 t8 g t9 h t10 c8 c9 t11 end start a t1 t7 c1 c2 c7 b t3 c t4 f t8 c3 c4 c8 d t5 e t6 t11 g c6 t9 c9 Union of subnets is original net No shared places d t5 c5 h t10 e t6 Shared transitions are visible and have unique label end PAGE 102

Another Decomposition f t7 c7 t8 c8 t11 start a t1 c1 c2 t2 b t3 c t4 c3 c4 d t5 e t6 c5 c6 g t9 h t10 c9 end start a t1 t2 a t1 c2 c t4 c4 e d t5 a t1 c1 b t3 c3 d t5 t6 f e t7 c7 t8 c8 t11 t6 g c6 t9 c9 d Requirement t5 c5 h end Union of subnets is original net e t10 No shared places t6 Shared transitions are visible and unique PAGE 103

aximal Valid Decomposition f t7 c7 t8 c8 t11 start a t1 c1 c2 t2 b t3 c t4 c3 c4 d t5 e c5 c6 g t9 h t10 c9 end t6 SN 1 start a t1 SN 2 a t1 c1 t2 b t3 c3 d t5 e t6 SN 5 t7 d t5 e c5 c7 c6 f t8 g t9 h t10 f t8 g t9 h t10 c8 c9 t11 end SN 6 a t6 t1 c2 c SN 4 d t5 SN 3 t4 e t6 c t4 c4 PAGE 104

aximal Decomposition start a t1 c1 c2 t2 b t3 c t4 c3 c4 t7 d t5 e t6 c5 c7 c6 f t8 g t9 h t10 c8 c9 t11 end SN 1 start a t1 SN 3 a t1 c2 SN 2 a t1 c t4 c1 e t6 t2 b t3 c3 SN 4 c t4 d t5 e t6 c4 SN 5 t7 d t5 e t6 d t5 c5 c7 c6 f t8 g t9 h t10 f t8 g t9 h t10 c8 c9 t11 end SN 6 x Construction: group arcs iteratively aximal decomposition is unique There is always a valid decomposition y y PAGE 105

Non-unique visible labels f t7 c7 t8 c8 t11 start a t1 c1 c2 t2 b t3 b t4 c3 c4 d t5 e c5 c6 g t9 h t10 c9 end t2 t6 a t1 c1 b t3 c3 d t5 SN 7 t2 a e t6 a t1 c1 c2 b t3 b t4 c3 c4 d t5 e t1 c2 b d t5 t6 t4 e t6 b t4 c4 Union of subnets is original net No shared places Shared transitions are visible and unique PAGE 106

Conformance checking can be decomposed!!! start Let L be an event log, SN a system net, and D={SN 1,SN 2, SN n } a valid decomposition L i is the sublog of SN i (L projected onto visible transitions of SN i ) a t1 c1 c2 t2 b t3 c t4 c3 c4 e t6 t7 d t5 c5 c7 c6 f t8 g t9 h t10 c8 c9 t11 end L is perfectly fitting SN if and only if each projected log is L i is perfectly fitting SN i SN 1 start a t1 SN 3 a t1 c2 SN 2 c t4 a t1 c1 e t6 t2 b t3 SN 4 c t4 c3 e t6 c4 d t5 d t5 SN 5 e t6 t7 d t5 c5 c7 c6 f t8 g t9 h t10 f t8 g t9 h t10 c8 c9 t11 end SN 6 PAGE 107

Example of alignment for observed trace a,b,c,d,e,c,d,g,f a,b,c,d,e,c,d,g,f t2 c1 b a t3 start t1 c2 c t4 c3 c4 t7 d t5 e t6 c5 c7 c6 f t8 g t9 h t10 c8 c9 t11 end SN 1 start a t1 SN 3 a t1 c2 SN 2 a t1 c t4 c1 e t6 t2 b t3 c3 SN 4 c t4 d t5 e t6 c4 SN 5 t7 d t5 e t6 d t5 c5 c7 c6 f t8 g t9 h t10 f t8 g t9 h t10 c8 c9 t11 end SN 6 Etc. PAGE 108

So What? Quantifying Conformance Exact result: Fraction of cases perfectly fitting SN equals the fraction of cases for which each projected log is L i is perfectly fitting SN i. Bound: For fitness at the event level (costs based alignment) it is possible to compute an optimistic value. PAGE 109

Discovery can also be distributed! 1. Assume the set of activities is split in overlapping sets. 2. Split log L in sublogs L i based on these sets. 3. Discover a model SN i per sublog. 4. erge the models SN i into SN and return result. All the earlier guarantees hold, e.g., L is perfectly fitting SN if and only if each projected log is L i is perfectly fitting SN i PAGE 111

Example Let's use the Alpha algorithm Alpha algorithm is not very suitable for discovering transition bordered subnets. By adding start and end activities we get (in this case) perfectly fitting subnets. PAGE 112

Discover model per sublog (Alpha algorithm) SN 1 i a d start b end e SN 2 a c d SN 3 d j start e end start e h end SN 4 f j k start g end h PAGE 113

erge models SN 1 i a d start b end SN 2 start a c e e d end L is perfectly fitting SN because each projected log is L i is perfectly fitting SN i SN 3 j d start e h end SN 4 f j k start g end p1 p2 p3 {1,2,3,4} t1 h p5 p6 {1,2} a t2 p7 p8 p9 t4 i t3 b c t5 {1} {1} {2} p10 p11 j t6 {3,4} {1,2,3} d t7 p13 p14 p15 f t9 {4} g k t10 p19 t12 {3,4} p18 {4} {4} h t11 p20 p21 {1,2,3,4} t13 p22 p23 p24 p4 p12 {1,2,3} e t8 p16 p17 p25 PAGE 114

Simplify by removing redundant places and initialization p1 p2 p3 {1,2,3,4} t1 p5 p6 {1,2} a t2 p7 p8 p9 t4 i t3 b c t5 {1} {1} {2} p10 p11 j t6 {3,4} {1,2,3} d t7 p13 p14 p15 {4} f t9 p18 {4} {4} g t10 p19 k t12 {3,4} h t11 p20 p21 {1,2,3,4} t13 p22 p23 p24 p4 p12 {1,2,3} e t8 p16 p17 p25 f i j g k a b d start end c h e Log is perfectly fitting this model! PAGE 115

PAGE 116

SESEs, Passages, and general result all define trees on process graph/event log Few large process mining tasks or many smaller process mining tasks. Choice of level is driven by computation time and desired diagnostics! All implemented in Pro. PAGE 117

Diagnosis PAGE 118

Example: Conformance Checking based on SESEs k=inf (no decompostion) some overhead for easy problems intractable problems become tractable stopped after 10 hours problems are often local suitable k-value depends on model/log k = maximal number of arcs in one SESE, P = # places, T = # transitions, f = fitness, t = time in seconds, S = # transition bounded SESEs, B = # bridges, >5 = # more than 5 arcs, nf = # non-fitting parts. PAGE 119

Some pointers Wil.P. van der Aalst. Decomposing Petri Nets for Process ining: A Generic Approach. BP Center Report BP-12-20, BPcenter.org, 2012 Wil. P. van der Aalst: Distributed Process Discovery and Conformance Checking. FASE 2012: 1-25 Wil. P. van der Aalst: Decomposing Process ining Problems Using Passages. Petri Nets 2012: 72-91 H.. W. (Eric) Verbeek, Wil. P. van der Aalst: An Experimental Evaluation of Passage-Based Process Discovery. Business Process anagement Workshops 2012: 205-210 Wil.P. van der Aalst and Eric Verbeek. Process Discovery and Conformance Checking Using Passages. BP Center Report BP-12-21, BPcenter.org, 2012 J. unoz-gama, J. Carmona, W. van der Aalst: Hierarchical Conformance Checking of Process odels Based on Event Logs. In: Applications and Theory of Petri Nets, 2013 PAGE 120

Conclusion PAGE 121

Conclusion (1/2) Alignments are essential for relating observed and modeled behavior! Conformance has (at least) four dimensions! Representational bias is important (and should not be confused with visualization)! New questions are emerging: mediating between a reference model and observed behavior discovering configurable process models Decomposing process mining problems to deal with Big Data. PAGE 122

Conclusion (2/2) Still many challenging and highly relevant open problems in process mining! performance-oriented questions, problems and solutions process model analysis (simulation, verification, etc.) process mining data-oriented analysis (data mining, machine learning, business intelligence) compliance-oriented questions, problems and solutions PAGE 123