A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

Size: px
Start display at page:

Download "A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants"

Transcription

1 A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl 2 Institute of Dtbses nd Informtion Systems, Ulm University, Germny mnfred.reichert@uni-ulm.de 3 Dtbse Group, University of Twente, The Netherlnds.wombcher@utwente.nl Abstrct. Recently, new genertion of dptive Process-Awre Informtion Systems (PAISs) hs emerged, which enbles structurl process chnges during runtime while preserving PAIS robustness nd consistency. Such flexibility, in turn, leds to lrge number of process vrints derived from the sme model, but differing in structure. Generlly, such vrints re expensive to configure nd mintin. This pper provides heuristic serch lgorithm which fosters lerning from pst process chnges by mining process vrints. The lgorithm discovers reference model bsed on which the need for future process configurtion nd dpttion cn be reduced. It dditionlly provides the flexibility to control the process evolution procedure, i.e., we cn control to wht degree the discovered reference model differs from the originl one. As benefit, we cn not only control the effort for updting the reference model, but lso gin the flexibility to perform only the most importnt dpttions of the current reference model. Our mining lgorithm is implemented nd evluted by simultion using more thn 7000 process models. Simultion results indicte strong performnce nd sclbility of our lgorithm even when fcing lrge-sized process models. 1 Introduction In tody s dynmic business world, success of n enterprise incresingly depends on its bility to rect to chnges in its environment in quick, flexible nd cost-effective wy [22]. However, current off-the-shelf enterprise softwre does not meet these needs [23]. It is deployed in different compnies, domins, nd countries, nd therefore tends to be too generic nd rigid. Generlly, introduction of enterprise softwre entils the problem of ligning business processes nd IT. This cuses huge customiztion efforts t the site of softwre buyers tht exceed the price for the softwre licenses by fctor five to ten [5]. Softwre vendors, in turn, mke endevors to close this lignment gp [34], nd mjor progress hs This work ws done in the MinAdept project, which hs been supported by the Netherlnds Orgniztion for Scientific Reserch under contrct number

2 2 been chieved by shifting from function- to process-centered softwre design. Along this trend vriety of process support prdigms s well s lnguges hve emerged. Using WS-BPEL [1], for exmple, n executble process cn be composed out of existing ppliction services. At runtime, execution of these services is then orchestrted by the PAIS ccording to the defined process logic. Recently, different pproches for dpting processes hve emerged. Generlly, structurl process dpttions re not only needed for configurtion purpose t build time [9, 32], but lso become necessry for single process instnces during runtime to del with exceptionl situtions nd chnging needs [25, 41]. In response to this need dptive process mngement technology hs emerged [41, 43]. It llows to configure nd dpt process models t different levels. This, in turn, results in lrge collections of process model vrints (process vrints for short) creted from the sme process model, but slightly differing from ech other in their structure. Fig. 1 depicts n exmple. The left hnd side shows high-level view on ptient tretment process s it is normlly executed: ptient is dmitted to hospitl, where he first registers, then receives tretment, nd finlly pys. In emergency situtions, however, it might become necessry to devite from this model, e.g., by first strting tretment of the ptient nd llowing him to register lter during tretment. To cpture this behvior in the model of the respective process instnce, we need to move ctivity receive tretment from its current position to the one prllel to ctivity register. This leds to instnce-specific process model vrint S s shown in Fig. 1b. Generlly, lrge number of process vrints derived from sme originl process model exist [22]. 1.1 Problem Sttement Though considerble efforts hve been mde to ese process configurtion nd customiztion [9, 25], most existing pproches hve not yet utilized the informtion resulting from pst process dpttions [40]. Fig. 2 describes the gol of this pper. We im t lerning from pst process chnges by merging process vrints into one generic process model, which covers these vrints best. By dopting this generic model s newreference process model within the PAIS, need for future process dpttions nd thus cost for chnge will decrese. =Move (S, register, dmitted, py) S[ >S register Admitted register receive tretment py dmitted py AND-Split receive tretment AND-Join e=<dmitted, receive tretment, register, py> ) S: originl process model b) S : finl execution & chnge Fig. 1. Originl Process Model S nd Process Vrint S

3 3 When deriving new process reference model, the originl one should be lso tken into ccount. In most cses, drmtic chnges of the current reference model re not preferred due to implementtion costs or socil resons. Process designers should therefore hve the flexibility to choose to wht degree they wnt to chnge the originl reference model to fit better to the vrints. Originl reference process model S customiztion & dpttion Process vrint S1 Process vrint S2 Process vrint Sn Control differences mining & lerning Discovered reference process model S Fig. 2. Discovering new reference model by lerning from pst process configurtions Bsed on the two ssumptions tht (1) process models re well-formed (i.e., block-structured like in WS-BPEL) nd (2) ll ctivities in process model hve unique lbels, this pper dels with the following fundmentl reserch question: Given reference model nd collection of process vrints configured from it, how to derive new reference process model by performing sequence of chnge opertions on the originl one, such tht the verge distnce between the reference model nd the process vrints becomes miniml? The distnce between the reference process model nd process vrint is mesured by the number of high-level chnge opertions (e.g., to insert, delete or move ctivities [25]) needed to trnsform the reference model into the respective vrint. Clerly, the shorter the distnce is, the less efforts needed for process dpttion re. Bsiclly, we discover new reference model by performing sequence of chnge opertions on the originl one. In this context, we provide users the flexibility to control the distnce between old reference model nd newly discovered one, i.e., to choose how mny chnge opertions shll be pplied to the old reference model. Clerly, the most relevnt chnges (which significntly contribute to reduce the verge distnce) should be considered first nd the less importnt ones lst. Prticulrly, if users decide to ignore the less relevnt chnges, the overll performnce of our lgorithm with respect to the described reserch gol will not be influenced too much. Such flexibility to control the difference between the originl nd the discovered model is significnt improvement when compred to our previous work [15]; the pproch presented in [15] enbles discovery of reference process model by mining collection of vrints, but is unble to tke the originl reference process model into ccount.

4 4 The reminder of this pper is orgnized s follows. Section 2 gives bckground informtion needed for understnding this pper. Section 3 introduces our heuristic serch lgorithm nd provides high-level overview on how it cn be used for mining process vrints. We describe two importnt spects of our heuristics lgorithm, (i.e., the fitness function nd the serch tree) in Sections 4 nd 5. To evlute the performnce of our mining lgorithm, we conduct simultion. Section 6 describes its setup, while Section 7 presents the simultion results. Finlly, Section 8 discusses relted work nd Section 9 concludes with summry nd outlook. 2 Bckgrounds We first introduce bsic notions needed in the following: Process Model: Let P denote the set of ll sound process models. A prticulr process model S = (N, E,...) 4 P is defined s well-structured Activity Net [25]. N constitutes the set of ctivities nd E the set of control edges (i.e., precedence reltions) linking them. To limit the scope, we ssume Activity Nets to be block-structured (similr to WS-BPEL). A simple exmple is depicted in Fig. 3. For detiled description nd correctness issues, we refer to [25]. Process chnge: A process chnge is ccomplished by pplying sequence of high-level chnge opertions to given process model S over time [25]. Such opertions modify the initil process model by ltering its set of ctivities nd their order reltions. Thus, ech ppliction of chnge opertion results in new process model. We define process chnge nd process vrint s follows: Definition 1 (Process Chnge nd Process Vrint). Let P denote the set of possible process models nd C be the set of possible process chnges. Let S, S P be two process models, let C be process chnge, nd let σ = 1, 2,... n C be sequence of chnges performed on initil model S. Then: S[ S iff is pplicble to S nd S is the (sound) process model resulting from the ppliction of to S. S[σ S iff S 1, S 2,... S n+1 P with S = S 1, S = S n+1, nd S i [ i S i+1 for i {1,... n}. We lso denote S s vrint of S. Exmples of high-level chnge opertions include insert ctivity, delete ctivity, nd move ctivity s implemented in the ADEPT chnge frmework [25]. While insert nd delete modify the set of ctivities in the process model, move chnges ctivity positions nd thus the structure of the process model. A forml semntics of these chnge ptterns is given in [31]. For exmple, opertion move(s, A,B,C) moves ctivity A from its current position within process model S to the position fter ctivity B nd before ctivity C. Opertion delete(s, A), in turn, deletes ctivity A from process model S. Issues concerning the correct 4 A Well-structured Activity Net contins more elements thn only node set N nd edge set E, which cn be ignored in the context of this pper.

5 5 use of these opertions, their generliztion nd forml pre-/post-conditions re described in [25]. Though the depicted chnge opertions re discussed in reltion to our ADEPT chnge frmework, they re generic in the sense tht they cn be esily pplied in connection with other process met models s well [31, 43]. For exmple, process chnge s relized in the ADEPT frmework cn be mpped to the concept of life-cycle inheritnce known from Petri Nets [37]. We refer to ADEPT since it covers by fr most high-level chnge ptterns nd chnge support fetures when compred to other dptive PAISs [41, 43]. Definition 2 (Bis nd Distnce). Let S, S P be two process models. Then: Distnce d (S,S ) between S nd S corresponds to the miniml number of high-level chnge opertions needed to trnsform S into S ; i.e., we define d (S,S ) := min{ σ σ C S[σ S }. Furthermore, sequence of chnge opertions σ with S[σ S nd σ = d (S,S ) is denoted s bis between S nd S. The distnce between S nd S is the miniml number of high-level chnge opertions needed for trnsforming S into S. The corresponding sequence of chnge opertions is denoted s bis B S,S between S nd S. 5 Usully, such distnce mesures the complexity for model trnsformtion (i.e., configurtion). As exmple tke Fig. 1. Here, distnce between model S nd vrint S 1 is one, since we only need to perform one chnge opertion move(s, register,dmitted,py) to trnsform S into S [17]. In generl, determining bis nd distnce between two process models hs complexity t N P level [17]. We consider high-level chnge opertions insted of chnge primitives (i.e., elementry chnges like dding or removing nodes / edges) to mesure the distnce between process models. This llows us to gurntee soundness of process models nd provides more meningful mesure for distnce [17, 41]. Definition 3 (Trce). Let S = (N, E,...) P be process model. We define t s trce of S iff: t < 1, 2,..., k > (with i N) constitutes vlid nd complete execution sequence of ctivities considering the control flow defined by S. We define T S s the set of ll trces tht cn be produced by process instnces running on process model S. t( b) is denoted s precedence reltionship between ctivities nd b in trce t < 1, 2,..., k > iff i < j : i = j = b. We only consider trces composing rel ctivities, but no events relted to silent ones, i.e., nodes within process model hving no ssocited ction nd only existing for control flow purpose [17]. At this stge, we consider two process models s being the sme if they re trce equivlent, i.e., S S iff T S T S. The stronger notion of bi-similrity [10] is not needed in our context. 5 Generlly, it is possible to hve more thn one miniml set of chnge opertions to trnsform S into S, i.e., given process models S nd S their bis does not need to be unique. A detiled discussion of this issue cn be found in [37, 17].

6 6 3 Overview of Our Heuristic Serch Algorithm Section 3.1 provides running exmple which we use throughout the pper. In Section 3.2, we introduce our heuristic serch lgorithm nd give high-level overview of how it cn be pplied for mining process vrints. 3.1 Running Exmple An illustrting exmple is given in Fig. 3. Out of n originl reference model S, six different process vrints S i P (i = 1, 2,... 6) hve been configured. These vrints do not only differ in structure, but lso with respect to their ctivity sets. For exmple, ctivity X ppers in 5 of the 6 vrints (except S 2 ), while Z only ppers in S 5. The 6 vrints re further weighted bsed on the number of process instnces creted from them. In our exmple, 25% of ll instnces were executed ccording to vrint S 1, while 20% rn on S 2. If we only know the process vrints, but hve no runtime informtion bout relted instnce executions, we ssume vrints to be eqully weighted; i.e., every process vrint then hs weight 1/n, where n corresponds to the totl number of vrints. We cn lso compute the distnce (cf. Def. 2) between originl reference model S nd ech vrint S i. For exmple, when compring S with S 1 we obtin distnce 4 (cf. Fig. 3); i.e., we need to pply four high-level chnge opertions (move(s, H,I,D), move(s, I,J, endf low), move(s, J,B, endf low) nd insert(s, X,E,B); cf. Def. 1) to trnsform S into S 1. Bsed on weight w i of ech vrint S i, we cn then compute verge weighted distnce between reference model S nd its vrints. Regrding our exmple, s distnces between S nd S i we obtin: 4(i = 1,..., 6) 6 (cf. Fig. 3). When considering the vrint weights, s verge weighted distnce, we obtin = 4.0. This mens we need to perform on verge 4.0 chnge opertions to configure process vrint (nd relted instnce respectively) out of the reference model. Generlly, verge weighted distnce between reference model nd its process vrints represents how close they re. The gol of our mining lgorithm is to discover reference model for collection of (weighted) process vrints with miniml verge weighted distnce to the vrints. 3.2 Heuristic Serch for Process Vrint Mining As discussed in Section 2, mesuring the distnce between two models is n N P problem, i.e., the time for computing the distnce is exponentil to the size of the process models. Consequently, the problem set out in our reserch question (i.e., finding reference model which hs miniml verge weighted distnce to the vrints), is n N P problem s well. When encountering rel-life cses (i.e., 6 In our exmple, ll vrints hve the sme distnce to the originl reference model. We specilly designed it in this wy in order to better explin our simultion s presented in Section 6. Clerly, our lgorithm cn lso be pplied when vrints hve different distnces to the originl reference model.

7 7 I A E F G C D J B H S : originl reference model Process configurtion Weight: w 1 = 25% E S F 1 C S 2 Distnce: d (S,S1) = 4 Bies: B (S,S1) = Weight: w 3 = 15% Y C S 3 A S 4 E Distnce: d (S,S3) = 4 Bies: B (S,S3)= Weight: w 5 = 20% A E X Y B J F S 5 G S 6 I H G D X F A X D G I J C Z D I H Distnce: d (S,S5) = 4 Distnce: d (S,S6) = 4 Bies: B (S,S5) = move (S, H, I, D), move(s, I, J, endflow), move (S, J, B, endflow), insert(s, X, E, B) move(s, J, {A,F}, B), insert(s, X, E, J), Insert (S, Y, strtflow, I), move(s, I, D, H) insert(s, Y, {A,F}, B), insert(s, X, E, Y), insert(x, Z, C, D), move (S, J, B, endflow) B B H J Weight: w 2 = 20% Distnce: d (S,S2)= 4 Bies: B (S,S2) = E Distnce: d (S,S4) = 4 B (S,S4)= Bies: B (S,S6) = E Weight: w 4 = 10% Bies: C Weight: w 6 = 10% E insert(s, Y, E, B, con), move(s, C, strtflow, I), move (S, J, B, endflow), move (S, I, D, H) A X F H A F Y C D B G G insert(s, X, E, B), insert(s, Y, strtflow, B), move (S, J, B, endflow), move (S, H, I, C) D move(s, H, strtflow, I), move (S, I, B, endflow), insert(s, X, E, B), move (S, J, B, endflow, con) I A X F Y I G I J B B H C D H J J Averge weighted distnce = 4 chnge / instnce AND-Split AND-Join XOR-Split XOR-Join Fig. 3. Illustrting exmple thousnds of vrints with complex structure), finding the optimum would therefore be either too time-consuming or not fesible. In this pper, we present heuristic serch lgorithm for vrint mining. Our overll gol is to find solution which is close to the optimum, but cn be computed in resonble mount of time. Heuristic lgorithms re widely used in vrious fields of computer science, e.g., rtificil intelligence [20], dt mining [36] nd mchine lerning [24]. A problem employs heuristics when it my hve n exct solution, but the computtionl cost of finding it my be prohibitive [20]. Although heuristic lgorithms do not im t finding the rel optimum (i.e., it is neither possible to theoreticlly prove tht the discovered result is the optimum nor cn we sy how

8 8 close it is to the optimum), they re widely used in prctice. Usully heuristic lgorithms provide nice blnce between the goodness of the discovered solution nd the computtion time for finding it [20]. Regrding the mining of process vrints, Fig. 4 illustrtes on how heuristic lgorithms cn be pplied in our context. Here we represent ech process vrint S i s single node in the two dimensionl spce (white node). The gol for vrint mining is then to find the center of these nodes (bull s eye S nc ), which hs miniml verge distnce to them. In ddition, s discussed in Section 1, we lso wnt to tke the originl reference model S (solid node) into ccount, such tht we cn control the difference between the newly discovered reference model nd the originl one. Bsiclly, this requires us to blnce two forces: one is to bring the newly discovered reference model closer to the vrints (i.e., to the bull s eye S nc t right) thn the old one; the other one is to move the discovered model not too fr wy from originl model S (the solid node t left) such tht it does not differ too much from the originl one. Process designer obtin the flexibility to blnce these two forces, i.e., they re ble to discover model (e.g., S c ), which is closer to the vrints thn the old one but which is still within limited distnce to the ltter. Clerly, the chnge opertions pplied first to the (originl) reference model should be more importnt (i.e., reduce the distnce between the reference model nd the vrints more) thn the ones positioned t end. Consequently, if we ignore the less relevnt chnges, we will not influence the overll distnce reduction between reference model nd vrints too much. Our heuristic lgorithm works s follows: 1. We use originl reference model S s strting point. 2. We serch for ll neighboring process models with distnce 1 to the currently considered reference process model. If we re ble to find better model S mong these cndidte models (i.e., one which hs lower verge weighted distnce to the given collection of vrints thn S), we replce S by S. 3. Repet Step 2 until we either cnnot find better model or the mximlly llowed distnce between originl nd new reference model is reched. Finlly, S corresponds to our discovered reference model S c. If we do not set ny serch limittion, our heuristic lgorithm is lso ble to find the center of the vrints (i.e., S nc ). This implies tht it cn be lso pplied to scenrios where there only exists collection of vrints, but the originl reference model is not known. In this cse, we cn rndomly select vrint S i s strting point nd serch unlimitedly until we find the center, i.e., the model with miniml verge weighted distnce to the collection of vrints. Generlly, most importnt for ny heuristic serch lgorithm re two spects: the heuristic mesure nd the lgorithm tht uses heuristics to serch the stte spce. Section 4 introduces the fitness function which mesures the qulity of prticulr cndidte model; i.e., it llows us to pproximtely evlute how close such cndidte model is to the given vrints. Section 5 then introduces best-first serch lgorithm to serch the stte spce; i.e., this lgorithm illustrtes how to serch for next cndidte process model.

9 9 Force 2: close to reference Force 1: close to vrints S i :Vrints d = 2 d =3 d=1 S: Originl reference model No constrint S c: Serch result with constrint S nc : Serch result without constrint Originl Reference model Process vrints Intermedite serch result Discovered Reference Model Serch steps Fig. 4. Heuristic serch pproch 4 Fitness Function of Heuristic Serch Algorithm Generlly, ny fitness function of heuristic serch lgorithm should be quickly computble. Since serch spce my become very lrge, we must be ble to mke quick decision on which pth to choose next. Averge weighted distnce cn not be used s fitness function since complexity for computing it is N P. In this section we introduce fitness function, which cn be used to pproximtely mesure the closeness between cndidte model nd the collection of vrints. In prticulr, it cn be computed in polynomil time. Like in most heuristic serch lgorithms, the chosen fitness function is resonble guessing rther thn precise mesurement. Therefore, in Section 7 we will investigte how the fitness function is correlted with the verge weighted distnce. 4.1 Activity Coverge For cndidte process model S c = (N c, E c,...) P, we first mesure to wht degree its ctivity set N c covers the ctivities tht occur in the considered collection of vrints. We denote this mesure s ctivity coverge AC(S c ) of S c. Before we cn compute ctivity coverge, we first need to determine the frequency with which ech ctivity j ppers within the collection of vrints. Definition 4 (Activity frequency). Let S i = (N i, E i,...) P, i = 1, 2,..., n be collection of vrints with weights w i nd ctivity sets N i. For ech j n i=1 N i, we define g( j ) s reltive frequency with which j ppers within the given vrint collection. Formlly: g( j ) = w i (1) S i : j N i Tble 1 shows the frequency of ech ctivities contined in ny of the vrints in our running exmple; e.g., X is present in 80% of the vrints (i.e., in S 1, S 3, S 4, S 5, nd S 6 ), while Z only occurs in 20% of the cses (i.e., in S 5 ).

10 10 Activity A B C D E F G H I J X Y Z g( j) Tble 1. Frequency of ech ctivity within the given vrint collection Definition 5 (Activity coverge). Let M = n i=1 N i be the set of ctivities which re present in t lest one of the vrints. Let further N c be the ctivity set of cndidte process model S c. Given ctivity frequency g( j ) of ech j M, we cn compute ctivity coverge AC(S c ) of model S c s follows: AC(S c ) = j N c g( j ) j M g( (2) j) The vlue rnge of AC(S c ) is [0, 1]. Let us tke originl reference model S s cndidte model. It contins ctivities A, B, C, D, E, F, G, H, I, nd J, but does not contin X, Y nd Z. Therefore, its ctivity coverge AC(S), which represents how much it covers the ctivities in the vrint collection, is Structure Fitting Though AC(S c ) mesures how representtive the ctivity set N c of cndidte model S c is with respect to given vrint collection, it does not sy nything bout the structure of the cndidte model. We therefore introduce structure fitting SF (S c ) s nother mesurement. It mesures to wht degree cndidte model S c structurlly fits to the given collection of vrints S i. We first sketch method which llows to represent process model S s order mtrix. Bsed on this, we introduce ggregted order mtrix which llows to represent collection of process vrints s mtrix. In ddition, we introduce the coexistence mtrix which shows the importnce of the order reltions. Finlly, we describe how to mesure structure fitting SF (S c ) of cndidte model S c. Representing Process Models s Order Mtrices One key feture of our ADEPT chnge frmework is to mintin the structure of the unchnged prts of process model [25]. For exmple, when deleting n ctivity this neither influences successors nor predecessors of this ctivity, nd therefore lso not their order reltions. To incorporte this feture in our pproch, rther thn only looking t direct predecessor-successor reltionships between ctivities (i.e. control edges), we consider the trnsitive control dependencies for ech pir of ctivities; i.e. for given process model S = (N, E,...) P, we exmine for ctivities i, j N, i j their trnsitive order reltion. Logiclly, we determine order reltions by considering ll trces the process model my produce (cf. Section 2). Results re ggregted in n order mtrix A N N, which considers four types of control reltions (cf. Def. 6): Definition 6 (Order mtrix). Let S = (N, E,...) P be process model with N = { 1, 2,..., n }. Let further T S denote the set of ll trces producible

11 11 on S. Then: Mtrix A N N is clled order mtrix of S with A ij representing the order reltion between ctivities i, j N, i j iff: A ij = 1 iff ( t T S with i, j t t( i j )) If for ll trces contining ctivities i nd j, i lwys ppers BEFORE j, we denote A ij s 1, i.e., i lwys precedes of j in the flow of control. A ij = 0 iff ( t T S with i, j t t( j i )) If for ll trces contining ctivities i nd j, i lwys ppers AFTER j, we denote A ij s 0, i.e. i lwys succeeds of j in the flow of control. A ij = * iff ( t 1 T S, with i, j t 1 t 1 ( i j )) ( t 2 T S, with i, j t 2 t 2 ( j i )) If there exists t lest one trce in which i ppers before j nd nother trce in which i ppers fter j, we denote A ij s *, i.e. i nd j re contined in different prllel brnches. A ij = - iff ( t T S : i t j t) If there is no trce contining both ctivity i nd j, we denote A ij s -, i.e. i nd j re contined in different brnches of conditionl brnching. Fig. 5 gives n exmple. Besides control edges, which express direct predecessorsuccessor reltionships, the depicted process model S contins different control connectors: AND-Split, AND-Join, XOR-Split nd XOR-join. The depicted order mtrix represents ll these reltions. For exmple, ctivities A nd B will never pper in the sme trce since they re contined in different brnches of n XOR block. Therefore, we ssign - to mtrix element A AB. Similrly, we obtin the reltion for ech pir of ctivities. The min digonl of the mtrix is empty since we do not compre n ctivity with itself. Under certin conditions, n order mtrix uniquely represents the process model it ws creted from [17]. Anlyzing its order mtrix (cf. Def. 6) is then sufficient for nlyzing the corresponding process model. B D A C E G F 0 : successor 1 : predecessor * : AND-block - : XOR-block AND-Split XOR-Split AND-Join XOR-Join ) Process model S b) Order mtrix of S Fig. 5. ) Process model nd b) relted order mtrix Aggregted Order Mtrix For given collection of process vrints, first, we compute the order mtrix of ech process vrint (cf. Def. 6). Regrding our

12 12 running exmple from Fig. 3, we need to compute six order mtrices (cf. Fig. 6). Note tht we only show prtil view of the order mtrices here, (ctivities H, I, J, X, Y nd Z) due to spce limittions. Afterwrds, we nlyze the order reltion for ech pir of ctivities considering ll order mtrices derived before. As the order reltion between two ctivities might be not the sme in ll order mtrices, this nlysis does not result in fixed reltion, but provides distribution for the four types of order reltions (cf. Def. 6). Regrding our exmple, in 65% of ll cses, H is successor of I (s in S 2, S 3, S 4 nd S 6 ), in 25% of ll cses H is predecessor of I (s in S 1 ), nd in 10% of ll cses H nd I re contined in different brnches of n XOR block (s in S 4 ) (cf. Fig. 6). Generlly, to collection of process vrints we cn define the order reltion between ctivities nd b s 4-dimensionl vector V b = (vb 0, v1 b, v b, v b ). Ech field then corresponds to the frequency of the respective reltion type ( 0, 1, * or - ) s specified in Def. 6. Tke our running exmple nd consider Fig. 6. Here, vhi 1 corresponds to the frequency of ll cses with ctivities H nd I hving order reltionship 1, i.e., ll cses for which H precedes I; we obtin V HI = (0.65, 0.25, 0, 0.1). Formlly, we define n ggregted order mtrix s follows: Definition 7 (Aggregted Order Mtrix). Let S i = (N i, E i,...) P, i = 1, 2,..., n be collection of process vrints with ctivity sets N i. Let further A i be the order mtrix of S i, nd let w i represent the number of process instnces being executed bsed on S i. The Aggregted Order Mtrix of ll process vrints is defined s 2-dimensionl mtrix V m m with m = N i nd ech mtrix element v jk = (vjk 0, v1 jk, v jk, v jk ) being 4-dimensionl vector. For τ {0, 1,, }, element vjk τ expresses to wht percentge, ctivities j nd k hve order reltion τ within the collection of process vrints S i. Formlly: j, k N i, j k : vjk τ = A ijk = τ w i j, k N w i. i Fig. 6 shows the ggregted order mtrix V for the process vrints from Fig. 3. Agin, due to spce limittions, we only consider order reltions for ctivities H, I, J, X, Y, nd Z. Generlly, in n ggregted order mtrix, the min digonl is lwys empty since we do not specify the order reltion of n ctivity with itself. For ll other elements, non-filled vlue in certin dimension mens it corresponds to zero. Importnce of the Order Reltions Generlly, the order reltions computed by n ggregted order mtrix my be not eqully importnt. For exmple, reltionship V HI between H nd I (cf. Fig. 6) would be more importnt thn reltion V HZ, since ctivities H nd I pper together in ll six process vrints while ctivities H nd Z only show up together in vrint S 5 (cf. Fig. 3). We therefore define co-existence mtrix CE in order to represent the importnce of the different order reltions occurring within n ggregted order mtrix V. Definition 8 (Coexistence Mtrix).

13 VH S1 13 Order mtrices :25% S2 :20% S3 :15% S4 :10% S5 :20% S6 :10% V 0 : 65% 1 : 25% * : 0% - : 10% I= (0.65, 0.25, 0, 0.10) Aggregted order mtrix 0 1 * - 0 : successor 1 : predecessor * : AND-block - : XOR-block Fig. 6. Aggregted order mtrix bsed on process vrints Let S i = (N i, E i,...) P, i = 1, 2,..., n be collection of process vrints with ctivity sets N i. Let further w i represent reltive frequency of process instnces being executed bsed on S i. The Coexistence Mtrix of process vrints S 1,..., S n is then defined s 2-dimensionl mtrix CE m m with m = N i. Ech mtrix element CE jk corresponds to the reltive frequency with which ctivities j nd k pper together within the given collection of vrints. Formlly: j, k N i, j k : CE jk = S i : j N i k N i w i. Regrding our running exmple, Tble 7 shows the corresponding coexistence mtrix. Agin, due to spce limittions, we only depict the coexistence mtrix for ctivities H, I, J, X, Y, nd Z. For instnce, CE HI = 1 nd CE HZ = 0.2. This indictes tht order reltion between H nd I is more importnt thn the one between H nd Z. Structure Fitness of Cndidte Process Models Since we cn represent cndidte process model S c by its corresponding order mtrix A c (cf. Def. 6), we determine the structure fitting SF (S c ) between S c nd the vrints (cf. subsection 4.2) by mesuring how similr order mtrix A c nd ggregted order mtrix V (representing the vrints) re. We cn compute S c by mesuring the order reltions between every pir of ctivities in A c nd in V.

14 14 Fig. 7. Pirwise co-existnce of ctivities For exmple, consider reference model S s cndidte process model S c (i.e., S c = S). A prtil view of the corresponding order mtrix A is depicted in Fig. 8. Obviously, A HI = 0 holds, i.e., H is successor of I in model S (cf. Fig. 8). Consider now ggregted order mtrix V. Here the order reltion between ctivities H nd I is represented by the 4-dimensionl vector V HI = (0.65, 0.25, 0, 0.1). If we now wnt to compre how close A HI nd V HI re, we first need to build n ggregted order mtrix V c purely bsed on our cndidte process model S c (S in our cse). Fig. 8 shows both the order mtrix A c nd the clculted ggregted order mtrix V c of process model S c (S c = S). As order reltion between H nd I in V c, we obtin V HI c = (1, 0, 0, 0), i.e., H is lwys successor of I. We now cn compre V HI (which represents the vrints) with V HI c (which represents the reference model). We use Eucliden metrics f(α, β) to mesure the closeness between two vectors α = (x 1, x 2,..., x n ) nd β = (y 1, y 2,..., y n ): f(α, β) = α β α β = n i=1 x iy i n i=1 x2 i n i=1 y2 i (3) f(α, β) [0, 1] computes the cosine vlue of the ngle θ between vectors α nd β in Eucliden spce. If f(α, β) = 1 holds, α nd β exctly mtch in their directions; f(α, β) = 0 mens, they do not mtch t ll. Regrding our running exmple, we obtin f(v HI, V HI c ) = This number indictes high similrity between the order reltions of the cndidte process model with respect to H nd I nd the ones cptured by the vrints. Bsed on (3) which mesures similrity between the order reltions, nd Coexistence mtrix CE (cf. Def. 8) which mesures importnce of the order reltions, we cn define structure fitness SF (S c ) of cndidte model S c s follows: Definition 9 (Structure Fitness). Let S i = (N i, E i,...) P, i = 1, 2,..., n be collection of process vrints with ctivity sets N i. Let further CE be the

15 15 ) b) A c : order mtrix of cndidte model S c s originl reference model S V c : Aggregted order mtrix by cndidte model S c s originl reference model S Fig. 8. Order mtrix A c nd ggregted order mtrix V c constructed by cndidte model S c = S coexistence mtrix, nd V be the ggregted order mtrix of the collection of vrints. For cndidte model S c, let m = N c corresponds to the number of ctivities in S c ; let further V c be ggregted order mtrix of S c. we cn compute structure fitness SF (S c ) s follows: SF (S c ) = m j=1 m k=1,k j (f(v j k, V c j k ) CE j k ) m (m 1) (4) For every pir of ctivities j, k N c, j k, we first compute similrity of corresponding order reltions (s cptured by V nd V c ) by mens of f(v j k, V c j k ), nd second the importnce of these order reltions by CE j k. Structure fitness SF (S c ) [0, 1] of cndidte model S c then equls the verge of the similrity multiplied by the importnce of every order reltion. Regrding our exmple from Fig. 3, structure fitting SF (S) of the originl reference model S corresponds to Fitness Function So fr, we hve described the two mesurements ctivity coverge AC(S c ) nd structure fitting SF (S c ) to evlute fitness of cndidte model S c. While AC(S c ) mesures to wht degree ctivities occurring in the vrints re covered by the cndidte model S c, SF (S c ) mesures to wht degree S c fits structurlly to the vrints. Definition 10 (Fitness). For cndidte model S c, let AC(S c ) be the ctivity cover of S c nd let further SF (S c ) be the structure fitness of S c. We compute

16 16 the fitness F it(s c ) of cndidte model S c s follows: F it(s c ) = AC(S c ) SF (S c ) (5) As AC(S c ) [0, 1] nd SF (S c ) [0, 1], vlue rnge of F it(s c ) is [0,1] s well. Fitness vlue F it(s c ) indictes how close cndidte model S c is to the given collection of vrints. If F it(s c ) = 1 holds, cndidte model S c will fit perfectly to the vrints; i.e., no dditionlly dpttion will be needed. Otherwise, further dpttions might be required. The higher F it(s c ) is, the closer S c will be to the vrints nd the less configurtion efforts will be needed. Regrding our exmple from Fig. 3, fitness vlue F it(s) of the originl reference process model S is F it(s) = AC(S) SF (S) = = As fitness of cndidte model S c is evluted by ctivity coverge AC(S c ) multiplied by structure fitting SF (S c ), we cn utomticlly blnce the number of ctivities to be considered in cndidte model S c. If too mny ctivities of low relevnce (i.e., ctivities which only pper in limited number of instnces; e.g., ctivity Z in our exmple) re considered in the cndidte model, we will obtin high AC(S c ) vlue. However, SF (S c ) then possibly will decrese since coexistence vlues (cf. Def. 8) of such less relevnt ctivities re often very low (cf. Fig. 7). On the contrry, if S c contins too few ctivities, SF (S c ) cn potentilly be very high, while AC(S c ) will be too low in order to qulify S c s good cndidte model. Therefore, high vlue for F it(s c ) does not only men tht S c structurlly fits well to the vrints, but lso tht resonble number of ctivities is considered in the cndidte model. The complexity of computing F it(s c ) is polynomil. To be more precise, let n be the number of vrints nd let m = n i=1 S i be the totl number of ctivities in the vrints. The complexity to compute ctivity frequency (cf. Def. 4) is O(mn) nd the complexity to compute ggregted order mtrix V (cf. Def. 7) is O(2m 2 n). Bsed on this, the complexity to compute the fitness function is O(m + 2m 2 ). Note tht it is significntly lower thn N P level complexity s needed for computing the verge weighted distnce. As lredy discussed, the fitness function F it(s c ) is only resonble guess rther thn n exct mesurement (like verge weighted distnce). Therefore, we nlyze the performnce of our fitness function lter in Section 7. 5 Constructing the Serch Tree We hve sketched the bsic steps of our heuristic mining lgorithm in Section 3.2. In Section 4, we hve then discussed how to evlute cndidte process model S c bsed on fitness function F it(s c ). In this section, we show how we cn find dequte cndidte process models. For tht purpose we present best-first lgorithm which llows us to construct serch tree in such wy tht we cn find the best cndidte model in the serch spce. Section 5.1 first provides n overview on how we construct the serch tree by compring the result models from chnging ech ctivity j N c in the cndidte model S c. In Section 5.2, we further describe in wht wys we cn dpt prticulr ctivity j in order

17 17 to find ll kids of given cndidte process model S c, i.e., ll models which hve distnce one to S c. In Section 5.3, we provide serch results nd n evlution of our exmple models from Fig. 3. Finlly, Section 5.4 provides prototype implementtion of the described lgorithm. 5.1 The Serch Tree Let us revisit Fig. 4 which gives generl overview of our heuristic serch pproch. Strting with the current cndidte model S c, in ech itertion, we serch for its neighbors (i.e., process models which hve distnce 1 to S c ) to see whether we still cn find better cndidte model S c with higher fitness vlue. Generlly, we cn construct neighbor model for given process model S c = (N c, E c,...) by pplying one insert, delete, or move opertion to S c. All ctivities j N i (N i corresponds to the ctivity set of vrint S i ), which hve ppered in the vrint collection, re cndidte ctivities for chnge. Obviously, n insert opertion dds n ctivity j / N c to S c, while the other two opertions delete or move n ctivity j lredy present in S c (i.e., j N c ). Generlly, numerous process models my result by chnging one prticulr ctivity j on S c. Note tht the positions where we cn insert ( j / N c ) or move ( j N c ) ctivity j cn be numerous. Section 5.2 provides detils on how to find ll process models resulting from the chnge of one prticulr ctivity j on S c. In this section, first of ll, we ssume tht we hve lredy found the best process model (i.e., with highest fitness vlue) from ll the models resulting from chnging prticulr ctivity j on S c. We denote this model s the best kid S j kid of S c when chnging j. Our bsic ide is to crete ll neighbor models, to evlute ech of them with the fitness function, nd to finlly choose the one with highest fitness vlue. We present best-first lgorithm to perform our heuristic vrint mining (cf. Algorithm 1). To illustrte this lgorithm, we use the serch tree depicted in Fig. 9. Our serch lgorithm strts with setting the originl reference model S s the initil stte, i.e., S c = S (see the node t the top of Fig. 9). We further define AS s ctive ctivity set, which contins ll ctivities vilble for chnge. At the beginning, AS = { j j n i=1 N i} contins ll ctivities tht pper in t lest one process vrint S i. For ech ctivity j AS, we determine the corresponding best kid S j kid of S c when chnging j on S c (i.e., when deleting, moving or inserting j ). If the best kid S j kid hs higher fitness vlue thn S c, we mrk it with lines; otherwise, we mrk it white nd remove j from AS (cf. Fig. 9). 7 Afterwrds, we find the best one mong ll the best kids S j kid, i.e., the 7 For ll nodes mrked s white, we remove them from ctive ctivity set AS. Consequently, we stop serching the best kid when chnging them. In principle, it is still possible tht when chnging them lter (i.e., bsed on nother cndidte model S c), we cn find better resulting model. However, such chnce is very low due to the fct tht we hve lredy enumerted ll possible solutions by chnging such ctivity on S c. We therefore remove them from AS in order to reduce the serch spce.

18 A B A B C 18 Originl reference model Z Y Best kid when chnging A Best kid when chnging B Z Best kid when chnging Z Best kid when chnging Y Serch result Best sibling of ll best kids Best kid is better thn prent Best kid is NOT better thn prent Strt / End B Terminting condition: No kid is better thn its prent Fig. 9. Construct the serch tree one with highest fitness vlue. We denote this model s best sibling S sib nd we mrk the corresponding ctivity s ccordingly. Since this model S sib is the best model we re ble to obtin by one chnge opertion on cndidte model S c, we set S sib s the first intermedite serch result nd replce S c by S sib for further serch (cf. Fig. 9, S sib re mrked s bull s eyes). Note tht we lso remove s from AS since this ctivity hs now been lredy considered for chnge. The described serch method goes on itertively, until termintion condition is met, i.e., we either cn not find better model, or the llowed serch distnce is reched. The finl serch result S sib corresponds to our discovered reference model S (the node mrked by bull s eye nd circle in Fig. 9). 5.2 Options for Chnging One Prticulr Activity Section 5.1 hs shown how to construct serch tree by compring the best kids S j kid. This section discusses how to find such best kid Sj kid when chnging prticulr ctivity j, i.e., we discuss how to find the neighbors of cndidte model S c by performing one high-level chnge opertion (cf. Def. 1) on j. The best kid S j kid is consequently the one with highest fitness vlue mong ll considered models. Regrding prticulr ctivity j, we consider three type of bsic chnge opertions: delete, move nd insert ctivity (cf. Section 5.1). The neighbor model resulting through deletion of n ctivity j N c cn be esily determined by removing j from the process model nd the corresponding order mtrix; furthermore, movement of n ctivity cn be simulted by its deletion nd subsequent re-insertion t the desired position. Thus, the bsic chllenge in finding neighbors of cndidte model is to pply one ctivity insertion such tht block

19 19 input : A process model S; collection of process vrints S i = (N i, E i,...), i = 1,..., n; llowed serch distnce d ; output: Resulting process model S AS = n 1 i=1 N i /* Define AS s ctive ctivity set */; 2 S c = S /* Define initil cndidte model */; 3 t = 1 /* Define initil serch step */ ; 4 while AS > 0 nd t d /* Serch condition */; 5 do 6 S sib = S c /* Set S c s initil S sib */ ; 7 Define s s the selected ctivity ; 8 forech j AS do 9 S kid = FindBestKid(S c) ; 10 if Fitness(S kid ) > Fitness(S c ) then 11 if Fitness(S kid ) > Fitness(S sib ) then 12 S sib = S kid ; 13 s = j ; 14 end 15 else 16 AS = AS \ { j } ; /* Best kid not better thn its prent */ 17 end 18 end 19 if Fitness(S sib ) > Fitness(S c) then 20 S c = S sib ; /* Initite next itertion */ ; 21 AS = AS \ { s} ; 22 else 23 brek ; 24 end 25 t = t+1 ; 26 end Algorithm 1: Heuristic serch lgorithm for vrint mining structuring nd soundness of the resulting model cn be gurnteed. Obviously, for prticulr ctivity j, the positions where we cn (correctly) insert it into cndidte model S c re the subjects of our interest. Inserting j t (correct) position within S c results in one neighbor model. Therefore, finding ll neighbors first requires finding ll vlid positions where we cn correctly insert j in S c. Fig. 10 provides one exmple. Given process model S, we would like to find ll process models tht my result when inserting ctivity X into S. We pply the following two steps to simulte the insertion of n ctivity: 1. First, we enumerte ll possible blocks the cndidte model S contins. A block cn be n tomic ctivity, self-contined prt of the process model, or the process model itself (cf Algorithm 2 for n lgorithm enumerting ll possible blocks of process model). Note tht the number of possible

20 20 cndidte blocks cn become very lrge; e.g., hundreds of potentil blocks my exist for process model contining 50 ctivities. 2. After hving determined ll blocks of the current model we now cn simulte ll possible insertions of ctivity X. For this purpose, we cn cluster X with ech block nd position it in reltion to this block, i.e., we cn set order reltion τ {0, 1,, } between X nd the selected block B (cf. Def. 6). This wy, we insert X to the respective position such tht it forms nother block together with B; or in nother word, we replce block B by nother block B which contins B nd X. Consequently, we obtin neighbor model S by inserting X into S. Following these two steps, we cn gurntee tht the resulting process model is sound nd block-structured. Every time we cluster n ctivity with block, we ctully dd this ctivity to the position where it cn form bigger block together with the selected one, i.e., we replce self-contined block of process model by bigger one. Consider our exmple from Fig. 10. Among the determined blocks, we cn find the sequentil block defined by ctivities C nd D (step 1). Then we cn cluster ctivity X with this block using order reltion τ = 0 for exmple (step 2). Consequently, we obtin S s one neighbor of S (cf. Fig. 10). In the following, we describe these two steps in detil. I G C D J H Enumerte blocks Blocks contining n ctivities n = 1 n = 2 Blocks {I}, {G}, {C}, {D}, {J}, {H} {C, D}, {J, H} Cluster X with block {C, D} by τ= 0 I C G D X J H S: process model n = 3 n = 4 {C, D, G} {I, C, D, G}, {C, D, G, H} S : one possible resulting model fter inserting ctivity X in S C D G H I J C 1 * D 0 * G * * H I J A S: Order mtrix of S Sme order reltions n = 5 n = 6 {I, C, D, G, J}, {C, D, G, J, H} {I, C, D, G, J, H} τ= 0 C D G H I J C D G H I J 0 X * * * X 1 1 * * * Copy of block {C,D} A S : Order mtrix of S Step 1: enumerte blocks ) b) Step 2: clustering X I C G D J H G I X J H C D I G X C D H J 56 potentil neighbors Cluster X with block {I, C, D, G, J, H} by τ= 1 Cluster X with block {G} by τ= * Cluster X with block {J, H} by τ= - c) Some exmple neighboring models by inserting X into S Fig. 10. Finding the neighboring models by inserting X into process model S

21 21 Step 1: Block-enumerting Algorithm We now present n lgorithm to enumerte ll possible blocks of process model S. Let S = (N, E,...) P be process model with N = { 1,..., n }. Let further A N N be the order mtrix of S. Two ctivities i nd j cn form block if nd only if [ k N \ { i, j } : A ik = A jk ] holds; i.e., two ctivities cn form block if nd only if they hve exctly sme order reltions to the remining ctivities. Consider our exmple from Fig. 10. Here ctivities C nd D cn form block since they hve sme order reltions to the remining ctivities G, H, I nd J. input : A process model S = (N, E,...) nd its order mtrix A output: A set BS with ll possible blocks 1 Define BS x be set of blocks contining blocks with x ctivities. x = (1,..., n); 2 Define ech ctivity i s block B, i = (1,..., n) ; 3 BS 1 = {B 1,..., B n }. /* initil stte */ ; 4 for i = 2 to n /* Compute BS i */; 5 do 6 let j = 1; let k = i; 7 while j k do 8 k = i - j /* A block contining k ctivities cn only be obtined by merging blocks contining i nd j ctivities */; 9 forech (B j, B k ) BS j BS k ; 10 merge = TRUE /* judge whether B j nd B k cn form block */; 11 do 12 if B j Bk = /* Disjoint? */ then 13 forech ( α, β, γ ) B j B k (N \ B j Bk ) do 14 if A αγ A βγ then 15 merge = FALSE /* two blocks con merge only if they show sme order reltions to the ctivities out side the two blocks */; 16 brek ; 17 end 18 end 19 else 20 merge = FALSE; 21 end 22 if merge = TRUE then 23 B p = B j Bk ; 24 BS i = BS i Bp ; 25 end 26 end 27 j = j + 1 ; 28 end 29 end BS = n x=1 BSx Algorithm 2: Block enumerting lgorithm

22 22 The block-enumerting lgorithm is depicted in Algorithm 2. Let us first define BS x s the set contining ll blocks comprising exctly x ctivities. In its initil stte, ech ctivity forms single block by its own (line 2) nd consequently we obtin BS 1 (line 3). The lgorithm strts by computing BS 2 (blocks contining 2 ctivities) nd continues itertively to compute BS i until it reches its upper boundry i = n. In ech itertion, we cn determine block contining x ctivities by merging two disjoint blocks contining j nd k ctivities respectively (i = j +k) (line 8). For exmple, block contining 2 ctivities cn only be obtined by merging two blocks of which ech contins 1 ctivity. Or we cn only obtin block contining 5 ctivities by merging two disjoint blocks contining either 1 nd 4 ctivities respectively or 2 nd 3 ctivities respectively (line 4-29). Lines 9 to 26 check whether or not two blocks B j nd B k cn be merged together. This is possible iff ny ctivities α B j nd β B k show sme order reltions to the remining ctivities outside the two blocks. Otherwise (lines 14-17), B j nd B k cnnot form block (i.e., merge = F ALSE). Until we obtin ll sets of blocks BS x with x = 1,..., n ctivities per block, we cn define set BS s BS = n x=1 BS x. BS consequently corresponds to ll blocks, process model S contins (line 28). In our context, we consider ech block s set rther thn process model, since structure of these blocks is lredy cler in S. 8 Consider the exmple from Fig. 10. For the given process model S, ll possible blocks re enumerted. For exmple, s ctivities C nd D show sme order reltions in respect to the remining ctivities in order mtrix A s, they my form block. Or, block {C, D} nd block {G} show sme order reltions in respect to remining ctivities H, I nd J; therefore they cn form bigger block {C, D, J}. As S contins 6 ctivities, its blocks re orgnized in 6 groups contining blocks of different sizes. Step 2: Cluster Inserted Activity with One Block In Step 1, we hve shown how to enumerte ll possible blocks for given cndidte model S c. Bsed on this, we describe where we cn insert prticulr ctivity j in S c such tht we obtin sound nd block-structured model gin. Assume tht we wnt to insert ctivity X in S (cf. Fig. 10). To ensure the block structure of the resulting model, we cluster X with n enumerted block, i.e., we replce one of the previously determined blocks B by bigger block B contining B s well s X. In the context of this clustering, we set the order reltion between block B nd inserted ctivity X s τ {0, 1,, } (see Def. 6), i.e., the order reltions between X nd ll ctivities of B re defined by τ. One exmple is given in Fig. 10b, where the inserted ctivity X is clustered with block {C, D} by order reltion τ = 0, i.e., we set X s successor of the 8 Worst-cse, the complexity of this lgorithm is 2 n where n corresponds to the number of ctivities. However, this worst-cse scenrio will only occur if ny combintion of ctivities my form block (like process model for which ll ctivities re ordered in prllel to ech other). During our simultion, in most cses we were ble to enumerte ll blocks of process model within few milliseconds. This indictes tht complexity is low in prctice.

23 23 sequence block contining C nd D. To relize this clustering, we hve to set the order reltions between X on the one hnd nd ctivities C nd D from the selected block on the other hnd to 0. Furthermore, order reltions between X nd the remining ctivities re the sme s for C nd D respectively. Afterwrds these three ctivities form new block {C, D, X} replcing the old one (i.e., {C, D}). This wy, fter inserting X into S, we obtin new sound nd block-structured process model S. Fig. 10b shows only one resulting model S which we obtin when inserting X in S. Obviously, S is not the only neighboring models in this context since we cn insert X t different positions in S; i.e., for ech block S enumerted in Step 1, we cn cluster it with X by ny one of the four order reltions τ {0, 1,, }. Regrding our exmple from Fig. 10, S contins 14 blocks. Consequently, the number of models tht my result when inserting X in S equls 14 4 = 56; i.e., we obtin 56 potentil models by inserting X into S. Fig. 10c shows some neighboring models of S. Note tht the 56 resulting models re not necessrily unique, i.e., it is possible tht some of them re sme. However, this is not n importnt issue in our context since our fitness function F it(s c ) cn be quickly computed. Therefore, some redundnt informtion will not significntly decrese the performnce of our heuristic serch lgorithm. 5.3 Serch Result for Our Running Exmple Regrding our exmple from Fig. 3, we now present the serch result we obtin when pplying our heuristics serch lgorithm. Fig. 11 does not only show the finl resulting model, but ll the intermedite process models discovered during the serch. Note tht in this scenrio, we do not set ny limittion on the number of serch steps, i.e., we llow the lgorithm to go s fr s possible to find the best reference model. Fig. 11 shows the evolution of the originl reference model S. The first opertion δ 1 = move(s, J,B, endf low) chnges S into intermedite result model R 1. According to Algorithm 1, R 1 constitutes tht neighbor model of S which cn be derived by pplying one vlid chnge opertion to S nd which shows highest fitness vlue in comprison to ll other neighbor models of S. Using R 1 s next input for our lgorithm, we discover process model R 2. In this context, chnge opertion δ 2 = insert(r 1, X, E, B) is pplied. Finlly, we obtin R 3 by performing chnge δ 3 = move(r 2, I,D,H) on model R 2. Since we cnnot find better process model by chnging R 3 nymore, we obtin R 3 s finl result. Note tht if we set constrints on llowed serch steps (i.e., we only llow to chnge originl reference model by mximl d chnge opertions), the finl serch result would be s follows: R d if d 3 or R 3 if d > 3. We further compre the originl reference model S nd ll (intermedite) serch results in Tble 2. We first show the fitness vlue of ll the models in Fig. 11. As our heuristic serch lgorithm is bsed on finding process models with better fitness vlues, we cn observe improvements of the fitness vlues with ech serch step. The fitness vlue F it(s) increses from (model S) to (model R 1 ), nd

24 1 24 = M o v e ( S, J, B, e n d F lo w ) S[ 1>R1 I E A E F B G J C D S: reference model A B J F G I C D H H E E I A X F C A X F C G D G B B D I J J H R3 : result fter three chnges (Finl result) H >R3 [ 3 R2 ) H, D I,, 2 ( R e v p M = 3 R1 : result fter one chnge R 1 [ 2 >R 2 2=Insert (R1, X, E, B) R2 : result fter two chnges Fig. 11. Serch result by every chnge opertions then to (model R 2 ). Finlly, it reches (model R 3 ). Though such fitness vlue is only resonble guessing of how good the result model is, the improvement of the fitness vlue t lest indictes tht discovered models is ssumed to get better in ech itertion. Still, we need to exmine whether or not the discovered process models re indeed getting better. We therefore compute the verge weighted distnce between the discovered model nd the vrints, which is precise mesurement in our context. From Tble 2, the improvement of verge weighted distnces fter pplying the bove chnges becomes cler, i.e., the verge weighted distnce drops monotoniclly from 4 (when considering model S) to 2.25 (when considering model R 3 ). Mesuring the verge weighted distnce shows tht for the given exmple, the lgorithm performs s expected. One importnt reson to design heuristic serch lgorithm in our context ws to be ble to only consider the most relevnt chnge opertions, i.e., the importnt chnges (reducing verge weighted distnce between reference model nd vrints most) should be discovered t beginning while the trivil ones should be either ignored or be put t the end (cf. Section 1). We therefore dditionlly evlute delt-fitness nd delt-distnce, which indicte the reltive S R 1 R 2 R 3 Fitness Averge weighted distnce Delt fitness Delt Distnce Tble 2. Serch result by every chnge

25 25 improvement of fitness vlues nd the reduction of verge weighted distnce for every itertion of the lgorithm. For exmple, the first chnge opertion δ 1 chnges S into R 1, nd consequently improves fitness vlue (delt-fitness) by nd reduces verge weighted distnce (delt-distnce) by 0.8. Similrly, δ 2 reduces verge weighted distnce by 0.6 nd δ 3 by It is obvious tht the delt-distnce is monotoniclly decresing s the number of chnge opertions increses. This indictes tht the importnt chnges re performed t beginning of the serch, while the less importnt ones re performed t the end. Another importnt feture of our heuristic serch is its bility to utomticlly decide on which ctivities shll be included in the reference model. A predefined threshold or filtering of the less relevnt ctivities in the ctivity set re not needed. In our exmple, X is utomticlly inserted, when Y nd Z re not. The only concern in our heuristic vrint mining is to reduce the verge weighted distnce, i.e., the three chnge opertions (insert, move, delete) re utomticlly blnced bsed on their influence on the reduction of verge weighted distnce. This is lso significnt improvement when compred to mny other process mining techniques in which preprocessing of trivil ctivities should be conducted before performing the mining [15, 38]. 5.4 Proof-of-Concept Prototype The described pproch hs been implemented nd tested using Jv. Figure 12 depicts screenshot of our prototype. We hve used our ADEPT2 Process Templte Editor [27] s tool for creting process vrints. For ech process model, the editor cn generte n XML representtion with ll relevnt informtion (like nodes, edges, blocks) being mrked up. We store creted vrints in vrints repository (cf. Fig. 12) which cn be ccessed by our mining procedure. Fig. 12. Screenshot of the prototype

26 26 The mining lgorithm hs been developed s stnd-lone Jv progrm, independent from the process editor. It cn red the originl reference model nd ll process vrints, nd it cn generte the result models ccording to the XML schem of the process editor. All intermedite serch results re lso stored nd cn be visulized using the ADEPT2 editor. ADEPT2 is next genertion dptive process mngement tool, which llows for the flexible execution of process instnces. Prticulrly, the ADEPT2 frmework enbles d-hoc chnges of single process instnces during runtime s well s chnges t the process type level nd their propgtion to running instnces if desired nd possible [26]. Bsed on the presented mining lgorithm, ADEPT is ble to provide full process lifecycle support. 6 Simultion Setup Clerly, using one exmple to mesure the performnce of our heuristic mining lgorithm is fr from being enough. Since computing the verge weighted distnce is t N P level, the fitness function, whose clcultion only needs polynomil time, is only n pproximtion of verge weighted distnce. Therefore, these two mesures cn NOT be correlted perfectly, i.e., it is not for sure tht the improvement of the fitness vlue will lwys result in deduction of verge weighted distnce. Therefore, the first question is how the fitness improvement (delt-fitness) correltes with the reduction of verge weighted distnce (deltdistnce)? Moreover, we wnt to nlyze whether the lgorithm cn scle up. Clerly, it will tke longer time to find the result if we hve to cope with lrge collection of vrints with dozens or even hundreds of ctivities, simply becuse the serch spce would be significntly lrger. Wht is more importnt to know is whether the performnce will lso chnge when fcing lrge models, i.e., does the correltion between delt-fitness nd delt-distnce depend on the size of the model? In ddition, we re interested in whether it is relly true tht the most importnt chnge opertions (i.e., the chnge opertions which lrgely reduce verge weighted distnce) re performed t beginning of the serch. If this is the cse, we do not need to fer too much when setting serch limittions or filtering out the chnge opertions performed t the end. Therefore, the third reserch question is s follows: To wht degree re the importnt chnge opertions positioned t the beginning of the serch steps? I.e., to wht degree re delt-fitness nd delt-distnce monotoniclly decresing s the number of chnge opertions increses? Finlly, we investigte whether we cn further improve the performnce of our heuristic vrint mining lgorithm using some other dt mining or rtificil intelligence techniques, i.e., we try to dopt the concept of pruning s commonly used in dt mining [36] nd rtificil intelligence [20]. In our context of vrint mining, we cn prune out the sitution when delt-fitness is not nicely correlted with delt-distnce, nd consequently improve the perfor-

27 27 mnce of our lgorithm by dpting our lgorithm to such sitution. Therefore, the lst reserch question is s follows: : How cn we improve the performnce of our heuristic mining lgorithm by dopting the concept of pruning? We try to nswer ll these questions using simultion, i.e., by generting thousnds of dt smples, we cn provide sttisticl nswer for these questions. To summrize, we use simultion to nswer the following reserch questions: 1. How much is delt-fitness correlted to delt-distnce in our heuristic lgorithm? 2. Cn the performnce of the lgorithm scle up? Tht is, does the correltion between fitness vlue nd verge weighted distnce depend on the size of models? 3. Is it relly true tht the importnt chnge opertions re performed t the beginning? Tht is, does improvement of the verge weighted distnce monotoniclly decreses s the number of chnge opertions increses? 4. Would it be possible to further prune the dt such tht we cn improve the performnce of our heuristic mining lgorithm? In this section, we describe how we setup the simultion, simultion results themselves re presented in Section 7. In totl, we creted 72 groups of dtsets bsed on different scenrios. Ech of these groups consists of 1 reference model nd 100 vrints configured out of it. In totl, we crete 7272 process models for nlysis. This section is orgnized s follows. Section 6.1 describes n lgorithm to generte rndom reference process model. Section 6.2 describes bsed on which prmeters we cn configure collection of vrints out of such rndomly creted reference model. Section 6.3 summrizes the considered scenrios nd describe how we dpt the prmeters when creting the respective vrints. Finlly, Section 6.4 shows how simultions re setup to mesure the performnce of our heuristic vrint mining lgorithm. 6.1 Generting the Reference Process Model Our generl ide of rndomly generting (block structured) reference model is to cluster blocks, i.e., we rndomly cluster ctivities (blocks) into bigger block nd this clustering continues itertively until ll the ctivities (blocks) re clustered together. The detil of our pproch is depicted in lgorithm 3. To illustrte how Algorithm 3 works, n exmple is given in Fig. 13. As input set of ctivities {A,B,C,D,E} re given, nd the gol is to construct vlid, block-structured process model S out of them. The lgorithm strts by considering ech ctivity i s bsic block B i, nd dding these blocks to set B (lines 1 nd 2). Regrding our exmple, B = {{A}, {B}, {C}, {D}, {E}}. The lgorithm first rndomly select two blocks B i, B j (lines 4) nd cluster them together with rndomly chosen order reltion τ (lines 5 nd 6). Regrding our exmple, blocks {B} nd {C} re selected to construct new block {B, C} with rndomly chosen order reltion 1 (which mens B precedes C). The newly

28 28 input : Set of ctivities i the process model to be generted should contin, i = (i,..., n) output: Vlid process model S 1 Define ech ctivity i s bsic block B i, i = (1,..., n); 2 Define set B := {B 1,..., B n} /* initil stte */ ; 3 while B > 1 do 4 rndomly selected two blocks B i, B j B ; 5 rndomly select n order reltion τ {0, 1,, } ; 6 build block B k which contins sub-blocks B i nd B j hving order reltion τ ; 7 8 B := B \ {B i, B j } ; B := B {B k } ; 9 end 10 S := B 0 with B 0 B Algorithm 3: Rndomly generting reference model creted block {B, C} will then replce blocks {B} nd {C} in the block set B, i.e., B = {{A}, {B, C}, {D}, {E}} (lines 7 nd 8). This procedure (lines 4-8) is repeted until block set B contins one single block B 0 (B 0 = {A,B,C,D,E} regrding our exmple). This block then represents our rndomly generted process model S. (line 10). Fig. 13 shows the process model we rndomly generted s well s the block constructed in ech itertion. Order reltion rndomly chosen 0 1 E 1 * B C E D A Result A B C 1 2 D 3 4 Fig. 13. Exmple of generting rndom process model In prctise, certin order reltions re used more often thn others. For exmple, the predecessor-successor reltion is used more frequently thn AND/XORsplits [46]. When rndomly generting process model, we therefore tke this into ccount s well. Rther thn rndomly setting the order reltion for two blocks, we set the probbility for choosing AND-split (τ = ) nd XORsplit (τ = ) to 10% respectively, while predecessor-successor reltionships (τ = {0, 1}) re chosen with probbility of 80%.

29 Prmeters Considered for Generting Process Vrints Tking generted reference process model, we control how vrints re configured by djusting specific prmeters. For exmple, these prmeters determine how mny chnge opertions needed to perform to configure prticulr vrint nd where ctivities should be moved or inserted to nd so forth. Bsiclly, we hve considered the following prmeters when generting the process vrints. 1. Prmeter 1 (Size of Process Models) The size of vrints (i.e., the number of its ctivities) cn potentilly influence results. Therefore, we need to check the behvior of our lgorithm when pplying them to vrints of different sizes. This is lso importnt to test the sclbility of our lgorithm, i.e., whether or not the correltion of delt-fitness nd delt-distnce depends on th size of the models. 2. Prmeter 2 (Similrity of Process Vrints) This prmeter mesures how close these vrints re, e.g., whether or not the vrints re similr to ech other. In this context, similrity mesures how difficult it is to configure one vrint into nother [17]. 3. Prmeter 3 (Activity Occurrence) One importnt feture of our heuristic mining lgorithm is its bility to utomticlly decide which ctivities should or should not be considered in the discovered model. Clerly, ctivities with low ctivity frequency (cf. Def. 4) my not be considered. We use this prmeter to perform detiled nlysis of this sitution. 4. Prmeter 4 (Activity Consistence) As ctivity frequency cn be quickly computed by scnning the ctivity sets of the vrints, it is lso importnt to know whether the position for one prticulr ctivity is consistent. e.g., if n ctivity is often inserted, we re lso interested in whether such ctivity is lwys inserted into prticulr position. This prmeter therefore helps us to exmine whether or not the position where ctivities re inserted or moved to cn influence our serch lgorithm. As prmeter 1 nd 2 re esy to understnd, we use Fig. 14 to provide further nlyze of prmeter 3 nd 4. Consider the sitution when n ctivity j is very often inserted when configuring vrints (high occurrence), nd its inserted position is lso very consistent (high consistency). The heuristic lgorithm therefore should hve high chnce to lso insert j in the reference model, since such insertion is very often used in configuring ech vrint (up-right prt of Fig 14). On the contrry, if n ctivity ppers in only few vrints (low occurrence) nd its positions in those vrints re lso chnging ll the time (low consistency), we therefore cn ignore such ctivity (down-left prt of Fig. 14) since such chnge does not repet often. The difficult prt is the rest two prts in the vlue spce (mrked with question mrk). It would be difficult to know whether we should insert one ctivity with high ctivity frequency but its positions re not very stble. Reson is tht even if we insert this ctivity in the reference model, we lso need to move it frequently since its position in the vrints re not stble, therefore inserting such ctivity is not necessrily led to the reduce of the verge weighted distnce. On the

30 30 1 Occurrence Drop Keep 0 Consistency 1 Fig. 14. Consistency nd occurrence contrry, if n ctivity does not pper often in the vrints, but its positions in these vrints re very consistent, it is lso difficult to determine whether or not we should insert this ctivity in the reference model. If we insert it, we need to often delete it since it does not shows up often in the vrints. Therefore, in the following subsections, we will explin how to simulte these situtions to cover the vlue spce in our simultion. 6.3 Methods for Generting Dt Sets When configuring the vrints bsed on given reference model, we vry the vlues of the prmeters described in subsection 6.2 nd consequently generte vrints bsed on different scenrios. The following choices re vilble for the different prmeters: Prmeter 1 (Size of Process Models) This prmeter controls how mny ctivities shll be contined in the originl reference model nd consequently control, in generl, the size of process vrints. There cn be three options: 1. Smll-sized reference models: contining 10 ctivities 2. Medium-sized reference models: contining 20 ctivities 3. Lrge-sized reference models: contining 50 ctivities In our simultion, we use the sme reference model for the groups contining process models of sme size. Reson is tht we wnt to void the influence of the rndomly generted reference model. According to [21], process models contining more thn 50 ctivities hve high risk of errors. therefore, it is not recommended to design such lrge model. Following this guideline, we lso set the lrgest size of process model for 50 ctivities in our simultion. Note tht the vrints my hve different ctivity sets thn the reference model since we lso employ insert nd delete opertions when configuring vrint. Prmeter 2 (Similrity of Process Vrints) The closeness between the vrints is mesured by the totl number of chnge opertions we pply when generting vrints (cf. Def. 2). Three possible choices exist:

31 31 1. Smll-chnge: 10% of ctivities re chnged 2. Medium-chnge: 20% of ctivities re chnged 3. Lrge-chnge: 30% of ctivities re chnged For exmple, for the dtsets comprising lrge-size process vrints (i.e., vrints with 50 ctivities), medium-chnge would men we need to perform 10 chnge opertions on the reference model to configure prticulr process vrint. This wy, we cn control the distnce between the reference model nd its vrint. And indirectly, we cn control the similrity between vrints since they re ll controlled in certin distnces with the reference model. Prmeter 3 (Activity Occurrence) We use Fig. 15 to illustrte how we configure prmeter 3. The X-xis represents list of ctivities nd the Y-xis shows their corresponding probbility being involved in process configurtions. Given reference model S, let j N, j = 1..., m be the ctivities in S. Bsed on prmeter 1 nd 2, we cn decide on by performing how mny chnge opertions we re ble to configure S into prticulr vrint S i. For exmple, if prmeter 1 is lrge-sized model (S contins 50 ctivities) nd prmeter 2 is lrge-chnge (chnge 30% of ctivities in S), we know tht we need to chnge 15 ctivities to configure one vrint out of S. Let x be the number of chnge opertions, we cn crete x new ctivities x, x = n n + x. Then the m + x ctivities constitute the X-xis in Fig. 15. Probbility for chnge x = number of chnge opertions m-x+1... m-1 m m+1... m+x stble moved inserted Activity Old New Fig. 15. Chnge occurrence for ctivities in the reference model The Y-xis therefore shows the probbility ech ctivity is involved in chnge opertions. In order to compre ctivities with different probbility being involved in chnge opertions, we ssign ech ctivity with different probbility. We use Gussin distribution to relize the difference. For ctivity j, j = (m x + 1)... (m + x), the probbility it involved in chnge opertions is j j 1 (j m) 2 1 2π e 2( x 3 )2 (i.e.,x (m, ( x 3 )2 ), which mens the expected vlue of the dis-

32 32 tribution is m with stndrd devition x 3. The probbility of chnging j is the integrl in the intervl [j-1,j) ). Tble 3 shows the probbility for the groups with prmeter 1 being smll-sized nd prmeter 2 being lrge-chnge. Note tht ctivity K, L nd M, re not in the originl reference model S. The purpose of using Gussin distribution here is only to simulte the sitution tht ctivities hve different probbility being involved in chnges. We do NOT ssume tht the probbility for ctivities being involved in chnge must follow Gussin distribution or ny other distributions. Activity A B C D E F G H I J K* L* M* Probbility Tble 3. The probbility for ech ctivity being involved in chnge opertions when configuring process vrints After we described how to simulte the probbility for ech ctivity being involved in chnges, now we describe wht kind of chnge opertions we consider on generting ech vrints. Since the originl reference model S only contin ctivities j, j = 1... m, if n ctivity j, j = m m + x re involved in process configurtions, we cn only insert such ctivity in S in order to configure prticulr vrint S i. For the ctivities j, j = m x m which lredy exist in the reference model S, we cn move such ctivity to nother position when configuring prticulr vrint S i. We ignored delete opertion when generting the dtset since it cn be esily hndled. 9 While how often n ctivity j is inserted when generting vrints cn be esily obtined by ctivity frequency g( j ), it is difficult to know how often ctivities re involved in move opertions becuse move opertion does not led to the chnge of the ctivity set nd structure chnges re difficult to identify. We design our simultion by considering both insert nd move opertion lso with the purpose of checking whether these two types of chnge opertions re considered eqully importnt by our heuristic mining lgorithm. Since the move nd insert opertions hve the sme probbility to be employed when generting dtset, we would lso expect the heuristic mining lgorithm cn lso pply reltively sme mount of chnge or insert opertions to discover reference model. If it is not the cse, it is n indiction tht the fitness function is not properly designed. Prmeter 4 (Activity Consistence) While prmeter 3 describe how often prticulr ctivity is chnged, prmeter 4 controls where these ctivities re chnged (moved or inserted) to. As we discussed in subsection 5.2, there re numerous options to insert n ctivity j into prticulr process model S c. j cn be clustered with ny block 9 Delete opertion is only not considered in generting dtset but is still considered in performing mining. Deleting ctivity j leds to the chnge of ctivity frequency g( j), which cn lso be relized by insert opertion.

33 33 in S c by one of the order reltion τ = {0, 1,, }. Since the number of blocks contined in process model is often lrge, the number of possible resulting models by inserting j in S c is lso lrge. Therefore, for prticulr group, we cn define severl chnge opertions s consistent chnge opertions, i.e., when n ctivity j is inserted into S c, it is lwys clustered with prticulr block by prticulr order reltion τ. Therefore, we cn control how frequent we perform the corresponding consistent chnge opertions to configure prticulr vrint S i. The first step is then to determine these consistent chnge opertions for prticulr group. For exmple, consider Fig. 15. For given reference model S, we only llow performing x chnges from ctivity set j, j = (m x+1)... (m+x) to configure prticulr process vrints S i. The remining ctivities, i.e., j, j [1, (m x)], re stble, i.e., will not be chnged. These ctivities re the suitble cndidte blocks to be cluster with. For ech ctivity j, j [(m x + 1), (m + x)], we cn find corresponding ctivity j, j [1, (m x)], so tht if we need to perform consistent chnge on j, it is lwys clustered with j by prticulr order reltion τ. For exmple, consider the cse we described in Tble 3. Activities A, B, C, D, E, F nd G will not be involved in chnge since their probbility for chnge ll equl to 0. Therefore, if we need to move H, I or J or to insert K, L or M, we lwys cluster them with one of the stble ctivities, e.g., if we need to move J to configure prticulr vrint by consistent chnge opertion, it will lwys be clustered with B by order reltion sying τ = 0. We therefore cn control ctivity consistence of ctivity j by setting how often consistent chnge opertion is performed. If j is to be chnge, we set the probbility to perform consistent chnge opertion to p, nd the probbility to perform rndom chnge opertion (i.e., rndomly select block nd cluster it with rndom order reltion) is consequently 1 p. In order to nlyze the reltion between the ctivity consistency nd ctivity occurrence, we hve designed four different scenrios to cover the spce s illustrted in Fig. 14. The scenrios re depicted in Fig. 16. These four scenrios re constructed by keeping either occurrence or consistency stble while chnging the others: 1. LowOcc scenrio keeps occurrence of ctivities ll t 30%, while mkes the consistency of these ctivities vrying from 0 to 80%. 2. HighOcc scenrio keeps the occurrence of ctivities ll t 70% while mking the consistency of them lso vrying from 0 to 80%. 3. LowCon scenrio keeps the consistency of ctivities ll t 30% while mking the occurrence chnge from 0 to 80%. 4. HighCon scenrio keeps the consistency of ctivities ll t 70% while mking the occurrence chnges from 0 to 80%. Note tht the scenrio is not describing prticulr ctivities but set of ctivities. For the group with prmeter 1 being lrge-sized nd prmeter 2 being lrge-chnge, 30 ctivities (15 for move nd 15 for insert) will be involved in configuring the 100 process vrints with different occurrence or consistency vlues. Therefore, one prticulr scenrio cn cover lrge spce in the vlue

34 34 Occurrence LowCon HighCon 70% HighOcc 30% LowOcc 30% 70% Consistency Fig. 16. Spce coverge using simple scenrios spce (cf. Fig. 16). This cn significntly enhnce our reserch since for ech scenrios, we hve collection of ctivities for nlysis. In order to better cover the vlue spce, we hve designed four dditionl scenrios. While the occurrence lwys follows the curve s depicted in Fig. 15, the consistency of the corresponding ctivities re depicted in Fig Positively correlted scenrio mkes the consistency of prticulr ctivity positively correlted to its occurrence, i.e., when ctivity j hs high occurrence, it lso hs high consistency. This pplies to both moved nd inserted ctivities. 2. Negtively correlted scenrio, on the contrry, mkes the consistency negtively correlted to the occurrence. When j hs high occurrence vlue, it should hve low consistency. This pplies lso to both moved nd inserted ctivities. 3. Focus on move scenrio ssigns high consistency to the moved ctivities while set low consistency to the inserted ctivities. 4. Focus on insert scenrio on the contrry ssigns high consistency to the inserted ctivities nd low consistency to the moved ctivities. Fig. 17 lso depicts the vlue spce s covered by ech scenrio. We dditionl design these scenrio not only to better cover the vlue spces, but lso to nlyze whether the insert nd move opertions re considered eqully importnt by the lgorithm, i.e., since insert nd move opertions re eqully used when generting the dtset, our heuristic mining lgorithm lso should not show significnt difference on move nd insertions when discovering the reference model. To sum up, this subsection describes how ech group of dtset re generted by djusting the vlues of different prmeters. Since we hve 3 options for prmeter 1 (smll-size, medium-size nd lrge-sized), 3 options for prmeter

35 1 2 - x m m +x 1 2 m -x m -1 m m +x o ns i ste n cy o ns i ste n cy x m m +x 1 2 m -x m -1 m m +x o ns i ste n cy o ns i ste n cy 35 Consistncy Consistncy 80% stble 80% m m m moved inserted Old New Positively correlted Occurrence Occurrence C C Consistncy Consistncy 80% stble 80% Old m moved inserted New Negtively correlted Occurrence Occurrence C C Activity stble m moved m m inserted stble moved m inserted Old Focus on move New Old New Focus on insert Fig. 17. Spce cover using complex scenrios 2 (smll-chnge, medium-chnge nd lrge-chnge) nd 8 scenrios to cover the vlue spce of occurrence nd consistency, we hve in turn generted = 72 groups of dtset contining 1 reference model nd 100 vrints configured from it. In the next subsection, we will describe wht informtion we cn obtin from ech group of dtset. 6.4 Simultion Setup For ech one of the 72 groups of dtset constructed bsed on different scenrios, we perform our heuristic mining to discover new reference model by mining the collection of vrints. We do not set ny constrints on serch steps, i.e., the lgorithm will only terminte when no better model cn be discovered. We used Dell Ltitude Lptop (2.4 GHZ CPU nd 3.5 GB RAM) to run our simultion under Windows environment. The following informtion re documented in ech group: 1. Originl reference model, i.e., the model bsed on which we perform the chnges (cf. subsection 6.1).

36 process vrints. Bsed on given reference model, we generte ech vrint by configuring the reference model ccording to the different scenrios s described in subsection 6.3. For ech group, we generte 100 process vrints. Note tht lthough the 100 vrints re generted by following sme scenrio, these models re NOT sme. The reson is tht the scenrio only depicted the feture of the collection of vrints but not prticulr vrint. 3. Serch results. We document ll the intermedite process models s well s the end serch result s obtined from the heuristic serch. The corresponding chnge opertions re lso documented. As exmple consider Fig. 11 s the heuristic serch result we cn obtin by mining the reference model S nd vrints S i from Fig Fitness nd verge weighted distnce. Similr to the evlution results we presented in Tble 2 of serch results (cf Fig. 11), we lso compute the fitness nd verge weighted distnce vlue of every intermedite process models s obtined from our heuristic mining. We dditionl documents delt-fitness nd delt-distnce in order to exmine the influence of every chnge opertion. 5. Execution time. At lst, we lso document The running time to perform our heuristic serch lgorithm. 7 Simultion Result In Section 6, we hve described how the simultion is setup nd wht reserch questions we wnt o nswer through the simultion, in this section we give the nswer to these reserch questions. In prticulr: 1. Subsection 7.1 will provide the bsic nlysis of the heuristic lgorithm, e.g., verge distnce reduction, running time etc. 2. Subsection 7.2 will show the correltion nlysis of delt-fitness nd deltdistnce. 3. Subsection 7.3 will compre the correltion vlues of process models with different size to nswer whether the performnce of our lgorithm cn scle up. 4. Subsection 7.4 will exmine whether nor not the importnt chnges re performed t the beginning. 5. Subsection 7.5 will provide method to trin threshold vlue to improve the performnce of our heuristic mining lgorithm. 7.1 Bsic Performnce Anlysis Improvement on verge weighted distnces In 60 (our of 72) groups, we re ble to discover new reference model different thn the originl one. The verge weighted distnce of the discovered model is shorter thn tht of the originl reference model, i.e., setting the discovered model s the

37 37 new reference model cn reduce on verge chnge opertions to configure the vrints. When compred to the verge weighted distnce of the originl reference process model, it is on verge 17.92% shorter. Number of Chnge nd Move opertions For the 60 (out of 72) groups which we re ble to discover different reference model, we perform in totl 284 chnge opertions, i.e., on verge 4.73 chnge opertions per group. Within the 284 chnge opertions, there re 132 insert opertions nd 152 move opertions. Clerly, we see no significnt difference between the number of insert or move opertions. which indictes tht the two opertions re well-blnced by our lgorithm. Reson is tht we performed reltively sme mount of insert or move opertions when generting the dt set (cf. subsection 6.3) nd such trend is lso shown during the discovery of the reference model. Running Time Clerly, the number of ctivities contined in the vrints cn significntly influence the execution time of our lgorithm. The serch spce is simply lrger for lrge models since the number of cndidte ctivities for chnge is higher nd the number of blocks contined in lrge model is lso higher. We therefore nlyze the execution time of our lgorithm by considering the size of process models. The verge running time is summrized in Tble 4. smll-sized medium-sized lrge-sized Averge serch time (s) Averge # of chnges performed Model sizes Tble 4. Averge serch time for process models of different sizes While it tkes only little time to discover the result model for the smllsized nd medium-sized models, it tkes considerble longer time (on verge seconds) to find the result for the lrge-sized model. It tkes long not only becuse the process models re lrger, but lso becuse the serch steps re longer s well. We need to perform on verge 8.43 chnge opertions on the originl reference model to discover the end result. Reson is tht the vrints in these groups re more different from ech other thn those in the smll-sized nd medium-sized model (the number of chnge opertions is determined by the model size, i.e., we chnge respectively 10%, 20% nd 30% of ctivities to configure vrint (cf. subsection 6.3)). The consequence is tht the discovered model could lso be more different from the originl model. However, we believe the running time is cceptble considering the complexity of the problem, especilly when compred with other dt mining problems which might tke hours or even dys to compute [36].

38 Correltion of Delt Fitness nd Delt Distnce One importnt issue we wnt to investigte is how fitness vlue is correlted with verge weighted distnce. As we discussed in Section 6, fitness vlue is only quick guess of how close cndidte model S c is to the collection of vrints, it is not s precise s the verge weighted distnce nd cn lso not perfectly correlted with the verge weighted distnce since computing it is n N P problem. In this subsection, we will nlyze how much they re correlted with ech other. Our heuristic serch lgorithm is best-first pproch, i.e., we serch whether we cn find process model with higher fitness vlue, therefore, it is more useful to mesure how much the delt-fitness (the difference between the fitness vlues before nd fter chnge) is correlted with the delt-distnce (the difference between the verge weighted distnces before nd fter chnge), becuse it is improvement of the fitness vlue (delt-fitness) tht guide the serch steps (cf. Section 5). Another reson not to directly nlyze the correltion of the fitness vlue nd the distnce vlue is their vlue rnges. While, the fitness vlue of model hs vlue rnge [0, 1], the verge weighted distnce hs vlue rnge [0, + ]. On the contrry, the delt-fitness nd the delt-distnce both hve vlue rnge [ 1, 1] since we only chnge one ctivity t time. Therefore, the correltion between the delt-fitness nd delt-distnce is more resonble to consider. Similr techniques for evluting fitness function is lso widely used in evluting other lgorithms [14]. Since every chnge opertion will led to prticulr chnge on the process model nd consequently cretes delt-fitness vlue nd delt-distnce vlue, we obtin 284 dt smples since we performed in totl 284 chnge opertions. We plot these dt smple in Fig. 18 s (delt-fitness, delt-distnce). For exmple, consider the serch result in Tble 2. We cn obtin three dt smples of delt-fitness nd delt-distnce from it, i.e., (0.171,0.8), (0.04,0.6) nd (0.017,0.25). In Fig. 18, the x-xis is the delt-fitness vlue nd the y-xis is the deltdistnce vlue. All delt-fitness re lrger thn 0. It is esy to understnd since the lgorithm would only perform chnge if nd only if it cn find model with higher fitness vlue. However, the delt-distnce is not lwys lrger thn 0, which indictes tht sometimes the chnge opertion cn mke the result even worse. It is not surprising to see this since the fitness vlue is only quick guessing of the distnce. We dditionl plot line with delt-distnce being 0 to seprte good smples (positive delt-distnce) nd bd smples (negtive delt-distnce). In Fig. 18, we lso mrked the dt smple from groups of different model sizes seprtely. It is cler tht these three groups form three different clusters, i.e., they re not overlpping too much with ech other nd the lrger the model size is, the more its cloud positions towrds the y-xis. This indictes tht the size of process model cn influence the delt-fitness vlue. Reson is tht the fitness vlue is mesured by the verge of how much ech pir of order reltions mtches ech other in the cndidte model nd in the vrints (cf. Formul 5),

39 39 Good Bd Fig. 18. Construct the serch tree therefore since one chnge opertion only chnge one ctivity, it would hve higher influence on the smll model compred to on the big ones. Therefore, it is more resonble to nlyze the correltion of the groups of different sizes seprtely. We use Person correltion to mesure the correltion between the deltfitness nd delt-distnce [35]. Let X be the delt-fitness nd Y be the deltdistnce. We obtin n dt smples (x i, y i ) where i = 1,..., n. Let x nd ȳ be the men of X nd Y, let s x nd s y be the stndrd devition of X nd Y. The Person correltion cn be computed using Formul 6. r xy = xi y i n xȳ (n 1)s x s y (6) We dditionl tested whether the correltions re significnt, i.e., whether the correltion coefficients re significntly different from 0 [35]. The results re summrized in Tble 5. Number of dt Correltion Probbility of r = 0 Significntly Correlted Smll-sized <1.0E-8 Yes Medium-sized <1.0E-8 Yes Lrge-sized <1.0E-8 Yes Tble 5. Delt fitness nd delt distnce correltions of groups with different sizes

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Text mining: bag of words representation and beyond it

Text mining: bag of words representation and beyond it Text mining: bg of words representtion nd beyond it Jsmink Dobš Fculty of Orgniztion nd Informtics University of Zgreb 1 Outline Definition of text mining Vector spce model or Bg of words representtion

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES MARCELLO DELGADO Abstrct. The purpose of this pper is to build up the bsic conceptul frmework nd underlying motivtions tht will llow us to understnd ctegoricl

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

Semistructured Data Management Part 2 - Graph Databases

Semistructured Data Management Part 2 - Graph Databases Semistructured Dt Mngement Prt 2 - Grph Dtbses 2003/4, Krl Aberer, EPFL-SSC, Lbortoire de systèmes d'informtions réprtis Semi-structured Dt - 1 1 Tody's Questions 1. Schems for Semi-structured Dt 2. Grph

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

Tool Vendor Perspectives SysML Thus Far

Tool Vendor Perspectives SysML Thus Far Frontiers 2008 Pnel Georgi Tec, 05-13-08 Tool Vendor Perspectives SysML Thus Fr Hns-Peter Hoffmnn, Ph.D Chief Systems Methodologist Telelogic, Systems & Softwre Modeling Business Unit Peter.Hoffmnn@telelogic.com

More information

Misrepresentation of Preferences

Misrepresentation of Preferences Misrepresenttion of Preferences Gicomo Bonnno Deprtment of Economics, University of Cliforni, Dvis, USA gfbonnno@ucdvis.edu Socil choice functions Arrow s theorem sys tht it is not possible to extrct from

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

Improper Integrals. October 4, 2017

Improper Integrals. October 4, 2017 Improper Integrls October 4, 7 Introduction We hve seen how to clculte definite integrl when the it is rel number. However, there re times when we re interested to compute the integrl sy for emple 3. Here

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS

A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS Ji-Eun Roh (), Te-Eog Lee (b) (),(b) Deprtment of Industril nd Systems Engineering, Kore Advnced Institute

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

ISG: Itemset based Subgraph Mining

ISG: Itemset based Subgraph Mining ISG: Itemset bsed Subgrph Mining by Lini Thoms, Stynryn R Vlluri, Kmlkr Krlplem Report No: IIIT/TR/2009/179 Centre for Dt Engineering Interntionl Institute of Informtion Technology Hyderbd - 500 032, INDIA

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search Uninformed Serch [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI t UC Berkeley. All CS188 mterils re vilble t http://i.berkeley.edu.] Tody Serch Problems Uninformed Serch Methods

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

Eliminating left recursion grammar transformation. The transformed expression grammar

Eliminating left recursion grammar transformation. The transformed expression grammar Eliminting left recursion grmmr trnsformtion Originl! rnsformed! 0 0! 0 α β α α α α α α α α β he two grmmrs generte the sme lnguge, but the one on the right genertes the rst, nd then string of s, using

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

Tree Structured Symmetrical Systems of Linear Equations and their Graphical Solution

Tree Structured Symmetrical Systems of Linear Equations and their Graphical Solution Proceedings of the World Congress on Engineering nd Computer Science 4 Vol I WCECS 4, -4 October, 4, Sn Frncisco, USA Tree Structured Symmetricl Systems of Liner Equtions nd their Grphicl Solution Jime

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X 4. Mon, Sept. 30 Lst time, we defined the quotient topology coming from continuous surjection q : X! Y. Recll tht q is quotient mp (nd Y hs the quotient topology) if V Y is open precisely when q (V ) X

More information

CSCI 446: Artificial Intelligence

CSCI 446: Artificial Intelligence CSCI 446: Artificil Intelligence Serch Instructor: Michele Vn Dyne [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI t UC Berkeley. All CS188 mterils re vilble t http://i.berkeley.edu.]

More information

Computer-Aided Multiscale Modelling for Chemical Process Engineering

Computer-Aided Multiscale Modelling for Chemical Process Engineering 17 th Europen Symposium on Computer Aided Process Engineesing ESCAPE17 V. Plesu nd P.S. Agchi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Computer-Aided Multiscle Modelling for Chemicl Process

More information

6.2 Volumes of Revolution: The Disk Method

6.2 Volumes of Revolution: The Disk Method mth ppliction: volumes by disks: volume prt ii 6 6 Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem 6) nd the ccumultion process is to determine so-clled volumes

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Stained Glass Design. Teaching Goals:

Stained Glass Design. Teaching Goals: Stined Glss Design Time required 45-90 minutes Teching Gols: 1. Students pply grphic methods to design vrious shpes on the plne.. Students pply geometric trnsformtions of grphs of functions in order to

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE 3.1 Scheimpflug Configurtion nd Perspective Distortion Scheimpflug criterion were found out to be the best lyout configurtion for Stereoscopic PIV, becuse

More information

CS 221: Artificial Intelligence Fall 2011

CS 221: Artificial Intelligence Fall 2011 CS 221: Artificil Intelligence Fll 2011 Lecture 2: Serch (Slides from Dn Klein, with help from Sturt Russell, Andrew Moore, Teg Grenger, Peter Norvig) Problem types! Fully observble, deterministic! single-belief-stte

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

pdfapilot Server 2 Manual

pdfapilot Server 2 Manual pdfpilot Server 2 Mnul 2011 by clls softwre gmbh Schönhuser Allee 6/7 D 10119 Berlin Germny info@cllssoftwre.com www.cllssoftwre.com Mnul clls pdfpilot Server 2 Pge 2 clls pdfpilot Server 2 Mnul Lst modified:

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

1 Quad-Edge Construction Operators

1 Quad-Edge Construction Operators CS48: Computer Grphics Hndout # Geometric Modeling Originl Hndout #5 Stnford University Tuesdy, 8 December 99 Originl Lecture #5: 9 November 99 Topics: Mnipultions with Qud-Edge Dt Structures Scribe: Mike

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

CSEP 573 Artificial Intelligence Winter 2016

CSEP 573 Artificial Intelligence Winter 2016 CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Integration. September 28, 2017

Integration. September 28, 2017 Integrtion September 8, 7 Introduction We hve lerned in previous chpter on how to do the differentition. It is conventionl in mthemtics tht we re supposed to lern bout the integrtion s well. As you my

More information

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

The Distributed Data Access Schemes in Lambda Grid Networks

The Distributed Data Access Schemes in Lambda Grid Networks The Distributed Dt Access Schemes in Lmbd Grid Networks Ryot Usui, Hiroyuki Miygi, Yutk Arkw, Storu Okmoto, nd Noki Ymnk Grdute School of Science for Open nd Environmentl Systems, Keio University, Jpn

More information

Epson Projector Content Manager Operation Guide

Epson Projector Content Manager Operation Guide Epson Projector Content Mnger Opertion Guide Contents 2 Introduction to the Epson Projector Content Mnger Softwre 3 Epson Projector Content Mnger Fetures... 4 Setting Up the Softwre for the First Time

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Approximation by NURBS with free knots

Approximation by NURBS with free knots pproximtion by NURBS with free knots M Rndrinrivony G Brunnett echnicl University of Chemnitz Fculty of Computer Science Computer Grphics nd Visuliztion Strße der Ntionen 6 97 Chemnitz Germny Emil: mhrvo@informtiktu-chemnitzde

More information

Integration. October 25, 2016

Integration. October 25, 2016 Integrtion October 5, 6 Introduction We hve lerned in previous chpter on how to do the differentition. It is conventionl in mthemtics tht we re supposed to lern bout the integrtion s well. As you my hve

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Step-Voltage Regulator Model Test System

Step-Voltage Regulator Model Test System IEEE PES GENERAL MEETING, JULY 5 Step-Voltge Regultor Model Test System Md Rejwnur Rshid Mojumdr, Pblo Arboley, Senior Member, IEEE nd Cristin González-Morán, Member, IEEE Abstrct In this pper, 4-node

More information

such that the S i cover S, or equivalently S

such that the S i cover S, or equivalently S MATH 55 Triple Integrls Fll 16 1. Definition Given solid in spce, prtition of consists of finite set of solis = { 1,, n } such tht the i cover, or equivlently n i. Furthermore, for ech i, intersects i

More information

A Formalism for Functionality Preserving System Level Transformations

A Formalism for Functionality Preserving System Level Transformations A Formlism for Functionlity Preserving System Level Trnsformtions Smr Abdi Dniel Gjski Center for Embedded Computer Systems UC Irvine Center for Embedded Computer Systems UC Irvine Irvine, CA 92697 Irvine,

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yroslvsky. Fundmentls of Digitl Imge Processing. Course 0555.330 Lecture. Imge enhncement.. Imge enhncement s n imge processing tsk. Clssifiction of imge enhncement methods Imge enhncement is processing

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

Digital Design. Chapter 6: Optimizations and Tradeoffs

Digital Design. Chapter 6: Optimizations and Tradeoffs Digitl Design Chpter 6: Optimiztions nd Trdeoffs Slides to ccompny the tetbook Digitl Design, with RTL Design, VHDL, nd Verilog, 2nd Edition, by Frnk Vhid, John Wiley nd Sons Publishers, 2. http://www.ddvhid.com

More information

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation Alloctor Bsics Dynmic Memory Alloction in the Hep (mlloc nd free) Pges too corse-grined for llocting individul objects. Insted: flexible-sized, word-ligned blocks. Allocted block (4 words) Free block (3

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Control-Flow Analysis and Loop Detection

Control-Flow Analysis and Loop Detection ! Control-Flow Anlysis nd Loop Detection!Lst time! PRE!Tody! Control-flow nlysis! Loops! Identifying loops using domintors! Reducibility! Using loop identifiction to identify induction vribles CS553 Lecture

More information

A Scalable and Reliable Mobile Agent Computation Model

A Scalable and Reliable Mobile Agent Computation Model A Sclble nd Relible Mobile Agent Computtion Model Yong Liu, Congfu Xu, Zhohui Wu, nd Yunhe Pn College of Computer Science, Zhejing University Hngzhou 310027, Chin cckffe@yhoo.com.cn Abstrct. This pper

More information

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1 Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

Essential Question What are some of the characteristics of the graph of a rational function?

Essential Question What are some of the characteristics of the graph of a rational function? 8. TEXAS ESSENTIAL KNOWLEDGE AND SKILLS A..A A..G A..H A..K Grphing Rtionl Functions Essentil Question Wht re some of the chrcteristics of the grph of rtionl function? The prent function for rtionl functions

More information

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES GENEATING OTHOIMAGES FO CLOSE-ANGE OBJECTS BY AUTOMATICALLY DETECTING BEAKLINES Efstrtios Stylinidis 1, Lzros Sechidis 1, Petros Ptis 1, Spiros Sptls 2 Aristotle University of Thessloniki 1 Deprtment of

More information

MATH 2530: WORKSHEET 7. x 2 y dz dy dx =

MATH 2530: WORKSHEET 7. x 2 y dz dy dx = MATH 253: WORKSHT 7 () Wrm-up: () Review: polr coordintes, integrls involving polr coordintes, triple Riemnn sums, triple integrls, the pplictions of triple integrls (especilly to volume), nd cylindricl

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM Interntionl Journl of Innovtive Reserch in Advnced Engineering (IJIRAE) Volume1 Issue1 (Mrch 2014) A Comprison of the Discretiztion Approch for CST nd Discretiztion Approch for VDM Omr A. A. Shib Fculty

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

Revisiting the notion of Origin-Destination Traffic Matrix of the Hosts that are attached to a Switched Local Area Network

Revisiting the notion of Origin-Destination Traffic Matrix of the Hosts that are attached to a Switched Local Area Network Interntionl Journl of Distributed nd Prllel Systems (IJDPS) Vol., No.6, November 0 Revisiting the notion of Origin-Destintion Trffic Mtrix of the Hosts tht re ttched to Switched Locl Are Network Mondy

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-169 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information