A Data-centric Perspective for Workflow Model Management

Similar documents
On The Theoretical Foundation for Data Flow Analysis in Workflow Management

Correction of Data-flow Errors in Workflows

Analysis of BPMN Models

Data Flow Modeling and Verification in Business Process Management

Process Model Consistency Measurement

Business Process Modeling. Version 25/10/2012

Constructing User Interaction Behaviors Net from System Log Hua-Qiang SUN 1,a, Shu-Leng DONG 2,b, Bing-Xian MA 1,c,*

Workflow : Patterns and Specifications

MIT Sloan School of Management

Dealing with Artifact-Centric Systems: a Process Mining Approach

Research and Implementation of Event Driven Multi Process Collaboration Interaction Platform Xiang LI * and Shuai ZHAO

Adaptive Selection of Necessary and Sufficient Checkpoints for Dynamic Verification of Temporal Constraints in Grid Workflow Systems

Towards Automated Process Modeling based on BPMN Diagram Composition

Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites *

Multi-dimensional database design and implementation of dam safety monitoring system

Generation of Interactive Questionnaires Using YAWL-based Workflow Models

Construction of BPMN-based Business Process Model Base

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Temporal Exception Prediction for Loops in Resource Constrained Concurrent Workflows

White Paper Workflow Patterns and BPMN

A Formal Model of Dataflow Repositories

Service-Based Realization of Business Processes Driven by Control-Flow Patterns

THE SELECTION OF THE ARCHITECTURE OF ELECTRONIC SERVICE CONSIDERING THE PROCESS FLOW

Leveraging Set Relations in Exact Set Similarity Join

Integration of UML and Petri Net for the Process Modeling and Analysis in Workflow Applications

On Capturing Process Requirements of Workflow Based Business Information Systems *

Automated Online News Classification with Personalization

Research and Design of Crypto Card Virtualization Framework Lei SUN, Ze-wu WANG and Rui-chen SUN

3.4 Data-Centric workflow

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

NeOn Methodology for Building Ontology Networks: a Scenario-based Methodology

Supporting Documentation and Evolution of Crosscutting Concerns in Business Processes

Business Process Modelling

Extending Enterprise Services Descriptive Metadata with Semantic Aspect Based on RDF

On Reusing Data Mining in Business Processes - A Pattern-based Approach

Seamless (and Temporal) Conceptual Modeling of Business Process Information

2017 International Conference on Economics, Management Engineering and Marketing (EMEM 2017) ISBN:

A Two Phase Verification Algorithm for Cyclic Workflow Graphs

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

IJESMR International Journal OF Engineering Sciences & Management Research

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

An Improved Pre-classification Method for Offline Handwritten Chinese Character Using Four Corner Feature

Workflow-Centric Distribution of Organizational Knowledge: the Case of Document Flow Coordination

Graph Based Workflow Validation

Trust4All: a Trustworthy Middleware Platform for Component Software

Hierarchical Clustering of Process Schemas

Mining with Eve - Process Discovery and Event Structures

Detecting Approximate Clones in Process Model Repositories with Apromore

2. The Proposed Process Model of CBD Main phases of CBD process model are shown, in figure Introduction

Formal Modeling of BPEL Workflows Including Fault and Compensation Handling

Multi-path based Algorithms for Data Transfer in the Grid Environment

A Tagging Approach to Ontology Mapping

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation

Making Business Process Implementations Flexible and Robust: Error Handling in the AristaFlow BPM Suite

User Interface Design: The WHO, the WHAT, and the HOW Revisited

ProcessGene Query a Tool for Querying the Content Layer of Business Process Models

VisoLink: A User-Centric Social Relationship Mining

Perspectives on User Story Based Visual Transformations

Conceptual Modeling and Specification Generation for B2B Business Processes based on ebxml

A Planning-Based Approach for the Automated Configuration of the Enterprise Service Bus

Business Process Management Seminar 2007/ Oktober 2007

Research on Design and Application of Computer Database Quality Evaluation Model

Flexab Flexible Business Process Model Abstraction

COMMIUS Project Newsletter COMMIUS COMMUNITY-BASED INTEROPERABILITY UTILITY FOR SMES

Design and Realization of Agricultural Information Intelligent Processing and Application Platform

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

A Design Method for Composition and Reuse Oriented Weaponry Model Architecture Meng Zhang1, a, Hong Wang1, Yiping Yao1, 2

Top-k Keyword Search Over Graphs Based On Backward Search

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

Petri-net-based Workflow Management Software

Final Project Report

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Using Component-oriented Process Models for Multi-Metamodel Applications

Eliminating False Loops Caused by Sharing in Control Path

International Journal of Software and Web Sciences (IJSWS) Web service Selection through QoS agent Web service

Introduction to IRQA 4

Fault Class Prioritization in Boolean Expressions

INTRODUCING A MULTIVIEW SOFTWARE ARCHITECTURE PROCESS BY EXAMPLE Ahmad K heir 1, Hala Naja 1 and Mourad Oussalah 2

Web Service Usage Mining: Mining For Executable Sequences

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Design and Implementation of unified Identity Authentication System Based on LDAP in Digital Campus

DATAFLOW ANALYSIS AND WORKFLOW DESIGN IN BUSINESS PROCESS MANAGEMENT

Managing test suites for services

Business Process Modeling. Version /10/2017

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology

A Tool for Supporting Object-Aware Processes

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Incorporating Design Schedule Management into a Flow Management System*

UML Specification and Correction of Object-Oriented Anti-patterns

A Model Transformation from Misuse Cases to Secure Tropos

Business Intelligence & Process Modelling

Road Network Traffic Congestion Evaluation Simulation Model based on Complex Network Chao Luo

A Novel Image Classification Model Based on Contourlet Transform and Dynamic Fuzzy Graph Cuts

Support for development and test of web application: A tree-oriented model

Distributed Applications from Scratch: Using GridMD Workflow Patterns

A Finite State Mobile Agent Computation Model

Unstructured Data Migration and Dump Technology of Large-scale Enterprises

SQL Query Optimization on Cross Nodes for Distributed System

Mappings from BPEL to PMR for Business Process Registration

Transcription:

A Data-centric Perspective for Workflow odel anagement Zhiyong Liu niv. of Science and Technology of China - City niversity of Hong Kong Joint Research Center Suzhou, P.R. China liuzy@mail.ustc.edu.cn Research-in-Progress J. Leon Zhao Department of Information Systems, City niversity of Hong Kong Kowloon, Hong Kong jlzhao@cityu.edu.hk Harry Jiannan Wang Department of Accounting and IS, niversity of Delaware Newark, DE, SA hjwang@udel.edu Huaping Chen niversity of Science and Technology of China Hefei, P.R. China hpchen@ustc.edu.cn Abstract With increasingly widespread adoption of workflow technology as a standard solution to business process management, much effort is put on designing appropriate workflow models. The reuse of existing models has been recently suggested to support more efficient workflow design, which requires the storage, search and composition of workflow models. In this paper, we propose a new framework of workflow model management from a data-centric perspective, which provides unique features lacking in existing approaches based on control flow and function description. We develop a datacentric indexing method for organizing workflow models in the repository, design a flooding algorithm for searching workflow models, and present a composition approach for integrating different workflow models to meet specific user requirements. Keywords: workflow model management, model reuse, data flow modeling, data dependency, data-centric workflow composition Thirty Second International Conference on Information Systems, Shanghai 2011 1

Knowledge anagement and Business Intelligence Introduction With increasingly widespread adoption of workflow technology as a standard solution to business process management (BP), much effort is put on designing the appropriate workflow models that satisfy the business logic and constraints for the given company. Workflow models are designed to support complex process management in many business domains such as supply chain management, knowledge management and e-commerce (Stohr and Zhao 2001). Choosing the effective and efficient workflow models is a key factor of BP success. Workflow modeling plays a crucial role in BP, which captures the implicit process knowledge and domain knowledge in organizations. any modeling methods have been proposed and applied by both academic researchers and business practitioners,such as Petri net (urata 1989), L activity diagram (Dumas and ter Hofstede 2001), logic-based method (Bi and Zhao 2004), metagraph (Basu and Blanning 2000), BPN (Wohed et al. 2006), and data-flow-based approach (Sun et al. 2006). In particular, dataflow-based workflow modeling is a relatively new method, which designs the workflow models by analyzing the input and output data of tasks and their dependency relationships (Sun et al. 2006). Compared with the other workflow modeling methods, which require business users to provide plenty of information about specific processes, such as description of business function, sequence of tasks and structured representation of business rules, data-flow-based modeling method only needs the users to identify the input and output data in the processes, which is a easier task given those data items are usually contained in the documents they use on a daily basis. Besides starting from scratch, workflow models could also be developed based on existing similar models. odel reuse will certainly improve the quality and efficiency of business process design. Business practitioners have documented many workflow models in various business domains based on their knowledge and experiences, which are named Process Reference odels (PRs), such as SAP Process Reference odel and Oracle Best Practice Processes (Wang and Wu 2010). Besides the function-based reuse of reference models, researchers also introduced Workflow odel Patterns (or Workflow odel Components) as basic reusable units of models which can accomplish certain decomposed business functions (Zhuge 2003). This kind of model reuse usually mixes function-based reuse and structure-based reuse. In addition, case definitions at schema level and instance level, and case descriptions, can also be sources for reuse. Case repositories have been developed to support case-based model reuse (adhusudan et al. 2004). Nevertheless, little research has been done about workflow model reuse based on a data-centric approach, which takes the data flow structure of workflow models as the focus of model management. In this paper, we fill this research gap by proposing a framework of workflow model management, which consists of model storage, model search and model composition via a data-centric approach. We design the data-centric approach for workflow model management and evaluate the approach by applying it to a banking process and by comparing it with similar approaches. Our contributions are as follows: First, we define the formal data structure of a workflow model repository and propose a data-centric indexing method of workflow models based on data dependency relationships. Second, we propose a method for the match of user inquiries and candidate workflow models based on data similarity, and develop a flooding algorithm to search for model groups to satisfy user inquiry. Third, we develop a formal method to compose candidate groups of models to satisfy specific user requirements. Brief Literature Review We briefly survey the relevant works about data flow modeling methods and workflow model reuse. Data flow modeling is a relatively new workflow modeling method. Compared with traditional modeling methods that focus on the structure issues (Kiepuszewski et al. 2000), data flow is considered as an important complement to the specification of workflow. Sadiq et al. (2004) identify the importance of data flow issues to workflow research, and introduce the modeling, specification and validation of data flow. They also illustrate the essential requirements for data flow modeling and seven types of data anomalies. Russell et al. (2005) describe a series of workflow data patterns to show the different ways of representation and utilization of data flow in workflow system. The analysis of data flow is a useful tool for 2 Thirty Second International Conference on Information Systems, Shanghai 2011

Liu et al. / Data-centric Workflow odel anagement workflow verification, which has been used to validate the correctness of control flow (Sun et al. 2004). Sun et al. (2006) propose a formal data flow specification and develop algorithms to detect data anomalies based on data dependency. The data flow modeling method is further extended to support process integration (Guo et al. 2008). A formal workflow language based on Petri net and nested relational calculus has been proposed to support data flow modeling (Hidders et al. 2008). Reusing the existing models is an efficient and effective way to support model design. Wang and Wu (2010) propose a collaborative approach to integrate and manage process reference models (PRs). Their approach is based on spiral model for organizational knowledge creation and leverage Web 2.0 technologies, such as social tagging and classification to maintain PRs. Besides the reference models, the model patterns (or model components) are also reusable. Zhuge (2003) defines workflow component with four characteristics: independency, encapsulation, completeness and consistency. Thom et al. (2007) classify workflow patterns into nine categories, and Altintas et al. (2005) abstract workflows into several components according to their service functions. Cao et al. (2005) define workflows components, which consist of function, quality of service, control flow model and business domain. The model design cases are also a reusable resource, which includes more than control flow models but also case descriptions such as information on functions, agents and resource. adhusudan et al. (2004) propose a framework to reuse two categories of workflow cases, i.e., prototypical cases and instance cases. The indexing structure of the cases is a set of hierarchies, which includes functional, organizational, and task hierarchies. They also introduce a similarity flooding algorithm to support case retrieval. The similarity-based model matching is another important aspect of model reuse. adhusudan et al. (2004) support case retrieval, by matching the NAE properties of nodes in models through language processing and string matching techniques to get the initial similarities. The final similarities are calculated by iterative fix-point computation. Zhuge (2002) calculates the matching degree of two activities with activity-distance. The similarity degree of processes is calculated using matching degree of activities, numbers of activities and numbers of sub-processes of the processes. Xiao et al. (2009) define the overall similarity between a given workflow pattern and candidate Petri nets which is expressed as the weighted sum of the three similarities, i.e., vertices, arcs and paths similarities. A Data-centric Approach for Workflow odel anagement An Illustrative Example We will illustrate our approach using two simplified banking processes. A process model repository has been built for the bank to store the existing models to facilitate new workflow model design. The two example processes are shown in Figure 1 and 2 using extended L activity diagram as defined in (Sun et al. 2006). Transfer Application: After receiving a transfer application, the bank firstly verifies the application to ensure that the information is accurate, and checks the completeness to make certain that all the information needed is provided. When the status data items Application Verified and Application Complete are received, the Received Application data item is transformed to an Accepted Application. Then the password from the customer is checked before allowing the customer to access the Account Balance. If there is enough balance, the transfer will be executed and the transfer process finishes. Figure 1. Workflow odel of Transfer Application Thirty Second International Conference on Information Systems, Shanghai 2011 3

Knowledge anagement and Business Intelligence Figure 2. Workflow odel of Credit Payment Application Credit Payment Application: After Transaction Information and Authentication Information are received from the client,the bank accepts the credit payment application. Then the Credit History and Liquid Assets of the customer are checked to create a Credit Score. The information of this transaction and the credit score of the customer are analyzed to give an overall evaluation such as risk level and amount adjustment. Finally, the application with evaluation result is approved by the officer. A new workflow model of Loan Application process needs to be designed, which begins from receiving an application and ends with that application being approved. The requirement can be concluded as modeling a process that produces the data Application Approved with input data Received Application. Besides starting from scratch, we can reuse the two aforementioned models. The data structure of the models can be used as index. The candidate models are found by similarity-based searching. Considering the possible incapacity of individual models to fulfill the requirements, composition of candidate models is also supported. In following sections, we will explain our approach in detail, which facilitates datacentric model reuse. odel Storage Definition 1 (Workflow odel) A workflow model is denoted as W= {Description, Control-flow, Dataflow}: Description contains the business domain, function and information related to the application of the model; Control-flow describes the structure of the control flow, which consists of the task set and set of flows between tasks; Data-flow is a set of relationships between tasks and data, which contains tasks and their input and output data. The data structure of workflow models stored in the repository can be showed as follow: {odel Name W; odel Description; Control Flow Structure Г= { T, F}; Data Set D= {d i i=1, 2,, n }. Task-Data Relationship TD = {(t, D in-t, D out-t) t T }. } W is the unique identification of this model. odel Description records the information provided by model designers to describe the function of this model. Г is the control flow structure which consists of tasks and the flows between them. T is a finite set of tasks in W, T={t i i=1, 2,, m }, t is a specific task in T (t T). F is a finite set of the flows between tasks in T, F= (t i t j), t i T, t j T, i j. The data items which are used or produced in this model form a data set D. TD describes the relationship between input/output data and tasks, D in-t and D out-t are the sets of input and output data of task t. For example, the workflow model of Transfer Application shown in Figure 1 can be stored as follow: {odel Name: Transfer Application; odel Description: Receive, verify, check and accept transfer applications; Control Flow Structure: Г= {T= {t 1: Receive Application, t 2: Verify Application, t 3: Check Application Completeness, t 4: Accept Application, t 5: Check Password, t 6: Check Balance, t 7: Transfer}, F= {(t 1 t 2), (t 1 t 3), (t 2 t 4), (t 3 t 4), (t 4 t 5), (t 5 t 6), (t 6 t 7)}}; 4 Thirty Second International Conference on Information Systems, Shanghai 2011

Liu et al. / Data-centric Workflow odel anagement Data Set: D= {d 1: Received Application, d 2: Application Verified, d 3: Application Complete, d 4: Accepted Application, d 5: Password Correct, d 6: Account Balance, d 7: Adequate Balance, d 8: Transfer Finished}; Task-Data Relationship: TD = {(t 1, Φ, {d 1}), (t 2, {d 1}, {d 2}), (t 3, {d 1}, {d 3}), (t 4, {d 1, d 2, d 3}, {d 4}), (t 5, {d 4}, {d 5}), (t 6, {d 4, d 5}, {d 6, d 7}), (t 7, {d 4, d 7}, {d 8})}. } Definition 2 (Direct Data Dependency and Indirect Data Dependency) Data Direct Dependency relationship is defined as: a task cannot produce data B without data A as input, then, we call B directly depends on A, denoted as A->B. When A 1->A 2, A 2->A 3,, A n-1->a n (n=3, 4, ), we say A n indirectly depends on A 1, which is denoted as A 1->->A n. Definition 3 (Data Item Description) Data Item Description consists of optional description of the data item, dependency relationships and data-task relationship showing the tasks that use specific data as input or output. The data structure of data items can be showed as follow: {Data Name d; odel Name W; Data Description; Direct Dependency DP d = {(d->d i) d,d i D, i=1, 2, }; Indirect Dependency DP i = {(d->->d j) d,d j D, j=1, 2, }; Data-Task Relationship DT = {(d, T i, T o)}. } d is the identification of this data item. W is the name of the model that d belongs to. DP d and DP i are the sets of direct and indirect data dependency relationships of d respectively. DT records the tasks related to this data item d, T i and T o are the sets of tasks that use d as input and output respectively. In the Credit Payment Application example shown in Figure 2, the data item d c ( Accepted Application ) can be stored as follow: {Data Name: Accepted Application; odel Name: Credit Payment Application; Data Description: the information provided by the applicant is accepted and recorded; Direct Dependency: DP d = {(d c->d d), (d c->d e), (d c->d f), (d c->d g), (d c->d h)}; Indirect Dependency: DP i = {(d c->->d f), (d c->->d g), (d c->->d h)}; Data-Task Relationship DT = {(d c, {t 4, t 5, t 7, t 8}, {t 3})}. } A Data-centric Workflow odel Indexing ethod -- Data Dependency Structure Definition 4 (Data Dependency Structure) A Data Dependency Structure (DDS) is a mapping from the data flow structure of a specific workflow model. DDS consists of the data set and the direct dependency relationship between data, which is denoted as: DDS = {D, DP d}. (a) (b) Figure 3. Examples of Data Dependency Structure Thirty Second International Conference on Information Systems, Shanghai 2011 5

Knowledge anagement and Business Intelligence For example, in the model of Transfer Application,the data set D = {d 1, d 2, d 3, d 4, d 5, d 6, d 7, d 8 }, the direct dependency DP d = {(d 1->d 2), (d 1->d 3), (d 1->d 4), (d 2->d 4), (d 3->d 4), (d 4->d 5), (d 4->d 6), (d 4->d 7), (d 4- >d 8), (d 5->d 6), (d 6->d 7), (d 7->d 8)}. We represent the data items with ovals and indicate the direct dependency relationship with arrows. The DDS of this model is illustrated in Figure 3(a). From this visual index of this model, we can intuitively identify the direct and indirect dependency relationships. The data dependency structure of model of Credit Payment Application is created similarly, as shown in Figure 3(b). odel Search We show a sample model search requirement as follows. ser Inquiry: External Input ->->External Output Expected Result: atched individual odels or odel Groups The user describes the possible input and output data for the required model. For the Loan Application process, one search requirement can be expressed as {Received Application} ->-> {Application Approved}. The expected result could be an individual model which satisfies the user requirements or several models collectively fulfill the requirement after appropriate composition. According to the different types of expected results, we further classify the data-centric model matching into Individual odel atching and ultiple odels atching. Individual odel atching The user requirements for individual model matching are shown below: ser Inquiry: D in->->d out Expected Result: Individual odels contain d in->->d out, (d in D in, d out D out) D in (or D out) is a set of data items required by the user to be input (or output) of the model, d in and d out are data items, d in D in, d out D out. The user expects the target model to produce output data D out using input data D in. If an individual model satisfies this requirement, then the dependency relationship d in->->d out should exist in the data flow structure of this model. For example, the user requirement for individual model matching of the Loan Application process modeling can be expressed as follows: D in= {Received Application} and D out= {Application Approved}. The expected result is a set of models W= {w i}. The dependency relationship set of w i is denoted as w i.dp, which includes both direct and indirect dependencies in w i. The dependency required by user Received Application->->Application Approved should belong to w i.dp, which means the returned models are capable to produce Application Approved with input Received Application. ser Inquiry: {Received Application}->->{Application Approved} Expected Result: W={w i Received Application->->Application Approved w i.dp, i=1,2, } There may be more than one model that contains d in->->d out. The most appropriate result should be the one with the highest similarity with user requirement. By extending the definition of similarity measure defined in (Terversky 1977), we give the definition of similarity as follow: Definition 5 (Data Set Similarity) The data set described in the user requirement is denoted as D (D = D in D out), data set contained in a candidate model is denoted as D, data set similarity is defined as following: SI Data = D D D D + α D The data set similarity focuses on the proportion of the common elements between user data set and model data set. The variable α (α 0) denotes the weight of the data set that includes the data items in user requirements but not in the model. In practice, the variable α should be predefined. The initial value of α can be assigned with 1 which means that the common and uncommon data sets between user D 6 Thirty Second International Conference on Information Systems, Shanghai 2011

Liu et al. / Data-centric Workflow odel anagement requirement and model take the same importance. Different values can be assigned to α to search the one leads to best performance, which is beyond the scope of this paper. Definition 6 (Dependency Similarity) The data dependency relationship which is described in user requirements is denoted as DP, which is a set of dependencies like d in->->d out (d in D in, d out D out). We can simply express that as DP = D in->->d out. The dependency contained in a candidate model is denoted as DP (DP = DP d DP i). Dependency similarity is defined as following: SI Dep = DP DP DP DP + β DP DP Similar to the definition of data set similarity, dependency similarity focuses on the common part of dependency sets of user requirements and the candidate model. The variable β (β 0) denotes the weight of uncommon dependency set, and should be predefined in practice. Similar to α, β is assigned with an initial value of 1, and the most appropriate value can be obtained by experiments. We evaluate the similarity between user requirements and candidate models from two aspects: data set and data dependency. The overall similarity should be a synthesis of the above two, which is defined next. Definition 7 (Synthesized Similarity) Synthesized Similarity is the weighted sum of data set similarity and dependency similarity: SI Syn = γ SI + (1 γ ) SI Data For 0 γ 1, γ and 1-γ represent the weights of SI data and SI dep respectively, and should be predefined. We assign 0.5 to γ in practice, which means the importance of data set similarity and data dependency similarity is treated equally. In the Loan Application process modeling example, we assume the candidate individual model is Credit Payment Application model shown in Figure 2. The user requirement is described in Figure 6. The common data set between user requirements and the model data set is {Application Approved}, and the uncommon data set is {Received Application}. SI data= 0.5 (α=1). The user required dependency Received Application->->Application Approved is not contained in the model s dependency set, so the common dependency set is empty and SI dep=0. Then synthesized similarity between user requirement and the candidate model is o.25. (SI syn=0.5 0.5+0=0.25, γ=0.5) ultiple odel atching When individual model match returns no result or the similarities of returned models cannot reach the threshold, we can conduct additional search for model groups that may satisfy the requirements after composition. The user requirement for multiple models match is described as follow: ser Inquiry: D in->->d out Expected Result: odels Groups contain d in->-> ->->d k->-> ->->d out (i.e. A model set S={Wi i=1, 2,, n}, in which W1 contains dn->->dout, odel W2 contains dn-1->->dn,, odel Wn contains din->->d1, (din Din, dout Dout)) After finding a model W 1 which contains d out, and when the initial similarity between W 1 and user requirement exceeds a predefined threshold (denoted as a ), we scan W 1 to get a data set (denoted as D 1) where all data items are depended by d out. For every data item in this data set, e.g., d n, the second models W2 with a data item d n-1 which satisfies d n-1 ->-> d n can be derived. Then a new data set (denoted as D 2) from W 2 can be found where all data items are depended by the previous data item (d n). The above procedure can be repeated until d in appears in a new data set. Because the complexity of this procedure increases sharply when more models added to the group, a threshold of model group size should be predefined to limit the search space. When the number of models in a group exceeds the threshold and the requirement is still unsatisfied, the group should be abandoned with no result returned. We propose a Flooding Algorithm to support multiple model matching as following: Step 1: Search for the models that contain d out and SI data > a, name the returned model set as S 1, define a temporary data set as D t, D t = D out. Define a count of numbers of models as k, k=0; Step 2: For every model W i in S k (k=1,2, ), search for the data set D t+1, where for every d i D t+1 there exists d i->->d t (d t D t). If d i D in, add W i to the result model group G and finish the search, Dep Thirty Second International Conference on Information Systems, Shanghai 2011 7

Knowledge anagement and Business Intelligence Step 3: Repeat Step 2, until search finishes or k>=n. ( n is the predefined threshold to limit the size of a model group). odel Composition After identification of the candidate model groups, we need to compose the multiple models and delete the unnecessary parts. The candidate models are Transfer Application model (model I for short) and Credit Payment Application model (model II for short) shown in Figure 1 and 2. In this example, the user inquiry is {Received Application} ->-> {Application Approved}, which means with the input Received Application the composed model returned by search should produce Application Approved as output. In the model search phase, using the flooding algorithm, the dependency relationship Accepted Application->->Application Approved (d c->->d h) in model II is identified, d 4 in model I has the same description with d c, i.e., d 4=d c. Then the dependency relationships Received Application->->Accepted Application (d 1->->d 4) is identified which satisfies the ending condition of the model search. The resulting model should be a composition of the two models that contain d 1->->d 4 and d c->->d h respectively (d 4=d c). Definition 8 (Linking Node and Linked Node) Assume dependency d i->->d j (i j) exists in model W 1 and dependency d p->->d q (p q) exists in model W 2. If d j = d p, then the dependency d i ->-> d q can be achieved after composition of W 1 and W 2. We call d j and d p a Linking Nodes Pair. d j and d p are denoted as Link Node and Linked Node respectively. For example, d 4 is called link node and d c is called linked node in Figure 8. Figure 4. An Example of odel Composition A Data-centric odel Composition ethod: Prune: Taking the link nodes, e.g. d 4 in model I in Figure 11, and the external output node, e.g. d h in model II in Figure 11, as the ending nodes, we search the nodes that are not included in the dependency relationships of the ending nodes. Delete data items that are not depended by the output nodes or the link nodes, i.e., d 5, d 6, d 7, d 8, and the associated dependency relationships. Delete data items that are depended by linked nodes, i.e., d a, d b, and the associated dependency relationships. Delete the direct dependency relationships that end with linked nodes. Combine: After pruning the unnecessary data and dependency relationships, merge the linking nodes pair with the same description as a single node (d 4 and d c). Replace the link nodes and linked nodes in the dependency relationships with the merged node. 8 Thirty Second International Conference on Information Systems, Shanghai 2011

Liu et al. / Data-centric Workflow odel anagement Figure 5. Composition of Two Data Dependency Structures apping: The composition of the data dependency structure helps us to design a reference control flow model. We propose several principles: (1) After identifying the unnecessary data and data dependency relationships, we can identify the redundant tasks in the workflow. (2) The description of some tasks can be revised after composition which means some of these tasks output data are unnecessary and the associated work for producing these data can be omitted. (3) The structural relationships among remained tasks should be kept. (4) The data dependency relationship determines the execution sequences of tasks. (5) The reference model should be revised by incorporating domain knowledge to get the control flow model. Figure 6. The composed workflow model for the loan application process Conclusion and Future Research In this paper, we propose a new data-centric approach for storing, searching, and composing workflow models. We demonstrate and evaluate our approach via an illustrative example. We are working on several issues to improve on the preliminary results presented in this paper. First, we will refine the proposed approach by fine-tuning some concepts and variables, such as the synthesized model similarity, similarity weight variables, and the computational complexity of key algorithms. Second, we plan to further evaluate our approach in a number of ways, such as a comprehensive comparison with existing approaches, developing a prototype system, and conducting user experiments to test the usability, efficiency, and effectiveness of our approach. Third, we are looking into semantics of data item descriptions and plan to leverage concepts in semantic webs to enhance the model matching procedure presented in this paper. Acknowledgement: This work was partially supported by an SRG grant from the City niversity of Hong Kong (No. 7008121). Thirty Second International Conference on Information Systems, Shanghai 2011 9

Knowledge anagement and Business Intelligence References Altintas, I., Birnbaum, A., Baldridge, K.K., Sudholt, W., iller,., Amoreira, C., Potier, Y., and Ludaescher, B. 2004. "A Framework for the Design and Reuse of Grid Workflows," Scientific Applications of Grid Computing. Beijing, China: Springer, pp. 120-133. Basu, A., and Blanning, R.W. 2000. "A Formal Approach to Workflow Analysis," Information Systems Research (11:1), ar 2000, pp. 17-36. Basu, A., and Blanning, R.W. 2003. "Synthesis and Decomposition of Processes in Organizations," Information Systems Research (14:4), Dec 2003, pp. 337-355. Bi, H.H., and Zhao, J.L. 2004. "Applying Propositional Logic to Workflow Verification," Information Technology and anagement (5:3), pp. 293-318. Cao, J., ou, Y., Wang, J., Zhang, S., and Li,. 2005. "A Dynamic Grid Workflow odel Based on Workflow Component Reuse," Grid and Cooperative Computing-GCC 2005: Springer, pp. 424-429. Dumas,., and ter Hofstede, A. 2001. "L Activity Diagrams as a Workflow Specification Language," «L» 2001 The nified odeling Language. odeling Languages, Concepts, and Tools, Toronto, Canada: Springer, pp. 76-90. Guo, X.., Sun, S.X., and Vogel, D. 2008. "A Data Flow Perspective for Business Process Integration," International Conference on Information Systems (ICIS 2008), Paris. Hidders, J., Kwasnikowska, N., Sroka, J., Tyszkiewicz, J., and Van den Bussche, J. 2008. "DFL: A Dataflow Language Based on Petri Nets and Nested Relational Calculus," Information Systems (33:3), pp. 261-284. Kiepuszewski, B., ter Hofstede, A., and Bussler, C. 2000. "On Structured Workflow odelling," Conference on Advanced Information Systems Engineering (CAiSE 2000). Springer, pp. 431-445. Kumar, A., and Zhao, J.L. 1999. "Dynamic Routing and Operational Controls in Workflow anagement Systems," anagement Science (45:2), pp. 253-272. adhusudan, T., Zhao, J.L., and arshall, B. 2004. "A Case-Based Reasoning Framework for Workflow odel anagement," Data and Knowledge Engineering (50:1), pp. 87-115. urata, T. 1989. "Petri Nets: Properties, Analysis and Applications," Proc. of IEEE (77:4), pp. 541-580. Russell, N., ter Hofstede, A., Edmond, D., and van der Aalst, W. 2005. "Workflow Data Patterns: Identification, Representation and Tool Support," Conceptual odeling - ER 2005. Klagenfurt, Austria: Springer, pp. 353-368. Stohr, E.A., and Zhao, J.L. 2001. "Workflow Automation: Overview and Research Issues," Information Systems Frontiers (3:3), Sep, pp. 281-296. Sadiq, S., Orlowska,., Sadiq, W., and Foulger, C. 2004. "Data Flow and Validation in Workflow odelling," Proceedings of the 15th Australasian database conference, pp. 207-214. Sun, S.X., Zhao, J.L., and Sheng, O.R.L. 2004. "Data Flow odeling and Verification in Business Process anagement," Americas Conference on Information Systems (ACIS 2004), New York. Sun, S.X., Zhao, J.L., Nunamaker, J.E., and Sheng, O.R.L. 2006. "Formulating the Data-Flow Perspective for Business Process anagement," Information Systems Research (17:4), pp. 374-391. Thom, L.H., Lau, J.., Iochpe, C., and endling, J. 2007. "Extending Business Process odeling Tools with Workflow Pattern Reuse," International Conference on Enterprise Information Systems. Tversky, A. 1977. "Features of Similarity," Psychological Review (84:4), Jul 1977, pp. 327-352. van der Aalst, W..P. 1998. "The Application of Petri Nets to Workflow anagement," Journal of Circuits Systems and Computers (8:1), pp. 21-66. Wang, H.J., and Wu, H. 2010. "Supporting Process Design for E-Business via an Integrated Process Repository," Information Technology and anagement. Wohed, P., Van der Aalst, W., Dumas,., Ter Hofstede, A., and Russell, N. 2006. "On the Suitability of Bpmn for Business Process odelling," Business Process anagement (BP 2006), Vienna, Austria Xiao, L., Zheng, L., Xiao, J., and Huang, Y. 2009. "A Graphical Query Language for Querying Petri Nets," Information Systems: odeling, Development, and Integration. Springer, pp. 514-525. Zhuge, H. 2002. "A Process atching Approach for Flexible Workflow Process Reuse," Information and Software Technology (44:8), pp. 445-450. Zhuge, H. 2003. "Component-Based Workflow Systems Development," Decision Support Systems (35:4), pp. 517-536. 10 Thirty Second International Conference on Information Systems, Shanghai 2011