Redo Log Process Mining in Real Life: Data Challenges & Opportunities

Size: px
Start display at page:

Download "Redo Log Process Mining in Real Life: Data Challenges & Opportunities"

Transcription

1 Redo Log Process Mining in Real Life: Data Challenges & Opportunities E. González López de Murillas 1, G.E. Hoogendoorn 1, and H.A. Reijers 1,2 1 Department of Mathematics and Computer Science Eindhoven University of Technology, Eindhoven, The Netherlands 2 Department of Computer Science Vrije Universiteit Amsterdam, Amsterdam, The Netherlands e.gonzalez@tue.nl, g.e.hoogendoorn@student.tue.nl, h.a.reijers@tue.nl Abstract. Data extraction and preparation are the most time-consuming phases of any process mining project. Due to the variability on the sources of event data, it remains a highly manual process in most of the cases. Moreover, it is very difficult to obtain reliable event data in enterprise systems that are not process-aware. Some techniques, like redo log process mining, try to solve these issues by automating the process as much as possible, and enabling event extraction in systems that are not process aware. This paper presents the challenges faced by redo log, and traditional process mining, comparing both approaches at theoretical and practical levels. Finally, we demonstrate that the data obtained with redo log process mining in a real-life environment is, at least, as valid as the one extracted by the traditional approach. Key words: Process Mining, Databases, Redo Logs, Event logs, Data Quality. 1 Introduction Data extraction and preparation are among the first steps to take in any business intelligence or data analysis project. In many cases, up to 80% of the time and effort, and 50% of the cost is spent during the data extraction and preparation phases [1]. This is due to the fact that the original sources of data come in great variety, differing in structure depending on the nature of the application or process under study. The standardization of this phase represents a challenge, given that a lot of domain knowledge is usually required in order to carry it out. It is because of this that most of the work is done by hand, in an ad-hoc fashion, requiring a lot of iterations in order to obtain the proper data in the right form. In process mining the situation is not much different. Studies have been carried out, focusing on SAP [2, 3, 4], or ERPs in general [5]. Also, efforts have been made to achieve a certain degree of generalization with the tool XESame [6], which assists in the task of defining mappings between database fields on the one side, and events, traces and logs on the other. However, these solutions, which we refer to as part of the classical or traditional approach, are tightly coupled to the specific IT system or data schema they were designed to analyze. Moreover, they do not support the extraction of event data from systems that are non-process aware and do not explicitly record historical information. For this reason, other techniques exist that try to leverage on the existence of alternative sources of data. A very promising approach is redo log process mining [7]. Most modern relational database management systems (RDBMSs) implement different mechanisms to ensure consistency and fault tolerance. One of these mechanisms is redo log recording, which

2 2 E. González López de Murillas et al. consists of a set of files in which database operations are recorded before being applied to the actual data. This allows to rollback the state of the database to previous points in time, undoing the last operations recorded in the redo log files. Redo log process mining exploits the information stored in database redo log files in order to obtain event data. This event data can be analyzed to understand the behavior of processes interacting with the database. One of the benefits of this approach is its independence of the specific application or process in execution, being able to extract behavioral information from both process and non-process aware systems. Also, the event extraction is carried out automatically, without the need for domain knowledge to know how to build events from database tables, as is the case in the traditional approach. However, the prerequisites of this approach are that (a) the redo log system needs to be explicitly configured and enabled in order to record the events, and that (b) special database privileges are required to be able to read the content of the redo log files from the RDBMS. With respect to the traditional and redo log process mining approaches, we face two main questions. (1) Is redo log process mining feasible in a real-life environment? (2) Are the results of both approaches comparable in terms of data quality? Based on our intuition and experience with sample datasets, we propose the following hypothesis: the data obtained by the redo log process mining approach is at least as rich as the data obtained by traditional methods. The goal of this paper is to answer these questions and find support for this hypothesis by comparing the results of both process mining approaches on a real-life dataset. The content of this paper is based on the work developed in [8] as part of one of the author s Master project. The remainder of this paper is organized as follows. First, Section 2 provides some background on the event data extraction techniques about to be compared. After that, a theoretical comparison is presented in Section 3. Then, Section 4 proposes the practical comparison, introducing the business case, explaining the execution of the data extraction, and showing the results. Section 5 compares the results of the application of both approaches, discussing their validity and equivalence. Finally, Section 6 presents the conclusion of this paper. 2 Background We want to compare two approaches for event data extraction: traditional, and redo log process mining. These two approaches differ with respect to the source of data, as well as the procedure they follow to extract it. This section provides some background on the particularities of both approaches, explaining the process to follow for their application, while focusing on the data extraction and processing stages. 2.1 Traditional Process Mining In traditional process mining, event logs are constructed from the plain files or the database tables of the IT system under study. The main event attributes (activity name, case id, timestamp, etc) are identified by hand, while making use of domain knowledge, and extracted in order to build an event log. This is a rather laborious task, as described in the procedure in [9], but very common during the first stages of a process mining project. In some scenarios, data is obtained directly from the original IT systems that drive the process being analyzed. On other occasions, data has already been preprocessed and gathered in data warehouses or similar systems, alleviating somewhat the data extraction issue. In these cases, the work of extracting and processing the data cannot be avoided altogether; it must be performed in a previous phase. The complexity of the task is tackled before the analysis

3 Redo Log Process Mining in Real Life: Data Challenges & Opportunities 3 Table 1: First three steps of the PM 2 process mining methodology. Stage 1: Planning Stage 2: Extraction Stage 3: Data Processing Selecting business processes Determining scope Creating views Identifying research questions Extracting event data Aggregating events Composing project team Transferring process knowledge Enriching logs Filtering logs is done, but the decisions made during the data warehouse design can dramatically affect the kind of analysis that can be performed on the resulting data. In order to apply process mining techniques, it is required to have access to event data that includes, at least, timestamps, activity names, and case identifiers. However, not all the data models of data warehouses guarantee that these aspects of data are being preserved. In order to ensure that enough information is being collected, process-aware meta models like the one proposed in [10] can be adopted. Regardless of the location of the data, it is necessary to obtain a valid event log in order to do the process mining analysis. Different methodologies exist in the literature that describe the steps to take in a process mining project. For our purpose, we decided to focus on the process mining methodology PM 2 [11], a recent methodology that covers all the stages in the life cycle of a process mining project, which has been verified in a real-life environment. PM 2 divides the project in six stages: planning, extraction, data processing, mining & analysis, evaluation, and process improvement & support. Given that we are interested in obtaining an event log, we must focus on the first three stages: planning, extraction, and data processing. Each of these stages has sub-steps as described in Table 1. In traditional process mining, these three stages are carried out manually by the analyst or the process mining team. Usually, these stages require substantial domain knowledge to define the business questions, select the right database tables, determine the case notion, and include interesting event and case attributes, among other tasks. This domain knowledge is often obtained through interviews with the process owners and users. The data is usually retrieved from database tables, executing SQL queries to build the events and, finally, extract the event logs. However, the quality of the event data that can be obtained is constrained by the existence of historical information, timestamps, status changes, modifications, additional attributes, etc. As has been noted before, the structure of the data model strongly determines the usefulness of the resulting event logs. Other event data retrieving techniques, such as redo log process mining, try to mitigate these issues exploiting the existence of historical data automatically recorded by the database systems. Section 3 presents some of the challenges to face with the traditional process mining approach, and compares them to the ones faced by redo log process mining. 2.2 Redo Log Process Mining Redo log process mining is a more automatic technique than the traditional approach. It requires less domain knowledge, and is independent of the system under study. It tries to exploit the execution information stored in database redo logs in order to extract event data. The database redo log system is a functionality of the database management systems that, in order to ensure consistency and fault tolerance, records all the data modification actions executed on the database before they are actually applied. Generally, a set of files are configured to store the redo logs. The RDBMS stores the actions in the redo log files and, when a file is full, it passes to the next file. When all the redo log files are full (according to a specific maximum size), the first file of the set is overwritten. This means that only a recent window of events could be retrieved from the redo logs when assuming a default setting. However,

4 4 E. González López de Murillas et al. database systems usually allow to archive the completed redo log files in a separate location for subsequent analysis. This is a crucial aspect to take into account in order to collect enough data to perform a meaningful analysis. In general, any modification action on the database is recorded in the redo logs. This means that we cannot only observe insert, update, and delete operations performed on every piece of data, but also modifications on the data schema, transactions, rollback and commit operations, etc. The main advantage of this technique is that it allows to analyze systems that are not process aware and do not explicitly record any execution information. Also, deleted data, not present in the database anymore, can be recovered from the redo logs. This has a great value from the forensic audit point of view. On the other hand, the technique presents challenges in terms of data availability, permissions and performance. The following section explores some of these difficulties and compares them to the ones faced by traditional process mining. 3 Theoretical Comparison In the previous sections we have described the fundamentals of both the traditional and redo log process mining approaches. In this section we point out the main differences from a theoretical point of view, clarifying the challenges to face in order to apply either technique in a process mining project. Table 2: Requirements for traditional and redo log process mining approaches Aspect Traditional PM Redo Log PM Data Timestamps Required Guaranteed elements Case Notion Required Required Activity names Required Guaranteed Technical Event recording Application dependent Automatic aspects Completeness of data Desirable Desirable DB read access Required Required Special privileges Not Required Required Snapshot of DB Desirable Required Table 2 shows the requirements of both approaches with respect to the availability of data elements, and some technical aspects to take into account. For each data element, the approaches present different levels of exposure. Something is required when it must be explicitly recorded and available in the database tables. If it is guaranteed, this means that it is assured to be available, regardless of the data schema or the application under study. With respect to technical aspects, something is required when it must be available at the extraction time. If it is automatic, this means that it is guaranteed to be available. Desirable means that it will positively affect the data quality, but is not critical for the technique to work. An aspect is application-dependent when it depends on the application under study to be available; therefore, some uncertainty exists. Finally, an aspect is not required if it is not necessary for the technique to work and, in fact, will not affect the quality of data. We will discuss these elements in more detail soon. 3.1 Data Elements Looking at the top part of Table 2, we can identify that several data elements are needed to extract event logs at all, and we can see how differently these approaches obtain them. The

5 Redo Log Process Mining in Real Life: Data Challenges & Opportunities 5 presence of timestamps, a case notion, and activity names are required by the traditional approach. This means that these elements must be recorded by the application and be available in the database tables at the moment of extraction. This represents the first and most important challenge to face in a process mining project. Without these three elements, we cannot construct events and, therefore, no event log. In case that these elements are not explicitly available, we cannot apply the traditional approach, and must find different ways to obtain events. Redo log process mining has a partial solution for this situation. Thanks to the automatic recording of redo logs by the RDBMS, we can automatically obtain database events, which contain timestamps, activity names, and implicitly within the data, one or several case notions. 3.2 Technical Aspects With respect to the technical aspects, the first challenge to face is the actual event recording. As mentioned before, in traditional process mining we depend on the application to actively record the events and store them in the database tables. Without this, we cannot build the event logs. However, redo log process mining relies on the automatic recording of events in the database redo logs. The fact that this is an automatic system means that event recording in redo logs is application-independent. Yet, it needs to be enabled. Many RDBMSs have this functionality, but it is not properly configured or even enabled by default. Therefore, despite being automatic, it is useless if it is not activated. The events in redo log process mining will be available as long as the recording is enabled, properly configured, and the redo logs are archived instead of being overwritten in a rotary manner. Due to different reasons, the completeness of data available to be extracted cannot be guaranteed in either of the approaches. Missing events would lead to incomplete traces that could affect the quality of the resulting analysis. With respect to the traditional approach, incomplete data can be caused by clean-up activities performed in the database, removing batches of historical information for space saving purposes for example. Also, recording failures could cause completeness issues in the data. Based on our experience, this problem is present even more often when dealing with redo logs. As pointed out previously, redo log recording needs to be enabled and properly configured in order to work well for our purpose. The redo logs will only start to be recorded from the moment they are enabled. Any event that happened before that moment will be unknown to us. Also, the redo log archiving must be configured so the redo logs do not get overwritten or discarded. If that is not the case, gaps in the data could appear. This would be the reason for incomplete or missing traces that will affect the quality of the resulting event log. Normally, when extracting event data, read access to the database is required in order to execute queries and read the content of tables. This requirement is independent of the technique used for the data extraction. Nonetheless, because of how critical the original files are, the redo log approach needs special privileges in order to load and read the content of redo log files. These privileges are not easy to obtain when dealing with production systems in a real-life environment. In our experience, it is safer to perform the data extraction from a cloned instance of the database system. This can be desirable for traditional process mining as well, but here it would not be critical since the extraction method is less computationally intensive and intrusive than redo log process mining. Additionally, the extraction of events from redo logs has another relative drawback: it requires a snapshot of the database. This is due to the fact that the events recorded correspond to insertions, modifications or deletions of rows, and only the affected fields are reflected in the events. Therefore, unless we possess the complete set of redo log files since the system

6 6 E. González López de Murillas et al. creation (which is extremely rare), it is not possible to reconstruct the content of the additional fields exclusively from the redo logs. To solve this issue, a snapshot of the database content is required, such that the values of the missing fields can be queried. To summarize, the main challenges to face when extracting event data from a database system are determined by (a) the presence of the event data in the database, (b) the correct configuration of the event recording systems, and (c) the access and connectivity to the data systems with sufficient privileges to obtain the necessary information. Until now, we have explained the particularities of two data extraction approaches, together with the challenges they face, at a theoretical level and in a very general way. The next section presents a practical comparison performed with data from a real-life system, using both data extraction approaches, to see how these issues work out in real life. 4 Practical Comparison In the previous sections, the advantages and challenges of extracting events from redo logs with respect to database tables have been presented. However, these claims have no value without a proper validation. The aim of performing a case study with both traditional and redo log process mining approaches in this section is twofold. First, to show that applying redo log process mining in a real-life scenario is possible. Second, to demonstrate that, in situations that satisfy certain minimum requirements, the results of redo log process mining are of at least as much quality as the ones obtained from the traditional approach. 4.1 Business Case In order to carry out this case study in a fair manner, it was important to select a system that fulfilled the minimum requirements of both process mining techniques. That is, a system that explicitly records events in the database tables, and that allows to enable redo log recording at the RDBMS level. The software system selected for this study is the OTRS 1 ticketing system. OTRS is a web-based open source process aware information system, commercialized by the OTRS Group, used for customer service, help desk, and IT service management. It offers ticket creation and management, automation, time management, and reporting among other functionalities. The specific instance of OTRS to be analyzed is a production installation within a well known ICT company set in The Netherlands. The company has been using this instance of OTRS for at least for two years now, since the end of 2014, with the purpose of managing the incidents of the IT systems of their clients. In fact, only a subset of the whole plethora of functionalities that OTRS offers are being actively used within the company. In the daily use of the OTRS system, customers send messages reporting issues. This triggers the creation of tickets in the system, that will be followed up by IT specialists. After some interaction between customers and specialists, trying to determine the root cause of the issue, the ticket status will evolve until it gets, hopefully, solved. The goal of the system is to help the company with their customer support in order to maintain a high level of service availability and quality. There are several reasons to choose this specific instance of the OTRS ticketing system. First, the fact that it is a PAIS makes it very attractive to apply the traditional process mining approach. In addition to that, it runs on an Oracle RDBMS, with the possibility to enable redo 1 OTRS:

7 Redo Log Process Mining in Real Life: Data Challenges & Opportunities 7 Table 3: Steps in the execution of the traditional and redo log process mining approaches to obtain an event log. Traditional PM Redo Log PM 1. Query the database (SQL Developer) 1. Connection to DB (PADAS) 2. View of events and cases (SQL Developer) 2. Extraction of Data Model (PADAS) 3. Export log to disk (SQL Developer) 3. Extract events for each table (PADAS) 4. Add trace attributes to log (RapidProM) 4. Build log (PADAS) 5. Load log for analysis (ProM) 5. Export log to XES format (PADAS) 6. Load log for analysis (ProM) log recording, which is a basic requirement to apply redo log process mining. Also, the system was being used in production, with real-life customers. And finally, the company owning the instance was interested in applying process mining to asses the quality of their service. This means that they were willing to cooperate and provide access to the required data and domain knowledge to carry out this case study. The next section describes the execution of the study and how both process mining approaches were applied on the OTRS data. 4.2 Execution To obtain an event log from the system under study, it is necessary to follow a specific set of steps, depending on the approach used to extract the event data. However, in both cases, first we must define the scope of the analysis. The company is interested in answering business questions related to the incident solving process. In particular, these related questions are about the service-level agreements (SLAs) they have with their customers. When looking at the data model 2 of the OTRS system, we observe that the table TICKET plays a central role in the general schema. This table contains the main attributes of a ticket in OTRS. Also, the table TICKET HISTORY holds the historical information related to each ticket. This means that the changes in the tickets are stored in the form of events in that table. Additionally, messages and extra data linked to each ticket is stored in the table ARTICLE. In conclusion, we consider the table TICKET as the case table, and TICKET HISTORY and ARTICLE as event tables. With the scope being defined, it is possible to proceed with the data extraction to build an event log. Starting with traditional process mining, we executed the steps in the left column of Table 3. The details regarding the execution of these steps are outside the scope of this paper. However, extensive information about the full study can be found in [8]. The result has been an event log of which the characteristics can be observed in Table 4, under the column Traditional PM. The data extraction process for the redo log process mining approach differs from the traditional mainly in the source of data, which are the redo log files instead of the database tables. This means that special tools need to be used, in this case the Process Aware Data Suite 3 (PADAS). This tool allows to connect to an Oracle database, and is able to extract the data model, and the events contained in the redo log files for any table of the schema. Also, once the events have been extracted, the tool supports the log creation step, grouping events in traces according to the desired case notion. More details on the log building creation are available in [7]. The steps followed in the data extraction and log building phase for the redo log approach are listed under the right column of Table 3. The log exported from the PADAS tool presents the characteristics observable in Table 4, under the column Redo Log PM database.png 3 PADAS: egonzale/projects/padas/

8 8 E. González López de Murillas et al. 4.3 Results To discuss the results, we will take a look at the aspects of the event logs obtained by traditional and redo log process mining, to evaluate their main differences. Analyzing Table 4, it is clear that there is a big difference on the covered period of time, as well as to the size of the event logs obtained by the two data extraction approaches. The redo log data is not as extensive as the one obtained by the traditional method. This is due to the fact that the redo log recording on the Oracle database hosting the OTRS data schema was enabled at the beginning of the project, around March 2016, and continued until July of the same year. However, the traditional approach was able to extract all the events in the TICKET HISTORY table, which was never deleted or purged since the OTRS system was setup at the end of That is the main reason for the big difference in data quality between both approaches. Table 4: Metrics of the resulting logs for both approaches on all the available data. Metric Traditional PM Redo Log PM Time window captured (days) Magnitude (# of cases) Support (# of events) Number of distinct event classes Granularity of timestamps seconds seconds Fig. 1: Missing archived logs over time in Shaded areas indicate the availability of archived logs, and white areas indicate the gaps Table 5: Metrics of the resulting logs for the period from June 17th to July 12th. Metric Traditional PM Redo Log PM Time window captured (days) Magnitude (# of cases) Support (# of events) Number of distinct event classes Granularity of timestamps seconds seconds Additionally, after observing the resulting event log from the redo log process mining approach, one more data quality issue was identified. Big time gaps were spotted in the extracted data, as shown in Figure 1. However, this problem did not exist in the data obtained by the traditional approach, which was complete. Further investigation of the root cause showed that the reason for this was a misconfiguration of the cloned server used in the study. In this server, a daily script would archive the already filled redo log files to a storage location. However, in some cases, a race condition occurred with another script in charge of cleaning up storage for space saving purposes. This caused the loss of redo log files for full days, and consequently incomplete cases and data quality issues. The issue was fixed as soon as it was detected and, fortunately, data continued to be recorded, this time without interruption. In order to ensure a fair comparison of the process mining approaches, the following strategy was adopted: from the time line of redo log data observable in Figure 1, the largest uninterrupted period was selected to be compared between both logs. This period is from June 17th to July 12th. The resulting event logs were then compared, and the metrics are presented in Table 5. The following section provides a discussion on the equivalence of these two event logs, looking at them from the structural and behavioral point of view.

9 Redo Log Process Mining in Real Life: Data Challenges & Opportunities 9 5 Discussion It has been previously stated that the goal of this work is to find support for the hypothesis that says that the data obtained by the redo log process mining approach is at least as rich as the data obtained by traditional methods. Section 3 shows the intuition behind this hypothesis from the theoretical point of view. Then, Section 4 takes a practical perspective on the evaluation, applying both process mining approaches in a real-life environment. The aim of this section is to analyze the results of the practical comparison, in order to support the aforementioned hypothesis, and explain the possible differences between the event logs obtained by both process mining approaches. 5.1 Event Labels Comparison Table 5 shows that, when focusing on a period of time during which data is available for both approaches, the event logs coincide in the number of cases. Also, the number of events extracted by the redo log approach is higher than the amount obtained by the traditional one. However, this does not guarantee that the former is a superset of the latter. To find evidence of it, we have to look at the event labels in both logs. Table 6 shows a list of event labels ordered by frequency for both event logs. At first sight, the event labels seem disjoint. However, further analysis shows that the two most frequent event labels in the redo log process mining event log, namely NewEventNoMsg, and NewEventWithMsg, correspond to the redo log events obtained from the TICKET HISTORY table. This table is the source of events for the traditional process mining approach. In fact, the sum of the frequencies of these two event labels, 5032 and 1310 respectively, is equal to 6342 events, the total number of events in the event log obtained with the traditional approach. The reason for which the 22 event types of one log are grouped in only two in the other one is that, as to the latter, the event classifier is automatically provided by the approach. This classifier takes into account the table in which the event occurred, Table 6: Event labels and frequencies with the default classifiers for the two event logs. Traditional PM Redo Log PM Activity label Freq Rel Freq Activity label Freq Rel Freq Misc % NewEventNoMsg % OwnerUpdate % NewEventWithMsg % StateUpdate % MessagePhoneOrNote % CustomerUpdate % MessageTicketMerged % NewTicket % AutoReplyTicketReceived % SendAgentNotif % NewArticleA % AddNote % NewArticleB % Lock % NewArticleC % Unlock % UpdateMsg-TicketId-Time-User % Merged % UpdateEvent-TicketId-Time-User % TicketLinkAdd % New Note-Customer Agent % FollowUp % NewArticleD % Customer % UpdateEvent-TicketId-Time % SendAutoReply % UpdateMessage-TicketId-Time % SendAnswer % UpdateMessage-TicketId-User % Move % UpdateEvent-TicketId-User % PriorityUpdate % NewMessage-CustomerOrAgent % TypeUpdate % New External % SendCustomerNotif % UpdateMessage-TicketId % TimeAccounting % UpdateEvent-TicketId % SetPendingTime % FromCustomerWithoutCC % SendAutoFollowUp % FromCustomerWithCC % Total % Total %

10 10 E. González López de Murillas et al. and which fields were affected. However, in the traditional approach, the event classifier takes into account the value of the ticket state id field, which maps integer values to the event labels on the left side of Table 6. Therefore, using this event classifier in the events NewEventNoMsg and NewEventWithMsg of the redo log process mining event log, would result in the same set of event labels, with the same frequencies. To be precise, the events from the redo log approach with the label NewEventNoMsg correspond to a subset of the events obtained through the traditional method with the following event labels: Misc, OwnerUpdate, StateUpdate, CustomerUpdate, NewTicket, SendAgentNotification, Lock, Unlock, Merged, TicketLinkAdd, Move, PriorityUpdate, TypeUpdate, SetPendingTime. With respect to the events with the label NewEventWithMsg, they correspond to a subset of the events with the labels: OwnerUpdate, StateUpdate, AddNote, FollowUp, Customer, SendAutoReply, SendAnswer, SendCustomerNotification, TimeAccounting, SendAutoFollowUp. Therefore we see that there is not a 1:n mapping between the event classes obtained by both approaches. On the contrary, it is a n:m relation, with cases like the activity OwnerUpdate from the log of the traditional approach that groups events that can either correspond to the activity NewEventWithMsg or the activity NewEventNoMsg of the log of the redo log approach. It is important to note that the fact that in Table 5 the number of distinct event classes is the same for both logs (22) is just a coincidence. Actually, the real number of event classes in the redo log process mining event log using an appropriate event classifier should be 42, since two of the event classes of this log correspond to the 22 obtained with the traditional method. 5.2 Control Flow Comparison The equivalence of both event logs has been analyzed from the event labels point of view. However, without mining the traces, we cannot guarantee that the two event logs represent equivalent behavior. To check this aspect, we mined the event logs using the same event classifier in both cases. As discussed previously, the event log obtained from the redo logs contains a superset of the events in the one extracted by the traditional approach. In order to compare the behavior of both logs, we must focus on the same subset of activities. Therefore, the event log obtained from the redo log was filtered, to only include events corresponding to the labels NewEventNoMsg and NewEventWithMsg. Then, the same classifier as in the traditional approach was used, so both event logs would have the same set of event classes. After this preparatory step, we mined both logs using Inductive Miner Infrequent. The resulting process models can be observed in Figure 2. From observing both models we see that they mostly represent the same control flow. However, some differences can be spotted immediately. First, the activities Customer and SendAutoReply occur in parallel in Figure 2a, while they are in a sequence in Figure 2b. Second, the activity SendAnswer is part of a choice in Figure 2a, while it happens before the choice in Figure 2b. Third, activities NewTicket and CustomerUpdate always happen in 6th and 5th position from the end of the trace in Figure 2b, while in Figure 2a they can only be executed in mutual exclusion with the bottom part of the process. These differences, though graphically subtle, can mean a big difference in behavior. Fortunately, there is an explanation for them. There are two main reasons for this disagreement in control flow between both event logs. (1) The event timestamps obtained by both approaches are set by different mechanisms. In the traditional approach, the timestamps of each event correspond to the ones written by the OTRS system in the timestamp field of the TICKET HISTORY table. In the redo log approach, the timestamps correspond to the ones recorded by the Oracle RDBMS when processing the SQL statements sent by the OTRS

11 Redo Log Process Mining in Real Life: Data Challenges & Opportunities 11 (a) Petri net mined for the event log obtained through traditional process mining. (b) Petri net mined for the event log obtained through redo log process mining. Fig. 2: Process models mined with Inductive Miner, Infrequent (noise threshold = 0.2) system. Therefore, a difference in order between the events of the traditional and the redo log approach could occur given that the timestamps in the former corresponds to the behavior enforced by OTRS, while the timestamps in the latter correspond to the actual execution of the associated statements in the database. (2) The events obtained by the traditional approach correspond to rows in the table TICKET HISTORY of the database, and their content can be modified during the life-cycle of the process. However, the events recorded by the redo log system are immutable, and a modification of a row in TICKET HISTORY would create a new event in the redo log files. In fact, the OTRS system is known to modify the fields TicketID, User, and Time of the TICKET HISTORY rows whenever two tickets are merged together. The presence of the activities UpdateEvent-TicketId-Time, UpdateEvent-TicketId- User, UpdateEvent-TicketId-Time-User, and UpdateEvent-TicketId in the event log obtained from the redo logs is a proof of this behavior. Therefore, after this comparison at both activity label, and control flow level, we can conclude that the behavior captured by the event log produced by the traditional approach is indeed a subset of the behavior captured by the redo log approach, and the latter can be easily filtered in order to achieve a high degree of equivalence.

12 12 E. González López de Murillas et al. 6 Conclusion In this paper, two process mining approaches have been compared with respect to the data extraction phase: traditional process mining, and redo log process mining. The evaluation was performed in a unique setting: both approaches were applied in a real-life environment, on real data from real systems, in order to determine the level of equivalence between the results obtained through both methods. Analyzing the results, we concluded that, when the difficulties to apply the redo log approach are overcome, this method is able to retrieve richer event logs, with a higher quality in terms of number of events, and reliability of the captured behavior. Additionally, it has been shown that traditional approaches are vulnerable to event manipulation, which can alter the results of the analysis, while the redo log approach ensures the immutability of the events, being therefore more robust to data manipulation and fraud. In addition to these benefits, redo log process mining, unlike the traditional approach, can be applied to non-process aware systems, in which events are not explicitly recorded at the application level, but they still use a RDBMS as a data storage. However, this comes at a price. The need for special privileges to configure and enable redo log recording makes it not easy to set up in all environments, while the traditional approach only requires read access to the relevant database tables. All things considered, redo log process mining must be considered as a viable alternative to traditional process mining. As future work, new sources of event data will be explored, in order to tackle the limitations of the redo log approach, and improve the quality of the extracted event logs with respect to traditional methods. References 1. Watson, H.J., Wixom, B.H.: The current state of business intelligence. Computer 40(9) (2007) 2. Ingvaldsen, J.E., Gulla, J.A.: Preprocessing support for large scale process mining of SAP transactions. In: Business Process Management Workshops, Springer (2008) Roest, A.: A practitioner s guide for process mining on erp systems : the case of sap order to cash. Master s thesis, Technische Universiteit Eindhoven, The Netherlands (2012) 4. Segers, I.: Investigating the application of process mining for auditing purposes. Master s thesis, Technische Universiteit Eindhoven, The Netherlands (2007) 5. Yano, K., Nomura, Y., Kanai, T.: A practical approach to automated business process discovery. In: Enterprise Distributed Object Computing Conference Workshops (EDOCW), th IEEE International. (Sept 2013) Verbeek, H., Buijs, J.C., Van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Information Systems Evolution. Springer (2011) González-López de Murillas, E., van der Aalst, W.M.P., Reijers, H.A.: Process mining on databases: Unearthing historical data from redo logs. In: Business Process Management. Springer (2015) 8. Hoogendoorn, G.E.: A comparative study for process mining approaches in a real-life environment. Master s thesis, Eindhoven University of Technology (2017) 9. Jans, M.J.: From relational database to valuable event logs for process mining purposes: a procedure. Technical report, Hasselt University (2017) 10. González López de Murillas, E., Reijers, H.A., van der Aalst, W.M.P.: Connecting databases with process mining: A meta model and toolset. In: International Workshop on Business Process Modeling, Development and Support, Springer (2016) van Eck, M.L., Lu, X., Leemans, S.J., van der Aalst, W.M.: PM2: A process mining project methodology. In: International Conference on Advanced Information Systems Engineering, Springer (2015)

Connecting databases with process mining: a meta model and toolset

Connecting databases with process mining: a meta model and toolset Software & Systems Modeling https://doi.org/10.1007/s10270-018-0664-7 SPECIAL SECTION PAPER Connecting databases with process mining: a meta model and toolset Eduardo González López de Murillas 1 Hajo

More information

ProM 6: The Process Mining Toolkit

ProM 6: The Process Mining Toolkit ProM 6: The Process Mining Toolkit H.M.W. Verbeek, J.C.A.M. Buijs, B.F. van Dongen, W.M.P. van der Aalst Department of Mathematics and Computer Science, Eindhoven University of Technology P.O. Box 513,

More information

The multi-perspective process explorer

The multi-perspective process explorer The multi-perspective process explorer Mannhardt, F.; de Leoni, M.; Reijers, H.A. Published in: Proceedings of the Demo Session of the 13th International Conference on Business Process Management (BPM

More information

The Multi-perspective Process Explorer

The Multi-perspective Process Explorer The Multi-perspective Process Explorer Felix Mannhardt 1,2, Massimiliano de Leoni 1, Hajo A. Reijers 3,1 1 Eindhoven University of Technology, Eindhoven, The Netherlands 2 Lexmark Enterprise Software,

More information

BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models

BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models Raffaele Conforti 1, Adriano Augusto 1, Marcello La Rosa 1, Marlon Dumas 2, and Luciano García-Bañuelos 2 1 Queensland

More information

Data Streams in ProM 6: A Single-Node Architecture

Data Streams in ProM 6: A Single-Node Architecture Data Streams in ProM 6: A Single-Node Architecture S.J. van Zelst, A. Burattin 2, B.F. van Dongen and H.M.W. Verbeek Eindhoven University of Technology {s.j.v.zelst,b.f.v.dongen,h.m.w.verbeek}@tue.nl 2

More information

Connecting Databases with Process Mining: A Meta Model and Toolset

Connecting Databases with Process Mining: A Meta Model and Toolset Connecting bases with Mining: A Meta and Toolset E. González López de Murillas 1,3, H.A. Reijers 1,2, and W.M.P van der Aalst 1 1 Department of Mathematics and Computer Science Eindhoven University of

More information

Dealing with Artifact-Centric Systems: a Process Mining Approach

Dealing with Artifact-Centric Systems: a Process Mining Approach Dealing with Artifact-Centric Systems: a Process Mining Approach Guangming Li and Renata Medeiros de Carvalho 2 Abstract: Process mining provides a series of techniques to analyze business processes based

More information

Reality Mining Via Process Mining

Reality Mining Via Process Mining Reality Mining Via Process Mining O. M. Hassan, M. S. Farag, M. M. MohieEl-Din Department of Mathematics, Facility of Science Al-Azhar University Cairo, Egypt {ohassan, farag.sayed, mmeldin}@azhar.edu.eg

More information

Database Optimization

Database Optimization Database Optimization June 9 2009 A brief overview of database optimization techniques for the database developer. Database optimization techniques include RDBMS query execution strategies, cost estimation,

More information

Reality Mining Via Process Mining

Reality Mining Via Process Mining Reality Mining Via Process Mining O. M. Hassan, M. S. Farag, and M. M. Mohie El-Din Abstract Reality mining project work on Ubiquitous Mobile Systems (UMSs) that allow for automated capturing of events.

More information

Discovering Hierarchical Process Models Using ProM

Discovering Hierarchical Process Models Using ProM Discovering Hierarchical Process Models Using ProM R.P. Jagadeesh Chandra Bose 1,2, Eric H.M.W. Verbeek 1 and Wil M.P. van der Aalst 1 1 Department of Mathematics and Computer Science, University of Technology,

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

Categorizing Migrations

Categorizing Migrations What to Migrate? Categorizing Migrations A version control repository contains two distinct types of data. The first type of data is the actual content of the directories and files themselves which are

More information

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered

More information

Multidimensional Process Mining with PMCube Explorer

Multidimensional Process Mining with PMCube Explorer Multidimensional Process Mining with PMCube Explorer Thomas Vogelgesang and H.-Jürgen Appelrath Department of Computer Science University of Oldenburg, Germany thomas.vogelgesang@uni-oldenburg.de Abstract.

More information

Petri-net-based Workflow Management Software

Petri-net-based Workflow Management Software Petri-net-based Workflow Management Software W.M.P. van der Aalst Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands,

More information

APD tool: Mining Anomalous Patterns from Event Logs

APD tool: Mining Anomalous Patterns from Event Logs APD tool: Mining Anomalous Patterns from Event Logs Laura Genga 1, Mahdi Alizadeh 1, Domenico Potena 2, Claudia Diamantini 2, and Nicola Zannone 1 1 Eindhoven University of Technology 2 Università Politecnica

More information

Problems and Challenges When Implementing a Best Practice Approach for Process Mining in a Tourist Information System

Problems and Challenges When Implementing a Best Practice Approach for Process Mining in a Tourist Information System Problems and Challenges When Implementing a Best Practice Approach for Process Mining in a Tourist Information System Marian Lux and Stefanie Rinderle-Ma Faculty of Computer Science, University of Vienna

More information

Business Intelligence & Process Modelling

Business Intelligence & Process Modelling Business Intelligence & Process Modelling Frank Takes Universiteit Leiden Lecture 9 Process Modelling & BPMN & Tooling BIPM Lecture 9 Process Modelling & BPMN & Tooling 1 / 47 Recap Business Intelligence:

More information

Semantic Web in a Constrained Environment

Semantic Web in a Constrained Environment Semantic Web in a Constrained Environment Laurens Rietveld and Stefan Schlobach Department of Computer Science, VU University Amsterdam, The Netherlands {laurens.rietveld,k.s.schlobach}@vu.nl Abstract.

More information

How Turner Broadcasting can avoid the Seven Deadly Sins That. Can Cause a Data Warehouse Project to Fail. Robert Milton Underwood, Jr.

How Turner Broadcasting can avoid the Seven Deadly Sins That. Can Cause a Data Warehouse Project to Fail. Robert Milton Underwood, Jr. How Turner Broadcasting can avoid the Seven Deadly Sins That Can Cause a Data Warehouse Project to Fail Robert Milton Underwood, Jr. 2000 Robert Milton Underwood, Jr. Page 2 2000 Table of Contents Section

More information

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data June 2006 Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality,

More information

Interactive PMCube Explorer

Interactive PMCube Explorer Interactive PMCube Explorer Documentation and User Manual Thomas Vogelgesang Carl von Ossietzky Universität Oldenburg June 15, 2017 Contents 1. Introduction 4 2. Application Overview 5 3. Data Preparation

More information

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling Yanhong Wang*, Dandan Ji Department of Information Science and Engineering, Shenyang University of Technology, Shenyang 187, China. * Corresponding

More information

Data Vault Partitioning Strategies WHITE PAPER

Data Vault Partitioning Strategies WHITE PAPER Dani Schnider Data Vault ing Strategies WHITE PAPER Page 1 of 18 www.trivadis.com Date 09.02.2018 CONTENTS 1 Introduction... 3 2 Data Vault Modeling... 4 2.1 What is Data Vault Modeling? 4 2.2 Hubs, Links

More information

Decomposed Process Mining with DivideAndConquer

Decomposed Process Mining with DivideAndConquer Decomposed Process Mining with DivideAndConquer H.M.W. Verbeek Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands h.m.w.verbeek@tue.nl Abstract.

More information

Database Management System 9

Database Management System 9 Database Management System 9 School of Computer Engineering, KIIT University 9.1 Relational data model is the primary data model for commercial data- processing applications A relational database consists

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

SILWOOD TECHNOLOGY LTD. Safyr Metadata Discovery Software. Safyr Getting Started Guide

SILWOOD TECHNOLOGY LTD. Safyr Metadata Discovery Software. Safyr Getting Started Guide SILWOOD TECHNOLOGY LTD Safyr Metadata Discovery Software Safyr Getting Started Guide S I L W O O D T E C H N O L O G Y L I M I T E D Safyr Getting Started Guide Safyr 7.1 This product is subject to the

More information

Towards Automated Process Modeling based on BPMN Diagram Composition

Towards Automated Process Modeling based on BPMN Diagram Composition Towards Automated Process Modeling based on BPMN Diagram Composition Piotr Wiśniewski, Krzysztof Kluza and Antoni Ligęza AGH University of Science and Technology al. A. Mickiewicza 30, 30-059 Krakow, Poland

More information

The Six Principles of BW Data Validation

The Six Principles of BW Data Validation The Problem The Six Principles of BW Data Validation Users do not trust the data in your BW system. The Cause By their nature, data warehouses store large volumes of data. For analytical purposes, the

More information

ASG WHITE PAPER DATA INTELLIGENCE. ASG s Enterprise Data Intelligence Solutions: Data Lineage Diving Deeper

ASG WHITE PAPER DATA INTELLIGENCE. ASG s Enterprise Data Intelligence Solutions: Data Lineage Diving Deeper THE NEED Knowing where data came from, how it moves through systems, and how it changes, is the most critical and most difficult task in any data management project. If that process known as tracing data

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Mining with Eve - Process Discovery and Event Structures

Mining with Eve - Process Discovery and Event Structures Mining with Eve - Process Discovery and Event Structures Robin Bergenthum, Benjamin Meis Department of Software Engineering, FernUniversität in Hagen {firstname.lastname}@fernuni-hagen.de Abstract. This

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Data Warehousing. Seminar report. Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science

Data Warehousing. Seminar report.  Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science A Seminar report On Data Warehousing Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science SUBMITTED TO: SUBMITTED BY: www.studymafia.org www.studymafia.org Preface

More information

Matching and Alignment: What is the Cost of User Post-match Effort?

Matching and Alignment: What is the Cost of User Post-match Effort? Matching and Alignment: What is the Cost of User Post-match Effort? (Short paper) Fabien Duchateau 1 and Zohra Bellahsene 2 and Remi Coletta 2 1 Norwegian University of Science and Technology NO-7491 Trondheim,

More information

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified

More information

Guide to Mitigating Risk in Industrial Automation with Database

Guide to Mitigating Risk in Industrial Automation with Database Guide to Mitigating Risk in Industrial Automation with Database Table of Contents 1.Industrial Automation and Data Management...2 2.Mitigating the Risks of Industrial Automation...3 2.1.Power failure and

More information

Mining Process Performance from Event Logs

Mining Process Performance from Event Logs Mining Process Performance from Event Logs The BPI Challenge 2012 Case Study A. Adriansyah and J.C.A.M Buijs Department of Mathematics and Computer Science Eindhoven University of Technology P.O. Box 513,

More information

ORACLE DATA SHEET ORACLE PARTITIONING

ORACLE DATA SHEET ORACLE PARTITIONING Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development,

More information

A Framework for Source Code metrics

A Framework for Source Code metrics A Framework for Source Code metrics Neli Maneva, Nikolay Grozev, Delyan Lilov Abstract: The paper presents our approach to the systematic and tool-supported source code measurement for quality analysis.

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Object Persistence Design Guidelines

Object Persistence Design Guidelines Object Persistence Design Guidelines Motivation Design guideline supports architects and developers in design and development issues of binding object-oriented applications to data sources The major task

More information

The ITIL v.3. Foundation Examination

The ITIL v.3. Foundation Examination The ITIL v.3. Foundation Examination ITIL v. 3 Foundation Examination: Sample Paper 4, version 3.0 Multiple Choice Instructions 1. All 40 questions should be attempted. 2. There are no trick questions.

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Enterprise Manager: Scalable Oracle Management

Enterprise Manager: Scalable Oracle Management Session id:xxxxx Enterprise Manager: Scalable Oracle John Kennedy System Products, Server Technologies, Oracle Corporation Enterprise Manager 10G Database Oracle World 2003 Agenda Enterprise Manager 10G

More information

2 The IBM Data Governance Unified Process

2 The IBM Data Governance Unified Process 2 The IBM Data Governance Unified Process The benefits of a commitment to a comprehensive enterprise Data Governance initiative are many and varied, and so are the challenges to achieving strong Data Governance.

More information

Identify and Eliminate Oracle Database Bottlenecks

Identify and Eliminate Oracle Database Bottlenecks Identify and Eliminate Oracle Database Bottlenecks Improving database performance isn t just about optimizing your queries. Oftentimes the infrastructure that surrounds it can inhibit or enhance Oracle

More information

Real-time Recovery Architecture as a Service by, David Floyer

Real-time Recovery Architecture as a Service by, David Floyer Real-time Recovery Architecture as a Service by, David Floyer September 2nd, 2016 For enterprise executives trying to achieve aggressive RPO and RTO SLAs, Wikibon believes that batch backup appliances

More information

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution Tervela helps companies move large volumes of sensitive data safely and securely over network distances great and small. We have been

More information

A Tool for Supporting Object-Aware Processes

A Tool for Supporting Object-Aware Processes A Tool for Supporting Object-Aware Processes Carolina Ming Chiao, Vera Künzle, Kevin Andrews, Manfred Reichert Institute of Databases and Information Systems University of Ulm, Germany Email: {carolina.chiao,

More information

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002 Managing Oracle Real Application Clusters An Oracle White Paper January 2002 Managing Oracle Real Application Clusters Overview...3 Installation and Configuration...3 Oracle Software Installation on a

More information

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Transaction Management A.R. Hurson Computer Science Missouri Science & Technology 1 Distributed Database System Note, this unit will be covered

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

D2.5 Data mediation. Project: ROADIDEA

D2.5 Data mediation. Project: ROADIDEA D2.5 Data mediation Project: ROADIDEA 215455 Document Number and Title: D2.5 Data mediation How to convert data with different formats Work-Package: WP2 Deliverable Type: Report Contractual Date of Delivery:

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

New York Cybersecurity. New York Cybersecurity. Requirements for Financial Services Companies (23NYCRR 500) Solution Brief

New York Cybersecurity. New York Cybersecurity. Requirements for Financial Services Companies (23NYCRR 500) Solution Brief Publication Date: March 10, 2017 Requirements for Financial Services Companies (23NYCRR 500) Solution Brief EventTracker 8815 Centre Park Drive, Columbia MD 21045 About EventTracker EventTracker s advanced

More information

ELTMaestro for Spark: Data integration on clusters

ELTMaestro for Spark: Data integration on clusters Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be

More information

Weak Levels of Consistency

Weak Levels of Consistency Weak Levels of Consistency - Some applications are willing to live with weak levels of consistency, allowing schedules that are not serialisable E.g. a read-only transaction that wants to get an approximate

More information

CorreLog. SQL Table Monitor Adapter Users Manual

CorreLog. SQL Table Monitor Adapter Users Manual CorreLog SQL Table Monitor Adapter Users Manual http://www.correlog.com mailto:support@correlog.com CorreLog, SQL Table Monitor Users Manual Copyright 2008-2018, CorreLog, Inc. All rights reserved. No

More information

Microsoft Developing SQL Databases

Microsoft Developing SQL Databases 1800 ULEARN (853 276) www.ddls.com.au Length 5 days Microsoft 20762 - Developing SQL Databases Price $4290.00 (inc GST) Version C Overview This five-day instructor-led course provides students with the

More information

Jim Mains Director of Business Strategy and Media Services Media Solutions Group, EMC Corporation

Jim Mains Director of Business Strategy and Media Services Media Solutions Group, EMC Corporation Media Asset Management Databases The Heart of the System and Critical Decisions and Steps for Success Jim Mains Director of Business Strategy and Media Services Media Solutions Group, EMC Corporation Agenda

More information

Partitioning in Oracle Database 10g Release 2. An Oracle White Paper May 2005

Partitioning in Oracle Database 10g Release 2. An Oracle White Paper May 2005 Partitioning in Oracle Database 10g Release 2 An Oracle White Paper May 2005 Oracle Partitioning EXECUTIVE OVERVIEW Oracle Partitioning will enhance the manageability, performance, and availability of

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

turning data into dollars

turning data into dollars turning data into dollars Tom s Ten Data Tips December 2008 ETL ETL stands for Extract, Transform, Load. This process merges and integrates information from source systems in the data warehouse (DWH).

More information

Developing SQL Databases

Developing SQL Databases Course 20762B: Developing SQL Databases Page 1 of 9 Developing SQL Databases Course 20762B: 4 days; Instructor-Led Introduction This four-day instructor-led course provides students with the knowledge

More information

Non-Dominated Bi-Objective Genetic Mining Algorithm

Non-Dominated Bi-Objective Genetic Mining Algorithm Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1607-1614 Research India Publications http://www.ripublication.com Non-Dominated Bi-Objective Genetic Mining

More information

Towards Process Instances Building for Spaghetti Processes

Towards Process Instances Building for Spaghetti Processes Towards Process Instances Building for Spaghetti Processes Claudia Diamantini 1, Laura Genga 1, Domenico Potena 1, and Wil M.P. van der Aalst 2 1 Information Engineering Department Università Politecnica

More information

Introduction to K2View Fabric

Introduction to K2View Fabric Introduction to K2View Fabric 1 Introduction to K2View Fabric Overview In every industry, the amount of data being created and consumed on a daily basis is growing exponentially. Enterprises are struggling

More information

Automated Testing of Tableau Dashboards

Automated Testing of Tableau Dashboards Kinesis Technical Whitepapers April 2018 Kinesis CI Automated Testing of Tableau Dashboards Abstract Companies make business critical decisions every day, based on data from their business intelligence

More information

ITIL Event Management in the Cloud

ITIL Event Management in the Cloud ITIL Event Management in the Cloud An AWS Cloud Adoption Framework Addendum January 2017 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Hierarchical Clustering of Process Schemas

Hierarchical Clustering of Process Schemas Hierarchical Clustering of Process Schemas Claudia Diamantini, Domenico Potena Dipartimento di Ingegneria Informatica, Gestionale e dell'automazione M. Panti, Università Politecnica delle Marche - via

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Database Assessment for PDMS

Database Assessment for PDMS Database Assessment for PDMS Abhishek Gaurav, Nayden Markatchev, Philip Rizk and Rob Simmonds Grid Research Centre, University of Calgary. http://grid.ucalgary.ca 1 Introduction This document describes

More information

Automatic Discovery of Data-Centric and Artifact-Centric Processes

Automatic Discovery of Data-Centric and Artifact-Centric Processes Automatic Discovery of Data-Centric and Artifact-Centric Processes E.H.J. Nooijen, B.F. van Dongen, and D. Fahland Eindhoven University of Technology Eindhoven, The Netherlands enooijen@gmail.com, {b.f.v.dongen,d.fahland}@tue.nl

More information

How to Manage your Process Mining Analysis - Best Practices and Challenges. Willy van de Schoot Process Mining Camp June 15 th, 2015

How to Manage your Process Mining Analysis - Best Practices and Challenges. Willy van de Schoot Process Mining Camp June 15 th, 2015 How to Manage your Process Mining Analysis - Best Practices and Challenges Willy van de Schoot Process Mining Camp June 15 th, 2015 Atos Managed Services Atos Managed Services Manage customer ICT infrastructure

More information

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0. IBM Optim Performance Manager Extended Edition V4.1.0.1 Best Practices Deploying Optim Performance Manager in large scale environments Ute Baumbach (bmb@de.ibm.com) Optim Performance Manager Development

More information

Certified Information Systems Auditor (CISA)

Certified Information Systems Auditor (CISA) Certified Information Systems Auditor (CISA) 1. Domain 1 The Process of Auditing Information Systems Provide audit services in accordance with IT audit standards to assist the organization in protecting

More information

Online Conformance Checking for Petri Nets and Event Streams

Online Conformance Checking for Petri Nets and Event Streams Online Conformance Checking for Petri Nets and Event Streams Andrea Burattin University of Innsbruck, Austria; Technical University of Denmark, Denmark andbur@dtu.dk Abstract. Within process mining, we

More information

Accurate study guides, High passing rate! Testhorse provides update free of charge in one year!

Accurate study guides, High passing rate! Testhorse provides update free of charge in one year! Accurate study guides, High passing rate! Testhorse provides update free of charge in one year! http://www.testhorse.com Exam : 70-467 Title : Designing Business Intelligence Solutions with Microsoft SQL

More information

A ProM Operational Support Provider for Predictive Monitoring of Business Processes

A ProM Operational Support Provider for Predictive Monitoring of Business Processes A ProM Operational Support Provider for Predictive Monitoring of Business Processes Marco Federici 1,2, Williams Rizzi 1,2, Chiara Di Francescomarino 1, Marlon Dumas 3, Chiara Ghidini 1, Fabrizio Maria

More information

Online Conformance Checking for Petri Nets and Event Streams

Online Conformance Checking for Petri Nets and Event Streams Downloaded from orbit.dtu.dk on: Apr 30, 2018 Online Conformance Checking for Petri Nets and Event Streams Burattin, Andrea Published in: Online Proceedings of the BPM Demo Track 2017 Publication date:

More information

20762B: DEVELOPING SQL DATABASES

20762B: DEVELOPING SQL DATABASES ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to

More information

Process Model Consistency Measurement

Process Model Consistency Measurement IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 7, Issue 6 (Nov. - Dec. 2012), PP 40-44 Process Model Consistency Measurement Sukanth Sistla CSE Department, JNTUniversity,

More information

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon The data warehouse environment - like all other computer environments - requires hardware resources. Given the volume of data and the type of processing

More information

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa Data Warehousing Data Warehousing and Mining Lecture 8 by Hossen Asiful Mustafa Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information,

More information

DATABASE SCALABILITY AND CLUSTERING

DATABASE SCALABILITY AND CLUSTERING WHITE PAPER DATABASE SCALABILITY AND CLUSTERING As application architectures become increasingly dependent on distributed communication and processing, it is extremely important to understand where the

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 001-855-844-3881 & 001-800-514-06-9 7 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

Oracle 1Z0-053 Exam Questions & Answers

Oracle 1Z0-053 Exam Questions & Answers Oracle 1Z0-053 Exam Questions & Answers Number: 1Z0-053 Passing Score: 660 Time Limit: 120 min File Version: 38.8 http://www.gratisexam.com/ Oracle 1Z0-053 Exam Questions & Answers Exam Name: Oracle Database

More information

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Charting the Course... MOC C: Developing SQL Databases. Course Summary Course Summary Description This five-day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL database. The course focuses on teaching individuals how to use

More information

Features of the architecture of decision support systems

Features of the architecture of decision support systems Features of the architecture of decision support systems van Hee, K.M. Published: 01/01/1987 Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)

More information

Db2 9.7 Create Table If Not Exists >>>CLICK HERE<<<

Db2 9.7 Create Table If Not Exists >>>CLICK HERE<<< Db2 9.7 Create Table If Not Exists The Explain tables capture access plans when the Explain facility is activated. You can create them using one of the following methods: for static SQL, The SYSTOOLS schema

More information

A Mechanism for Sequential Consistency in a Distributed Objects System

A Mechanism for Sequential Consistency in a Distributed Objects System A Mechanism for Sequential Consistency in a Distributed Objects System Cristian Ţăpuş, Aleksey Nogin, Jason Hickey, and Jerome White California Institute of Technology Computer Science Department MC 256-80,

More information

Hybrid Data Platform

Hybrid Data Platform UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,

More information

Microsoft. [MS20762]: Developing SQL Databases

Microsoft. [MS20762]: Developing SQL Databases [MS20762]: Developing SQL Databases Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This five-day

More information