VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages.

Size: px

Start display at page:

Download "VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages."

Edwina Williams
6 years ago
Views:

1 VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages. Jeevan Joishi MTech Research Associate, Software Analytics Research Lab (SARL) 1/109

MTech Thesis Evaluation Committee Members Thesis Adviser Prof.

Researcher at Siemens Corporate Research and Technology Faculty

Radha Krishna Pisipati Principal Research Scientist at Infosys

2 MTech Thesis Evaluation Committee Members Thesis Adviser Prof. Ashish Sureka Adjunct Faculty at IIIT-Delhi and currently Visiting Researcher at Siemens Corporate Research and Technology Faculty In-charge, Software Analytics Research Lab (SARL) External Examiner Dr. Radha Krishna Pisipati Principal Research Scientist at Infosys Technologies Limited. Internal Examiner Prof. Sandip Aine Faculty Member at IIIT-Delhi 2/109

3 Outline 1. Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References 3/109

4 Research Motivation and Aim Presentation Outline 1. Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Alpha Algorithm in SQL, RDBMS 4. Implementation of Alpha Algorithm in CQL, Column Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References 4/109

5 Research Motivation and Aim Why NoSQL? Introduction to NoSQL Global population accessing internet has increased tremendously. Most applications are hosted on the cloud and need to support users 24 hours a day, 365 days a year. 5/109 Figure taken from [17] Fig 1: Scale of internet usage.

6 Research Motivation and Aim Why NoSQL? Introduction to NoSQL Data is captured in huge volumes and consists of both structured and unstructured data. Amount of data is growing rapidly and nature of data is growing as well. 6/109 Figure taken from [17] Fig 2: Growth of data.

7 Research Motivation and Aim Why NoSQL? Introduction to NoSQL What is wrong with relational databases? Nothing! Relational Databases employ one size fits all philosophy for storage. Relational Databases are used when strong consistency is a must. Relational Databases can create problem when its time to scale. 7/109

8 Research Motivation and Aim Why NoSQL? Introduction to NoSQL Explosion of social media sites like Facebook, Twitter with large data needs. They had to capture and deal with very large volumes of data in a way which was difficult to deal with traditional RDBMS. Traditional databases are designed to scale up. We required a database that can scale out. When relational applications become successful, usage goes up. Joins are inherent in RDBMS and become very slow! Application developers find it difficult to get the dynamic scalability they need while maintaining the performance users demand. 8/109

9 Research Motivation and Aim Why NoSQL? Introduction to NoSQL We require a technology that scales out rather than scaling up! Scale Up- Add more processor, memory. Scale Out- Add more servers. 9/109 Figure taken from [18] Fig 3: Scale-up vs. Scale-out.

10 Research Motivation and Aim Introduction To NoSQL NoSQL Database. Hence, NoSQL databases were introduced: Not Only SQL Non-relational data stores. Do not require a fixed table schema. Do not strictly follow on ACID properties of database, instead focus on CAP(Consistency, Availability, Partition Tolerance). Column stores, Graph databases, Document stores. 10/109

11 Research Motivation and Aim Introduction to NoSQL RDBMS vs. NoSQL Scale up vs. Scale out Normalization vs. De-normalization ACID vs. CAP Schema vs. Schema-less Structured Data vs. Unstructured Data. 11/109

12 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Recor d No Name Address City State 01 Jeevan Joishi Uniworld Apartment Bangalore Karnataka 02 Kunal Gupta 15 th Cross Road Kanpur Uttar Pradesh 03 Priyanka Verma Sector-7 Jind Haryana 04 Nidhi Agarwal JJ colony Bhiwani Haryana Table 1: A RDBMS table. Fig 4: A Graph model. 12/109 Figure taken from [19]

13 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented In row oriented, to read specific attributes, whole record needs to be read. Joins in relational databases are compute-intensive tasks. However, graph databases can read individual values based on nodes, relationships or properties. Graph databases avoid joins by traversing relationship(s) using index-free adjacency. 13/109

14 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Fig. 5: Relationships in Relational databases. 14/109 Figure taken from [20]

15 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Fig. 5: Relationships in Relational databases. 15/109 Figure taken from [20]

16 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Fig. 6: Relationships in Graph databases. 16/109 Figure taken from [20]

17 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Non-native vs. Native Graph Processing Fig 7: Non-Native Graph Processing using Global lookup index Fig 8: Native Graph Processing using index-free adjacency 17/109

18 Research Motivation and Aim Process Mining Process Mining Process Mining is analysing a process using event log data. One of the key aspects is to study the social structure of the organization using event logs. 18/109 Fig 9: Types of Process Mining Techniques

19 Research Motivation and Aim Process Mining Process Mining Process Mining focuses on the analysis of process using the data present in event logs. Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 19/109

20 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 20/109

21 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 21/109

22 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 22/109

23 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 23/109

24 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 24/109

25 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 25/109

26 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 26/109

27 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 27/109

28 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 28/109

29 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 29/109

30 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 30/109

31 Research Motivation and Aim Process Mining Process Mining 3 types of process mining techniques: 1. Process Discovery 2. Process Conformance 3. Process Enhancement 3 types of process mining perspectives: 1. Control Flow Perspective 2. Organizational Perspective 3. Case Perspective. 31/109

32 Research Motivation and Aim Process Mining Similar Task Algorithm Similar Task algorithm focuses on identifying actors performing similar activities in the organizational perspective. It focuses on activities the actors perform irrespective of cases. It is based on the notion that people doing similar things have a stronger relation than people doing different things. 32/109

33 Research Motivation and Aim Process Mining Similar Task Algorithm Case Identifier Activity Identifier Actor 1 A Nidhi 2 A Nidhi 2 C Kunal 1 B Priyanka 3 A Pooja 1 C Nidhi 3 D Kunal 3 B Priyanka A B C D Nidhi Kunal Priyanka Pooja Astha Table 3: Actor-Activity Matrix 2 B Pooja 2 D Astha 1 D Astha 33/109 Table 2: Sample Event Log

34 Research Motivation and Aim Process Mining Similar - Task Algorithm Given two vectors of attributes, A and B, the Cosine-Similarity if given by Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha /109 Table 4: Figure taken from [21]. Cosine Similarity Values

35 Research Motivation and Aim Similar - Task Algorithm at a glance! Similar Task Algorithm at a glance! 35/109

36 Research Motivation and Aim Process Mining Sub Contract Algorithm Sub Contract algorithm focuses on how work moves among performers. The main idea is to count the number of times individual j performs an activity in between two activities performed by individual i. The relation between individuals are case dependent. 36/109

37 Research Motivation and Aim Process Mining Sub Contract Algorithm Case Identifier Activity Identifier Actor Case Identifier Activity Identifier Actor 1 A Nidhi 2 A Nidhi 2 C Kunal 1 B Priyanka 3 A Pooja 1 C Nidhi 3 D Kunal 3 B Priyanka 2 B Pooja 2 D Astha 1 D Astha 1 A Nidhi 1 B Priyanka Zoom Shape 1 1 C Nidhi 1 D Astha 2 A Nidhi 2 C Kunal 2 B Pooja 2 D Astha 3 A Pooja 3 D Kunal 3 B Priyanka Table 5: Sample Event Log Table 6: Organized Event Log 37/109

38 38/109

39 39/109

40 Research Motivation and Aim Process Mining Sub Contract Algorithm Case Identifier Activity Identifier Actor Case Identifier Activity Identifier Actor 1 A Nidhi 2 A Nidhi 2 C Kunal 1 B Priyanka 3 A Pooja 1 C Nidhi 3 D Kunal 3 B Priyanka 2 B Pooja 2 D Astha 1 D Astha 1 A Nidhi 1 B Priyanka 1 C Nidhi 1 D Astha 2 A Nidhi 2 C Kunal 2 B Pooja 2 D Astha 3 A Pooja 3 D Kunal 3 B Priyanka Table 5: Sample Event Log Table 6: Organized Event Log 40/109

41 Research Motivation and Aim Process Mining Sub - Contract Algorithm normal = 4.0 Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 7: Zoom Shape 1 Sub Contraction Values before Normalization Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 8: Sub Contraction Values after Normalization 41/109

42 42/109

43 43/109

44 Research Motivation and Aim Process Mining Sub - Contract Algorithm normal = 4.0 Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 7: Sub Contraction Values before Normalization Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 8: Sub Contraction Values after Normalization 44/109

45 45/109

46 46/109

47 Research Motivation and Aim Process Mining Sub - Contract Algorithm normal = 4.0 Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 7: Sub Contraction Values before Normalization Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 8: Sub Contraction Values after Normalization 47/109

48 Research Motivation and Aim Sub - Contract Algorithm at a glance! Sub Contract Algorithm at a glance I 48/109

49 Research Motivation and Aim Sub - Contract Algorithm at a glance! Sub Contract Algorithm at a glance II 49/109

50 Research Motivation and Aim Research Motivation and Aim Query languages provide the most standard way to interact with the database. We, try to implement process mining algorithm using database query languages to the extent possible so that our application is tightly coupled to the database. Our work lies at the intersection of Process Mining and NoSQL databases. 50/109

51 Research Motivation and Aim Research Aim Research Aim. To investigate the intersection of Process Mining and Graph Database(s) for detecting social, hierarchical structures. To understand application needs that can be modelled into this new domain. To implement Similar-Task algorithm and Sub-Contract algorithm in row-oriented database, MySQL. To implement Similar-Task algorithm and Sub-Contract algorithm in graph oriented database, Neo4j. To compare performance of Similar-Task algorithm and Sub-Contract Algorithm in MySQL and Neo4j. 51/109

52 Presentation Outline 52/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

53 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Relational Databases. Implementation of Mining Algorithms in Relational Databases Ordonez et al. [5] Implement k-means clustering algorithm in SQL. Cluster large datasets in RDBMS. Define suitable tables, index them and write suitable queries for clustering purposes. Ordonez et al. [6] Extend own work in [5]. Efficient implementation of EM algorithm to perform clustering in very large datasets. 53/109

54 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Relational Databases Implementation of Mining Algorithms in Relational Databases Berzal et al. [7] Implemented Tree Based Association Rule Mining to discover interesting patterns in relational databases. Sattler et al. [8] Applied data mining techniques on a decision tree and classifier. Tight coupling of data mining and database systems. 54/109

55 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Graph Databases Implementation of Mining Algorithms in Graph Databases Wang et al. [9] Studied structural pattern mining for large disk based graph databases. They presented a novel ADI index structure and efficient algorithms for mining frequent pattern. Wang et al. [10] Presented techniques to obtain scalable mining in graph databases. 55/109

56 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Graph Databases. Implementation of Mining Algorithms in Graph Databases Huan et al. [11] Presented novel technique to mine maximal frequent sub-graph in graph databases. Ozaki et al. [12] Came up with hyper-clique pattern in graph databases. Used hyper-clique pattern to detect highly correlated sub-graphs. 56/109

57 Related Work and Novel Research Contributions Performance Comparison of Mining Algorithms in Relational and Graph Databases. Performance Comparison of Mining Algorithms in Relational and Graph Databases. Vicknair et al. [13] Performance comparison of Relational and Graph databases for data provenance systems. McColl et al. [14] Evaluated performance of series of open-source graph databases. Used various graph algorithms for a graph setup consisting of 256 million nodes. 57/109

58 Related Work and Novel Research Contributions Performance Comparison of Mining Algorithms in Relational and Graph Databases. Performance Comparison of Mining Algorithms in Relational and Graph Databases. Ciglan et al. [15] Benchmarked graph databases over graph traversal algorithms. Macko et al. [16] Presented a performance introspection framework for Graph database, PIG. PIG provided tools and mechanisms to understand performance of graph database. 58/109

59 Related Work and Novel Research Contributions Novel Research Contributions. Novel Research Contributions While there has been work done in implementing data mining algorithms in relational and graph databases, we are, First to implement organizational mining algorithms (Similar-Task and Sub-Contract) in row oriented database MySQL using SQL. First to implement organizational mining algorithms (Similar-Task and Sub-Contract) in graph oriented database Neo4j using CYPHER. Performance Benchmarking of organizational mining algorithms (Similar-Task and Sub-Contract) on MySQL and Neo4j. 59/109

60 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Presentation Outline 60/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

61 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Steps Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad tasks Declare and iterate cursor to select distinct tasks. Create a table to store result. Fetch actors vector and calculate Cosine Similarity. Write results to the result table. 61/109

62 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Define and iterate cursor Declare cursor to select distinct tasks from table Open cursor. Loop through the results returned by the cursor. 62/109

63 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Declare table to store results Dynamically create table with the specified table-name. Prepare SQL statements from the query and execute it. 63/109

64 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Fetch actors vector and calculate Cosine-Similarity I. Prepare query to insert into table Define variables to store values for cosine-similarity calculation. 64/109

65 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Fetch actors vector and calculate Cosine-Similarity II. Inside the cursor, collect distinct tasks from the tables for the required calculation. 65/109

66 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Fetch actors vector and calculate Cosine-Similarity III. Append parts of cosine similarity calculation to the SQL query. 66/109

67 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Update Final Results I. Declare a cursor to get all distinct teams. Iterate through the cursor to get distinct teams 67/109

68 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Update Final Results II. Form a query by for creating table and taking distinct teams as columns. Inside the cursor loop, append distinct teams as columns of the table. 68/109

69 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Update Final Results III. Form a query for inserting values into the table (resultant table) Inside the cursor loop, assign similarity values at the respective column (match teams). 69/109

70 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Steps Sub-Contract Algorithm implementation can be studied under four (4) broad categories: Create table to store results. Find distinct case identifiers. Update normal and find sub-contraction within each case. Normalize the result. 70/109

71 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Create table to store results I Declare cursor to select distinct actors. Iterate through the cursor to collect the distinct actors. 71/109

72 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Create table to store results II Form a query to create a table. Inside the cursor, append each distinct actor as part of the query. 72/109

73 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Find distinct case identifiers Declare cursor to select distinct case identifiers with count >= 3 Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase. 73/109

74 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction I. Update normal. 74/109

75 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction II. Declare a cursor to find sub-contracting actors. 75/109

76 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction III. Iterate through the cursor to find IDs of actor 76/109

77 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction IV. Declare cursor to find sub-contracting actors. Iterate through the cursor to find IDs of sub-contracting actors. 77/109

78 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction V. For any pair of sub-contracting actor, insert or update sub-contract value between them. 78/109

79 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Normalize the result. Declare cursor to select distinct actors that formed columns of the result table For each column, form an update query and normalize it by normal 79/109

80 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented Presentation Outline 80/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

81 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Similar Task Algorithm. Steps Implementation of Similar Task algorithm in CYPHER consists mainly of two (2) broad functions. Load data with Actor and activity nodes being unique. Calculate Cosine-Similarity between actors. 81/109

82 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Similar Task Algorithm. Load actor and activity node uniquely. Load data directly from the data file. Make unique nodes for actor and activity. 82/109

83 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Similar Task Algorithm. Calculate Cosine - Similarity. Match common activities between actors and calculate similarity. 83/109

84 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Steps Implementation of Sub Contract algorithm in CYPHER consists mainly of four (4) broad functions. Identify sub contracting actors within each case. Collect unique names and make new nodes for each of them. Set sub contraction strength between unique actor nodes. Calculate normal and normalize the sub contraction value. 84/109

85 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Identify sub contracting actors. Identify sub-contracting actors and connect then via [:RELATED_TO] relationship. 85/109

86 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Collect unique names and create unique actor nodes. Collect unique actor names Make new nodes, UNIQUEACTOR for each distinct actor names found. 86/109

Set sub contraction strength between unique actors.

87 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Set sub contraction strength between unique actors. For all sub-contracting actor, determine strength of sub-contraction between the actors. 87/109

88 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Calculate normal and normalize the result. Calculate normal. Normalize the sub-contraction strength between actors. 88/109

89 Experimental Dataset Presentation Outline 89/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

90 Experimental Dataset Experimental Dataset. We use Business Process Intelligence 2014 (BPI 2014) dataset to conduct our experiments. The log contains events from an incident and problem management system of Rabobank Group ICT. Contains data about managing requests from Rabobank Group ICT. Contains total records. 90/109

91 Experimental Dataset Dataset Details Fig. 11: Sample Event Log from MySQL. 91/109

92 Performance Comparison Presentation Outline 92/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

93 Performance Comparison Similar Task Algorithm Load Time Dataset size Load Time (msec) MySQL Neo4j 65, ,01, ,19, ,00, ,66, Table 9: Data Load Time Fig 12: Load Time 93/109

94 Performance Comparison Execution Time I Similar Task Algorithm Dataset Size Execution Time (msec) Step -8 Step -9 MySQL Neo4j MySQL Neo4j 65, ,01, ,19, ,00, ,66, Table 10: Execution Time of Step-8 & Step-9 94/109

95 Performance Comparison Execution Time II Similar Task Algorithm Fig. 13: Execution Time of Step-8 & Step-9 95/109

96 Performance Comparison Similar Task Algorithm Disk Usage in MySQL I Tables Dataset Size Dataset OTMatrix InitSim FinalSim Table 11: Disk Space Usage in MySQL. 96/109

97 Performance Comparison Similar Task Algorithm Disk Usage in MySQL II Fig 14: Disk Space Usage in MySQL. 97/109

98 Performance Comparison Similar Task Algorithm Disk Usage in Neo4j I Graph Elements Dataset Size Nodes Relationships Properties Table 12: Disk Space Usage in Neo4j. 98/109

99 Performance Comparison Similar Task Algorithm Disk Usage in Neo4j II Fig. 14: Disk Space Usage in Neo4j. 99/109

100 Performance Comparison Sub Contract Algorithm Load Time Dataset size Load Time (msec) MySQL Neo4j 65, ,01, ,19, ,00, ,66, Table 13: Load Time Fig 15: Load Time 100/109

101 Performance Comparison Sub Contract Algorithm Execution Time in MySQL I Dataset Size Execution Time (msec) Update Normal Sub-Contract Detection Update Result Normalize result 65, ,01, ,19, ,00, , ,66, Table 14: Execution Time for 4 main steps in MySQL. 101/109

102 Performance Comparison Sub Contract Algorithm Execution Time in MySQL II Fig 16: Execution Time for 4 main steps in MySQL. 102/109

103 Performance Comparison Sub Contract Algorithm Execution Time in Neo4j I Dataset Size Execution Time (msec) Update Normal Sub-Contract Detection Update Result Normalize result 65, ,01, ,19, ,00, ,66, Table 15: Execution Time for 4 main steps in Neo4j 103/109

104 Performance Comparison Sub Contract Algorithm Execution Time in Neo4j II Fig. 17: Execution Time for 4 main steps in Neo4j. 104/109

105 Performance Comparison Sub Contract Algorithm Disk Space Usage in MySQL I Tables Dataset Size Dataset OrganisedData ResultMatrix Table 15: Disk Space Usage in MySQL 105/109

106 Performance Comparison Sub Contract Algorithm Disk Space Usage in MySQL II Fig 17: Disk Space Usage in MySQL 106/109

107 Performance Comparison Sub Contract Algorithm Disk Space Usage in Neo4j I Graph Elements Dataset Size Nodes Relationships Properties Table 16: Disk Space Usage for graph elements in Neo4j. 107/109

108 Performance Comparison Sub Contract Algorithm Disk Space Usage in Neo4j II Fig. 18: Disk Space Usage for graph elements in Neo4j. 108/109

109 Conclusion Presentation Outline 109/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

110 . Conclusion Conclusion Neo4j performs better when it comes to loading data. Read operations in MySQL are comparatively faster for a single node setup. Neo4j gives much improved performance whenever relationships are of prime importance. Writes performance varied greatly for both cases. For smaller dataset, MySQL performs better whereas for larger dataset, Neo4j gives improved performance. 110/109

111 Limitations Presentation Outline 111/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations and Future work 9. References

112 Limitations Limitations and Future Work. Limitations Different sizes of single dataset was used. Single node setup of databases were used. Metrics used for organizational mining were only two in number. 112/109

113 Limitations and Future Work. Future Work Future Work To apply the algorithm over larger data sets. Create a multi-node Neo4j setup and implement the algorithms on it. Implement and study impact of process enhancement and recommendation systems. Experiment with more relational and graph oriented databases. 113/109

114 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented Presentation Outline 114/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References

115 References References I WIL VAN DER AALST. Process Mining: Overview and Opportunities. ACM, vi, 2, 11 P Neubauer. Graph databases, NOSQL and Neo4j? I Robinson, J Webber, E Eifrem. Graph Databases 115/109 Minseok Song, WIL M. P. Van Der Aalst. Towards comprehensive support for organizational mining. Elsevier, 2008.

116 References References II Carlos Ordonez. Programming the K-means clustering algorithm in SQL C. Ordonez and P. Cereghini. SQLEM: fast clustering in SQL using the EM algorithm. International Conference on Management of Data Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero. TBRAR: An ecient method for association rule mining in relational databases. Elsevier, K-U.Sattler and O.Dunemann. SQL Database Primitives for Decision Tree Classiers. Conference on Information and Knowledge Management, /109

117 References References III W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan. Graphminer: a structural pattern mining system for large disk based graph databases and its applications. ACM, C Wang, W Wang, Y Zhu, B Shi, J Pei. Scalable Mining of large disk based graph databases. ACM, J Huan, W Wang, J Prins. SPIN: mining maximal frequent subgraphs from graph databases. ACM, T Ozaki, T Okhwaha. Mining correlated subgraphs in graph databases. Advancement in Knowledge Discovery and Data Mining, /109

118 References References IV C Vicknair, M Macais, Z Zhao, X Nan, Y Chen. A comparison of graph databases and a relational database: a data provenance perspective ACM, RC McColl, R Ediger, J Poovey, D Campbell. A performance evaluation of open-source graph databases. ACM, M Ciglan, A Averbuch, L Hluchy Benchmarking graph traversal operations over graph databases. IEEE, P Macko, D Margo, M Seltzer. Performance introspection of graph databases ACM, /109

119 References References V 119/109 Why NOSQL? Couchbase. Scale-out vs. Scale-up. Introduction to Graph Databases and Neo4j. From Relational to Neo4j. Cosine- Similarity

NETWORK FAILURES AND ROOT CAUSE ANALYSIS: AN APPROACH USING GRAPH DATABASES

NETWORK FAILURES AND ROOT CAUSE ANALYSIS: AN APPROACH USING GRAPH DATABASES 1 A. VIJAY KUMAR, 2 G. ANJAN BABU Department of Computer Science, S V University, Tirupati, India Abstract - Detecting the origin