VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages.
|
|
- Edwina Williams
- 6 years ago
- Views:
Transcription
1 VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages. Jeevan Joishi MTech Research Associate, Software Analytics Research Lab (SARL) 1/109
2 MTech Thesis Evaluation Committee Members Thesis Adviser Prof. Ashish Sureka Adjunct Faculty at IIIT-Delhi and currently Visiting Researcher at Siemens Corporate Research and Technology Faculty In-charge, Software Analytics Research Lab (SARL) External Examiner Dr. Radha Krishna Pisipati Principal Research Scientist at Infosys Technologies Limited. Internal Examiner Prof. Sandip Aine Faculty Member at IIIT-Delhi 2/109
3 Outline 1. Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References 3/109
4 Research Motivation and Aim Presentation Outline 1. Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Alpha Algorithm in SQL, RDBMS 4. Implementation of Alpha Algorithm in CQL, Column Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References 4/109
5 Research Motivation and Aim Why NoSQL? Introduction to NoSQL Global population accessing internet has increased tremendously. Most applications are hosted on the cloud and need to support users 24 hours a day, 365 days a year. 5/109 Figure taken from [17] Fig 1: Scale of internet usage.
6 Research Motivation and Aim Why NoSQL? Introduction to NoSQL Data is captured in huge volumes and consists of both structured and unstructured data. Amount of data is growing rapidly and nature of data is growing as well. 6/109 Figure taken from [17] Fig 2: Growth of data.
7 Research Motivation and Aim Why NoSQL? Introduction to NoSQL What is wrong with relational databases? Nothing! Relational Databases employ one size fits all philosophy for storage. Relational Databases are used when strong consistency is a must. Relational Databases can create problem when its time to scale. 7/109
8 Research Motivation and Aim Why NoSQL? Introduction to NoSQL Explosion of social media sites like Facebook, Twitter with large data needs. They had to capture and deal with very large volumes of data in a way which was difficult to deal with traditional RDBMS. Traditional databases are designed to scale up. We required a database that can scale out. When relational applications become successful, usage goes up. Joins are inherent in RDBMS and become very slow! Application developers find it difficult to get the dynamic scalability they need while maintaining the performance users demand. 8/109
9 Research Motivation and Aim Why NoSQL? Introduction to NoSQL We require a technology that scales out rather than scaling up! Scale Up- Add more processor, memory. Scale Out- Add more servers. 9/109 Figure taken from [18] Fig 3: Scale-up vs. Scale-out.
10 Research Motivation and Aim Introduction To NoSQL NoSQL Database. Hence, NoSQL databases were introduced: Not Only SQL Non-relational data stores. Do not require a fixed table schema. Do not strictly follow on ACID properties of database, instead focus on CAP(Consistency, Availability, Partition Tolerance). Column stores, Graph databases, Document stores. 10/109
11 Research Motivation and Aim Introduction to NoSQL RDBMS vs. NoSQL Scale up vs. Scale out Normalization vs. De-normalization ACID vs. CAP Schema vs. Schema-less Structured Data vs. Unstructured Data. 11/109
12 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Recor d No Name Address City State 01 Jeevan Joishi Uniworld Apartment Bangalore Karnataka 02 Kunal Gupta 15 th Cross Road Kanpur Uttar Pradesh 03 Priyanka Verma Sector-7 Jind Haryana 04 Nidhi Agarwal JJ colony Bhiwani Haryana Table 1: A RDBMS table. Fig 4: A Graph model. 12/109 Figure taken from [19]
13 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented In row oriented, to read specific attributes, whole record needs to be read. Joins in relational databases are compute-intensive tasks. However, graph databases can read individual values based on nodes, relationships or properties. Graph databases avoid joins by traversing relationship(s) using index-free adjacency. 13/109
14 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Fig. 5: Relationships in Relational databases. 14/109 Figure taken from [20]
15 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Fig. 5: Relationships in Relational databases. 15/109 Figure taken from [20]
16 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Fig. 6: Relationships in Graph databases. 16/109 Figure taken from [20]
17 Research Motivation and Aim Row Oriented vs. Graph Oriented Database Row Oriented vs. Graph Oriented Non-native vs. Native Graph Processing Fig 7: Non-Native Graph Processing using Global lookup index Fig 8: Native Graph Processing using index-free adjacency 17/109
18 Research Motivation and Aim Process Mining Process Mining Process Mining is analysing a process using event log data. One of the key aspects is to study the social structure of the organization using event logs. 18/109 Fig 9: Types of Process Mining Techniques
19 Research Motivation and Aim Process Mining Process Mining Process Mining focuses on the analysis of process using the data present in event logs. Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 19/109
20 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 20/109
21 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 21/109
22 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 22/109
23 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 23/109
24 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 24/109
25 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 25/109
26 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 26/109
27 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 27/109
28 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 28/109
29 Research Motivation and Aim Process Mining Process Mining Each event in an event log record details in an activity. Each event is associated with Case Identifiers (CaseID). Each event has a timestamp. Each event has an activity that is being performed. An event has an actor that handles the event. Additionally, each such event may include a unique identifier. 29/109
30 Research Motivation and Aim Process Mining Process Mining Fig. 10: An example Event Log. 30/109
31 Research Motivation and Aim Process Mining Process Mining 3 types of process mining techniques: 1. Process Discovery 2. Process Conformance 3. Process Enhancement 3 types of process mining perspectives: 1. Control Flow Perspective 2. Organizational Perspective 3. Case Perspective. 31/109
32 Research Motivation and Aim Process Mining Similar Task Algorithm Similar Task algorithm focuses on identifying actors performing similar activities in the organizational perspective. It focuses on activities the actors perform irrespective of cases. It is based on the notion that people doing similar things have a stronger relation than people doing different things. 32/109
33 Research Motivation and Aim Process Mining Similar Task Algorithm Case Identifier Activity Identifier Actor 1 A Nidhi 2 A Nidhi 2 C Kunal 1 B Priyanka 3 A Pooja 1 C Nidhi 3 D Kunal 3 B Priyanka A B C D Nidhi Kunal Priyanka Pooja Astha Table 3: Actor-Activity Matrix 2 B Pooja 2 D Astha 1 D Astha 33/109 Table 2: Sample Event Log
34 Research Motivation and Aim Process Mining Similar - Task Algorithm Given two vectors of attributes, A and B, the Cosine-Similarity if given by Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha /109 Table 4: Figure taken from [21]. Cosine Similarity Values
35 Research Motivation and Aim Similar - Task Algorithm at a glance! Similar Task Algorithm at a glance! 35/109
36 Research Motivation and Aim Process Mining Sub Contract Algorithm Sub Contract algorithm focuses on how work moves among performers. The main idea is to count the number of times individual j performs an activity in between two activities performed by individual i. The relation between individuals are case dependent. 36/109
37 Research Motivation and Aim Process Mining Sub Contract Algorithm Case Identifier Activity Identifier Actor Case Identifier Activity Identifier Actor 1 A Nidhi 2 A Nidhi 2 C Kunal 1 B Priyanka 3 A Pooja 1 C Nidhi 3 D Kunal 3 B Priyanka 2 B Pooja 2 D Astha 1 D Astha 1 A Nidhi 1 B Priyanka Zoom Shape 1 1 C Nidhi 1 D Astha 2 A Nidhi 2 C Kunal 2 B Pooja 2 D Astha 3 A Pooja 3 D Kunal 3 B Priyanka Table 5: Sample Event Log Table 6: Organized Event Log 37/109
38 38/109
39 39/109
40 Research Motivation and Aim Process Mining Sub Contract Algorithm Case Identifier Activity Identifier Actor Case Identifier Activity Identifier Actor 1 A Nidhi 2 A Nidhi 2 C Kunal 1 B Priyanka 3 A Pooja 1 C Nidhi 3 D Kunal 3 B Priyanka 2 B Pooja 2 D Astha 1 D Astha 1 A Nidhi 1 B Priyanka 1 C Nidhi 1 D Astha 2 A Nidhi 2 C Kunal 2 B Pooja 2 D Astha 3 A Pooja 3 D Kunal 3 B Priyanka Table 5: Sample Event Log Table 6: Organized Event Log 40/109
41 Research Motivation and Aim Process Mining Sub - Contract Algorithm normal = 4.0 Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 7: Zoom Shape 1 Sub Contraction Values before Normalization Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 8: Sub Contraction Values after Normalization 41/109
42 42/109
43 43/109
44 Research Motivation and Aim Process Mining Sub - Contract Algorithm normal = 4.0 Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 7: Sub Contraction Values before Normalization Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 8: Sub Contraction Values after Normalization 44/109
45 45/109
46 46/109
47 Research Motivation and Aim Process Mining Sub - Contract Algorithm normal = 4.0 Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 7: Sub Contraction Values before Normalization Nidhi Kunal Priyanka Pooja Astha Nidhi Kunal Priyanka Pooja Astha Table 8: Sub Contraction Values after Normalization 47/109
48 Research Motivation and Aim Sub - Contract Algorithm at a glance! Sub Contract Algorithm at a glance I 48/109
49 Research Motivation and Aim Sub - Contract Algorithm at a glance! Sub Contract Algorithm at a glance II 49/109
50 Research Motivation and Aim Research Motivation and Aim Query languages provide the most standard way to interact with the database. We, try to implement process mining algorithm using database query languages to the extent possible so that our application is tightly coupled to the database. Our work lies at the intersection of Process Mining and NoSQL databases. 50/109
51 Research Motivation and Aim Research Aim Research Aim. To investigate the intersection of Process Mining and Graph Database(s) for detecting social, hierarchical structures. To understand application needs that can be modelled into this new domain. To implement Similar-Task algorithm and Sub-Contract algorithm in row-oriented database, MySQL. To implement Similar-Task algorithm and Sub-Contract algorithm in graph oriented database, Neo4j. To compare performance of Similar-Task algorithm and Sub-Contract Algorithm in MySQL and Neo4j. 51/109
52 Presentation Outline 52/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
53 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Relational Databases. Implementation of Mining Algorithms in Relational Databases Ordonez et al. [5] Implement k-means clustering algorithm in SQL. Cluster large datasets in RDBMS. Define suitable tables, index them and write suitable queries for clustering purposes. Ordonez et al. [6] Extend own work in [5]. Efficient implementation of EM algorithm to perform clustering in very large datasets. 53/109
54 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Relational Databases Implementation of Mining Algorithms in Relational Databases Berzal et al. [7] Implemented Tree Based Association Rule Mining to discover interesting patterns in relational databases. Sattler et al. [8] Applied data mining techniques on a decision tree and classifier. Tight coupling of data mining and database systems. 54/109
55 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Graph Databases Implementation of Mining Algorithms in Graph Databases Wang et al. [9] Studied structural pattern mining for large disk based graph databases. They presented a novel ADI index structure and efficient algorithms for mining frequent pattern. Wang et al. [10] Presented techniques to obtain scalable mining in graph databases. 55/109
56 Related Work and Novel Research Contributions Implementation of Mining Algorithms in Graph Databases. Implementation of Mining Algorithms in Graph Databases Huan et al. [11] Presented novel technique to mine maximal frequent sub-graph in graph databases. Ozaki et al. [12] Came up with hyper-clique pattern in graph databases. Used hyper-clique pattern to detect highly correlated sub-graphs. 56/109
57 Related Work and Novel Research Contributions Performance Comparison of Mining Algorithms in Relational and Graph Databases. Performance Comparison of Mining Algorithms in Relational and Graph Databases. Vicknair et al. [13] Performance comparison of Relational and Graph databases for data provenance systems. McColl et al. [14] Evaluated performance of series of open-source graph databases. Used various graph algorithms for a graph setup consisting of 256 million nodes. 57/109
58 Related Work and Novel Research Contributions Performance Comparison of Mining Algorithms in Relational and Graph Databases. Performance Comparison of Mining Algorithms in Relational and Graph Databases. Ciglan et al. [15] Benchmarked graph databases over graph traversal algorithms. Macko et al. [16] Presented a performance introspection framework for Graph database, PIG. PIG provided tools and mechanisms to understand performance of graph database. 58/109
59 Related Work and Novel Research Contributions Novel Research Contributions. Novel Research Contributions While there has been work done in implementing data mining algorithms in relational and graph databases, we are, First to implement organizational mining algorithms (Similar-Task and Sub-Contract) in row oriented database MySQL using SQL. First to implement organizational mining algorithms (Similar-Task and Sub-Contract) in graph oriented database Neo4j using CYPHER. Performance Benchmarking of organizational mining algorithms (Similar-Task and Sub-Contract) on MySQL and Neo4j. 59/109
60 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Presentation Outline 60/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
61 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Steps Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad tasks Declare and iterate cursor to select distinct tasks. Create a table to store result. Fetch actors vector and calculate Cosine Similarity. Write results to the result table. 61/109
62 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Define and iterate cursor Declare cursor to select distinct tasks from table Open cursor. Loop through the results returned by the cursor. 62/109
63 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Declare table to store results Dynamically create table with the specified table-name. Prepare SQL statements from the query and execute it. 63/109
64 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Fetch actors vector and calculate Cosine-Similarity I. Prepare query to insert into table Define variables to store values for cosine-similarity calculation. 64/109
65 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Fetch actors vector and calculate Cosine-Similarity II. Inside the cursor, collect distinct tasks from the tables for the required calculation. 65/109
66 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Fetch actors vector and calculate Cosine-Similarity III. Append parts of cosine similarity calculation to the SQL query. 66/109
67 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Update Final Results I. Declare a cursor to get all distinct teams. Iterate through the cursor to get distinct teams 67/109
68 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Update Final Results II. Form a query by for creating table and taking distinct teams as columns. Inside the cursor loop, append distinct teams as columns of the table. 68/109
69 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Similar Task Algorithm Update Final Results III. Form a query for inserting values into the table (resultant table) Inside the cursor loop, assign similarity values at the respective column (match teams). 69/109
70 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Steps Sub-Contract Algorithm implementation can be studied under four (4) broad categories: Create table to store results. Find distinct case identifiers. Update normal and find sub-contraction within each case. Normalize the result. 70/109
71 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Create table to store results I Declare cursor to select distinct actors. Iterate through the cursor to collect the distinct actors. 71/109
72 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Create table to store results II Form a query to create a table. Inside the cursor, append each distinct actor as part of the query. 72/109
73 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Find distinct case identifiers Declare cursor to select distinct case identifiers with count >= 3 Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase. 73/109
74 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction I. Update normal. 74/109
75 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction II. Declare a cursor to find sub-contracting actors. 75/109
76 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction III. Iterate through the cursor to find IDs of actor 76/109
77 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction IV. Declare cursor to find sub-contracting actors. Iterate through the cursor to find IDs of sub-contracting actors. 77/109
78 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Update normal and find sub-contraction V. For any pair of sub-contracting actor, insert or update sub-contract value between them. 78/109
79 Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS Sub - Contract Algorithm. Normalize the result. Declare cursor to select distinct actors that formed columns of the result table For each column, form an update query and normalize it by normal 79/109
80 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented Presentation Outline 80/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
81 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Similar Task Algorithm. Steps Implementation of Similar Task algorithm in CYPHER consists mainly of two (2) broad functions. Load data with Actor and activity nodes being unique. Calculate Cosine-Similarity between actors. 81/109
82 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Similar Task Algorithm. Load actor and activity node uniquely. Load data directly from the data file. Make unique nodes for actor and activity. 82/109
83 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Similar Task Algorithm. Calculate Cosine - Similarity. Match common activities between actors and calculate similarity. 83/109
84 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Steps Implementation of Sub Contract algorithm in CYPHER consists mainly of four (4) broad functions. Identify sub contracting actors within each case. Collect unique names and make new nodes for each of them. Set sub contraction strength between unique actor nodes. Calculate normal and normalize the sub contraction value. 84/109
85 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Identify sub contracting actors. Identify sub-contracting actors and connect then via [:RELATED_TO] relationship. 85/109
86 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Collect unique names and create unique actor nodes. Collect unique actor names Make new nodes, UNIQUEACTOR for each distinct actor names found. 86/109
87 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Set sub contraction strength between unique actors. For all sub-contracting actor, determine strength of sub-contraction between the actors. 87/109
88 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented. Sub Contract Algorithm. Calculate normal and normalize the result. Calculate normal. Normalize the sub-contraction strength between actors. 88/109
89 Experimental Dataset Presentation Outline 89/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
90 Experimental Dataset Experimental Dataset. We use Business Process Intelligence 2014 (BPI 2014) dataset to conduct our experiments. The log contains events from an incident and problem management system of Rabobank Group ICT. Contains data about managing requests from Rabobank Group ICT. Contains total records. 90/109
91 Experimental Dataset Dataset Details Fig. 11: Sample Event Log from MySQL. 91/109
92 Performance Comparison Presentation Outline 92/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
93 Performance Comparison Similar Task Algorithm Load Time Dataset size Load Time (msec) MySQL Neo4j 65, ,01, ,19, ,00, ,66, Table 9: Data Load Time Fig 12: Load Time 93/109
94 Performance Comparison Execution Time I Similar Task Algorithm Dataset Size Execution Time (msec) Step -8 Step -9 MySQL Neo4j MySQL Neo4j 65, ,01, ,19, ,00, ,66, Table 10: Execution Time of Step-8 & Step-9 94/109
95 Performance Comparison Execution Time II Similar Task Algorithm Fig. 13: Execution Time of Step-8 & Step-9 95/109
96 Performance Comparison Similar Task Algorithm Disk Usage in MySQL I Tables Dataset Size Dataset OTMatrix InitSim FinalSim Table 11: Disk Space Usage in MySQL. 96/109
97 Performance Comparison Similar Task Algorithm Disk Usage in MySQL II Fig 14: Disk Space Usage in MySQL. 97/109
98 Performance Comparison Similar Task Algorithm Disk Usage in Neo4j I Graph Elements Dataset Size Nodes Relationships Properties Table 12: Disk Space Usage in Neo4j. 98/109
99 Performance Comparison Similar Task Algorithm Disk Usage in Neo4j II Fig. 14: Disk Space Usage in Neo4j. 99/109
100 Performance Comparison Sub Contract Algorithm Load Time Dataset size Load Time (msec) MySQL Neo4j 65, ,01, ,19, ,00, ,66, Table 13: Load Time Fig 15: Load Time 100/109
101 Performance Comparison Sub Contract Algorithm Execution Time in MySQL I Dataset Size Execution Time (msec) Update Normal Sub-Contract Detection Update Result Normalize result 65, ,01, ,19, ,00, , ,66, Table 14: Execution Time for 4 main steps in MySQL. 101/109
102 Performance Comparison Sub Contract Algorithm Execution Time in MySQL II Fig 16: Execution Time for 4 main steps in MySQL. 102/109
103 Performance Comparison Sub Contract Algorithm Execution Time in Neo4j I Dataset Size Execution Time (msec) Update Normal Sub-Contract Detection Update Result Normalize result 65, ,01, ,19, ,00, ,66, Table 15: Execution Time for 4 main steps in Neo4j 103/109
104 Performance Comparison Sub Contract Algorithm Execution Time in Neo4j II Fig. 17: Execution Time for 4 main steps in Neo4j. 104/109
105 Performance Comparison Sub Contract Algorithm Disk Space Usage in MySQL I Tables Dataset Size Dataset OrganisedData ResultMatrix Table 15: Disk Space Usage in MySQL 105/109
106 Performance Comparison Sub Contract Algorithm Disk Space Usage in MySQL II Fig 17: Disk Space Usage in MySQL 106/109
107 Performance Comparison Sub Contract Algorithm Disk Space Usage in Neo4j I Graph Elements Dataset Size Nodes Relationships Properties Table 16: Disk Space Usage for graph elements in Neo4j. 107/109
108 Performance Comparison Sub Contract Algorithm Disk Space Usage in Neo4j II Fig. 18: Disk Space Usage for graph elements in Neo4j. 108/109
109 Conclusion Presentation Outline 109/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
110 . Conclusion Conclusion Neo4j performs better when it comes to loading data. Read operations in MySQL are comparatively faster for a single node setup. Neo4j gives much improved performance whenever relationships are of prime importance. Writes performance varied greatly for both cases. For smaller dataset, MySQL performs better whereas for larger dataset, Neo4j gives improved performance. 110/109
111 Limitations Presentation Outline 111/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations and Future work 9. References
112 Limitations Limitations and Future Work. Limitations Different sizes of single dataset was used. Single node setup of databases were used. Metrics used for organizational mining were only two in number. 112/109
113 Limitations and Future Work. Future Work Future Work To apply the algorithm over larger data sets. Create a multi-node Neo4j setup and implement the algorithms on it. Implement and study impact of process enhancement and recommendation systems. Experiment with more relational and graph oriented databases. 113/109
114 Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented Presentation Outline 114/ Research Motivation and Aim 2. Related Work and Novel Research Contributions 3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS 4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented 5. Experimental Dataset 6. Performance Comparison 7. Conclusion 8. Limitations 9. References
115 References References I WIL VAN DER AALST. Process Mining: Overview and Opportunities. ACM, vi, 2, 11 P Neubauer. Graph databases, NOSQL and Neo4j? I Robinson, J Webber, E Eifrem. Graph Databases 115/109 Minseok Song, WIL M. P. Van Der Aalst. Towards comprehensive support for organizational mining. Elsevier, 2008.
116 References References II Carlos Ordonez. Programming the K-means clustering algorithm in SQL C. Ordonez and P. Cereghini. SQLEM: fast clustering in SQL using the EM algorithm. International Conference on Management of Data Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero. TBRAR: An ecient method for association rule mining in relational databases. Elsevier, K-U.Sattler and O.Dunemann. SQL Database Primitives for Decision Tree Classiers. Conference on Information and Knowledge Management, /109
117 References References III W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan. Graphminer: a structural pattern mining system for large disk based graph databases and its applications. ACM, C Wang, W Wang, Y Zhu, B Shi, J Pei. Scalable Mining of large disk based graph databases. ACM, J Huan, W Wang, J Prins. SPIN: mining maximal frequent subgraphs from graph databases. ACM, T Ozaki, T Okhwaha. Mining correlated subgraphs in graph databases. Advancement in Knowledge Discovery and Data Mining, /109
118 References References IV C Vicknair, M Macais, Z Zhao, X Nan, Y Chen. A comparison of graph databases and a relational database: a data provenance perspective ACM, RC McColl, R Ediger, J Poovey, D Campbell. A performance evaluation of open-source graph databases. ACM, M Ciglan, A Averbuch, L Hluchy Benchmarking graph traversal operations over graph databases. IEEE, P Macko, D Margo, M Seltzer. Performance introspection of graph databases ACM, /109
119 References References V 119/109 Why NOSQL? Couchbase. Scale-out vs. Scale-up. Introduction to Graph Databases and Neo4j. From Relational to Neo4j. Cosine- Similarity
NETWORK FAILURES AND ROOT CAUSE ANALYSIS: AN APPROACH USING GRAPH DATABASES
NETWORK FAILURES AND ROOT CAUSE ANALYSIS: AN APPROACH USING GRAPH DATABASES 1 A. VIJAY KUMAR, 2 G. ANJAN BABU Department of Computer Science, S V University, Tirupati, India Abstract - Detecting the origin
More informationPerformance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases
Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases Khalid Mahmood Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, Karachi Pakistan khalidmdar@yahoo.com
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationCassandra- A Distributed Database
Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional
More informationRELATIONAL DATABASE AND GRAPH DATABASE: A COMPARATIVE ANALYSIS
RELATIONAL DATABASE AND GRAPH DATABASE: A COMPARATIVE ANALYSIS Surajit Medhi 1, Hemanta K. Baruah 2 1 Department of Computer Science, Gauhati University, Assam, India 2 Bodoland University, Assam, India
More informationNOSQL Databases and Neo4j
NOSQL Databases and Neo4j Database and DBMS Database - Organized collection of data The term database is correctly applied to the data and their supporting data structures. DBMS - Database Management System:
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationA Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores
A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores Nikhil Dasharath Karande 1 Department of CSE, Sanjay Ghodawat Institutes, Atigre nikhilkarande18@gmail.com Abstract- This paper
More informationEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management
More informationL22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld
L22: NoSQL CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 4/5/2018 Several slides courtesy of Benny Kimelfeld 2 Outline 3 Introduction Transaction Consistency 4 main data models
More informationAdvanced Data Management Technologies Written Exam
Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This
More informationTWOO.COM CASE STUDY CUSTOMER SUCCESS STORY
TWOO.COM CUSTOMER SUCCESS STORY With over 30 million users, Twoo.com is Europe s leading social discovery site. Twoo runs the world s largest scale-out SQL deployment, with 4.4 billion transactions a day
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationDistributed Databases: SQL vs NoSQL
Distributed Databases: SQL vs NoSQL Seda Unal, Yuchen Zheng April 23, 2017 1 Introduction Distributed databases have become increasingly popular in the era of big data because of their advantages over
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationA Study of NoSQL Database
A Study of NoSQL Database International Journal of Engineering Research & Technology (IJERT) Biswajeet Sethi 1, Samaresh Mishra 2, Prasant ku. Patnaik 3 1,2,3 School of Computer Engineering, KIIT University
More informationIn-Memory Data processing using Redis Database
In-Memory Data processing using Redis Database Gurpreet Kaur Spal Department of Computer Science and Engineering Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, Punjab, India Jatinder Kaur
More informationKunal Gupta, Astha Sachdev, Ashish Sureka. Indraprastha Institute of Information Technology, Delhi (IIIT-D), New Delhi, India
Pragamana: Performance Comparison and Programming α-miner Algorithm in Relational Database Query Language and NoSQL Column-Oriented Using Apache Phoenix Kunal Gupta, Astha Sachdev, Ashish Sureka Indraprastha
More informationNon-Dominated Bi-Objective Genetic Mining Algorithm
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1607-1614 Research India Publications http://www.ripublication.com Non-Dominated Bi-Objective Genetic Mining
More informationPerformance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms
Int. J. Advanced Networking and Applications 458 Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms Puttegowda D Department of Computer Science, Ghousia
More informationHBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon
HBase vs Neo4j Technical overview Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon 12th October 2017 1 Contents 1 Introduction 3 2 Overview of HBase and Neo4j
More informationCISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases
CISC 7610 Lecture 4 Approaches to multimedia databases Topics: Document databases Graph databases Metadata Column databases NoSQL architectures: different tradeoffs for different workloads Already seen:
More informationEfficient Orienteering-Route Search over Uncertain Spatial Datasets
Efficient Orienteering-Route Search over Uncertain Spatial Datasets Mr. Nir DOLEV, Israel Dr. Yaron KANZA, Israel Prof. Yerach DOYTSHER, Israel 1 Route Search A standard search engine on the WWW returns
More informationMultidimensional Process Mining with PMCube Explorer
Multidimensional Process Mining with PMCube Explorer Thomas Vogelgesang and H.-Jürgen Appelrath Department of Computer Science University of Oldenburg, Germany thomas.vogelgesang@uni-oldenburg.de Abstract.
More informationTowards Practical Differential Privacy for SQL Queries. Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley
Towards Practical Differential Privacy for SQL Queries Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley Outline 1. Discovering real-world requirements 2. Elastic sensitivity & calculating sensitivity
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle NoSQL Database and Oracle Relational Database - A Perfect Fit Dave Rubin Director NoSQL Database Development 2 The following is intended to outline our general product direction. It is intended
More informationFive Common Myths About Scaling MySQL
WHITE PAPER Five Common Myths About Scaling MySQL Five Common Myths About Scaling MySQL In this age of data driven applications, the ability to rapidly store, retrieve and process data is incredibly important.
More informationPerformance Evaluation of Redis and MongoDB Databases for Handling Semi-structured Data
Performance Evaluation of Redis and MongoDB Databases for Handling Semi-structured Data Gurpreet Kaur Spal 1, Prof. Jatinder Kaur 2 1,2 Department of Computer Science and Engineering, Baba Banda Singh
More informationAPD tool: Mining Anomalous Patterns from Event Logs
APD tool: Mining Anomalous Patterns from Event Logs Laura Genga 1, Mahdi Alizadeh 1, Domenico Potena 2, Claudia Diamantini 2, and Nicola Zannone 1 1 Eindhoven University of Technology 2 Università Politecnica
More informationMongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM
MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL
More informationPresented by Nanditha Thinderu
Presented by Nanditha Thinderu Enterprise systems are highly distributed and heterogeneous which makes administration a complex task Application Performance Management tools developed to retrieve information
More informationTopics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL
Databases Topics History - RDBMS - SQL Architecture - SQL - NoSQL MongoDB, Mongoose Persistent Data Storage What features do we want in a persistent data storage system? We have been using text files to
More informationMongoDB Tutorial for Beginners
MongoDB Tutorial for Beginners Mongodb is a document-oriented NoSQL database used for high volume data storage. In this tutorial you will learn how Mongodb can be accessed and some of its important features
More informationInternational Journal of Informative & Futuristic Research ISSN:
www.ijifr.com Volume 5 Issue 8 April 2018 International Journal of Informative & Futuristic Research ISSN: 2347-1697 TRANSITION FROM TRADITIONAL DATABASES TO NOSQL DATABASES Paper ID IJIFR/V5/ E8/ 010
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationAdoption of E-Governance Applications towards Big Data Approach
Adoption of E-Governance Applications towards Big Data Approach Ethirajan D Principal Engineer, Center for Development of Advanced Computing Orcid : 0000-0002-7090-1870 Dr. S.Purushothaman Professor 5/411
More informationParadigm Shift of Database
Paradigm Shift of Database Prof. A. A. Govande, Assistant Professor, Computer Science and Applications, V. P. Institute of Management Studies and Research, Sangli Abstract Now a day s most of the organizations
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationAnalysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data
Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department
More informationNoSQL Databases Analysis
NoSQL Databases Analysis Jeffrey Young Intro I chose to investigate Redis, MongoDB, and Neo4j. I chose Redis because I always read about Redis use and its extreme popularity yet I know little about it.
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationBSIT 1 Technology Skills: Apply current technical tools and methodologies to solve problems.
Bachelor of Science in Information Technology At Purdue Global, we employ a method called Course-Level Assessment, or CLA, to determine student mastery of Course Outcomes. Through CLA, we measure how well
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationQlik Sense Performance Benchmark
Technical Brief Qlik Sense Performance Benchmark This technical brief outlines performance benchmarks for Qlik Sense and is based on a testing methodology called the Qlik Capacity Benchmark. This series
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationNOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe
NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationDATABASE SCALE WITHOUT LIMITS ON AWS
The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage
More informationA Review to the Approach for Transformation of Data from MySQL to NoSQL
A Review to the Approach for Transformation of Data from MySQL to NoSQL Monika 1 and Ashok 2 1 M. Tech. Scholar, Department of Computer Science and Engineering, BITS College of Engineering, Bhiwani, Haryana
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More informationMia Stephens JMP Academic Ambassador, SAS, NC
Japan Discovery Summit 11/18/2016 Shaping up Big Data A data workout with JMP Michèle Boulanger Rollins College, FL Chair of ISO/Technical Committee on Applications of Statistics Mia Stephens JMP Academic
More informationPolyglot Persistence in Today s Data World
Polyglot Persistence in Today s Data World Kimberly Wilkins Principal Engineer Databases ObjectRocket by Rackspace www.linkedin.com/in/wilkinskimberly, kimberly.wilkins@rackspace.com, @dba_denizen 1 Background
More informationBig Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationA Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture
A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses
More informationDynamic Design of Cellular Wireless Networks via Self Organizing Mechanism
Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism V.Narasimha Raghavan, M.Venkatesh, Divya Sridharabalan, T.Sabhanayagam, Nithin Bharath Abstract In our paper, we are utilizing
More informationParallel Approach for Implementing Data Mining Algorithms
TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
More informationNOSQL Databases: The Need of Enterprises
International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) NOSQL Databases: The Need of Enterprises Basit Maqbool Mattu M-Tech CSE Student. (4 th semester).
More informationIntroduction to K2View Fabric
Introduction to K2View Fabric 1 Introduction to K2View Fabric Overview In every industry, the amount of data being created and consumed on a daily basis is growing exponentially. Enterprises are struggling
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationHolistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of
More informationTutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access
Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationBIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,
BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT
More informationData Mining with Elastic
2017 IJSRST Volume 3 Issue 3 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Data Mining with Elastic Mani Nandhini Sri, Mani Nivedhini, Dr. A. Balamurugan Sri Krishna
More informationCISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Graph databases Neo4j syntax and examples Document databases
CISC 7610 Lecture 4 Approaches to multimedia databases Topics: Graph databases Neo4j syntax and examples Document databases NoSQL architectures: different tradeoffs for different workloads Already seen:
More informationColumn-Oriented Storage Optimization in Multi-Table Queries
Column-Oriented Storage Optimization in Multi-Table Queries Davud Mohammadpur 1*, Asma Zeid-Abadi 2 1 Faculty of Engineering, University of Zanjan, Zanjan, Iran.dmp@znu.ac.ir 2 Department of Computer,
More informationToward the integration of informatic tools and GRID infrastructure for Assyriology text analysis
58 Rencontre Assyriologique Internationale (RAI) Private and State 16-20 July 2012 - Leiden Toward the integration of informatic tools and GRID infrastructure for Assyriology text analysis Giovanni Ponti,
More informationH-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques A Model
Indian Journal of Science and Technology, Vol 9(38), DOI: 10.17485/ijst/2016/v9i38/101792, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 H-D and Subspace Clustering of Paradoxical High
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationHybrid Recommendation System Using Clustering and Collaborative Filtering
Hybrid Recommendation System Using Clustering and Collaborative Filtering Roshni Padate Assistant Professor roshni@frcrce.ac.in Priyanka Bane B.E. Student priyankabane56@gmail.com Jayesh Kudase B.E. Student
More informationData Streams in ProM 6: A Single-Node Architecture
Data Streams in ProM 6: A Single-Node Architecture S.J. van Zelst, A. Burattin 2, B.F. van Dongen and H.M.W. Verbeek Eindhoven University of Technology {s.j.v.zelst,b.f.v.dongen,h.m.w.verbeek}@tue.nl 2
More informationApache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
More informationVisualization and text mining of patent and non-patent data
of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent
More informationAn Overview of Projection, Partitioning and Segmentation of Big Data Using Hp Vertica
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 5, Ver. I (Sep.- Oct. 2017), PP 48-53 www.iosrjournals.org An Overview of Projection, Partitioning
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationHybrid Data Platform
UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,
More informationUniversity of Waterloo. Storing Directed Acyclic Graphs in Relational Databases
University of Waterloo Software Engineering Storing Directed Acyclic Graphs in Relational Databases Spotify USA Inc New York, NY, USA Prepared by Soheil Koushan Student ID: 20523416 User ID: skoushan 4A
More informationAdvances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis
Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving
More informationFinal Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm
Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except
More informationADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT
ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationReality Mining Via Process Mining
Reality Mining Via Process Mining O. M. Hassan, M. S. Farag, and M. M. Mohie El-Din Abstract Reality mining project work on Ubiquitous Mobile Systems (UMSs) that allow for automated capturing of events.
More informationEncrypting Data of MongoDB at Application Level
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 5 (2017) pp. 1199-1205 Research India Publications http://www.ripublication.com Encrypting Data of MongoDB at Application
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationOverview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL
* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL
More informationThe DataBridge: A Social Network for Long Tail Science Data!
The DataBridge: A Social Network for Long Tail Science Data Howard Lander howard@renci.org Renaissance Computing Institute The University of North Carolina at Chapel Hill Outline of This Talk The DataBridge
More informationRelational to NoSQL Database Migration
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,
More informationAppropches used in efficient migrption from Relptionpl Dptpbpse to NoSQL Dptpbpse
Proceedings of the Second International Conference on Research in DOI: 10.15439/2017R76 Intelligent and Computing in Engineering pp. 223 227 ACSIS, Vol. 10 ISSN 2300-5963 Appropches used in efficient migrption
More informationProbabilistic Graph Summarization
Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More information