University of Waterloo. Storing Directed Acyclic Graphs in Relational Databases
|
|
- Sabrina Rose
- 5 years ago
- Views:
Transcription
1 University of Waterloo Software Engineering Storing Directed Acyclic Graphs in Relational Databases Spotify USA Inc New York, NY, USA Prepared by Soheil Koushan Student ID: User ID: skoushan 4A Software Engineering May 8, 2017
2 Soheil Koushan 212 Parkview Ave Toronto, ON, M2N 3Y8 May 8, 2017 Dr. P. Lam, Director Software Engineering University of Waterloo Waterloo, ON, N2L 3G1 Dear Dr. P. Lam: This report, titled Storing Directed Acyclic Graphs in Relational Databases, is my third work term report. It is based on my experience at Spotify USA Inc for my 4A co-op term. Spotify is a music streaming service. During my co-op, I worked on internal tools with the focus of making data at Spotify easier to discover and understand for developers. Part of that was uncovering the dependencies between different datasets. Hence, the need arose to store dependency information in a database that is easy to query. Our team designed a schema that performs much faster than the conventional approach. This report outlines our proposed design. I would like to thank all my coworkers, as well as my supervisor, for giving me the opportunity to work on this problem. I would like to thank the numerous online resources, which are cited in the References and Acknowledgement sections, that contain information that helped me reach my conclusions. I hereby confirm that I have received no help, other than what is mentioned above, in writing this report. I also confirm this report has not been previously submitted for academic credit at this or any other academic institution. Sincerely, Soheil Koushan Student ID:
3 Executive Summary Spotify, a leading music streaming service, has thousands of datasets internally which are produced by hundreds of thousands of jobs running daily. Mapping the dependencies between these datasets is valuable for gaining an understanding of what went into a piece of data and who consumes it. This mapping takes the form of a directed acyclic graph (DAG). In this report, I propose a schema for storing directed acyclic graphs in a relational database. We consider directed acyclic graphs which have well defined breakpoints. For example, a breakpoint can be a dataset that is recommended for consumption, and non-breakpoints can be intermediary datasets that are used in the final product but should not be consumed by others. The way we query this data is by asking for all nodes leaving a breakpoint up until the next breakpoint. Our design criteria are write speed, read speed, and space complexity for storing dense graphs using the schema. The conventional approach is to use a table of nodes and a table of edges with transitive closure. The problem is that for dense graphs, the number edges that needs to be stored grows quadratically with the number of nodes. In addition, performing a database join is costly. Our proposed design stores only a nodes table, but includes an array field containing all the nodes that have a path to this node up. This way, we encode in a single column information that would ve been spread over many rows. This improves write time and reduces the need for a join. Our proposed solution is significantly faster in all the design criteria and meets the design constraints. The recommendation is to use the proposed schema instead of the more general, conventional approach, because it is specially optimized for the types of queries we require. iii
4 Table of Contents Executive Summary... iii Table of Contents... iv List of Figures... v List of Tables... vi 1 Introduction Problem Specification Problem Statement Design Constraints Design Criteria Design alternatives Conventional design: adjacency list with transitive closure Proposed design: accumulation array Evaluation Experimental Setup Write Speed Read Speed Space Complexity Conclusion Recommendations References Acknowledgements iv
5 List of Figures Figure 2-1. An example directed acyclic graph Figure 3-1. Downstream query for the conventional design Figure 3-2. Downstream query for the proposed design Figure 4-1. The type of graph used for measurements. Here, n = Figure 4-2. Results for the write speed test Figure 4-3. Results for the read speed test Figure 4-4. Results for the space complexity test v
6 List of Tables Table 3-1. An example node table in the adjacency list solution Table 3-2. An example edges table in the adjacency list solution Table 3-3. An example edges table with transitive closure. C is transitively a descendent of A through node B, hence hops = Table 3-4. The nodes table for the proposed design, using the graph in Figure vi
7 1 Introduction Much of our data today is graphical in nature. Social networks are an obvious example. In this analogy, people are nodes and friendships are edges. Another example is mapping, where nodes are places and edges are transportation options between them. Directed acyclic graphs are a special type of graph where the edges are directed and no cycles exist in the graph. They are often used to model dependencies. At Spotify, this type of dependency information is valuable in understanding relationships between different datasets. Because Spotify has tasks running hundreds of thousands of times a day, we need a storage solution that can support writes at this rate. Because of the amount of data that we need to store, it also needs to be space efficient. Most importantly, this data will be presented in a web user interface, meaning it needs to be quickly query-able. These are the three design criteria for a storage solution. In terms of design constraints, Spotify runs on Google s Cloud Platform, which does not offer an off the shelf graph database solution. Hence, we are constraint to traditional relational databases. Also, the result of the query will be shown in a UI. Thus, it needs to run in under 50ms for graphs with up to 1000 nodes. This report begins by elaborating on the problem, the design constraints, and design criteria. It then describes the conventional solution to the problem, which is followed by our proposed solution. Next is an evaluation of the two designs against our design criteria, followed by conclusions and recommendations. The intended audience is software engineers looking to decide on a database storage solution for directed acyclic graphs. It assumes basic algorithmic knowledge as well as knowledge in SQL. 1
8 2 Problem Specification 2.1 Problem Statement The task is to store directed acyclic graphs (DAGs) in a database. We consider directed acyclic graphs with well-defined breakpoints. This is illustrated in Figure 2-1, where breakpoints are depicted by blue squares and non-breakpoints by green circles. We will query the database by specifying a start node. The database should return all the nodes leaving that node up until the next breakpoint. We shall call this the downstream query. For example, a query for node A should return nodes B, D, E, and F. The schema should be optimized for queries of this type. 2.2 Design Constraints Figure 2-1. An example directed acyclic graph. There are two design constraints. The first is that Spotify runs on Google s Cloud Platform, which does not offer an off the shelf graph database solution. Hence, we are constraint to traditional relational databases. The second is that the downstream query for a graph with 1000 nodes needs to run in under 50ms. This is to ensure that the UI that will be presenting this data feels responsive and snappy. 2.3 Design Criteria There are three design criteria. The first is write speed, which is defined as the amount of time it takes to write a graph into the database. The second is read speed, which is defined as the amount of time a downstream query takes. The third is space complexity, which is defined as the amount of space the database needs to store a graph. These three design criteria cover all aspects of performance for the storage solution. 2
9 3 Design alternatives Two design alternatives were considered. The first is the conventional approach for storing graphs in a relational database. The second is a design proposal that is optimized for the types of queries we are interested in. 3.1 Conventional design: adjacency list with transitive closure One of the most common ways to store graphs in SQL databases is with an adjacency list table ([1], [2]). In this design, there exist two tables. The first contains nodes, as shown in Table 3-1. Name A B C Table 3-1. An example node table in the adjacency list solution. The second table contains edges. Each entry in the table represents one edge in the graph. An example is presented in Table 3-2, corresponding to the nodes table presented above. Parent Child A B B C Table 3-2. An example edges table in the adjacency list solution. This schema alone is sufficient to get all direct descendants of a node with a single query, but we want all transitive descendants, all the way up until the next breakpoint. This can be achieved using a recursive query, but it can be slow for long chains [1]. A common remedy for this is to use an adjacency list with transitive closure. This means that at write-time, we create an entry in the edges table for each transitive descendent. For example, because C descends from B, and B descends from A, then C transitively descends from A. An example is shown in Table 3-3. Parent Child Hops A B 0 3
10 B C 0 A C 1 Table 3-3. An example edges table with transitive closure. C is transitively a descendent of A through node B, hence hops = 1. For the types of queries specified in this report, transitive edges only need to be added up until the next breakpoint. Figure 3-1 contains the query which would return all nodes leaving a given node up until the next breakpoint. SELECT * FROM nodes JOIN edges ON name = child WHERE parent = X ; Figure 3-1. Downstream query for the conventional design. 3.2 Proposed design: accumulation array The proposed solution also applies the idea of transitive closure. It works by accumulating dependencies from parent to child. There still exists a table of nodes, but an array field is added. This array field contains all accumulated nodes up until the previous breakpoint, and is called AccumulatedNodes. This list of accumulated nodes is built by first applying topological sort. Kahn s algorithm, for example, can be used to do this. From this, we obtain an ordering of the nodes such that for each edge from node A to B, A comes before B in the ordering. Then, we iterate through this list. At each node, we first add the node itself to AccumulatedNodes. We then iterate through its parents. If the parent is not a breakpoint, we add its AccumulatedNodes too. If it is a breakpoint, we just add the parent node itself. Table 3-4 shows what the data should look like for the DAG in Figure 2-1. Note that we also need to add a field containing the node s parents. Otherwise, we would lose information about the graph. Node Parents AccumulatedNodes A - A B A A, B D B A, B, D E A, G A, E, G F E A, E, F, G G - G 4
11 Table 3-4. The nodes table for the proposed design, using the graph in Figure 2-1. To perform the downstream query for a given node, we search for that node in the AccumulatedNodes column. This query is provided in Figure 3-2. For a downstream query of node A, it should return A, B, D, E, and F, because A is in the AccumulatedNodes column for those nodes. SELECT * FROM nodes WHERE X = ANY(AccumulatedNodes); Figure 3-2. Downstream query for the proposed design. The benefit of this approach over the conventional approach is that it does not require a join across two tables. All the data needed to find the nodes for the downstream query and to build the graph is stored in one table. 5
12 4 Evaluation The two designs were evaluated by performing write, read, and speed tests for graphs of various size. 4.1 Experimental Setup A PostgreSQL database was used running on macos with a 2.6 GHz Intel Core i5 CPU and 8 GB 1600 MHz DDR3 RAM. The graph used for the experiment resembles a fully connected neural network, with the input layer and the output layer as breakpoints. The number of input nodes and the number of layers were always the same number, denoted by n. Figure 4-1 contains the graph for n = 3, which contains n 2 = 9 nodes. A downstream query was performed on the top-left node. Figure 4-1. The type of graph used for measurements. Here, n = 3. The graph used for the experiment is a dense one. The reason is that dense graphs are the most difficult storage solutions to deal with. Hence, we are performing tests with the worst-case scenario, and can expect better performance in the average case. 4.2 Write Speed Figure 4-2 displays the time taken to insert graphs of increasing size into the database. 6
13 Figure 4-2. Results for the write speed test. Figure 4-2 shows that for all values of n, the proposed design performs better. Large graphs can take upwards of two minutes to get inserted into the database with the conventional design. This is because we must insert O(n 4 ) edges into the database as there are n 2 nodes in the graph, and the average node has approximately n 2 /2 edges due to transitive closure. For this reason, insertion slows down significantly for large values of n. 4.3 Read Speed increases. Figure 4-3 shows the amount of time the downstream query takes as the size of the graph 7
14 Figure 4-3. Results for the read speed test. Once again, the proposed design performs better for all values of n. The reason the proposed design is faster is the elimination of the need for joins, which are one of the most expensive database operations [3]. This time, however, the difference in performance at n = 30 is only 40ms, which is much smaller than the difference for write speed. 4.4 Space Complexity grows. Figure 4-4 shows the amount of space taken up by the database as the size of the graph 8
15 Figure 4-4. Results for the space complexity test. Once again, because of the large number of edges we need to insert for the conventional method, as described in Section 4.2, the amount of space needed grows immensely. 9
16 5 Conclusion In this report, two different schemas for storing directed acyclic graphs with breakpoints have been presented. The first is the conventional approach, which involves a nodes table and an edges table with transitive closure. The second approach just has a nodes table, but adds a field containing accumulated nodes traversed since the last breakpoint. Through experimentation, the proposed design was shown to perform much better than the conventional design in all the design criteria: write speed, read speed, and space complexity. Write speed and space complexity are greatly reduced because the number of rows needed to be written is reduced. Read speed is improved because we avoid a database join, which is a costly operation [3]. The proposed solution also meets the all the design constraints. Firstly, the schema works with almost any relational database. In addition, the design constraint of a downstream query taking less than 50ms for a graph with 1000 nodes was met, as this query took 44.2 ms. 10
17 6 Recommendations Based on the conclusions, implementing the proposed solution is highly recommended. The proposed design performs better than the conventional design in all three design criteria (write speed, read speed, and space complexity) and meets all the design constraints. 11
18 References [1] K. Erdogan, A Model to Represent Directed Acyclic Graphs (DAG) on SQL Databases, CodeProject, 14-Jan [Online]. Available: Graphs-DAG-o. [Accessed: 04-May-2017]. [2] J. Horak, DAG structures in SQL databases, Apache Software Foundation, 19-Sep [Online]. Available: [Accessed: 04-May-2017]. [3] B. A. Johnson, Joins are slow, memory is fast, Database Science, 28-Nov [Online]. Available: [Accessed: 04-May-2017]. 12
19 Acknowledgements I want to acknowledge my coworkers Stephen Enders and Rouzbeh Delavari, who came up with the design of the proposed schema. I want to acknowledge my employer Spotify for giving me the opportunity to work on implementing the solution discussed in this report. 13
A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader
A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 7, July ISSN
International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-201 971 Comparative Performance Analysis Of Sorting Algorithms Abhinav Yadav, Dr. Sanjeev Bansal Abstract Sorting Algorithms
More informationETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere
ETL Best Practices and Techniques Marc Beacom, Managing Partner, Datalere Thank you Sponsors Experience 10 years DW/BI Consultant 20 Years overall experience Marc Beacom Managing Partner, Datalere Current
More informationBSIT 1 Technology Skills: Apply current technical tools and methodologies to solve problems.
Bachelor of Science in Information Technology At Purdue Global, we employ a method called Course-Level Assessment, or CLA, to determine student mastery of Course Outcomes. Through CLA, we measure how well
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationCopyright 2000, Kevin Wayne 1
Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. Directed
More informationKdb+ Transitive Comparisons
Kdb+ Transitive Comparisons 15 May 2018 Hugh Hyndman, Director, Industrial IoT Solutions Copyright 2018 Kx Kdb+ Transitive Comparisons Introduction Last summer, I wrote a blog discussing my experiences
More informationUndirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11
Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. V = {
More informationAdvanced Data Management
Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Sept 26, 2016 defined Given a graph G(V, E) with V as the set of nodes and E as the set of edges, a reachability query asks does
More informationEvaluating find a path reachability queries
Evaluating find a path reachability queries Panagiotis ouros and Theodore Dalamagas and Spiros Skiadopoulos and Timos Sellis Abstract. Graphs are used for modelling complex problems in many areas, such
More informationA Model for Streaming 3D Meshes and Its Applications
A Model for Streaming D Meshes and Its Applications ABSTRACT Ong Yuh Shin and Ooi Wei Tsang Department of Computer Science, School of Computing, National University of Singapore In this paper, we present
More informationSAS System Powers Web Measurement Solution at U S WEST
SAS System Powers Web Measurement Solution at U S WEST Bob Romero, U S WEST Communications, Technical Expert - SAS and Data Analysis Dale Hamilton, U S WEST Communications, Capacity Provisioning Process
More informationExploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019
Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data
More informationAlgorithms: Lecture 10. Chalmers University of Technology
Algorithms: Lecture 10 Chalmers University of Technology Today s Topics Basic Definitions Path, Cycle, Tree, Connectivity, etc. Graph Traversal Depth First Search Breadth First Search Testing Bipartatiness
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationGraph. Vertex. edge. Directed Graph. Undirected Graph
Module : Graphs Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Graph Graph is a data structure that is a collection
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationInternational School of informatics and Management
1 International School of informatics and Management Subject: System Design Lab Project Name: Student Admission System Group Number: 5 Team Guide: Jyoti Khurana (Lecturer) Members: Ashok Kumar Soni Hridayesh
More informationMerge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL
Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL 32901 rhibbler@cs.fit.edu ABSTRACT Given an array of elements, we want to arrange those elements into
More informationThree Paths to Better Business Decisions
Three Paths to Better Business Decisions Business decisions take you down many paths. The Micron 5210 ION SSD gets you where you want to go, quickly and efficiently. Overview Leaders depend on data, and
More informationMerge Sort Algorithm
Merge Sort Algorithm Jaiveer Singh (16915) & Raju Singh(16930) Department of Information and Technology Dronacharya College of Engineering Gurgaon, India Jaiveer.16915@ggnindia.dronacharya.info ; Raju.16930@ggnindia.dronacharya.info
More informationGraph Mining Extensions in Postgresql
Indian Journal of Science and Technology, Vol 9(35), DOI: 10.17485/ijst/2016/v9i35/98941, September 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Graph Mining Extensions in Postgresql G. Anuradha
More informationHierarchical Data in RDBMS
Hierarchical Data in RDBMS Introduction There are times when we need to store "tree" or "hierarchical" data for various modelling problems: Categories, sub-categories and sub-sub-categories in a manufacturing
More informationPopularity of Twitter Accounts: PageRank on a Social Network
Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,
More informationCS781 Lecture 2 January 13, Graph Traversals, Search, and Ordering
CS781 Lecture 2 January 13, 2010 Graph Traversals, Search, and Ordering Review of Lecture 1 Notions of Algorithm Scalability Worst-Case and Average-Case Analysis Asymptotic Growth Rates: Big-Oh Prototypical
More informationCSE 5236 Project Description
Instructor: Adam C. Champion, Ph.D. Spring 2018 Semester Total: 60 points The team project (2 3 students per team) for this class involves conceptualizing, designing, and developing a mobile application
More informationPASSWORDS TREES AND HIERARCHIES. CS121: Relational Databases Fall 2017 Lecture 24
PASSWORDS TREES AND HIERARCHIES CS121: Relational Databases Fall 2017 Lecture 24 Account Password Management 2 Mentioned a retailer with an online website Need a database to store user account details
More informationDatabases The McGraw-Hill Companies, Inc. All rights reserved.
Distinguish between the physical and logical views of data. Describe how data is organized: characters, fields, records, tables, and databases. Define key fields and how they are used to integrate data
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More information16/06/56. Databases. Databases. Databases The McGraw-Hill Companies, Inc. All rights reserved.
Distinguish between the physical and logical views of data. Describe how data is organized: characters, fields, records, tables, and databases. Define key fields and how they are used to integrate data
More informationBalanced Trees Part One
Balanced Trees Part One Balanced Trees Balanced search trees are among the most useful and versatile data structures. Many programming languages ship with a balanced tree library. C++: std::map / std::set
More informationWe re working full time this summer alongside 3 UCOSP (project course) students (2 from Waterloo: Mark Rada & Su Zhang, 1 from UofT: Angelo Maralit)
We re working full time this summer alongside 3 UCOSP (project course) students (2 from Waterloo: Mark Rada & Su Zhang, 1 from UofT: Angelo Maralit) Our supervisors: Karen: heads project, which has been
More information2.3 Algorithms Using Map-Reduce
28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationBayesian Machine Learning - Lecture 6
Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1
More informationINDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES
Al-Badarneh et al. Special Issue Volume 2 Issue 1, pp. 200-213 Date of Publication: 19 th December, 2016 DOI-https://dx.doi.org/10.20319/mijst.2016.s21.200213 INDEX-BASED JOIN IN MAPREDUCE USING HADOOP
More informationElementary Graph Algorithms. Ref: Chapter 22 of the text by Cormen et al. Representing a graph:
Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. Representing a graph: Graph G(V, E): V set of nodes (vertices); E set of edges. Notation: n = V and m = E. (Vertices are numbered
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationSelection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix
Spring 2010 Review Topics Big O Notation Heaps Sorting Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix Hashtables Tree Balancing: AVL trees and DSW algorithm Graphs: Basic terminology and
More informationChapter 5. Database Processing
Chapter 5 Database Processing No, Drew, You Don t Know Anything About Creating Queries." AllRoad Parts operational database used to determine which parts to consider for 3D printing. If Addison and Drew
More informationGraph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation
Graph Databases Guilherme Fetter Damasio University of Ontario Institute of Technology and IBM Centre for Advanced Studies Outline Introduction Relational Database Graph Database Our Research 2 Introduction
More informationRoberta Brown BA305 web Biweekly Written Assignment #2. Positive, Negative, and Persuasive Messages
Roberta Brown BA305 web Biweekly Written Assignment #2 Positive, Negative, and Persuasive Messages Positive Messages Example 1. Email Message TO: n-smith12@mailplace.com FROM: roberta.brown@ mscompany.com
More informationImplementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language
Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations Show Only certain columns and rows from the join of Table A with Table B The implementation of table operations
More informationIn this section you will find the 6 easy steps for using the Candidate Search section.
Searching for Candidates Career Centers will often define collections of students that meet certain criteria, and make these Resume Books available for employers to review. In many cases, it is up to the
More informationAdvanced Migration of Schema and Data across Multiple Databases
Advanced Migration of Schema and Data across Multiple Databases D.M.W.E. Dissanayake 139163B Faculty of Information Technology University of Moratuwa May 2017 Advanced Migration of Schema and Data across
More informationCSC 172 Data Structures and Algorithms. Lecture 24 Fall 2017
CSC 172 Data Structures and Algorithms Lecture 24 Fall 2017 ANALYSIS OF DIJKSTRA S ALGORITHM CSC 172, Fall 2017 Implementation and analysis The initialization requires Q( V ) memory and run time We iterate
More informationCHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science
CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science Entrance Examination, 5 May 23 This question paper has 4 printed sides. Part A has questions of 3 marks each. Part B has 7 questions
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationChapter 3. Graphs. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 3 Graphs Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 3.1 Basic Definitions and Applications Undirected Graphs Undirected graph. G = (V, E) V = nodes. E
More informationCSI 604 Elementary Graph Algorithms
CSI 604 Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. (Second edition) 1 / 25 Graphs: Basic Definitions Undirected Graph G(V, E): V is set of nodes (or vertices) and E is the
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationProduct Release Notes Alderstone cmt 2.0
Alderstone cmt product release notes Product Release Notes Alderstone cmt 2.0 Alderstone Consulting is a technology company headquartered in the UK and established in 2008. A BMC Technology Alliance Premier
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationCGS 3066: Spring 2017 SQL Reference
CGS 3066: Spring 2017 SQL Reference Can also be used as a study guide. Only covers topics discussed in class. This is by no means a complete guide to SQL. Database accounts are being set up for all students
More informationDatabricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes
Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified
More informationAnalyzing Flight Data
IBM Analytics Analyzing Flight Data Jeff Carlson Rich Tarro July 21, 2016 2016 IBM Corporation Agenda Spark Overview a quick review Introduction to Graph Processing and Spark GraphX GraphX Overview Demo
More informationReport Exec Enterprise System Specifications
Report Exec Enterprise System Specifications Contents Overview... 2 Technical Support... 2 At a Glance... 2 Report Exec Systems Diagram... 4 Hardware Specifications... 6 SQL Server... 6 RAM... 6 Processor...
More informationCTL.SC4x Technology and Systems
in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,
More informationGradintelligence student support FAQs
Gradintelligence student support FAQs Account activation issues... 2 I have not received my activation link / I cannot find it / it has expired. Please can you send me a new one?... 2 My account is showing
More informationQsync. Cross-device File Sync for Optimal Teamwork. Share your life and work
Qsync Cross-device File Sync for Optimal Teamwork Share your life and work Agenda Users' common issues QNAP NAS specifications recommended by various types of users Usage scenarios and Qsync application
More informationHEARTLAND DEVELOPER CONFERENCE 2017 APPLICATION DATA INTEGRATION WITH SQL SERVER INTEGRATION SERVICES
HEARTLAND DEVELOPER CONFERENCE 2017 APPLICATION DATA INTEGRATION WITH SQL SERVER INTEGRATION SERVICES SESSION ABSTRACT: APPLICATION DATA INTEGRATION WITH SQL SERVER INTEGRATION SERVICES What do you do
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationFigure 1: A directed graph.
1 Graphs A graph is a data structure that expresses relationships between objects. The objects are called nodes and the relationships are called edges. For example, social networks can be represented as
More information3.1 Basic Definitions and Applications. Chapter 3. Graphs. Undirected Graphs. Some Graph Applications
Chapter 3 31 Basic Definitions and Applications Graphs Slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley All rights reserved 1 Undirected Graphs Some Graph Applications Undirected graph G = (V,
More informationProximity Prestige using Incremental Iteration in Page Rank Algorithm
Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration
More informationOracle HCM Cloud Common Features
Oracle HCM Cloud Common Features Release 11 Release Content Document December 2015 Revised: January 2017 TABLE OF CONTENTS REVISION HISTORY... 3 OVERVIEW... 5 HCM COMMON FEATURES... 6 HCM SECURITY... 6
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationUAccess ANALYTICS. Fundamentals of Reporting. updated v.1.00
UAccess ANALYTICS Arizona Board of Regents, 2010 THE UNIVERSITY OF ARIZONA updated 07.01.2010 v.1.00 For information and permission to use our PDF manuals, please contact uitsworkshopteam@listserv.com
More informationRank Preserving Clustering Algorithms for Paths in Social Graphs
University of Waterloo Faculty of Engineering Rank Preserving Clustering Algorithms for Paths in Social Graphs LinkedIn Corporation Mountain View, CA 94043 Prepared by Ziyad Mir ID 20333385 2B Department
More informationHANA Performance. Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business
More informationCIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu
CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin Presented by: Suhua Wei Yong Yu Papers: MapReduce: Simplified Data Processing on Large Clusters 1 --Jeffrey Dean
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Profiling OGSA-DAI Performance for Common Use Patterns Citation for published version: Dobrzelecki, B, Antonioletti, M, Schopf, JM, Hume, AC, Atkinson, M, Hong, NPC, Jackson,
More informationFIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION
FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION The process of planning and executing SQL Server migrations can be complex and risk-prone. This is a case where the right approach and
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationProject Overview Distributed Network Traffic Controller
Project Overview Distributed Network Traffic Controller Revision Number: 1.1 Last date of revision: 5/11/05 22c:198 Johnson, Chadwick Hugh 1 Motivation When a limited resource is shared between multiple
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationGraph Algorithms. Imran Rashid. Jan 16, University of Washington
Graph Algorithms Imran Rashid University of Washington Jan 16, 2008 1 / 26 Lecture Outline 1 BFS Bipartite Graphs 2 DAGs & Topological Ordering 3 DFS 2 / 26 Lecture Outline 1 BFS Bipartite Graphs 2 DAGs
More informationBig Data Analysis Using Hadoop and MapReduce
Big Data Analysis Using Hadoop and MapReduce Harrison Carranza, MSIS Marist College, Harrison.Carranza2@marist.edu Mentor: Aparicio Carranza, PhD New York City College of Technology - CUNY, USA, acarranza@citytech.cuny.edu
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationCSE 530A. Query Planning. Washington University Fall 2013
CSE 530A Query Planning Washington University Fall 2013 Scanning When finding data in a relation, we've seen two types of scans Table scan Index scan There is a third common way Bitmap scan Bitmap Scans
More informationEfficient and Scalable Friend Recommendations
Efficient and Scalable Friend Recommendations Comparing Traditional and Graph-Processing Approaches Nicholas Tietz Software Engineer at GraphSQL nicholas@graphsql.com January 13, 2014 1 Introduction 2
More informationDRYAD: DISTRIBUTED DATA- PARALLEL PROGRAMS FROM SEQUENTIAL BUILDING BLOCKS
DRYAD: DISTRIBUTED DATA- PARALLEL PROGRAMS FROM SEQUENTIAL BUILDING BLOCKS Authors: Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly Presenter: Zelin Dai WHAT IS DRYAD Combines computational
More information(Refer Slide Time: 05:25)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering IIT Delhi Lecture 30 Applications of DFS in Directed Graphs Today we are going to look at more applications
More informationArkuda Concert. Audio Network Solutions
Arkuda Concert Audio Network Solutions How could manufacturers add value to their products and services? Companies aspire to provide services that meet their client`s needs. They try to offer a smart solution
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Distributed Machine Learning Week #9 Today Distributed computing for machine learning Background MapReduce/Hadoop & Spark Theory
More informationCHAPTER 18: CLIENT COMMUNICATION
CHAPTER 18: CLIENT COMMUNICATION Chapter outline When to communicate with clients What modes of communication to use How much to communicate How to benefit from client communication Understanding your
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationWriting Reports with Report Designer and SSRS 2014 Level 1
Writing Reports with Report Designer and SSRS 2014 Level 1 Duration- 2days About this course In this 2-day course, students are introduced to the foundations of report writing with Microsoft SQL Server
More informationComputational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs
Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in
More information3.1 Basic Definitions and Applications
Chapter 3 Graphs Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 3.1 Basic Definitions and Applications Undirected Graphs Undirected graph. G = (V, E) V = nodes. E
More informationMERGE SORT SYSTEM IJIRT Volume 1 Issue 7 ISSN:
MERGE SORT SYSTEM Abhishek, Amit Sharma, Nishant Mishra Department Of Electronics And Communication Dronacharya College Of Engineering, Gurgaon Abstract- Given an assortment with n rudiments, we dearth
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationApache Kylin. OLAP on Hadoop
Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite
More informationPerformance and Scalability with Griddable.io
Performance and Scalability with Griddable.io Executive summary Griddable.io is an industry-leading timeline-consistent synchronized data integration grid across a range of source and target data systems.
More informationCrowdPath: A Framework for Next Generation Routing Services using Volunteered Geographic Information
CrowdPath: A Framework for Next Generation Routing Services using Volunteered Geographic Information Abdeltawab M. Hendawi, Eugene Sturm, Dev Oliver, Shashi Shekhar hendawi@cs.umn.edu, sturm049@umn.edu,
More informationCHRIS Introduction Guide
1 Introduction... 3 1.1 The Login screen... 3 1.2 The itrent Home page... 5 1.2.1 Out of Office... 8 1.2.2 Default User Preferences... 9 1.2.3 Bookmarks... 10 1.3 The itrent Screen... 11 The Control Bar...
More information