Data Analytics: From Conceptual Modelling to Logical Representation

Size: px

Start display at page:

Download "Data Analytics: From Conceptual Modelling to Logical Representation"

Amber Neal
5 years ago
Views:

1 Data Analytics: From Conceptual Modelling to Logical Representation Qing Wang (B) and Minjian Liu Research School of Computer Science, Australian National University, Canberra, Australia Abstract. In recent years, data analytics has been studied in a broad range of areas, such as health-care, social sciences, and commerce. In order to accurately capture user requirements for enhancing communication between analysts, domain experts and users, conceptualising data analytics tasks to provide a high level of modelling abstraction becomes increasingly important. In this paper, we discuss the modelling of data analytics and how a conceptual framework for data analytics applications can be transformed into a logical framework that supports a simple yet expressive query language for specifying data analytics tasks. We have also implemented our modelling method into a unified data analytics platform, which allows to incorporate analytics algorithms as plug-ins in a flexible and open manner, We present case studies on three real-world data analytics applications and our experimental results on an unified data analytics platform. Keywords: Data analytics Conceptual modelling Logical model Query language 1 Introduction Data analytics is rapidly growing in popularity, with a variety of applications in many areas, e.g., health-care, social sciences, commerce, etc. This has led to the recent development of a large number of data analytics tools and systems, most of which are built upon graph models, such as GraphLab [11] and Pregel [12]. Nonetheless, in practice, many data analytics applications are still conducted in an ad-hoc way, due to the lack of general principles to design, develop and implement data analytics applications. For example, the decision on choosing data models for data analytics applications often relies on individuals own expertise, rather than a systematic consideration of requirements. This calls for a formal design paradigm that can provide a high level of modelling abstraction to support users in understanding their data analytics requirements. In particular, with the increasing complexity of data analytics applications, the Q. Wang and M. Liu Contributed equally to this work. c Springer International Publishing AG 2016 I. Comyn-Wattiau et al. (Eds.): ER 2016, LNCS 9974, pp , DOI: /

2 416 Q. Wang and M. Liu need to explicitly represent data analytics requirements into a conceptual model is pressingly required [6]. Recently, several methods for conceptually modelling data analytics applications have been reported [2, 15, 16]. A conceptual modelling paradigm for network analytics applications, called the Network Analytics ER model (NAER), wasproposedin[15]. In a nutshell, the NAER model extends the concepts of the traditional ER models [4] in three aspects: (a) the structural aspect - analytical entity and relationship types are added to represent first-class entities and relationships from the data analytics perspective; (b) the manipulation aspect - topological constructs are added to explicitly represent different topological structures of interest; and (c) the integrity aspect - constraints are added for governing integrity among different data analytics tasks. Based on this conceptual modelling paradigm, a set of design guidelines has further been provided in [15], through which users can benefit from establishing a conceptual framework that provides a coherent and comprehensive view on data analytics applications. As depicted in Fig. 1, such a conceptual framework may consist of a core schema, which has basic entity and relationship types to capture data requirements as in the traditional ER modelling, topology schemas, which have analytical entity and relationship types to capture query requirements of data analytics applications, and query topics, which describe the structure of queries in query requirements and are associated with both the core schema and the topology schemas. Fig. 1. A general process of modelling data analytics applications Nonetheless, how can we transform such a conceptual framework into a logical framework which is well suited to model the logical structure of data analytics applications without ambiguities? Although basic entity and relationship types in the core schema can be easily transformed into relation schemas following the existing rules [14], it is not yet clear: (1) How can analytical entity and relationship types be accurately defined at the logical level? (2) What logical structure can topology schemas be translated into? (3) How can topological constructs be specified using a query language, ideally in a declarative way? These questions

3 Data Analytics: From Conceptual Modelling to Logical Representation 417 are left unanswered in the previous works [15,16]. This paper aims to answer these questions by exploring the connections between such a conceptual framework for data analytics and its corresponding logical representation. Contributions. We have the following contributions in this paper: We discuss how a conceptual framework for data analytics as introduced in [15] can be effectively transformed into a logical framework. We introduce a novel query language for data analytics, which extends SQL with the ability to query topological properties of interest. We have implemented our modelling method into a unified data analytics platform, which allows to incorporate analytics algorithms as plug-ins in a flexible and open manner. We present three real-world data analytics applications to illustrate the expressive power and simplicity of our modelling method, and the experimental results of evaluating the performance of our data analytics platform. Outline. In the following, Sect. 2 discusses the modelling of data analytics and Sect. 3 introduces our query language for data analytics. We discuss three data analytics applications in Sect. 4, and present our experimental results in Sect. 5. The paper is concluded in Sect Modelling Data Analytics In this section, we discuss data analytics from a modelling perspective. This is because, in practice, many organisations are facing the challenges of managing data analytics tasks in a complex environment, and using modelling techniques can bring in several advantages to addressing these challenges, including: enhancing communications among multiple stakeholders, understanding connections among complex analysis requirements, and detecting design flaws earlier and right from the start before implementing any code. We first recall the Network Analytics ER (NAER) modelling method [15], then elaborate on the transformation from a conceptual model into the logical representation for data analytics applications. Generally, the NAER modelling method supports two kinds of entities and relationships [15]: (1) base entities and relationships which specify first-class entities and relationships that should be stored in a database system from a data management perspective, as in the traditional ER modelling; (2) analytical entities and relationships which specify first-class entities and relationships used for the data analytics purpose. In the NAER model, base types are the ground from which analytical types can be derived, and the base types that define an analytical type are called the support of the analytical type. To conceptualise data analytics tasks, not only data requirements (i.e., what kind of data is needed) but also query requirements (i.e., what kinds of queries are used) are considered in the conceptual modelling process. Base entity and relationship types are used to capture data requirements, leading to a core schema,

418 Q. Wang and M. Liu while analytical entity and relationship types are used to capture query requirements, which yields a number of topology schemas.

4 418 Q. Wang and M. Liu while analytical entity and relationship types are used to capture query requirements, which yields a number of topology schemas. That is, a core schema contains a set of base types, and each topology schema contains a set of analytical types, and the support of each analytical type in a topology schema is a subset of base types in the core schema. In general, each conceptual framework for data analytics applications contains a core schema which is relatively large, and a number of topology schemas which are often small. Although being small, topology schemas can be flexibly composed into larger schemas if needed [16]. After a conceptual framework has been established as previously discussed, the question arising is: how can such a conceptual framework be transformed into a logical framework? Although, in principle, it is possible to choose any logical data model, e.g., the relational data model, a graph model or a combination of several data models, data in many real-world applications is stored in relational databases. Moreover, data analytics tasks often require sophisticated analysis on both relational and topological properties of data. For these reasons, we develop the following data model at the logical level: Transform basic entities and relationships in the core schema into a set of relations for storage, as in the traditional ER modelling approach [14]; Transform analytical entities and relationships in the topology schemas into a set of entity-relationship (ER) graphs for analytics [9]. That is, one topology schema corresponds to one type of ER graphs in which each vertex represents an analytical entity and each edge between two vertices represents an analytical relationship between two analytical entities. Topology schemas Graph schemas ER-graph Graph mapper ER-graph Graph mapper Core schema Graph mapper Relation Graph mapper Relation schemas ER-graph Relation Relation ER-graph Fig. 2. A hybrid data model at the logical level Figure 2 illustrates a hybrid data model, in which a collection of ER graphs are constructed on top of relations through graph mappers either on the fly or in a materialized manner, as will be formally defined in Sect. 3. Accordingly, the core schema and topology schemas in a conceptual model are transformed into a set of relation schemas and graph schemas in a logical model. In practice, such a hybrid data model can be easily built by applying the above transformation rules to a conceptual model that describes data analytics tasks. Since data analytics tasks

5 Data Analytics: From Conceptual Modelling to Logical Representation 419 often require additional querying capability over graphs, for example, finding paths, detecting communities, clustering, ranking, etc., in order to implement such a hybrid data model at the logical level, we would need a query language that can support joint analytics of relations and graphs. 3 A Query Language for Data Analytics We present a SQL-like query language for data analytics, called RG-SQL, which extends the standard SQL with new features to facilitate joint analytics of relations and graphs. More specifically, RG-SQL provides data definition statements that can create graphs from relations in a flexible way, and data manipulation statements to conduct various data analytics operations over relations and graphs. In the following, we explain these new features of RG-SQL in detail. Creating graphs. RG-SQL can create two types of graphs: undirected graphs and directed graphs, through the specification on graph types using UNGRAPH and DIGRAPH, respectively. Graphs can be created either on the fly or in a materialized manner with the following syntax: Graphs on the fly SELECT <attribute list> FROM <relations graphs> WHERE <graph name> IS <graph type> AS (graph mapper); Materialized graphs CREATE <graph type> <graph name> AS (graph mapper); where <graph type> := UNGRAPH DIGRAPH, and a graph mapper is a SQL query that extracts an edge list (i.e., a list of edges of a graph, which is a common data structure for representing a graph) from relations in the underlying databases for graph construction. Ranking. To assess the importance of vertices within a graph, RQ-SQL provides a RANK operator with the following syntax: RANK( <graph name>, <measure>) <measure> := degree indegree outdegree betweenness closeness pagerank A number of measures are available for determining the importance of vertices [3]. One may choose the most suitable measure for a specific query based on the type of the graph and desired properties. Each RANK( <graph name>, <measure>) yields a relation with two attributes: vertexid, andvalue. Clustering. To explore the clustering structure of vertices over a graph, RG- SQL provides a CLUSTER operator with the following syntax: CLUSTER( <graph name>, <algorithm>) <algorithm> := CC SCC GN CNM MC

6 420 Q. Wang and M. Liu where CC refers to an algorithm of finding connected components, SCC an algorithm of finding strongly connected components, and GN, CNM and MC three algorithms for community detection, which respectively correspond to Girvan-Newman algorithm [7], Clauset-Newman-Moore Algorithm [5] and Peixoto s modified Monte Carlo Algorithm [13]. Each CLUSTER( <graph name>, <algorithm>) yields a relation with three attributes: clusterid, size and members. Path finding. To find paths among two or more vertices, RG-SQL provides a PATH operator with the following syntax: PATH( <graph name>, <path expression>) <path expression> :=. V <path expression>/ <path expression> <path expression>// <path expression> where V is a vertex expression that imposes certain condition on the vertices of a path,. is a do-not-care symbol indicating that any vertex is allowed in its position, / represents one edge, and // represents any number of edges. A path expression is valid if it contains a vertex expression in the first and last positions. For example, an expression V1//V2 specifies a path between two vertices V1 and V2, regardless of the length of the path. Each PATH( <graph name>, <path expression>) yields a relation with three attributes: pathid, length and path. 3.1 Discussion We now briefly discuss the expressive power of RG-SQL in comparison with the relational query language SQL and the graph query language Cypher used in Neo4j ( Since RG-SQL extends the standard SQL with the additional operations, such as ranking, clustering and path finding, RQ-SQL is strictly more expressive than SQL and has the expressive power beyond the first order logic [1], for example, recursion in a path finding expression V1//V2 cannot be expressed by SQL but can be expressed by RG-SQL. For Cypher, it is a query language designed to express graph patterns, which can nonetheless be expressed by RG-SQL or its variations through a combination of path finding operations. However, not all operations of RG-SQL can be expressed by Cypher, e.g., ranking operations using betweenness and clustering operations using GN. 4 Data Analytics Applications In this section we study data analytics tasks in three real-world applications and explain how data analytics requirements can be conceptualized in our work. 4.1 Digital Library Digital Library ( is a bibliographical network containing a collection of articles, authors, and publishers. Each article is written by one

Data Analytics: From Conceptual Modelling to Logical Representation 421 or more authors, one article may cite a number of other articles, and articles are included in conference proceedings or

7 Data Analytics: From Conceptual Modelling to Logical Representation 421 or more authors, one article may cite a number of other articles, and articles are included in conference proceedings or journals published by publishers. Figure 3 depicts a conceptual schema for this data analytics application, which includes the topology schemas S a1 and S a2 required by the following queries: Q1: [Collaborative communities] Find the communities that consist of authors who collaborate with each other to publish articles together. Q2: [Influential articles] Find the top 3 most influential articles. For Q1, we may use RQ-SQL to create a materialized coauthorship graph for coauthorship over S a1, then find the collaborative communities in the coauthorship graph by applying the MC algorithm in CLUSTER. CREATE UNGRAPH coauthorship AS (SELECT w1.aid, w2.aid AS coaid FROM WRITE AS w1, WRITE AS w2 WHERE w1.aid!=w2.aid AND w1.pid=w2.pid); SELECT clusterid, size, members FROM CLUSTER(coauthorship, MC); For Q2, we may create a citation graph over S a2 on the fly and then to find influential articles in the citation graph using the measure betweenness. SELECT vertexid, value FROM RANK(citation, betweenness) WHERE citation IS DIGRAPH AS (SELECT aid, citedaid FROM CITE) LIMIT 3; Topology Schemas S a1 Sa2 S a3 from AUTHOR* COAUTHOR SHIP ARTICLE* CITATION JOURNAL* COCITATION to Core Schema AUTHOR WRITE ARTICLE PUBLISH PUBLISHED _BY PUBLISHER + CITE JOURNAL PROCEEDINGS Fig. 3. A conceptual schema for Digital Library

8 422 Q. Wang and M. Liu 4.2 Twitter Twitter ( is a social network which enables users to post tweets. Users may follow one another. A tweet can mention one or more users and be labelled by one or more tags. Figure 4 depicts a conceptual schema for data analytics in Twitter. Typical data analytics tasks in Twitter include to analyse how users follow each other and to find the most followed people as described by the following queries: Q3: [Shortest path] Find the shortest path between Jack and Max. Q4: [Most followed people] Find the most followed people who have posted at least one tweet about ANU. Firstly, the following graph over the topology schema S t1 is created based on entities of user and their relationships in following. Then for Q3 we may find the shortest path between Jack and Max using the following RG-SQL query: SELECT * FROM PATH(following, v1//v2) WHERE v1 AS (SELECT uid FROM USER WHERE name = Jack ) AND v2 AS (SELECT uid FROM USER WHERE name = Max ) ORDER BY length ASC LIMIT 1; For Q4, we need to not only find the most followed people in the following graph but also people who have posted a tweet tagged from the relations over the core schema, as illustrated by the following RG-SQL query. SELECT uid, value FROM RANK(following, pagerank) AS p1, POST AS p2, LABELLED_BY AS l WHERE p1.vertexid=p2.uid AND p2.twid=l.tid AND l.label= ANU ORDER BY value DESC; 4.3 Stack Overflow Stack Overflow ( is a collaboratively edited question and answer site for programmers. Users may ask questions or post answers. A question may have zero or more answers and be labelled by tags. For each question, one answer can be accepted as the accepted answer. A conceptual schema for data analytics in Stack Overflow is presented in Fig. 5. Q5: [Python experts] Find top 10 Python experts in Stack Overflow (i.e. users who often reply Python questions and their answers are often accepted). Q6: [Most influential expert] Find the influential expert in Stack Overflow who is involved in one of the top 3 largest question-answer communities. Similarly, we first create the getting answers graph over the topology schema S s1. Then the RQ-SQL query for Q5 is follows:

9 Data Analytics: From Conceptual Modelling to Logical Representation 423 Fig. 4. A conceptual schema for Twitter SELECT * FROM RANK(getting_answers, pagerank) WHERE vertexid IN (SELECT owner_id FROM ANSWER AS a, LABELLED_BY AS l, TAG AS t WHERE a.parent_qid=l.qid AND l.tid=t.tid AND tag_label = python ) LIMIT 10; For Q6, we have the following RQ-SQL query, in which both RANK and CLUSTER operators are applied over two different graphs and their results can be flexibly combined to support further analytics. SELECT r.vertexid FROM RANK(getting_answers, pagerank) AS r, (SELECT members FROM CLUSTER(co-answering, MC) ORDER BY size DESC LIMIT 3) AS c WHERE r.vertexid=any(c.members); 5 Experiments We have implemented our modelling method into a unified data analytics platform, called Rogas, which allows to incorporate analytics algorithms as plug-ins in a flexible and open manner [10]. To understand how well Rogas can perform in comparison with other database systems, we have conducted experiments to compare the expressive power of query languages and the time efficiency of query execution in three different systems: PostgreSQL ( Neo4j ( and Rogas. These experiments were performed on a Dell Optiplex 9020 desktop computer with the Intel(R) Core(TM) i CPU 3.6 GHz 8 cores processor, 16 GB of memory and 256 GB disk. Rogas extends the query engine of PostgreSQL 9.4.4, with additional functionalities implemented using Python The version of Neo4j we used is community

10 424 Q. Wang and M. Liu Fig. 5. A conceptual schema for Stack Overflow In our experiments, we used the data sets from the data analytics applications discussed in Sect. 4: (1) Digital Library ( ) data set provided by the Digital Library ( (2) Stack Overflow data set from the Stanford Network Analytics Platform ( snap-icwsm/), and (3) Twitter data set provided by Haewoon Kwak ( kaist.ac.kr/traces/www2010.html). Table 1 presents more details about these three data sets. Table 2 depicts the queries used in our experiments, which can be generally divided into three categories: (1) Q1 Q3 are relational queries including join, sorting, and aggregate operations; (2) Q4 Q10 are queries about graph properties, including: triangle counting, pagerank centrality, path finding and community detection; (3) Q11 Q12 are sophisticated queries that may combine several graph properties, e.g., Q11 combines pagerank centrality with finding connected components and Q12 combines pagerank centrality with path finding. Our first experiment is to illustrate the expressive power of the three query languages: PostgreSQL, RG-SQL and Cypher in terms of the queries Q1 Q12. As shown in Table 3, PostgreSQL, RG-SQL and Cypher do have different expressive powers. SQL cannot be used to specify Q6 Q12 and Cypher cannot be used to specify Q10 Q12. Nonetheless, RG-SQL is expressive enough to specify all these queries. Our second experiment is to evaluate the time efficiency of query execution in Rogas, PostgreSQL and Neo4j. As not all queries can be expressed by PostgreSQL and Neo4j, we have thus compared Q1-Q5 over the three systems, and Q6-Q9 only over Rogas and Neo4j. Note that, for Q6 and Q7, Neo4j needs to use an extension, called Neo4j Mazerunner, to run graph analytics algorithms at scale with Hadoop HDFS and Apache Spark ( developer/apache-spark/#mazerunner), and is thus required to send an HTTP

11 Data Analytics: From Conceptual Modelling to Logical Representation 425 Table 1. Three data sets used in our experiments Data set Raw data size No of vertices in graphs (Neo4j) No of edges in graphs (Neo4j) No of records in relations (PostgreSQL) 14.9 GB (XML) 1,128,243 2,488,849 publisher 50 journal 128 proceedings 6,421 article 337,006 author 784,638 write 932,400 cite 1,212,894 Stack Overflow 30.6 GB (XML) 21,713,109 31,747,662 question 7,990,787 answer 13,684,117 tag 38,205 labelled by 13,466,686 Twitter 29.7 GB (TXT) 13,250, ,368,797 tweet 10,762,104 tag 210,121 user 2,277,971 follow 259,602,970 mentioned in 3,108,776 labelled by 1,657,051 GET request to Neo4j Mazerunner. In such cases, the time of executing queries in Neo4j includes the time for sending and receiving the requests. For each query, we ran it 5 times in each system and took the average time for plotting. Figure 6 presents our experimental result. The key observations are as follows: For Q1 Q5, Rogas performed equally well with PostgreSQL, and better than Neo4j in most queries except for Q4. This is because Q4 is about pattern matching which requires to navigate hyper-connectivity on graphs, and Neo4j has been particularly optimised for such queries whereas we have not yet implemented any query optimisation techniques. For Q5, it is not surprising that Rogas performed better than Neo4j since it handles the problem of triangle counting, for which the study in [8] has also experimentally verified that relational databases can perform the triangle counting task very efficiently through expressing a three-way self-join. For Q6 Q7, as Neo4j needs to use Neo4j Mazerunner, it requires time on sending and receiving the requests. Thus, Rogas performed better than Neo4j. However, for Q8 Q9, similar to Q4, these queries need to navigate hyperconnectivity on graphs and Neo4j performed better than Rogas. In addition to Q1 Q12, we have also run several queries about closeness centrality over Twitter using Rogas and Neo4j. Rogas can successfully complete the queries and return the query results, while Neo4j failed and the system reported the OutOfMemory error. The reason for this is that the graphs created in Twitter are large so that processing these queries exceeded the memory limitation of Neo4j.

12 426 Q. Wang and M. Liu Table 2. Queries used in our experiments Query Data set Query description Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Join Operation + Sorting Operation Stack Show the question id, the owner id and the tag label of top 10 Overflow questions that have the most view count. Join Operation + Sorting Operation + Aggregate Operation Show the top 5 answerers and their latest reputation score in an Stack descending order based on the number of their answers that accepted Overflow by questions. Join Operation + Sorting Operation + Aggregate Operation Twitter Stack Overflow Show the number of articles of each journal and proceeding along with the journal name and the proceeding title in a descending order. Pattern Matching Recommend 10 twitter users for Jack who currently does not follow these users but Jack follows somebody who are following them. Triangle Counting Count the number of triangles of the co-authorship network. PageRank Centrality Find the top 10 influential authors according to the pagerank centrality in the co-authorship network. Connected Component Count the number of connected components of the co-authorship network. Path Finding Find paths with length less than 2, which connect two author V1 and V2 in the co-authorship network where author V1 is affiliated at ANU and author V2 is affiliated at UNSW. Shortest Path Find a shortest paths between two authors Michael Norrish and Kevin Elphinstone in the co-author network. Community Detection Find a group of tags that they are often used together to label a question. PageRank Centrality + Connected Component According to the pagerank centrality, find the top 3 authors of the biggest collaborative community in the co-authorship network. PageRank Centrality + Path Finding According to the pagerank centrality, show how the top 2 authors connect with each other in the co-authorship network.

13 Data Analytics: From Conceptual Modelling to Logical Representation 427 Table 3. Comparison on the expressive power of the query languages PostgreSQL, RQ-SQL and Cypher over the queries Q1 Q12, where and indicate expressible and not expressible, respectively Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 PostgreSQL RG-SQL Cypher Fig. 6. Comparison on the time efficiency of query execution in Rogas, PostgreSQL and Neo4j over the queries Q1 Q9

14 428 Q. Wang and M. Liu 6 Conclusions In this paper, we have discussed how data analytics tasks can be conceptualised by a conceptual model and then transformed into a logical model. We have also proposed a query language for data analytics, and implemented the proposed methods into a data analytics platform that can unify various data analytics tasks and algorithms. This work was based on our case studies on several realworld data analytics applications. In the future, we plan to add query topics into our data analytics platform and investigate the development of a query language at a higher level through query topics. We will also study network dynamics and develop techniques to analyse and visualise networks that dynamically change over time. Acknowledgement. We thank the Digital Library for providing the data set of the bibliographical network. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995) 2. Bao, Z., Tay, Y., Zhou, J.: sonschema: a conceptual schema for social networks. In: Conceptual Modeling, pp (2013) 3. Brandes, U., Erlebach, T.: Network Analysis: Methodological Foundations. Springer Science & Business Media, New York (2005) 4. Chen, P.: The entity-relationship model - toward a unified view of data. TODS 1(1), 9 36 (1976) 5. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), (2004) 6. Embley, D.W., Liddle, S.W.: Big data conceptual modeling to the rescue. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER LNCS, vol. 8217, pp Springer, Heidelberg (2013). doi: / Girvan, M., Newman, M.E.: Community structure in social and biological networks. PNAS 99(12), (2002) 8. Jindal, A., Madden, S.: GRAPHiQL: a graph intuitive query language for relationaldatabases. In: IEEE International Conference on Big Data, pp (2014) 9. Kasneci, G., Ramanath, M., Sozio, M., Suchanek, F.M., Weikum, G.: Star: steinertree approximation in relationship graphs. In: ICDE, pp (2009) 10. Liu, M., Wang, Q.: Rogas: a declaratice framework for network analysis. In: VLDB (2016) 11. Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arxiv preprint arxiv: (2014) 12. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp (2010) 13. Peixoto, T.P.: Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89(1), (2014)

15 Data Analytics: From Conceptual Modelling to Logical Representation Thalheim, B.: Entity-relationship Modeling: Foundations of Database Technology. Springer Science & Business Media, New York (2013) 15. Wang, Q.: Network analytics ER model-towards a conceptual view of network analytics. In: ER, pp (2014) 16. Wang, Q.: A conceptual modeling framework for network analytics. Data Knowl. Eng. 99, (2015)

Towards a Unified Framework for Network Analytics

Towards a Unified Framework for Network Analytics Minjian Liu A thesis submitted in partial fulfillment of the degree of Master of Computing at The Department of Computer Science Australian National University