Cray Graph Engine / Urika-GX. Dr. Andreas Findling

Size: px

Start display at page:

Download "Cray Graph Engine / Urika-GX. Dr. Andreas Findling"

August Curtis
5 years ago
Views:

1 Cray Graph Engine / Urika-GX Dr. Andreas Findling

Cray Analytic Platforms 2012 2014 2016 2017+ Urika-GD Graph Analytics, XMT2 Seastar Urika-XA Hadoop Spark, Infiniband SSD

3 Cray Analytic Platforms Urika-GD Graph Analytics, XMT2 Seastar Urika-XA Hadoop Spark, Infiniband SSD Urika-GX Hadoop Spark, Cray Graph Engine Aries, SSD Minerva Analytics Software Stack available on XC Platforms Copyright 2017 Cray Inc.

4 Porting the Query Engine Data in an RDF database is unstructured Communication of information across the dataset can be highly irregular, may approach all-to-all for tightly connected graphs Maintaining optimal network performance for short remote references, both PUTs and GETs is essential Urika-GD does this very well ~100 Mrefs/s per node for single word loads and stores But all references are remote! Mapping to XC/Aries architecture Global address space for one-sided communication Leverage the low-level DMAPP library communication layer Non-blocking implicit GETs and PUTs (~50 Mrefs/s per node, single word) Utilize synchronization features and atomic operations available with Aries Copyright 2017 Cray Inc.

5 Selection of Coarray C++ Programming Model C++ template library that runs on top of Cray's Partitioned Global Address Space (PGAS) library Provides the performance advantages of the low-level DMAPP communication Provides easy access to Aries synchronization features and atomic operations Urika-GD codebase is currently C++ Coarray provides an easy model for taking advantage of locality when available Internal intermediate data structures Carrying forward the Basic Graph Function (BGFs) extensions Custom graph algorithms written in Coarray C++ Copyright 2017 Cray Inc.

6 Why is CGE on Urika-GX faster than Urika-GD? Urika-GD got its performance from Multithreading and huge shared memory with fast random access (Seastar) CGE uses a shared memory model called PGAS (Partitioned Global Address Space) Invented and championed by Cray Inc. Depends on the Aries network and its RDMA capability Urika-GX nodes are more powerful Multicore processors provide fewer but more powerful threads 8 channels of DDR4 memory per node provide more memory bandwidth and capacity- critical for Graph Analytics Graph software has been re-factored for these hardware differences, but 90% of it is the same Aries network is faster than Seastar Bandwidth is the most important commodity Copyright 2017 Cray Inc. Cray Inc. Proprietary Not For Public Disclosure 6

LUBM25K: Graph Analytics Benefits from Large Memory and Fast Interconnect Lehigh University Bench Mark (LUBM) Basic Graph Patterns and Inference Test Query Time 40,000 35,000 30,000 25,000 20,000

7 LUBM25K: Graph Analytics Benefits from Large Memory and Fast Interconnect Lehigh University Bench Mark (LUBM) Basic Graph Patterns and Inference Test Query Time 40,000 35,000 30,000 25,000 20,000 15,000 10,000 Average 300% Improvement on Complex Queries 700,0% 600,0% 500,0% 400,0% 300,0% 200,0% Speed Up Highlights Graph performance on complex queries over larger Urika-GD system 5, ,0% 0,000 0,0% Urika-GD Athena 32 Speed Up Urika-GD system, lubm25k, 64 nodes, 24 images per node Urika-GX system, lubm25k, 32 nodes, 24 images per node Copyright 2016 Cray Inc.

8 Cray Graph Engine Overview

9 Pervasive Speed Supercomputing Experience CGE is an in-memory Semantic Graph Database Implemented using HPC technology PGAS and Aries network Based on W3C industry standards RDF graph data format (a.k.a. Triple Store ) SPARQL 1.1 query language Extended with additional high performance graph algorithms (BGFs) Community detection, S-T connectivity, Betweeness centrality Designed to work with other URIKA-GX applications to create complex workflows Copyright 2017 Cray Inc.

11 A Graph-pattern matching workload Given a pattern of interest find all instances thereof Lehigh University Benchmark

12 A Graph-theoretic Workload What's the shortest route from A to B? What is the ranking of the targeted vertex? PageRank

13 RDF Triple Store LUBM 2017

14 Lehigh University Benchmark Ontology: Univ-Bench Represents the meaning of terms (vocabulary) and their interrelationship using OWL Entities / Classes (42) University Department FullProfessor UndergraduateStudent GraduateStudent Student Relationships / Properties / Rules (32) suborganizationof headof memberof takescourse name telephone Web Ontology Language - OWL OWL, RDF and SPARQL standards are the building blocks of the Semantic Web OWL goes beyond RDF, XML; is intended to be used when information needs to be processed. Best developed Ontologies: Gene Ontology (GO) 14

Other Ontologies Gene Ontology (GO) Geneontology.org The GO defines concepts/classes used to describe gene function, and relationships between these concepts.

15 Other Ontologies Gene Ontology (GO) Geneontology.org The GO defines concepts/classes used to describe gene function, and relationships between these concepts. The need of consistent description of gene products across databases. Platform to agree How and Why a specific term is used, and to consistently apply it. Copyright 2015 Cray Inc 15

16 Lehigh University Benchmark The raw data Univ-Bench Artifical data generator UBA UBA generates the requested number of Universities (i.e. LUBM25K has 25,000 Universities) In each University 15~20 Departments are suborganizationof the University In each Department 7~10 FullProfessors worksfor the Department One of the FullProfessors is headof the Department Every Student is memberof the Department 10~20 ResearchGroups are suborganisationof the Department undergraduatedegreefrom, mastersdegreefrom connect Universities Copyright 2015 Cray Inc 16

17 Resource Description Framework N-Triples data format Subject(resource) Predicate (property name) Object (property value) Subject: < Predicate: < Object: < Each of those actually represent resources URI Uniform Resource Identifier Benchmark: LUBM25K 3.3 billion triples 1.2 billion inferred (CGE) 4.5 billion triples in the inferred dataset 626GB in one RDF file: lubm.25k.nt Memory demand: 4 * (Size of *.nt file) => 2504 GB ~ 10 nodes with 256GB (rule of thumb CGE User Guide) Copyright 2015 Cray Inc 17

18 LUBM Queries 14 Queries come with the LUBM benchmark Graph pattern matching queries Queries testing reasoning and inference capabilities SPARQL The query language Designed to query data conforming to the RDF data model. Recursive name: SPARQL protocol and query language Together with the RDF and OWL standards one of the building blocks of the Semantic Web Keywords Typical SPARQL query: I want these pieces of information from the subset of data that meets these conditions WHERE specifies the data to pull Formulated in a triple pattern SELECT picks which data to display Copyright 2015 Cray Inc 18

19 LUBM Queries Graph Pattern: Triangle Query 2: Print out all GraduateStudents which are memberof a Department and do have a undergraduatedegreefrom the same University where the Department is a suborganizationof SELECT?X?Y?Z WHERE {?X rdf:type ub: GraduateStudent.?Y rdf:type ub: University.?Z rdf:type ub: Department.?X ub:memberof?z.?y ub:suborganizationof?y.?x ub:undergraduatedegreefrom?y} Query 9 has the same triangular pattern of relationship. It is the most compute intensive query. Copyright 2015 Cray Inc 19

20 LUBM Queries The basic pattern: nodes Query 14: Print out the names of all undergraduate students SELECT?X WHERE {?X rdf:type ub:undergraduatestudent} Large input, low selectivity No reasoning or inference Query 6: Print out the names of all students (as defined in the Ontology) SELECT?X WHERE {?X rdf:type ub:student} Large input, low selectivity Using the rules of the Ontology (reasoning) is needed to find: UndergraduateStudent and GraduateStudent are Students (subclassof relationship) Copyright 2015 Cray Inc 20

21 Pattern matching scaling 100 LUBM200K Scaling Strong scaling on most queries Strict query time (seconds) x16 256x16 512x Query

22 SPARK GraphX LUBM 2017

23 Spark Framework Apache Spark Fast, general purpose framework for large-scale data processing Potential to keep data in memory Solves the problem of not being able to share data across multiple map and reduce steps Choice of languages: Python, Scala, R, Java Supports variety of workloads with the same runtime Batch Streaming Interactive SQL Machine Learning GraphX

24 GraphX - The Spark Graph Library Data Model: Labeled Property Graph Nodes, Edges The simplest way to think of a graph is to name all the nodes and their connections (edges). Properties and Labels attached to nodes and vertices Data format for LUBM: JSON (nodes.json; edges.json 2 files) Why does Cray CGE do RDF? Open Standard of the W3C: Basis of the Semantic Web. Query Language? Spark/GraphX provides an abstraction for graph analysis No query language but GraphX API currently only available in Scala Writing pattern matching queries requires the understanding of its underlying distributed data processing engine, Spark, and the properties of its data-parallel operations GraphX extends the Spark RDD abstraction by introducing a Graph Class Resilient Distributed Property Graph => DISTRIBUTED PARALLEL Copyright 2015 Cray Inc 24

25 Pattern matching - Spark Comparison LUBM25K CGE vs. Spark GraphX Performance 128 Nodes XC-40 CGE 1-2 orders of magnitude faster Strict query time (ms) CGE GraphX Query CUG 2017 Copyright 2017 Cray Inc. 25

26 Build-in Graph Functions SNAP 2017

27 Built-in Graph Functions (BGFs) SPARQL is limited in its ability to express graph processing CGE augments SPARQL with a capability of calling library graph algorithms You can go from SPARQL to a graph algorithm and back to SPARQL for further refinement Stanford Network Analysis Project (SNAP) US Patent Citations and two online social networks

28 Applications for Available Algorithms Search / neighborhood identification and extraction Pattern-matching / subgraph isomorphism: (Core functionality) Cybersecurity application: Context and search, data exfiltration, beaconing, attack identification Community detection Modularity: Relaxed clique Cybersecurity application: Botnet detection and server hierarchy mapping Path finding Shortest path, S-T connectivity Cybersecurity application: Identify likely paths for information flow between nodes Key node / edge identification Betweenness centrality Cybersecurity application: find the vulnerable points in network configurations Anomaly identification and clustering Bad Rank: finds likely worst actors by association with known bad actors, a la PageRank Cybersecurity application: Unknown-unknown identification Copyright 2017 Cray Inc.

29 SERIOUS AGILITY PERVASIVE SPEED Whole Graph Analysis Scaling Strict Query time (seconds) CGE Performance: Pagerank (SPARQL w/ BGF extension) 32 nodes 64 nodes 128 nodes 256 nodes 512 nodes Strong scaling across SNAP datasets 1 cit-patents soc-livejournal1 com-friendster Dataset Copyright 2017 Cray Inc.

30 SERIOUS AGILITY PERVASIVE SPEED seconds Whole Graph Analysis Scaling Performance Comparison: CGE vs. Spark GraphX PageRank livejournal1 64p livejournal1 128p livejournal1 256p CGE order of magnitude faster Iterative SPARQL approach equivalent to Spark 1 Spark GraphX Python+SPARQL SPARQL+ BGF Programming Model Copyright 2017 Cray Inc.

31 SERIOUS AGILITY PERVASIVE SPEED Whole Graph Analysis Scaling seconds Performance Comparison: CGE vs. Spark GraphX PageRank friendster 64p friendster 128p friendster 256p CGE order of magnitude better than Spark Dataset characteristics affect performance 1 Spark GraphX Python+SPARQL SPARQL+ BGF Programming Model Copyright 2017 Cray Inc.

32 Cray Urika-GX Configuration

33 Urika-GX Configuration Supercomputing Experience Deep memory / storage hierarchy Aries Network Cray Aries fabric with high I/O throughput and low latency 16/48 2-socket Intel Xeon E v4 family processor nodes cores 8-24 TB DRAM TB PCIe SSDs TB HDD local storage Attach to external POSIX-compliant global storage: Cray Sonexion (Lustre ) GPFS NFS HPC Network Optimized PGAS for Cray Graph Engine Large Memory Node-local PCIe SSDs Tiered HDFS, Optimized Shuffle Operations External File Systems (incl. Lustre) Copyright 2017 Cray Inc.

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes