The Many Faces Of Apache Ignite. David Robinson, Software Engineer May 13, PDF Free Download

The Many Faces Of Apache Ignite David Robinson, Software Engineer May 13, 2016

A Face In elementary geometry, a face is a two-dimensional polygon on the boundary of a polyhedron. 2 Attribution:Robert Webb's Stella software http://www.software3d.com/stella.php

Some Faces of Apache Ignite Data Streaming SQL Transactions Services File System Data Grid 3 Persistence Clusters Spark Integration Attribution:Robert Webb's Stella software http://www.software3d.com/stella.php

Background The Market, Apache Ignite, A Use Case 4

Understanding the In Memory Eco-System Fabrics In-Memory Database Apache Ignite Distributed Caches Redis Memcached Data Grid Hazelcast Alluxio(?) Dashdb SAP Hana The distinctions may be blurring coming down to performance and scale 5

Apache Ignite Forms a Cluster All those faces potentially running on each node 6 Source: http://preview.tinyurl.com/hzvq5m6

What Is Genesis Graph? Running Today (short demo) A Property Graph Database built on Apache Ignite Basic Pieces Vertex Edge Vertex Properties Edge Properties 7

Leveraging Capabilities in the Grid less usage Genesis Graph DB Today planned more usage 8

How the Apache Ignite Grid Is Used For Genesis Graph towards a market leading (open governance/source) graph database store 9

Apache Ignite And Building A Big Data Graph Database Capabilities to construct a graph database. ID Generation Data representation and storage Multi-model + Analytics Integration Data Streaming and Eventing Transactions Partition awareness Fringe Benefits Keeping all, or large parts, of the graph in memory Notebook Integration Available for Data Scientists Real Time Graphs with the streaming 10

Future ID Generation On The Ignite Grid genesis graph Apache genesis Ignite Grid genesis Computing genesis Framework graph graph Custom AtomicID Service graph genesis graph get an id write a vertex Genesis Graph Client Atomics in Ignite are distributed across the cluster, essentially enabling performing atomic operations (such as increment-and-get or compare-and-set) with the same globally-visible value 11 Slide contains animation

Graph Storage On The Ignite Grid Ignite indexes index index partitioned cache index partitioned cache index partitioned cache index index partitioned cache partition aware index index Apache Ignite Grid Computing Framework read / write through to disk partitioned cache Genesis Graph Client H2 Cassandra HBase 12 Partitioned (with back up?) cache Off heap memory Write and read through persistence Slide contains animation

The Challenges Of Data Locality vertex Ex: hotel Key, Value Ex: name, hyatt network 13 This slide has automation

Forcing Data Locality through Affinity Keys vertex Key, Value Affinity Interface mapkeytonode(k key) int[] allpartitions(clusternode n) network 14 Co-location is required to use the Ignite SQL Join capability This slide has automation

Data Representation And Storage Challenges The Graph will need to implement its own, graph level indexes Ignite Hash Map data structure is inefficient at large scales public class InternalVertex implements Serializable { /** vertex id (indexed). */ @QuerySqlField(index = true) public Long id; /** ability to query via Ignite */ @QuerySqlField public String label;... Most efficient for query would be to inject new fields into this as user defines schema 15 This slide has automation

Data Representation And Storage Challenges The Graph will need to implement its own, graph level indexes Ignite Hash Map data structure is inefficient at large scales public class InternalVertex implements Serializable { /** vertex id (indexed). */ @QuerySqlField(index = true) public Long id; /** ability to query via Ignite */ @QuerySqlField public String label;... public class UserVertexIndex implements Serializable { /** vertex id (indexed). */ @QuerySqlField(index = true) public String name; @QuerySqlField(index=true) public Object value;... 16 Next idea is to auto generate beans that represent? indexes and let Ignite efficiently handle the indexing

Data Representation And Storage Challenges Tuning TinkerPop 3.x Strategies To Match the Storage Model Custom steps and strategies? Gremlin: g.e().has("since", "2005").fill(m); select * from edgestorecache where since=2005 17

Creating A Cache For the Graph public void opengraphvertexcache() { String namespacedcachename = getnamespacedcachename(ggdefinitions.genesisgraph_vertexcache_prefix); CacheConfiguration<Long, InternalVertex> cfg = new CacheConfiguration<>(namespacedCacheName); // we want to support transactions on all of our caches // this does not rule out atomic updates outside of a transaction cfg.setatomicitymode(cacheatomicitymode.transactional); cfg.setcachemode(cachemode.partitioned); // NOTE: the index here must be key/value pairs (in twos) // cfg.setindexedtypes(affinitykey.class, InternalVertex.class); cfg.setindexedtypes(long.class, InternalVertex.class); // must force close transactions because we cannot stop caches with open transactions IgniteTransactions txcontainer = this.igniteclientconnection.gethandletotxinterface(); if (txcontainer!= null) { Transaction atx = txcontainer.tx(); if (atx!= null) { if (atx.state().ordinal() == TransactionState.ACTIVE.ordinal()) { atx.commit(); } } } IgniteCache<Long, InternalVertex> internalvertexcache = this.igniteclientconnection.ignite.getorcreatecache(cfg); } // add the new cache into the list of caches to be closed this.cachesallocated.put(namespacedcachename, internalvertexcache); 18

Multi-Model + Analytic Processing Integration Spark RDDs Gremlin Graph Traversals SQL Property Queries data streaming 19

Analytic Processing: Spark Example scala> import org.apache.tinkerpop.gremlin.ignitegraph.structure.internal._ import org.apache.tinkerpop.gremlin.ignitegraph.structure.internal._ scala> val ic = new IgniteContext[Integer, InternalVertex](sc, () => new IgniteConfiguration()) ic: org.apache.ignite.spark.ignitecontext[integer,org.apache.tinkerpop.gremlin.ignitegraph.structure.internal.internalvertex] = org.apache.ignite.spark.ignitecontext@713935c8 scala> val vertices = sharedrdd.collect() vertices: Array[(Integer, org.apache.tinkerpop.gremlin.ignitegraph.structure.internal.internalvertex)] = Array((1,InternalVertex [id=1, collocateid=1, label=person, ]), (2,InternalVertex [id=2, collocateid=1, label=person, ]), (3,InternalVertex [id=3, collocateid=1, label=person, ]), (4,InternalVertex [id=4, collocateid=1, label=address, ]), (5,InternalVertex [id=5, collocateid=1, label=phonenumber, ])) scala> sharedrdd.foreach(println) scala> vertices.foreach(println) (1,InternalVertex [id=1, collocateid=1, label=person, ]) (2,InternalVertex [id=2, collocateid=1, label=person, ]) (3,InternalVertex [id=3, collocateid=1, label=person, ]) (4,InternalVertex [id=4, collocateid=1, label=address, ]) (5,InternalVertex [id=5, collocateid=1, label=phonenumber, ]) 20

Analytic Processing: SQL Example private void dowork() { String JDBCSTRING = "jdbc:ignite:cfg://cache=ignitegraph1graphvertexcache@file:/users/graphie/downloads/apacheignite/ignite-fabric-1.5.0.final/david/ david-ignite.xml"; try { // Register JDBC driver. Class.forName("org.apache.ignite.IgniteJdbcDriver"); // Open JDBC connection (cache name is not specified, which means that we use default cache). Connection conn = DriverManager.getConnection(JDBCSTRING); Statement stmt1 = conn.createstatement(); ResultSet rs = stmt1.executequery("select * from internalvertex"); while (rs.next()) { System.out.println("Id "+rs.getlong("id")+" Label "+rs.getstring("label")); } stmt1.close(); conn.close(); } catch (Exception e) { e.printstacktrace(); } Id 3 Label person Id 1 Label person Id 2 Label person Id 4 Label address Id 5 Label phonenumber 21

Apache Ignite And Building A Big Data Graph Database Capabilities to construct a graph database. ID Generation Data representation and storage Multi-model Data Streaming and Eventing Transactions Partition awareness Fringe Benefits Keeping all, or large parts, of the graph in memory Notebook Integration Available for Data Scientists Real Time Graphs with the streaming 22

Partition Awareness On The Ignite Grid vertex vertex cache property vertex property cache metaprop cache Ignite internals Can also be off heap rather than same JVM Apache Ignite JVM Data location can be controlled via Affinity Keys in Ignite Compute can also be co-located 23

Genesis Graph Visualization Visualization becomes much easier with all of the possible ways to access the graph data Gremlin Server Integration or Other Data Integration 24 UK to France International Air Routes Attribution: Graham Wallis, IBM

Genesis Graph Visualization Airports Sized By Number Of Routes Via Gremlin Server Interface 25 Attribution: Graham Wallis, IBM

The Many Faces Of Apache Ignite. David Robinson, Software Engineer May 13, 2016