The Many Faces Of Apache Ignite. David Robinson, Software Engineer May 13, 2016

Similar documents
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Processing of big data with Apache Spark

CSE 444: Database Internals. Lecture 23 Spark

Analyzing Flight Data

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Distributed Systems. 22. Spark. Paul Krzyzanowski. Rutgers University. Fall 2016

Introduction to NoSQL Databases

DATA SCIENCE USING SPARK: AN INTRODUCTION

CSC System Development with Java. Database Connection. Department of Statistics and Computer Science. Budditha Hettige

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Migrate from Netezza Workload Migration

Databases and Big Data Today. CS634 Class 22

E6895 Advanced Big Data Analytics Lecture 4:

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

An Introduction to Apache Spark

ERwin and JDBC. Mar. 6, 2007 Myoung Ho Kim

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Apache Ignite and Apache Spark Where Fast Data Meets the IoT

SciSpark 201. Searching for MCCs

2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk

In-Memory Computing Essentials

The Evolution of Big Data Platforms and Data Science

Apache Spark 2.0. Matei

Getting Started with Apache Ignite as a Distributed Database

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Big Data Architect.

Unifying Big Data Workloads in Apache Spark

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Cloud Computing & Visualization

Big Data Analytics using Apache Hadoop and Spark with Scala

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source


Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

In-Memory Performance Durability of Disk GridGain Systems, Inc.

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

C:/Users/zzaier/Documents/NetBeansProjects/WebApplication4/src/java/mainpackage/MainClass.java

Spark Overview. Professor Sasu Tarkoma.

Principal Software Engineer Red Hat Emerging Technology June 24, 2015

Study of NoSQL Database Along With Security Comparison

Chapter 4: Apache Spark

Analytics in Spark. Yanlei Diao Tim Hunter. Slides Courtesy of Ion Stoica, Matei Zaharia and Brooke Wenig

Techno Expert Solutions An institute for specialized studies!

Turning Relational Database Tables into Spark Data Sources

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

Database Applications (15-415)

Databricks, an Introduction

access to a JCA connection in WebSphere Application Server

Big data systems 12/8/17

VanillaCore Walkthrough Part 1. Introduction to Database Systems DataLab CS, NTHU

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

SQream Connector JDBC SQream Technologies Version 2.9.3

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Innovatus Technologies

microsoft

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

COP4540 TUTORIAL PROFESSOR: DR SHU-CHING CHEN TA: H S IN-YU HA

Shark: Hive (SQL) on Spark

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

Spark, Shark and Spark Streaming Introduction

CSE 135. Three-Tier Architecture. Applications Utilizing Databases. Browser. App. Server. Database. Server

Shen PingCAP 2017

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation. Chris Herrera Hashmap

Migrate from Netezza Workload Migration

Certified Big Data Hadoop and Spark Scala Course Curriculum

Khadija Souissi. Auf z Systems November IBM z Systems Mainframe Event 2016

A Tutorial on Apache Spark

Accelerating Spark Workloads using GPUs

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

A GridGain Systems In-Memory Computing White Paper

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Perform Database Actions Using Java 8 Stream Syntax Instead of SQL. Emil Forslund Java Developer Speedment, Inc.

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Welcome to the topic of SAP HANA modeling views.

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM

Cloud Programming on Java EE Platforms. mgr inż. Piotr Nowak

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

CSE 344 Final Review. August 16 th

Accelerate Big Data Insights

JDBC, Transactions. Niklas Fors JDBC 1 / 38

Distributed ACID Transac2ons in Apache Ignite

Why use a database? You can query the data (run searches) You can integrate with other business systems that use the same database You can store huge

Chair of Software Engineering. Java and C# in Depth. Prof. Dr. Bertrand Meyer. Exercise Session 9. Nadia Polikarpova

BUSINESS INTELLIGENCE LABORATORY. Data Access: Relational Data Bases. Business Informatics Degree

Agenda. Apache Ignite Project Apache Ignite Data Fabric: Data Grid HPC & Compute Streaming & CEP Hadoop & Spark Integration Use Cases Demo Q & A

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng.

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Going Big Data on Apache Spark. KNIME Italy Meetup

1

Hadoop. Introduction / Overview

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Transcription:

The Many Faces Of Apache Ignite David Robinson, Software Engineer May 13, 2016

A Face In elementary geometry, a face is a two-dimensional polygon on the boundary of a polyhedron. 2 Attribution:Robert Webb's Stella software http://www.software3d.com/stella.php

Some Faces of Apache Ignite Data Streaming SQL Transactions Services File System Data Grid 3 Persistence Clusters Spark Integration Attribution:Robert Webb's Stella software http://www.software3d.com/stella.php

Background The Market, Apache Ignite, A Use Case 4

Understanding the In Memory Eco-System Fabrics In-Memory Database Apache Ignite Distributed Caches Redis Memcached Data Grid Hazelcast Alluxio(?) Dashdb SAP Hana The distinctions may be blurring coming down to performance and scale 5

Apache Ignite Forms a Cluster All those faces potentially running on each node 6 Source: http://preview.tinyurl.com/hzvq5m6

What Is Genesis Graph? Running Today (short demo) A Property Graph Database built on Apache Ignite Basic Pieces Vertex Edge Vertex Properties Edge Properties 7

Leveraging Capabilities in the Grid less usage Genesis Graph DB Today planned more usage 8

How the Apache Ignite Grid Is Used For Genesis Graph towards a market leading (open governance/source) graph database store 9

Apache Ignite And Building A Big Data Graph Database Capabilities to construct a graph database. ID Generation Data representation and storage Multi-model + Analytics Integration Data Streaming and Eventing Transactions Partition awareness Fringe Benefits Keeping all, or large parts, of the graph in memory Notebook Integration Available for Data Scientists Real Time Graphs with the streaming 10

Future ID Generation On The Ignite Grid genesis graph Apache genesis Ignite Grid genesis Computing genesis Framework graph graph Custom AtomicID Service graph genesis graph get an id write a vertex Genesis Graph Client Atomics in Ignite are distributed across the cluster, essentially enabling performing atomic operations (such as increment-and-get or compare-and-set) with the same globally-visible value 11 Slide contains animation

Graph Storage On The Ignite Grid Ignite indexes index index partitioned cache index partitioned cache index partitioned cache index index partitioned cache partition aware index index Apache Ignite Grid Computing Framework read / write through to disk partitioned cache Genesis Graph Client H2 Cassandra HBase 12 Partitioned (with back up?) cache Off heap memory Write and read through persistence Slide contains animation

The Challenges Of Data Locality vertex Ex: hotel Key, Value Ex: name, hyatt network 13 This slide has automation

Forcing Data Locality through Affinity Keys vertex Key, Value Affinity Interface mapkeytonode(k key) int[] allpartitions(clusternode n) network 14 Co-location is required to use the Ignite SQL Join capability This slide has automation

Data Representation And Storage Challenges The Graph will need to implement its own, graph level indexes Ignite Hash Map data structure is inefficient at large scales public class InternalVertex implements Serializable { /** vertex id (indexed). */ @QuerySqlField(index = true) public Long id; /** ability to query via Ignite */ @QuerySqlField public String label;... Most efficient for query would be to inject new fields into this as user defines schema 15 This slide has automation

Data Representation And Storage Challenges The Graph will need to implement its own, graph level indexes Ignite Hash Map data structure is inefficient at large scales public class InternalVertex implements Serializable { /** vertex id (indexed). */ @QuerySqlField(index = true) public Long id; /** ability to query via Ignite */ @QuerySqlField public String label;... public class UserVertexIndex implements Serializable { /** vertex id (indexed). */ @QuerySqlField(index = true) public String name; @QuerySqlField(index=true) public Object value;... 16 Next idea is to auto generate beans that represent? indexes and let Ignite efficiently handle the indexing

Data Representation And Storage Challenges Tuning TinkerPop 3.x Strategies To Match the Storage Model Custom steps and strategies? Gremlin: g.e().has("since", "2005").fill(m); select * from edgestorecache where since=2005 17

Creating A Cache For the Graph public void opengraphvertexcache() { String namespacedcachename = getnamespacedcachename(ggdefinitions.genesisgraph_vertexcache_prefix); CacheConfiguration<Long, InternalVertex> cfg = new CacheConfiguration<>(namespacedCacheName); // we want to support transactions on all of our caches // this does not rule out atomic updates outside of a transaction cfg.setatomicitymode(cacheatomicitymode.transactional); cfg.setcachemode(cachemode.partitioned); // NOTE: the index here must be key/value pairs (in twos) // cfg.setindexedtypes(affinitykey.class, InternalVertex.class); cfg.setindexedtypes(long.class, InternalVertex.class); // must force close transactions because we cannot stop caches with open transactions IgniteTransactions txcontainer = this.igniteclientconnection.gethandletotxinterface(); if (txcontainer!= null) { Transaction atx = txcontainer.tx(); if (atx!= null) { if (atx.state().ordinal() == TransactionState.ACTIVE.ordinal()) { atx.commit(); } } } IgniteCache<Long, InternalVertex> internalvertexcache = this.igniteclientconnection.ignite.getorcreatecache(cfg); } // add the new cache into the list of caches to be closed this.cachesallocated.put(namespacedcachename, internalvertexcache); 18

Multi-Model + Analytic Processing Integration Spark RDDs Gremlin Graph Traversals SQL Property Queries data streaming 19

Analytic Processing: Spark Example scala> import org.apache.tinkerpop.gremlin.ignitegraph.structure.internal._ import org.apache.tinkerpop.gremlin.ignitegraph.structure.internal._ scala> val ic = new IgniteContext[Integer, InternalVertex](sc, () => new IgniteConfiguration()) ic: org.apache.ignite.spark.ignitecontext[integer,org.apache.tinkerpop.gremlin.ignitegraph.structure.internal.internalvertex] = org.apache.ignite.spark.ignitecontext@713935c8 scala> val vertices = sharedrdd.collect() vertices: Array[(Integer, org.apache.tinkerpop.gremlin.ignitegraph.structure.internal.internalvertex)] = Array((1,InternalVertex [id=1, collocateid=1, label=person, ]), (2,InternalVertex [id=2, collocateid=1, label=person, ]), (3,InternalVertex [id=3, collocateid=1, label=person, ]), (4,InternalVertex [id=4, collocateid=1, label=address, ]), (5,InternalVertex [id=5, collocateid=1, label=phonenumber, ])) scala> sharedrdd.foreach(println) scala> vertices.foreach(println) (1,InternalVertex [id=1, collocateid=1, label=person, ]) (2,InternalVertex [id=2, collocateid=1, label=person, ]) (3,InternalVertex [id=3, collocateid=1, label=person, ]) (4,InternalVertex [id=4, collocateid=1, label=address, ]) (5,InternalVertex [id=5, collocateid=1, label=phonenumber, ]) 20

Analytic Processing: SQL Example private void dowork() { String JDBCSTRING = "jdbc:ignite:cfg://cache=ignitegraph1graphvertexcache@file:/users/graphie/downloads/apacheignite/ignite-fabric-1.5.0.final/david/ david-ignite.xml"; try { // Register JDBC driver. Class.forName("org.apache.ignite.IgniteJdbcDriver"); // Open JDBC connection (cache name is not specified, which means that we use default cache). Connection conn = DriverManager.getConnection(JDBCSTRING); Statement stmt1 = conn.createstatement(); ResultSet rs = stmt1.executequery("select * from internalvertex"); while (rs.next()) { System.out.println("Id "+rs.getlong("id")+" Label "+rs.getstring("label")); } stmt1.close(); conn.close(); } catch (Exception e) { e.printstacktrace(); } Id 3 Label person Id 1 Label person Id 2 Label person Id 4 Label address Id 5 Label phonenumber 21

Apache Ignite And Building A Big Data Graph Database Capabilities to construct a graph database. ID Generation Data representation and storage Multi-model Data Streaming and Eventing Transactions Partition awareness Fringe Benefits Keeping all, or large parts, of the graph in memory Notebook Integration Available for Data Scientists Real Time Graphs with the streaming 22

Partition Awareness On The Ignite Grid vertex vertex cache property vertex property cache metaprop cache Ignite internals Can also be off heap rather than same JVM Apache Ignite JVM Data location can be controlled via Affinity Keys in Ignite Compute can also be co-located 23

Genesis Graph Visualization Visualization becomes much easier with all of the possible ways to access the graph data Gremlin Server Integration or Other Data Integration 24 UK to France International Air Routes Attribution: Graham Wallis, IBM

Genesis Graph Visualization Airports Sized By Number Of Routes Via Gremlin Server Interface 25 Attribution: Graham Wallis, IBM