Apply Graph and Deep Learning to Recommendation and Network Intrusion Detection

Size: px
Start display at page:

Download "Apply Graph and Deep Learning to Recommendation and Network Intrusion Detection"

Transcription

1 Apply Graph and Deep Learning to Recommendation and Network Intrusion Detection Zhe Wu, Ph.D. Architect Oracle Spatial and Graph June 22, 2017

2 Outline Introduction and overview of graph technologies Architecture of Oracle s Support Use cases: - Recommender System - Network Intrusion Detection Summary 2

3 Graph Data Model What is a graph? Data model representing entities as vertices and relationships as edges Optionally including attributes Also known as linked data What are typical graphs? C A F B D Social Networks LinkedIn, Facebook, Google+, Twitter,... Physical networks, Supplier networks,... Knowledge Graphs Apple SIRI, Google Knowledge Graph,... E 3

4 Graph Data Model Why are graphs popular? Easy data modeling whiteboard friendly Flexible data model No predefined schema, easily extensible Particularly useful for sparse data Insight from graphical representation Intuitive visualization Enabling new kinds of analysis C A F B E D 4

5 Architecture of Existing Support Graph Analytics Data Access Layer Parallel In-Memory Graph Analytics (PGX) Java APIs Apache Blueprints & Lucene/SolrCloud REST/Web Service Java, Groovy, Python, Oracle Spatial and Graph Oracle Database 12.2 Java APIs/JDBC/SQL/PLSQL Oracle Big Data Spatial and Graph Apache HBase Oracle NoSQL Database Property graph formats supported RDF (RDF/XML, N- GraphML Triples, N-Quads, GML TriG,N3,JSON) Graph-SON Flat Files CSV Relational Data Sources 5

6 Oracle Differentiators -- Graph Complete, Supported, Graph Solution: Storage: NoSQL, Hbase, RDBMS back-ends Data Access: Blueprints, Java, Query Language (PGQL) Rich Graph Analytics: 40 pre-built, in-memory graph algorithms Scalable: Analyze billion edge graph in memory on single BDA node Persist extremely large graphs on disk Security: Secure NoSQL, Kerberos CDH, Label Security on Oracle Database 12c 10-50x Faster than graph analysis competitors

7 The Data Model name= marko age = 29 weight= knows 2 8 name = vadas age = 27 9 weight=0.4 created weight=1.0 knows name= lop lang = java created 5 3 created weight=0.4 weight=1.0 name= ripple lang = java name= josh age = 32 created weight= name= peter age = 35 A set of vertices (or nodes) each vertex has a unique identifier. each vertex has a set of in/out edges. each vertex has a collection of key-value properties. A set of edges (or links) each edge has a unique identifier. each edge has a head/tail vertex. each edge has a label denoting type of relationship between two vertices. each edge has a collection of key-value properties. 7

8 Categories of Graph Wordload Computational Graph Analytics Compute values on vertices and edges Traversing graph or iterating over graph (usually repeatedly) Procedural logic Examples: Shortest Path, PageRank, Weakly Connected Components, Centrality,... Graph Pattern Matching Based on description of pattern Find all matching sub-graphs :Person{100} name = Amber age = 25 :friendof {2513} since = 08/01/2014 :Person{200} name = Paul age = 30 :worksat{1831} startdate = 09/01/2015 :friendof{1173} :knows{2200} :Company{777} name = Oracle location = Redwood City :Person{300} name = Heather age = 27 8

9 Examples for Graph Analysis Community detection and influencer analysis Churn risk analysis/targeted marketing, HR Turnover analysis Product recommendation Collaborative filtering, clustering Anomaly detection Social Network Analysis (spam detection), fraud detection in healthcare Path analysis and reachability Outage analysis in utilities networks, vulnerability analysis in IP networks, Panama Papers Pattern matching Tax fraud detection, data extraction 9

10 Build Recommender System with Graph Technologies

11 Building a Recommender System -- with Oracle Big Data Spatial and Graph Environment Oracle Big Data Lite VM Oracle Big Data Spatial and Graph v SolrCloud 4.10.x A user-item property graph Vertices (items, descriptions, and users) Edges (linking users and items) Recommendation: you may also like

12 Building a Recommender System -- with Oracle Big Data Spatial and Graph BDSG offers multiple approaches and they can be mixed together Content-based filtering Match item description Match user profile Relevancy ranking Collaborative filtering People liked similar items in the past will like similar items in the future Personalized Page Ranking Randomly navigate from a user to a product, then back to a user, Randomly jump to starting point(s) A B C u v w x A u u B B w w C

13 Personalized Page Rank-based Recommender System Random walk with restart Reference:

14 Key API for Personalized Page Rank API: ppr=analyst.personalizedpagerank ( pgxgraph, vertexset, /*max error*/, 0.85 /*damping factor*/, 1000 ); Result: ppr.gettopkvalues() it=ppr.gettopkvalues(9).iterator(); while (it.hasnext()) { entry=it.next(); vid=entry.getkey().getid(); System.out.format("ppr=%.4f vertex=%s\n", entry.getvalue(), opg.getvertex(vid));} ppr= vertex=vertex ID 1 {name:str:john, age:int:10} ppr= vertex=vertex ID 11 {type:str:prod, desc:str:kindle Fire} ppr= vertex=vertex ID 10 {type:str:prod, desc:str:iphone5, released:dat:sat Jan 21 00:02:00 EST 2012} 14

15 Recommendation with C.F. A recursive graph algorithm solves taste signature of both customers and items Matrix factorization Graph intuition A customer s taste signature is defined by what he/she likes An item s taste signature is (recursively) defined by who likes it [ ] [ ] [ ] [ ] [ ] [ ] Customer [ ] Item

16 Demo: Build Recommender System with Graph Technologies

17 Network Intrusion Detection with Deep Learning (Skymind /DL4J) and Big Data Spatial and Graph

18 Network Intrusion Detection (NID) Goal: determine if a network activity is legitimate or fraudulent (malicious) Input Data: a sequence of network activity for a machine on a corporate network NID is similar to some other anomaly detection problems Financial fraud Breakdown detection in Vehicles Manufacturing equipment Datacenter servers

19 Network Intrusion Detection (NID) Some characteristics Most of the input data is legitimate Cannot trust first line of defense for NID: corporate firewalls Corporation may have 10s of thousands of machines Hard to monitor them all Breaches are extremely costly Two basic approaches to NID Signature based: we have a labeled dataset of known attacks (supervised learning) Anomaly based: we don t know what attacks look like Most effective NID systems use a hybrid of them, as well as rules engines

20 Oracle BDSG and DeepLearning4J Integration Achitecture Graph Database (BDSG and Oracle Spatial and Graph) Graph Database (BDSG and Oracle Spatial and Graph)

21 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Labeled Network data set UNSW-NB15 data set for Network Intrusion Detection systems Created by IXIA PerfectStorm tool in Cyber Range Lab of Australian Centre for Cyber Security A mix of Real modern normal activities, and Synthetic contemporary attack behaviors Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)."military Communications and Information Systems Conference (MilCIS), IEEE, Moustafa, Nour, and Jill Slay. "The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set." Information Security Journal: A Global Perspective (2016): 1-14.

22 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Understand the data Features of UNSW-NB15 data set 49 original features

23 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization One round of clean up. Ports should be all integer based, however, there are Hex values Action: convert them back to decimal

24 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Understand the data & define transformations Categorical to One Hot transformation Service dns becomes

25 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Executed transformations with Scala & Apache Spark using Oracle s Big Data stack Save the RDD back to CSV format

26 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Built a Multi-Layer Perceptron (MLP) Neural Network

27 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Tested the quality of MLP NN After 800 iterations of training Accuracy: Labeled as non-intrusion classified as non-intrusion : 46 times Labeled as intrusion classified as non-intrusion : 1 time Labeled as intrusion classified as intrusion : 6 times ((46+6)/(46+6+1) = ) Long Short-Term Memory (LSTM) NN gave similar accuracy result

28 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization A Single GPU: GTX 970 (1664 CUDA cores, 4GB device RAM) 2-Quad core Intel CPUs (Xeon E GHz) CUDA NN Training Performance Improvement GPU over CPUs 7x x MLP LSTM

29 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Converted CSV to a (Oracle defined flat file.opv/.ope) Model each IP as a vertex Model each record (traffic from a source IP to a destination IP) as an edge 60+ Features become properties of edges Utility provided in BDSG OraclePropertyGraphUtilsBase.convertCSV2OPV OraclePropertyGraphUtilsBase.convertCSV2OPE Example CSV file 1,John,4.2,30 2,Mary,4.3,32 3,"Skywalker, Anakin",5.0,46 4,"Darth Vader",5.0,46 5,"Skywalker, Luke",5.0,53 Example output.opv file 1,name,1,John,, 1,score,4,,4.2, 1,age,2,,30, 2,name,1,Mary,, 2,score,4,,4.3, 2,age,2,,32,

30 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Utilized the built-in parallel graph data loader A single API call to loaddata method OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loaddata(opg, <PATH>/net_intrusion.opv", <PATH>/net_intrusion.ope, 8 // 8 threads ); Oracle Big Data Spatial and Graph Apache HBase Oracle NoSQL Database

31 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Network Intrusion Detection Blue edges: malicious Other edges: normal traffic Many attacks originated from to target Visualization tool: Cytoscape v Big Data Spatial and Graph v2.1

32 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Focused on Attacks graph

33 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Focused on Attacks graph

34 Dataset selection Data Cleansing & preparation Train Neural Network model Generate Load into BDSG Graph Visualization Focused on Attacks graph Applied built-in analytics in BDSG Found top-3 IP addresses with highest Page Rank value

35 Summary Graph capabilities in Oracle Big Data Spatial and Graph and Oracle Spatial and Graph Graph databases are powerful tools, complementing relational databases Especially strong for analysis of graph topology and multi-hop relationships Graph analytics offer new insight Especially relationships, dependencies and behavioural patterns Oracle Big Data Spatial and Graph (BDSG) and Oralce Spatial and Graph (OSG) offers Comprehensive analytics through various APIs, integration with relational database Scaleable, parallel in-memory processing Secure and scaleable graph storage using Oracle NoSQL or Hbase or Oracle Database Runs on commodity hardware or BDA, both on-premise or in the Cloud 35

36 Resources Oracle Spatial and Graph oracle.com/technetwork/database/options/spatialandgraph Oracle Big Data Spatial and Graph oracle.com/database/big-data-spatial-and-graph/index.html

Build Recommender Systems, Detect Network Intrusion, and Integrate Deep Learning with Graph Technologies

Build Recommender Systems, Detect Network Intrusion, and Integrate Deep Learning with Graph Technologies Build Recommender Systems, Detect Network Intrusion, and Integrate Deep Learning with Graph Technologies Zhe Wu Chris Nicholson Charlie Berger Architect Oracle CEO Skymind Senior Director Oracle BIWA 2017

More information

Deep Learning und Graphenanalyse im Einsatz gegen Hacker

Deep Learning und Graphenanalyse im Einsatz gegen Hacker Deep Learning und Graphenanalyse im Einsatz gegen Hacker Hans Viehmann Product Manager EMEA ORACLE Corporation DOAG Konferenz 2017 @SpatialHannes Safe Harbor Statement The following is intended to outline

More information

Analyzing a social network using Big Data Spatial and Graph Property Graph

Analyzing a social network using Big Data Spatial and Graph Property Graph Analyzing a social network using Big Data Spatial and Graph Property Graph Oskar van Rest Principal Member of Technical Staff Gabriela Montiel-Moreno Principal Member of Technical Staff Safe Harbor Statement

More information

Graph Databases nur ein Hype oder das Ende der relationalen Welt? DOAG 2016

Graph Databases nur ein Hype oder das Ende der relationalen Welt? DOAG 2016 Graph Databases nur ein Hype oder das Ende der relationalen Welt? DOAG 2016 Hans Viehmann Product Manager EMEA 15. November 2016 Safe Harbor Statement The following is intended to outline our general product

More information

Overview of Oracle Big Data Spatial and Graph Property Graph

Overview of Oracle Big Data Spatial and Graph Property Graph Overview of Oracle Big Data Spatial and Graph Property Graph Zhe Wu, Ph.D. Architect Oracle Spatial and Graph Jan, 2016 Copyright 2014 Oracle and/or its affiliates. All rights reserved. The following is

More information

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Analysing the Panama Papers with Oracle Big Data Spatial and Graph speakerdeck.com/rmoff/ Analysing the Panama Papers with Oracle Big Data Spatial and Graph BIWA Summit 2017 Robin Moffatt, Rittman Mead 1 Robin Moffatt! Head of R&D, Rittman Mead Previously OBIEE/DW developer

More information

Graph Analytics and Machine Learning A Great Combination Mark Hornick

Graph Analytics and Machine Learning A Great Combination Mark Hornick Graph Analytics and Machine Learning A Great Combination Mark Hornick Oracle Advanced Analytics and Machine Learning November 3, 2017 Safe Harbor Statement The following is intended to outline our research

More information

Introduction to Graph Analytics and Oracle Cloud Service

Introduction to Graph Analytics and Oracle Cloud Service Introduction to Graph Analytics and Oracle Cloud Service Hans Viehmann Jean Ihm Korbi Schmid Product Manager EMEA Product Manager US Engineering Manager Oracle Oracle Oracle @SpatialHannes @JeanIhm October

More information

Session 7: Oracle R Enterprise OAAgraph Package

Session 7: Oracle R Enterprise OAAgraph Package Session 7: Oracle R Enterprise 1.5.1 OAAgraph Package Oracle Spatial and Graph PGX Graph Algorithms Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017

More information

Analyzing Blockchain and Bitcoin Transaction Data as Graph

Analyzing Blockchain and Bitcoin Transaction Data as Graph Analyzing Blockchain and Bitcoin Transaction Data as Graph Zhe Wu alan.wu@oracle.com, Ph.D. Architect Oracle Spatial and Graph Feb 2018 Safe Harbor Statement The following is intended to outline our general

More information

Oracle Big Data Spatial and Graph Property Graph: Features and Performance ORACLE TECHNICAL WHITEPAPER DECEMBER 2017

Oracle Big Data Spatial and Graph Property Graph: Features and Performance ORACLE TECHNICAL WHITEPAPER DECEMBER 2017 Oracle Big Data Spatial and Graph Property Graph: Features and Performance ORACLE TECHNICAL WHITEPAPER DECEMBER 2017 Table of Contents INTRODUCTION... 2 ORACLE BIG DATA SPATIAL AND GRAPH PROPERTY GRAPH

More information

Analyzing Blockchain and Bitcoin Transaction Data as Graph

Analyzing Blockchain and Bitcoin Transaction Data as Graph Analyzing Blockchain and Bitcoin Transaction Data as Graph Xavier Lopez Senior Director Zhe Wu Architect Oracle Code Boston April 17th, 2018 Copyright 2015 Oracle and/or its affiliates. All rights reserved.

More information

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing School of Engineering and Information Technology University of New South Wales @ Canberra Nour Moustafa, Gideon Creech,

More information

Efficient and Scalable Friend Recommendations

Efficient and Scalable Friend Recommendations Efficient and Scalable Friend Recommendations Comparing Traditional and Graph-Processing Approaches Nicholas Tietz Software Engineer at GraphSQL nicholas@graphsql.com January 13, 2014 1 Introduction 2

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group

More information

Spatial Analytics Built for Big Data Platforms

Spatial Analytics Built for Big Data Platforms Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Using Graphs to Analyze Big Linked Data

Using Graphs to Analyze Big Linked Data Using Graphs to Analyze Big Linked Data Hassan Chafi, Director, Research and Advanced Development Oracle Labs Copyright 2014 Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The

More information

Analyzing Flight Data

Analyzing Flight Data IBM Analytics Analyzing Flight Data Jeff Carlson Rich Tarro July 21, 2016 2016 IBM Corporation Agenda Spark Overview a quick review Introduction to Graph Processing and Spark GraphX GraphX Overview Demo

More information

E6895 Advanced Big Data Analytics Lecture 4:

E6895 Advanced Big Data Analytics Lecture 4: E6895 Advanced Big Data Analytics Lecture 4: Data Store Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Chief Scientist, Graph Computing, IBM Watson Research

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Graph Database and Analytics in a GPU- Accelerated Cloud Offering

Graph Database and Analytics in a GPU- Accelerated Cloud Offering Graph Database and Analytics in a GPU- Accelerated Cloud Offering - Blazegraph GPU @ Cirrascale Cloud Brad Bebee, CEO, Blazegraph Dave Driggers, Chief Executive and Technical Officer, Cirrascale Corporation

More information

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Combining Graph and Machine Learning Technology using R

Combining Graph and Machine Learning Technology using R Combining Graph and Machine Learning Technology using R Hassan Chafi Oracle Labs Mark Hornick Oracle Advanced Analytics February 2, 2017 Safe Harbor Statement The following is intended to outline our research

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

McAfee Virtual Network Security Platform 8.4 Revision A

McAfee Virtual Network Security Platform 8.4 Revision A 8.4.7.101-8.3.7.18 Manager-Virtual IPS Release Notes McAfee Virtual Network Security Platform 8.4 Revision A Contents About this release New features Enhancements Resolved issues Installation instructions

More information

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Hans Viehmann Product Manager EMEA ORACLE Corporation 12. Mai 2016 Safe Harbor Statement The following

More information

white paper Aster Data ncluster In - database Analytics with R

white paper Aster Data ncluster In - database Analytics with R white paper Aster Data ncluster In - database Analytics with R Contents Introduction to Aster Data ncluster and SQL-MapReduce... 3 R in Aster Data ncluster... 3 Proprietary Scoring using R without In-database

More information

McAfee Network Security Platform 8.3

McAfee Network Security Platform 8.3 8.3.7.28-8.3.7.6 Manager-Virtual IPS Release Notes McAfee Network Security Platform 8.3 Revision B Contents About this release New features Enhancements Resolved issues Installation instructions Known

More information

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018 NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Warehouse- Scale Computing and the BDAS Stack

Warehouse- Scale Computing and the BDAS Stack Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

McAfee Network Security Platform 8.3

McAfee Network Security Platform 8.3 8.3.7.28-8.3.3.9 Manager-Mxx30-series Release Notes McAfee Network Security Platform 8.3 Revision C Contents About this release New features Enhancements Resolved issues Installation instructions Known

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Apache Spark Graph Performance with Memory1. February Page 1 of 13

Apache Spark Graph Performance with Memory1. February Page 1 of 13 Apache Spark Graph Performance with Memory1 February 2017 Page 1 of 13 Abstract Apache Spark is a powerful open source distributed computing platform focused on high speed, large scale data processing

More information

Spark, Shark and Spark Streaming Introduction

Spark, Shark and Spark Streaming Introduction Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References

More information

Introduction to NoSQL by William McKnight

Introduction to NoSQL by William McKnight Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their

More information

Detect Cyber Threats with Securonix Proxy Traffic Analyzer

Detect Cyber Threats with Securonix Proxy Traffic Analyzer Detect Cyber Threats with Securonix Proxy Traffic Analyzer Introduction Many organizations encounter an extremely high volume of proxy data on a daily basis. The volume of proxy data can range from 100

More information

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle NoSQL Database and Cisco- Collaboration that produces results 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. What is Big Data? SOCIAL BLOG SMART METER VOLUME VELOCITY VARIETY

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

Socrates: A System for Scalable Graph Analytics C. Savkli, R. Carr, M. Chapman, B. Chee, D. Minch

Socrates: A System for Scalable Graph Analytics C. Savkli, R. Carr, M. Chapman, B. Chee, D. Minch Socrates: A System for Scalable Graph Analytics C. Savkli, R. Carr, M. Chapman, B. Chee, D. Minch September 10, 2014 Cetin Savkli Cetin.Savkli@jhuapl.edu 240 228 0115 Challenges of Big Data & Analytics

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

McAfee Network Security Platform 9.2

McAfee Network Security Platform 9.2 McAfee Network Security Platform 9.2 (9.2.7.9-9.2.7.17 Manager-Virtual IPS Release Notes) Contents About this release New features Enhancements Resolved issues Installation instructions Known issues Product

More information

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F.

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F. Applied Spark From Concepts to Bitcoin Analytics Andrew F. Hart ahart@apache.org @andrewfhart My Day Job CTO, Pogoseat Upgrade technology for live events 3/28/16 QCON-SP Andrew Hart 2 Additionally Member,

More information

McAfee Network Security Platform

McAfee Network Security Platform Revision B McAfee Network Security Platform (9.2.9.3-9.2.5.34 Manager-NS3500 Release Notes) Contents About this release New Features Resolved issues Installation instructions Known issues Product documentation

More information

Databricks, an Introduction

Databricks, an Introduction Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,

More information

Big Data Technologies and Geospatial Data Processing:

Big Data Technologies and Geospatial Data Processing: Big Data Technologies and Geospatial Data Processing: A perfect fit Albert Godfrind Spatial and Graph Solutions Architect Oracle Corporation Agenda 1 2 3 4 The Data Explosion Big Data? Big Data and Geo

More information

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

SentinelOne Technical Brief

SentinelOne Technical Brief SentinelOne Technical Brief SentinelOne unifies prevention, detection and response in a fundamentally new approach to endpoint protection, driven by behavior-based threat detection and intelligent automation.

More information

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals

More information

Architectures for Scalable Media Object Search

Architectures for Scalable Media Object Search Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation

More information

NVIDIA PLATFORM FOR AI

NVIDIA PLATFORM FOR AI NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Graph and Link Mining

Graph and Link Mining Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Rapid growth of massive datasets

Rapid growth of massive datasets Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,

More information

Matrix Computations and " Neural Networks in Spark

Matrix Computations and  Neural Networks in Spark Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

Distributed Graph Storage. Veronika Molnár, UZH

Distributed Graph Storage. Veronika Molnár, UZH Distributed Graph Storage Veronika Molnár, UZH Overview Graphs and Social Networks Criteria for Graph Processing Systems Current Systems Storage Computation Large scale systems Comparison / Best systems

More information

Network Security Platform 8.1

Network Security Platform 8.1 8.1.7.100-8.1.3.130 Manager-M-series Release Notes Network Security Platform 8.1 Revision A Contents About this release New features Enhancements Resolved issues Installation instructions Known issues

More information

Deep Learning Frameworks with Spark and GPUs

Deep Learning Frameworks with Spark and GPUs Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,

More information

Network Security Platform 8.1

Network Security Platform 8.1 8.1.7.91-8.1.3.40 NTBA Appliance Release Notes Network Security Platform 8.1 Revision B Contents About this release New features Enhancements Resolved issues Installation Instructions Known issues Product

More information

Network Security Platform 8.1

Network Security Platform 8.1 8.1.7.5-8.1.3.10 NTBA Appliance Release Notes Network Security Platform 8.1 Revision B Contents About this release New features Enhancements Resolved issues Installation instructions Known issues Find

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

MapReduce, Hadoop and Spark. Bompotas Agorakis

MapReduce, Hadoop and Spark. Bompotas Agorakis MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

Real-Time Deep-Link Analytics for Big Graphs. Challenges and Solutions

Real-Time Deep-Link Analytics for Big Graphs. Challenges and Solutions Real-Time Deep-Link Analytics for Big Graphs Challenges and Solutions Victor Lee, Sr. Product Manager BigGraph Meetup October 11, 2017 Welcome to We meet to network, share, discuss, and invent together

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Processing of big data with Apache Spark

Processing of big data with Apache Spark Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

MySQL CLOUD SERVICE. Propel Innovation and Time-to-Market

MySQL CLOUD SERVICE. Propel Innovation and Time-to-Market MySQL CLOUD SERVICE Propel Innovation and Time-to-Market The #1 open source database in Oracle. Looking to drive digital transformation initiatives and deliver new modern applications? Oracle MySQL Service

More information

Gotcha! Network Analytics to augment Fraud Detection Big Data in the Food Chain: the un(der)explored goldmine?

Gotcha! Network Analytics to augment Fraud Detection Big Data in the Food Chain: the un(der)explored goldmine? Gotcha! Network Analytics to augment Fraud Detection Big Data in the Food Chain: the un(der)explored goldmine? December 4th, 2018 Author: Véronique Van Vlasselaer SAS Pre-Sales Analytical Consultant Introduction

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

McAfee Network Security Platform 9.2

McAfee Network Security Platform 9.2 McAfee Network Security Platform 9.2 (9.2.7.22-9.2.7.20 Manager-Virtual IPS Release Notes) Contents About this release New features Enhancements Resolved issues Installation instructions Known issues Product

More information

Network Security Platform 8.1

Network Security Platform 8.1 8.1.7.96-8.1.3.130 Manager-M-series Release Notes Network Security Platform 8.1 Revision A Contents About this release New features Enhancements Resolved issues Installation instructions Known issues Product

More information

A Tutorial on Apache Spark

A Tutorial on Apache Spark A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:

More information

CSE 444: Database Internals. Lecture 23 Spark

CSE 444: Database Internals. Lecture 23 Spark CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks

More information

Big Data Analytics. Description:

Big Data Analytics. Description: Big Data Analytics Description: With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture

More information

Triple Stores in a Nutshell

Triple Stores in a Nutshell Triple Stores in a Nutshell Franjo Bratić Alfred Wertner 1 Overview What are essential characteristics of a Triple Store? short introduction examples and background information The Agony of choice - what

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

McAfee Network Security Platform 8.1

McAfee Network Security Platform 8.1 Revision A McAfee Network Security Platform 8.1 (8.1.7.105-8.1.5.219 Manager-NS-series Release Notes) Contents About this release New features Enhancements Resolved issues Installation instructions Known

More information