DataSToRM: Data Science and Technology Research Environment

Similar documents
An Advanced Graph Processor Prototype

Automated Code Generation for High-Performance, Future-Compatible Graph Libraries

GraphBLAS: A Programming Specification for Graph Analysis

Design and Implementation of the GraphBLAS Template Library (GBTL)

GraphBLAS Mathematics - Provisional Release 1.0 -

When Graph Meets Big Data: Opportunities and Challenges

Secure Multi-Party Computation of Probabilistic Threat Propagation

LLMORE: Mapping and Optimization Framework

Leveraging Data Provenance to Enhance Cyber Resilience

Sampling Large Graphs for Anticipatory Analysis

E6895 Advanced Big Data Analytics Lecture 4:

Social Behavior Prediction Through Reality Mining

Visual Analytics for cyber security and intelligence

ANNUAL REPORT Visit us at project.eu Supported by. Mission

Graph Database and Analytics in a GPU- Accelerated Cloud Offering

Urika: Enabling Real-Time Discovery in Big Data

Dynamic Distributed Dimensional Data Model (D4M) Database and Computation System

The Constellation Project. Andrew W. Nash 14 November 2016

Be Like Water: Applying Analytical Adaptability to Cyber Intelligence

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.

Fast Hardware For AI

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2

Harp-DAAL for High Performance Big Data Computing

Welcome to the Industrial Internet Forum

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

UNCLASSIFIED. FY 2016 Base FY 2016 OCO

Anomaly Detection in Very Large Graphs Modeling and Computational Considerations

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Analysis and Mapping of Sparse Matrix Computations

Active Archive and the State of the Industry

Sampling Operations on Big Data

Windows 10 IoT Overview. Microsoft Corporation

Cisco Digital Network Architecture The Network Enables Digital Business. Rene Andersen Cisco DK

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Interconnect Your Future

Mining Web Data. Lijun Zhang

Overview. A fact sheet from Feb 2015

Modern Database Architectures Demand Modern Data Security Measures

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

Building the Most Efficient Machine Learning System

On the Effectiveness of Type-based Control Flow Integrity

Challenges for Data Driven Systems

Toward a Memory-centric Architecture

SIEM Solutions from McAfee

RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures

Rapid growth of massive datasets

Pre-Requisites: CS2510. NU Core Designations: AD

Cluster-based 3D Reconstruction of Aerial Video

Large Scale Graph Solutions: Use-cases And Lessons Learnt

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

Current Threat Environment

Transformation in Technology Barbara Duck Chief Information Officer. Investor Day 2018

Certified Big Data and Hadoop Course Curriculum

Know your neighbours: Machine Learning on Graphs

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Gen-Z Overview. 1. Introduction. 2. Background. 3. A better way to access data. 4. Why a memory-semantic fabric

SMART. Investing in urban innovation

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

The Oracle Trust Fabric Securing the Cloud Journey

Copyright 2017 Ford Motor Company, All Rights Reserved

Overcoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper

ETL is No Longer King, Long Live SDD

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

IBM InfoSphere Information Analyzer

Data to Decisions Terminate, Tolerate, Transfer, or Treat

Big Data Management and NoSQL Databases

Build application-centric data centers to meet modern business user needs

Cray's YarcData claims early success for graph database appliance

SCALABLE COMPLEX GRAPH ANALYSIS WITH THE KNOWLEDGE DISCOVERY TOOLBOX

National Institute of Standards and Technology

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

IBM Watson Content Hub

Introduction of the Industrial Internet Consortium. May 2016

New Approach to Graph Databases

Implementing Executive Order and Presidential Policy Directive 21

UNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017

OpenCAPI and its Roadmap

CHARTING THE FUTURE OF SOFTWARE DEFINED NETWORKING

Convergence of Machine Learning, Big Data, and Supercomputing

EUW - Session 9 Metering and grid data management for grid optimisation and energy transition a raising role for DSOs

Industry Collaboration and Innovation

Oracle Big Data Discovery

FOR FINANCIAL SERVICES ORGANIZATIONS

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Delivering Integrated Cyber Defense for the Cloud Generation Darren Thomson

Delivering the Digital Institution

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Storage Networking Strategy for the Next Five Years

Data Reorganization Interface

UNCLASSIFIED. R-1 Program Element (Number/Name) PE D8Z / Software Engineering Institute (SEI) Applied Research. Prior Years FY 2013 FY 2014

PULLING OUR SOCS UP VODAFONE GROUP AT RSAC Emma Smith. Andy Talbot. Group Technology Security Director Vodafone Group Plc

AFRL-OSR-VA-TR

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

Healthcare IT Modernization and the Adoption of Hybrid Cloud

Securing the Internet of Things (IoT) at the U.S. Department of Veterans Affairs

Information, Decision, & Complex Networks AFOSR/RTC Overview

Transforming IT: From Silos To Services

Transcription:

The Future of Advanced (Secure) Computing DataSToRM: Data Science and Technology Research Environment This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering. Distribution Statement A: Approved for public release: distribution unlimited. 2018 Massachusetts Institute of Technology. Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work. Mr. Vitaliy Gleyzer MIT Lincoln Laboratory 5 March 2018

Advancing the State of Big Data Analytics: Raw Data to Insight How can the insight be presented to improve human cognition? Can different data help improve human understanding? Are there new application areas enabled by new analytics? Big Data Application presentation What new information should be collected? data analytics What new insight can be gained from the data? technologies DataSToRM - 2 How can data be stored/collected more efficiently? How can existing analytics be accelerated? Can analytics be modified to allow efficient implementation?

Large-Scale Graph Applications Today Applications Development Environment Drug Discovery Personalized Healthcare Hardware Platform Graph Analysis Frameworks and Databases Algorithms Visualization Toolkits Fraud Detection Recommender Systems Lincoln Laboratory Supercomputing Center D4M PageRank Centrality Walktrap InfoMap K-Truss Functional Brain Mapping Cyber Data Analysis BFS MST Diverse, quickly evolving ecosystem DataSToRM - 3

Advancing the State of Big Data Analytics: Challenges Technology moves quickly New algorithms and analytic techniques New storage solutions New processing technologies New database technologies and frameworks New applications New framework adoption is a serious investment How to leverage new technologies? How to enable co-design opportunities? How to integrate disparate communities to enable co-design? Keeping up with big data technology is challenging DataSToRM - 4

Example Standardization Efforts Applications Development Environment Drug Discovery Personalized Healthcare Hardware Platform Graph Analysis Frameworks and Databases Algorithms Visualization Toolkits Fraud Detection Recommender Systems Lincoln Laboratory Supercomputing Center GraphBLAS D4M Apache Gremlin PageRank Centrality Walktrap InfoMap K-Truss Functional Brain Mapping Cyber Data Analysis BFS MST Different communities are attempting to unify and standardize interface and languages in the ecosystem DataSToRM - 5

Example Standardization Efforts Drug Discovery Applications Personalized Healthcare Hardware Platform Development Backend Environment interface standardization enables hardware innovation! Graph Analysis Frameworks and Databases Algorithms Visualization Toolkits Fraud Detection Functional Brain Mapping Recommender Systems Cyber Data Analysis Lincoln Laboratory Supercomputing Center GraphBLAS D4M Apache Gremlin PageRank Centrality Walktrap InfoMap K-Truss BFS MST Parallel inmemory database 100 1000 graph edge traversal speed Simple linear algebra API Different communities are attempting to unify and standardize interface and languages in the ecosystem DataSToRM - 6

Vertices Vertices Unifying Principles for Big Data Graphs Graphs capture relationship information between entities Molecular forces Social interactions Semantic concepts Vehicle tracks Graphs can be fully expressed in the language of linear algebra Represented as sparse matrices Enable mathematic foundation for data analysis Leverage existing linear algebra techniques and methods Define a small set of well-defined mathematical operations Graph Representations Vertices Adjacency Matrix (N N) Edges Incidence matrix (N M) DataSToRM - 7

DataSToRM: Data Science and Technology Research Environment Applications Graph Analysis Kernels API Threat Detection Community Detection Sentiment Analysis Classification Recommender Engine Centrality Analysis GraphBLAS (Semi-ring Linear Algebra API) Composed of analytics Implemented on top of a standard API Enables hardware diversity Hardware Hardware acceleration of a small number of well-defined mathematical operations enable an extensive analytic ecosystem DataSToRM - 8

GraphBLAS Overview Five key operations A = NxM (i,j,v) (i,j,v) = A C = A B C = A C C = A B = A. B Can be used to build 12 GraphBLAS standard functions buildmatrix, extracttuples, Transpose, mxm, mxv, vxm, extract, assign, ewiseadd, ewisemult, apply, reduce Can be used to build a variety of graph utility functions Tril(), Triu(), Degreed Filtered BFS, Can be used to build a variety of graph algorithms K-Truss, Jaccard Coefficient, Non-Negative Matrix Factorization, That work on a wide range of graphs Hyper, multi-directed, multi-weighted, multi-partite, multi-edge Unifying interface for backend graph processing DataSToRM - 9

Lincoln Laboratory Technologies Targeting Large-Scale Graph Analytics Dynamic Distributed Dimensional Data Model (D4M) Data analysis framework based on associative array algebra Graph Algorithms Graph Algorithms D4M GraphBLAS Standard API for graph analytics using Sparse Linear Algebra primitives Concise language for complex graph analytics Mathematical closure Linear Algebra underpinning Graph Processor GraphBLAS LLSC Simple hardware agnostic API Graph Processor Novel graph processing architecture Scalable graph processing hardware architecture Unprecedented performance Native linear algebra instruction set Lincoln Laboratory Super Computing Center (LLSC) State-of-the-art super computing environment Heterogeneous processing capabilities Ideal technology integration environment DataSToRM - 10

Graph Processor Matrix Multiply Performance Traversed Edges Per Second 1E+14 10 14 1E+13 10 13 1E+12 10 12 1E+11 10 11 10 10 1E+10 10 1E+9 9 ASIC Graph Processor (Projected) FPGA Graph Processor (Measured) Cray XK7 Titan (Measured) Cray XT4 Franklin (Measured) 2017 1 Chassis System Graph Processor Performance 100x 1 Racks 4 Racks 16 Racks ASIC 64 Racks FPGA Today s State of the Art 10 1E+8 Embedded Data Center Applications Applications 2 Nodes 10 1E+7 1E+1 10 1 101E+2 2 10 3 1E+3 10 4 1E+410 1E+5 10 6 1E+6 10 7 10 1E+7 10 9 1E+8 1E+9 Watts Highly efficient graph processing technology that is 100s to 1000s of times more efficient compared to traditional architectures DataSToRM - 11

Collaboration Opportunities Developing graph algorithms in the language of linear algebra Community detections, subgraph isomorphism, subgraph matching, etc. Developing graph algorithms that can scale to datasets with billions to trillions of vertices Sparsity-aware, distributed memory algorithms Identifying or developing new technologies to leverage the linear algebra abstraction Compilers, optimizer, hardware accelerators, etc. Integrating GraphBLAS backend as part of popular frameworks e.g., Apache TinkerPop, Neo4j, ElasticSearch, etc. DataSToRM - 12