Accelerator Design for Big Data Processing Frameworks

Similar documents
An FPGA NIC Based Hardware Caching for Blockchain

An In-Kernel NOSQL Cache for Range Queries Using FPGA NIC

Accelerating Spark RDD Operations with Local and Remote GPU Devices

Accelerating Blockchain Transfer System Using FPGA-Based NIC

An FPGA-Based In-NIC Cache Approach for Lazy Learning Outlier Filtering

Flash Storage Complementing a Data Lake for Real-Time Insight

Big Data Analytics using Apache Hadoop and Spark with Scala

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Hiroki Matsutani Dept. of ICS, Keio University, Hiyoshi, Kohoku-ku, Yokohama, Japan

Big Data Architect.

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

Creating a Recommender System. An Elasticsearch & Apache Spark approach

10 Million Smart Meter Data with Apache HBase

Distributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet

Accelerating Blockchain Search of Full Nodes Using GPUs

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens

Fluentd + MongoDB + Spark = Awesome Sauce

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

BIG DATA COURSE CONTENT

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

XPU A Programmable FPGA Accelerator for Diverse Workloads

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet


Hadoop. Introduction to BIGDATA and HADOOP

Toward a Memory-centric Architecture

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Processing of big data with Apache Spark

High-Level Data Models on RAMCloud

Exam Questions

Challenges for Data Driven Systems

BigchainDB: A Scalable Blockchain Database. Trent McConaghy

HDInsight > Hadoop. October 12, 2017

Caribou: Intelligent Distributed Storage

Certified Big Data Hadoop and Spark Scala Course Curriculum

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Big Streaming Data Processing. How to Process Big Streaming Data 2016/10/11. Fraud detection in bank transactions. Anomalies in sensor data

COMPARATIVE EVALUATION OF BIG DATA FRAMEWORKS ON BATCH PROCESSING

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Altera SDK for OpenCL

3D WiNoC Architectures

Designing Hybrid Data Processing Systems for Heterogeneous Servers

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Hadoop Development Introduction

Integration of Machine Learning Library in Apache Apex

Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Stages of Data Processing

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark

PacketShader: A GPU-Accelerated Software Router

microsoft

Visual Analytics Sandbox: A big data platform for processing network traffic

A Building Block 3D System with Inductive-Coupling Through Chip Interfaces Hiroki Matsutani Keio University, Japan

CERN openlab & IBM Research Workshop Trip Report

Specialist ICT Learning

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Getting to know. by Michelle Darling August 2013

40Gbps+ Full Line Rate, Programmable Network Accelerators for Low Latency Applications SAAHPC 19 th July 2011

Scalability of web applications

Lambda Architecture with Apache Spark

Twitter data Analytics using Distributed Computing

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

CompSci 516: Database Systems

How to Use NoSQL in Enterprise Java Applications

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Lightweight KV-based Distributed Store for Datacenters

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

BlueDBM: An Appliance for Big Data Analytics*

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

The Hardware & Software Implications of Microservices and How Big Data Can Help

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

IMPLEMENTING A LAMBDA ARCHITECTURE TO PERFORM REAL-TIME UPDATES

The Evolution of Big Data Platforms and Data Science

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

E6885 Network Science Lecture 10: Graph Database (II)

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

Chapter 1, Introduction

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

An Introduction to Apache Spark

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

4 Myths about in-memory databases busted

Database Solution in Cloud Computing

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Distributed systems for stream processing

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

In-Memory Data processing using Redis Database

Oracle Exadata: Strategy and Roadmap

P51: High Performance Networking

Transcription:

Accelerator Design for Big Data Processing Frameworks Hiroki Matsutani Dept. of ICS, Keio University http://www.arc.ics.keio.ac.jp/~matutani July 5th, 2017 International Forum on MPSoC for Software-Defined Hardware (MPSoC'17) 1

2 processing Input stream data Big data (Surveillance, Network service, SNS, UAV, IoT) Message queuing Data exchange (serialization) Learning Database layer (Polyglot persistence) DB queries Customer analysis Topic prediction chain records Geolocation query processing requires time and thus recent data cannot be reflected to the analysis result Combine batch and stream processing to make up the realtime capability

+ Stream processing Input stream data Message queuing Stream Inference Database layer (Polyglot persistence) Realtime DB queries Big data (Surveillance, Network service, SNS, UAV, IoT) Data exchange (serialization) Learning Customer analysis Topic prediction chain records Geolocation query Nathan Marz, et.al., "Big Data: Principles and best practices of scalable realtime data systems", Manning Publications (2015). 3

+ Stream processing Input stream data Message queuing Stream Inference Database layer (Polyglot persistence) Realtime DB queries Big data (Surveillance, Network service, SNS, UAV, IoT) I/O intensive Data exchange (serialization) Learning Customer analysis Topic prediction chain records Geolocation query Compute intensive Message queuing middleware (Apache Kafka) Serialization (Apache Thrift) Stream processing (Apache Spark Streaming) KVS / Column DB (Redis, HBase) Online machine learning (Classification, Outlier detection, Change point detection, Abnormal behavior detection) Document DB (MongoDB) Graph DB (Neo4j), graph processing processing (Apache Spark) Machine learning (Apache Spark MLlib) Tight integration of I/O and compute FPGA Four 10GbE FPGA Massive parallelism Networked GPU cluster GPUs Switch Host 4

Acceleration for data store Input stream data Message queuing Stream Inference Database layer (Polyglot persistence) Realtime DB queries Big data (Surveillance, Network service, SNS, UAV, IoT) I/O intensive Data exchange (serialization) Learning Customer analysis Topic prediction chain records Geolocation query Compute intensive Message queuing middleware (Apache Kafka) Serialization (Apache Thrift) Stream processing (Apache Spark Streaming) KVS / Column DB (Redis, HBase) Online machine learning (Classification, Outlier detection, Change point detection, Abnormal behavior detection) Document DB (MongoDB) Graph DB (Neo4j), graph processing processing (Apache Spark) Machine learning (Apache Spark MLlib) Tight integration of I/O and compute FPGA Four 10GbE FPGA Massive parallelism Networked GPU cluster GPUs Switch Host 5

Multilevel NOSQL cache: FPGA NIC [IEEE HoTI 16] Normal DB Access Result User Space In-Kernel KVS Cache (L2$) Hit Request NIC Cache (L1$) Hit Reply Add NetFPGA-10G Device Driver In-Kernel KVS (Key-Value Store) Cache 512GB NetFPGA-10G NIC HW Cache Machine Learning DRAM 288MB 10GbE x4 6

Multilevel NOSQL cache: FPGA NIC Multilevel NOSQL cache: User Space L1 NOSQL cache FPGA-based hardware cache L2 NOSQL cache In-kernel software cache Normal DB Access Result Tradeoffs between capacity and speed: L1 In-Kernel NOSQL cache KVS Very fast/efficient but small L2 Cache NOSQL (L2$) cache Hit Fast and large Design space exploration [IEEE HoTI 16] Request NIC Cache (L1$) Hit Reply Add NetFPGA-10G Device Driver In-Kernel KVS (Key-Value Store) Cache 512GB NetFPGA-10G NIC HW Cache Machine Learning DRAM 288MB 10GbE x4 7

FPGA NIC Cache for chain chain A chain of blocks each contains transactions verified and shared by all the parties 1. Bob wants to send money to Alice Bob 2. TX(Bob Alice) is represented as a block 3. The block is broadcasted to all the nodes 4. After verified, the block is added to the blockchain 8

FPGA NIC Cache for chain IoT devices (SPV nodes) Cannot maintain whole the blockchain data (>100GB) due to resource limitation Simple payment verification Ask full node to check whether a transaction of interest has been completed or not 1. Alice wants to verify TX(Bob Alice) Full node Alice 2. Alice contacts a full node to verify it 9

FPGA NIC Cache for chain IoT devices (SPV nodes) The Cannot number maintain of IoT devices whole that the blockchain join blockchain data will (>100GB) increase due to resource limitation To reduce full node accesses from SPV nodes, FPGA Ask NIC full KVS node is to used check as cache whether of a blockchain transaction [HEART 17] of interest has been completed or not Simple payment verification 1. Alice wants to verify TX(Bob Alice) Full node Alice 2. Alice contacts a full node to verify it FPGA NIC Cache 10

Acceleration for data processing Input stream data Message queuing Stream Inference Database layer (Polyglot persistence) Realtime DB queries Big data (Surveillance, Network service, SNS, UAV, IoT) I/O intensive Data exchange (serialization) Learning Customer analysis Topic prediction chain records Geolocation query Compute intensive Message queuing middleware (Apache Kafka) Serialization (Apache Thrift) Stream processing (Apache Spark Streaming) KVS / Column DB (Redis, HBase) Online machine learning (Classification, Outlier detection, Change point detection, Abnormal behavior detection) Document DB (MongoDB) Graph DB (Neo4j), graph processing processing (Apache Spark) Machine learning (Apache Spark MLlib) Tight integration of I/O and compute FPGA Four 10GbE FPGA Massive parallelism Networked GPU cluster GPUs Switch Host 11

Data processing w/ Spark+GPU User Space Array Cache (RDDs) Reduction & transformation of RDDs offloaded to GPU [HEART 16] RDD 1 Array Cache (Converted from Spark RDDs) RDD reduction (count, max, reduce, ) Value RDD 1 RDD 2 RDD 3 RDD 2 RDD-to-RDD transformation (sort, filter, map, ) RDD 3 Data are stored in RDDs (i.e., distributed shared memory) RDDs are converted to array structure & transferred to GPU 12

10G+10G Data processing w/ Spark+GPUs Many GPUs are directly connected to Apache Spark server via NEC ExpEther (20Gbps) GPUs 0G+10G 10/40GbE Switch PCIe card inserted in the server

Data processing w/ Spark+GPUs User Space Array Cache (RDDs) Reduction & transformation of RDDs offloaded to GPUs RDD 4 RDD 1 RDD 2 RDD 3 Array Cache (Converted from Spark RDDs) RDD 1 10GbE Switch RDD 3 RDD 6 RDD 2 RDD 5 [ICPADS 16] 14

Acceleration for data processing Input stream data Message queuing Stream Inference Database layer (Polyglot persistence) Realtime DB queries Big data (Surveillance, Network service, SNS, UAV, IoT) I/O intensive Data exchange (serialization) Learning Customer analysis Topic prediction chain records Geolocation query Compute intensive Message queuing middleware (Apache Kafka) Serialization (Apache Thrift) Stream processing (Apache Spark Streaming) KVS / Column DB (Redis, HBase) Online machine learning (Classification, Outlier detection, Change point detection, Abnormal behavior detection) Document DB (MongoDB) Graph DB (Neo4j), graph processing processing (Apache Spark) Machine learning (Apache Spark MLlib) Tight integration of I/O and compute FPGA Four 10GbE FPGA Massive parallelism Networked GPU cluster GPUs Switch Host 15

10Gbps outlier filtering: FPGA NIC Sensor Data Explosion User Space Outlier NetFPGA-10G Device Driver Only anomalyvalued packets are received Huge Size Small Size Machine learning algorithms Mahalanobis Distance Local Outlier Factor(LOF) K-Nearest Neighbor(KNN) NetFPGA-10G Data Mining 10GbE x4 16

10Gbps outlier filtering: FPGA NIC Density-based Sensor Data Explosion approach to User find Space outliers (e.g., higher LOF value when k neighbors are distant) All reference data needed for density computation Frequently-accessed reference data clusters are cached in FPGA NIC [PDP 17] Outlier NetFPGA-10G Device Driver Only anomalyvalued packets are received Huge Size Small Size Machine learning algorithms Mahalanobis Distance Local Outlier Factor(LOF) K-Nearest Neighbor(KNN) NetFPGA-10G Data Mining 10GbE x4 17

Spark Streaming: FPGA NIC Stream processing One-at-a-time style User Space Micro-batch Micro-batch style batch batch Spark Streaming Micro-batch style for compatibility w/ Spark Large latency (e.g, 1sec) NetFPGA-10G Device Driver Results of one-at-atime processing (e.g., filter) received NetFPGA-10G One-at-a-time Stream processing components which can be executed as one-at-a-time style are offloaded to FPGA NIC [IEEE BigData WS 16] 10GbE x4 18

Summary Input stream data Message queuing Stream Inference Database layer (Polyglot persistence) Realtime DB queries Big data (Surveillance, Network service, SNS, UAV, IoT) I/O intensive Data exchange (serialization) Learning Customer analysis Topic prediction chain records Geolocation query Compute intensive Message queuing middleware (Apache Kafka) Serialization (Apache Thrift) Stream processing (Apache Spark Streaming) KVS / Column DB (Redis, HBase) Online machine learning (Classification, Outlier detection, Change point detection, Abnormal behavior detection) Document DB (MongoDB) Graph DB (Neo4j), graph processing processing (Apache Spark) Machine learning (Apache Spark MLlib) Tight integration of I/O and compute FPGA Four 10GbE FPGA Massive parallelism Networked GPU cluster GPUs Switch Host 19

References (1/2) FPGA-based KVS accelerator Yuta Tokusashi, et.al., "A Multilevel NOSQL Cache Design Combining In-NIC and In- Kernel Caches", IEEE Hot Interconnects 2016. FPGA-based chain cache Yuma Sakakibara, et.al., "An FPGA NIC Based Hardware Caching for chain", HEART 2017. FPGA-based Spark Streaming accelerator Kohei Nakamura, et.al., "An FPGA-Based Low-Latency Network Processing for Spark Streaming", IEEE BigData Workshops 2016. 20

21 References (2/2) FPGA-based machine learning accelerator Ami Hayashi, et.al., "An FPGA-Based In-NIC Cache Approach for Lazy Learning Outlier Filtering", PDP 2017. Ami Hayashi, et.al., "A Line Rate Outlier Filtering FPGA NIC using 10GbE Interface", ACM SIGARCH CAN (2015). GPU-based acceleration of Apache Spark Yasuhiro Ohno, et.al., "Accelerating Spark RDD Operations with Local and Remote GPU Devices", ICPADS 2016.

Thank you! hank you for listening Acknowledgement: This work was supported by MIC SCOPE, NEDO IoT, JSPS Kaken B, and SECOM 22