Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities

Similar documents
Strategic Briefing Paper Big Data

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Cloud Computing: Making the Right Choice for Your Organization

QLIK INTEGRATION WITH AMAZON REDSHIFT

Embedded Technosolutions

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

High Performance and Cloud Computing (HPCC) for Bioinformatics

Fast Innovation requires Fast IT

High Performance Computing on MapReduce Programming Framework

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

10 Million Smart Meter Data with Apache HBase

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

Data Management at Cloud Scale CommVault Simpana v10. VMware Partner Exchange Session SPO2308 February 2013

7/22/2008. Transformations

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Challenges for Data Driven Systems

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Chapter 5. The MapReduce Programming Model and Implementation

Interstage Big Data Complex Event Processing Server V1.0.0

Flash in a Hybrid Cloud World. How Cloud Shift will affect flash in the Data Center Steve Knipple: Cloud Shift Advisors

Big Data Hadoop Course Content

Data-Intensive Distributed Computing

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Get ready to be what s next.

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Cloud Open Source Innovation on Software Defined Storage

Next Generation Data Center : Future Trends and Technologies

Dell EMC Hyper-Converged Infrastructure

Online Bill Processing System for Public Sectors in Big Data

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Provisioning IT at the Speed of Need with Microsoft Azure. Presented by Mark Gordon and Larry Kuhn Hashtag: #HAND5

From Internet Data Centers to Data Centers in the Cloud

Oracle Big Data Connectors

White Paper FUJITSU Storage ETERNUS DX S4/S3 series Extreme Cache/Extreme Cache Pool best fit for fast processing of vast amount of data

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Large-Scale Duplicate Detection

Overview of Data Services and Streaming Data Solution with Azure

HDInsight > Hadoop. October 12, 2017

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Accelerate your Azure Hybrid Cloud Business with HPE. Ken Won, HPE Director, Cloud Product Marketing

Consolidated Financial Results for Fiscal 2016 (As of March 2017)

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Dell EMC Hyper-Converged Infrastructure

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

I D C M A R K E T S P O T L I G H T

Converged Infrastructure Matures And Proves Its Value

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Big Data and Cloud Computing

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Transform to Your Cloud

When, Where & Why to Use NoSQL?

CSE6331: Cloud Computing

VMworld 2013 Overview

Modernizing Business Intelligence and Analytics

Flash Storage Complementing a Data Lake for Real-Time Insight

A Fast and High Throughput SQL Query System for Big Data

MOHA: Many-Task Computing Framework on Hadoop

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Evolving To The Big Data Warehouse

An Efficient Architecture for Resource Provisioning in Fog Computing

Spark, Shark and Spark Streaming Introduction

FUJITSU Backup as a Service Rapid Recovery Appliance

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

HPE Storage Update The All Flash Datacenter 3PAR

Big Data It s not just for Google Any More

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

HPC learning using Cloud infrastructure

Tech Data s Acquisition of Avnet Technology Solutions

Research Faculty Summit Systems Fueling future disruptions

Building a Data-Friendly Platform for a Data- Driven Future

TD01 - Enabling Digital Transformation Through The Connected Enterprise

Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)

Why the cloud matters?

Architekturen für die Cloud

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

HPE GreenLake. Consumption Solutions

From Silicon Valley to the Test Bed: Bringing Big-Data Technologies into ODS

Open Hybrid Cloud & Red Hat Products Announcements

5 Fundamental Strategies for Building a Data-centered Data Center

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

SpagoBI and Talend jointly support Big Data scenarios

CONFIGURATION GUIDE WHITE PAPER JULY ActiveScale. Family Configuration Guide

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

The Hadoop Paradigm & the Need for Dataset Management

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

EMPOWERING OUR CUSTOMERS TO CHANGE THE WORLD WITH DATA

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

REDEFINING THE ENTERPRISE

Virtualization and Softwarization Technologies for End-to-end Networking

Transcription:

Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Satoshi Tsuchiya Cloud Computing Research Center Fujitsu Laboratories Ltd. January, 2012

Overview: Fujitsu s Cloud and Big Data Fujitsu IaaS FGCP/S5: already deployed in world wide Public IaaS cloud platform Beta started in 2009, now deployed in 5 locations worldwide Pay for what you use / Elastic and scalable Fujitsu s PaaS for Big Data Convergence Services Platform (planned) PaaS for Big Data : Integrated Environment event processing, parallel batch(mapreduce), etc. Announced in Aug. 2011 / Early Beta service will start in March, 2012 (in Japan) Cloud Computing Research Center at Fujitsu Labs is working on R&D of key technologies for Fujitsu Cloud Services. My research team focuses on Parallel Data Processing. 2

Convergence Services Platform (PaaS) Integrated, easy-to-use data processing functions on the Fujitsu Cloud Announced August 2011, early beta service will start from March 2012 Sensing External Systems (Customer s Existing Systems) Real-time process Logging Data collection and detection Current status and trigger Data Management And Integration Archive (Records) Data Exchange Secure data conversion http://www.fujitsu.com/global/news/pr/archives/month/2011/20110830-01.html Extract Context CSPF Data analysis Select Prediction and simulation Batch Processing Other customers environment Context Extraction Controls Information Application Use Automatic control Visualization Recommendation Development support, Operational Management User Portal Site Navigation Customer 3 Copyright 2011 FUJITSU LIMITED

New Challenges on Big Data Gartner: 3 challenges on Big Data (June 2011) Volume: store enormous amount of data (tens of TB ~ several PB) Variety: transaction logs, sensor records, image, video, etc. Velocity: competitiveness depends on the responsiveness of analysis Not just Volume, Volume and Velocity together Advanced Users needs Velocity in tens of TB The report Big Data Analysis (Data Warehousing Institute ) Many advancing analysis users want to get results within hourly (min ~ sec) (Those advanced users already have tens of TB) Shift to quicker response 4

Big Data is like driving a car in the sea of information The Real World ever-changing ever-growing overlay Enterprise customers wants to find insights from the real world. Existing IT systems only shows past results in a small window Record of past relatively small New IT systems expected to show - Now : visualize current situation - Future: prediction, recommendation from enormous and ever-changing, ever-growing data various sources, enormous amount 5

Volume The Technology Map of Big Data Processing There is no single ring to rule them all. Utilization Base Platform Application Type Processing pattern Access Distribution Hardware Real-time XTP KVS Random/ Latency Record / hash CEP Ever-growing, Ever-changing, Batch jobs To be developed MapReduce (OSS Hadoop) purposebuilt Mix of methods Dynamically re-purposing servers Sequential/ throughput block/ alphabetical E P T G M K Hadoop DWH / RDB hr XTP: extream Transaction Processing CEP: Complex Event Processing KVS: Key-Value data Store min sec Velocity CEP msec Two major purpose-built towers: Real-time and Batch in parallel Real-Time: Latency focused record-base, short msgs, allocation by hash (random acc) Batch in parallel: Throughput focused Big block in storage, sequential/sorted allocation Next Step: variety of purpose-built systems mix of methods/elements appropriately for each need of enterprises 6

Exhibit A highly parallel and fast range query function for a distributed data store Distributed KVS (Key-Value Store) provides a storage function with scalability and fault tolerancy. However A rich function like Range Query cannot be executed efficiently on existing distributed KVS tech. Search Japanese restaurants around here Range Query needs additional info. and mechanism for rapid and efficient response Multitude of Sensors Data Accumulation (24 hours 365 days) Various functional Services Distributed KVS for scalability and high availability Range Query is a data extraction technique from a data set Additional info and mech. No Index (a simple answer) query to all possible nodes Very Inefficient Centrally managed Range Index ex. Hbase (Hadoop KVS) bad at scale out operation it needs careful design 7 Copyright 2011 FUJITSU

Exhibit Technology Enablers Two-layer data partitioning technique and combines them careffully in a distributed manner key segment (for efficiency) Put keys close to each other into the same segment (locality-aware) Tree-based allocation Dynamically split segments based on the accumulated amount of data (load balancing in terms of volume) segment server (for high avail.) Put segments into servers randomly Hash-based allocation Preserve high availability and scalability of distributed KVS 8 Key Index Tech. # of keys Carefully combines KVS Tech. Segment Server Key Distribution changes dynamically Dynamic load balance Tree-based partitioning to make the count of keys equal among segments and to realize data locality Hash-based partitioning to make the count of segments equal among servers Copyright 2011 FUJITSU LABORATORIES LIMITED

Summary Big Data is not just for Volume, Volume and Velocity together Big Data is like driving a car in the sea of information Existing IT system treats relatively small data and just show the past trends in a small rear view window. New IT systems are expected to show the future (prediction, recommendation) in a big front window (for rapid, precise decision) Next phase is variety of purpose-built systems to fulfill specific enterprise needs Basic data processing functions (Event Processing / Parallel Batch) are available Mix of methods/elements to fulfill the requirements of each enterprise with understanding elemental tech. and carefully designed combinations Fujitsu Labs are developing high level functions on top of basic parallel technologies aiming at purpose-built Big Data system in the cloud. 9

Copyright 2010 FUJITSU 10