Big Data and Object Storage

Similar documents
Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Hadoop An Overview. - Socrates CCDH

Hadoop/MapReduce Computing Paradigm

Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook

A brief history on Hadoop

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti

Distributed Filesystem

Flash Storage Complementing a Data Lake for Real-Time Insight

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Big Data Architect.

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

MOHA: Many-Task Computing Framework on Hadoop

Decentralized Distributed Storage System for Big Data

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Warehouse- Scale Computing and the BDAS Stack

SCALABLE DISTRIBUTED DEEP LEARNING

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Embedded Technosolutions

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

CS370 Operating Systems

Hadoop, Yarn and Beyond

Quobyte The Data Center File System QUOBYTE INC.

Strategic Briefing Paper Big Data

DATA SCIENCE USING SPARK: AN INTRODUCTION

UNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017

Map Reduce Group Meeting

New Approach to Unstructured Data

Scalable Tools - Part I Introduction to Scalable Tools

The Mathematics of Big Data

MapR Enterprise Hadoop

Effective Use of CSAIL Storage

Big Data and Cloud Computing

EsgynDB Enterprise 2.0 Platform Reference Architecture

Roadmap: Operating Pentaho at Scale. Jens Bleuel Senior Product Manager, Pentaho

Hadoop and HDFS Overview. Madhu Ankam

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

CS370 Operating Systems

Oracle Big Data Fundamentals Ed 2

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

5 Fundamental Strategies for Building a Data-centered Data Center

A Survey on Big Data

A BigData Tour HDFS, Ceph and MapReduce

Best Practices and Performance Tuning on Amazon Elastic MapReduce

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Lecture 11 Hadoop & Spark

Data Sharing Made Easier through Programmable Metadata. University of Wisconsin-Madison

Next-Generation Cloud Platform

Introducing SUSE Enterprise Storage 5

Configuring and Deploying Hadoop Cluster Deployment Templates

The amount of data increases every day Some numbers ( 2012):

The Microsoft Large Mailbox Vision

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Data Analysis Using MapReduce in Hadoop Environment

Introduction to Hadoop and MapReduce

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Distributed File Systems II

NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS

A Review Approach for Big Data and Hadoop Technology

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Accelerate Big Data Insights

Innovatus Technologies

Global Journal of Engineering Science and Research Management

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

The Intersection of Cloud & Solid State Storage

Chase Wu New Jersey Institute of Technology

Map Reduce & Hadoop Recommended Text:

Certified Big Data Hadoop and Spark Scala Course Curriculum

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Big Data Hadoop Course Content

Micron and Hortonworks Power Advanced Big Data Solutions

Big Data Hadoop Stack

Big Data with Hadoop Ecosystem

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Storage Systems. Storage Systems

ELASTIC DATA PLATFORM

Oracle Big Data Fundamentals Ed 1

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Data Movement & Tiering with DMF 7

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

HADOOP FRAMEWORK FOR BIG DATA

From Silicon Valley to the Test Bed: Bringing Big-Data Technologies into ODS

Transcription:

Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich

Volume, Variety & Velocity + Analytics Velocity TERABYTE 290 m diameter GIGABYTE 28 cm diameter Volume Variety + Analytics PETABYTE 300 km diameter

Volume, Variety & Velocity (cont.) 2005 2010 2012 2015 VOLUME 0.1 ZB 1.2 ZB 2.8 ZB 8.5 ZB 2020* 40ZB VARIETY VELOCITY 2020, BUSINESS TRANSACTIONS WILL GROW UP TO 450 BILLION A DAY, ACCORDING TO IDC Source: *IDC (https://www.emc.com/leadership/digital-universe/2014iview/index.htm)

Hitachi Vantara Forum 2018 Hadoop Basics

How to analyse these data? Hadoop Open-Source framework Scalable and distributed computing Hadoop Core and Ecosystem Spark, Hive, Kafka and many more!

Hadoop Solutions Apache Hadoop Hadoop Distributions Big Data Suite less Features more YARN MapReduce HDFS Ecosystem + Packaging Deployment-Tooling Support + Tooling/Modeling Business Analytics Scheduling/Integration

Hadoop Core YARN Resource planning and load balancing Infrastructure management and enterprise services HDFS Distributed overlay file system Redundant storage of large amounts of data MapReduce Programming model for the parallel calculations Fault-tolerant algorithm

Cluster HDFS Basics Logical File Rack 1 Rack 2 Rack 3

Cluster HDFS Basics (cont.) 1 1 1 2 Logical File 3 4 1 Rack 1 Rack 2 Rack 3

Cluster HDFS Basics (cont.) 1 1 3 1 3 2 4 2 Logical File 3 4 4 4 1 2 2 3 Rack 1 Rack 2 Rack 3

A Storage Problem? Cold Data Small Files 21% 8% 6% 6% 2% 27% 7% 5% 56% 19% 11% 32% 0-1 days < 7 days < 30 days < 90 days 90 days 0 KB - 4 KB 4 KB - 512 KB 512 KB - 16 MB 16 MB - 64 MB 64 MB - 128 MB 128 MB - 512 MB 512 MB - 1 GB Source: HPE 2017

Hitachi Vantara Forum 2018 Hadoop Architecture

Hadoop Architecture: Today Hadoop Analytics Hadoop File System Processing x86 Server Storage

Hadoop Architecture: Tomorrow Low Cost Nodes GPU Nodes FPGA Nodes Memory Nodes Processing Hitachi All-Flash-Array Spinning-Disk Nodes Object Storage Storage

Hadoop Architecture: Tomorrow (cont). Low Cost Nodes GPU Nodes FPGA Nodes Memory Nodes Processing SSD Nodes Spinning-Disk Nodes Hitachi HCP Storage

Hadoop Architecture: Tomorrow (cont). Low Cost Nodes GPU Nodes FPGA Nodes Memory Nodes Processing Hitachi All-Flash-Array Hitachi HCP Hitachi HCP Storage

Hitachi Vantara Forum 2018 HDFS Storage Tiering and Archives

HDFS Storage Types DISK SSD Default storage type Hard disk drives Solid state drives Fast, but expensive ARCHIVE RAM_DISK Archival drives Slow, but cheap Object Storage, Cloud Storage Memory drives Very fast with limited capacity

HDFS Storage Policies Assignment of the individual storage types to storage policies HOT Current data, read and write All replicas on DISK COLD Only for old data, read-only All replicas on ARCHIVE WARM Current and old data, mostly read Replicas on DISK and ARCHIVE All_SSD All replicas on SSD One_SSD One replica on SSD, n-1 on DISK Lazy_Persist One replica on RAM_DISK HOT HOT WARM COLD

Storage HDFS Blocks HDFS Policies Without HDFS Storage Tiering HOT WARM COLD All replicas on DISK DISK

Storage HDFS Blocks HDFS Policies With HDFS Storage Tiering HOT WARM COLD n replicas on DISK 1 replica on DISK n-1 replica on ARCHIVE n replicas on ARCHIVE DISK ARCHIVE

Hadoop Archives: HAR Files Layered filesystem on top of HDFS Use MapReduce to create archive Reduce memory consumption on Name Node Each HAR file access reads two index files and one data file hdfs:// har:// Nothing has changed to a client using HAR filesystem Master Index Index File File File File Data Source: http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

Storage HDFS Blocks HDFS Policies Hadoop Archives: HAR Files (cont.) HDFS Name Node metadata information of each block stored in HDFS Hadoop Archive File (HAR) metadata information of each block stored in HAR file DISK ARCHIVE

Thank You Computacenter AG & Co. ohg Consultancy Germany Sven Bauernfeind sven.bauernfeind@computacenter.com +49 173 9158966