Handling Big Data an overview of mass storage technologies

Similar documents
Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

Ceph: A Scalable, High-Performance Distributed File System

Choosing an Interface

Distributed File Systems II

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Distributed Filesystem

CISC 7610 Lecture 2b The beginnings of NoSQL

CLOUD-SCALE FILE SYSTEMS

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Archive Solutions at the Center for High Performance Computing by Sam Liston (University of Utah)

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

The Google File System. Alexandru Costan

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Staggeringly Large Filesystems

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems

GlusterFS Architecture & Roadmap

Data Centers. Tom Anderson

ROCK INK PAPER COMPUTER

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

GFS: The Google File System. Dr. Yingwu Zhu

A Gentle Introduction to Ceph

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

dcache Introduction Course

New Fresh Storage Approach for New IT Challenges Laurent Denel Philippe Nicolas OpenIO

MapReduce. U of Toronto, 2014

Lustre overview and roadmap to Exascale computing

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

BigData and Map Reduce VITMAC03

Decentralized Distributed Storage System for Big Data

Introduction to Distributed Data Systems

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

CS 111. Operating Systems Peter Reiher

virtual machine block storage with the ceph distributed storage system sage weil xensummit august 28, 2012

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Core solution description

Turning Object. Storage into Virtual Machine Storage. White Papers

Rio-2 Hybrid Backup Server

Data Movement & Tiering with DMF 7

The Google File System (GFS)

CEPH APPLIANCE Take a leap into the next generation of enterprise storage

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Analytics in the cloud

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Embedded Technosolutions

Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook

Ceph. The link between file systems and octopuses. Udo Seidel. Linuxtag 2012

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

CA485 Ray Walshe Google File System

File systems CS 241. May 2, University of Illinois

GFS: The Google File System

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

A BigData Tour HDFS, Ceph and MapReduce

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Datacenter Storage with Ceph

10. Replication. Motivation

Changing Requirements for Distributed File Systems in Cloud Storage

Software-defined Storage: Fast, Safe and Efficient

Introducing SUSE Enterprise Storage 5

Auto Management for Apache Kafka and Distributed Stateful System in General

Hedvig as backup target for Veeam

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

The Google File System

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

I/O CANNOT BE IGNORED

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

Big Data and Object Storage

Achieving the Potential of a Fully Distributed Storage System

From Internet Data Centers to Data Centers in the Cloud

CS5460: Operating Systems Lecture 20: File System Reliability

Kinetic drive. Bingzhe Li

I/O CANNOT BE IGNORED

Cloud object storage in Ceph. Orit Wasserman Fosdem 2017

Samba and Ceph. Release the Kraken! David Disseldorp

IST346. Data Storage

Storage. Hwansoo Han

SUSE Enterprise Storage Technical Overview

Integrated hardware-software solution developed on ARM architecture. CS3 Conference Krakow, January 30th 2018

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Reliable Computing I

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Storage. CS 3410 Computer System Organization & Programming

IBM C Foundations of IBM Big Data & Analytics Architecture V1.

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

SolidFire and Ceph Architectural Comparison

Transcription:

SS Data & Handling Big Data an overview of mass storage technologies Łukasz Janyst CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it GridKA School 2013 Karlsruhe, 26.08.2013

What is Big Data? A buzzword typically used to describe data sets that are too big to be stored and processed by conventional means.

What can we do with it? Analyze anonymous GPS records from 100 million drivers to help home buyers determine optimal property locations Analyze billions of credit card transactions to protect from fraud Find trends in the stock market moves Decode human genome

What can we do with it? Copy, store, and analyze the internet traffic for more or less questionable reasons source: Wikipedia The NSA s Data Center in Utah - where all the PRISM data is supposedly handled

What can we do with it? Process data from over 150 million sensors to find the Higgs boson

How big is it now? CERN alone currently stores over 100 petabytes of data, with the experiments producing around 30 PB annually NSA builds a data center capable of handling 12 exabytes of data Facebook stores around 300 billion photos Walmart processes 1 million client transactions per hour and has 2.5 PB

How big is it going to be? International Data Corporation forecasts the digital universe to grow up to 40ZB (40 trillion gigabytes) by 2020. Grow by 50% each year 5200 GB/per person in 2020

What are the challenges? Capture Store Transmit Process Scope of this presentation

Multitude of solutions

Scaling systems need to be able to grow with the growing amount of data they handle. Scaling up Scaling out

Ideal properties Ideally all distributed systems should be: Consistent commits are atomic across the entire system, all clients see the same data at the same time (Highly) Available remains operational at all times, requests are always answered (successfully or otherwise) Tolerant to partitions network failures don t cause inconsistencies, the system continues to operate correctly despite part of it being unreachable

Ideal properties - CAP In reality however: Available A C Consistent Pick two P Partition tolerant Brewer s CAP theorem

Typical components Metadata system Clients Protocol handlers Object store Caveat: not necessarily logically separate - may be tightly coupled and interleaved

Object stores 10c39527b893c798a93e8997772f65a8 (Hashed) key Data Blob Distributed Object Store - typically, a collection of uncorrelated flexible-sized data containers (objects) spread across multiple data servers

Object-node mapping Algorithmic object location can be computed by the client or server using object name (key) and other inputs (cluster state) Dynamo, CEPH Manager/Cache manager node asks storage nodes for an object and caches the location for future reference (XRootD) Index central entity (database) knows all the objects and their locations - most of traditional storage systems

Amazon Dynamo The output space of the hash function is treated like a ring A node is assigned a random value denoting it s position in the ring An object is assigned to a node by hashing the key and walking the ring clockwise to find a node with a position larger than the key. Replicas are stored to the subsequent nodes

CEPH - RADOS Each object is first mapped to a placement group depending on the key and replication level Placement groups are assigned to nodes and disks using a stable, pseudo random mapping algorithm depending on cluster map (CRUSH). Cluster map is managed by monitors and replicated to storage nodes and clients.

Chunks, stripes, replicas For performance, space and safety reasons, the data may be distributed in many different ways Replicas fairly simple, little metadata, performance space issues: knapsack problem, expensive for archiving Chunks solves the knapsack problem, distributes the load still requires replicating for safety, much more metadata Stripes relatively cheap archiving more metadata, knapsack problem

RAIN - Erasure codes RAIN - redundant array of inexpensive nodes (RAID implementation across nodes instead of disks) Used to increase fault tolerance by adding extra stripes correlating the info contained in the base stripes. Multiple techniques: Hamming parity Reed-Solomon error correction Low-density parity-check

System topology Data placement needs to take into account system topology. Spread replicas/chunks/stripes between failure domains: Different disks, nodes, racks, switches, power supplies, or entire data centers if possible There is even some research on reducing heat production by appropriately scheduling disk writes.

Data locality Computation is most efficient when executed close to data it operates on Core concept of Hadoop, where nodes are typically both storage and computation nodes HDFS exposes interfaces allowing job schedulers to dispatch jobs close to data: often the same node or rack

Metadata services Group and organize objects into human-browsable groups, manage quotas, ownership, group attributes... POSIX-like trees familiar, used since decades very hard to scale out Accounts/Containers/Objects trivially scalable may be hard to adjust legacy software

Runs on top of RADOS Maps files and directories hierarchies to RADOS objects Does dynamic tree partitioning Metadata cluster may grow or contract - nodes are stateless facades for accessing data in RADOS Data & CEPH Filesystem

Amazon S3 approach Proprietary technology Most likely it s Dynamo with: HTTP interface accounting system for billing user authentication/authorization mechanisms User accounts consist of buckets Buckets are sets of files account-bucket-file tuples are likely used as keys of Dynamo objects

Backups-Archiving Some data may need to be moved to cheaper or more reliable media. Back up - copy important data to a different kind of media - cheaper, more resilient to some natural phenomena Archive - move inactive data to a cheaper but safer and possibly less available system Backups and archives of big data are likely even bigger data!

HSM and Tiers Hierarchical Manager - transparently move data files between media types depending on how soon and how often they are accessed Tier - assigning different categories of data (more/less critical, active/inactive,...) to different kind of storage technologies, often manually

Clients APIs direct use integrating into commonly used tools as plug-ins Mount points through widespread protocols (NFS, CIFS/Samba,...) dedicated drivers (typically FUSE) Commandline and GUIs through widespread software (web browsers) custom tools

Access requirements Access patterns Is put/get enough? User authentication is the system exposed to multiple users? X.509, Kerberos, user/password, etc. Transmission encryption are the channels secure or data sensitive symmetric/asymmetric Do we need partial reads, vector reads? What about updates? Filesystem/bucket operations list, stat, chown, etc.

Efficiency considerations Latency support for logical streams and priorities allow for multiple queries at once and provide a way of disambiguating responses Bandwidth protocol overhead compression (both headers and payload) Server-side CPU intensiveness Do requests need to be decompressed? Does it need to parse a ton of text/xml?

HTTP HTTP is indisputable king of the cloud communication protocols not because it s particularly efficient, but because clients are built into pretty much every computer There s problems with it, mainly: does not allow out-of-order or interleaved responses reasonable performance only for big, one-shot downloads protocol overhead: many headers sent with each request, most of which are redundant

HTTP 2.0 (a.k.a SPDY) An effort to fix the most important issues with HTTP SPDY kind of a virtual transport protocol for HTTP. It does not really change the request or response format, just the way they are transported over the wire Prioritization and multiplexing introduce multiple logical streams within one connection responses may interleave each other Header dictionary only changing headers are sent over the wire and they are compressed

Outlook Prediction is very difficult, especially about the future. - Niels Bohr Need for handling bigger and bigger data will likely push out POSIX completely write-once read-many, put/get/remove the main cost is in moving applications to use the new semantics Metadata services are bottlenecks likely replaced by deterministic data placement

Questions Thanks for your attention! Questions? Comments?

Notes Some interesting reading: Dynamo: Amazon s Highly Available Key-value Store RADOS: A Scalable, Reliable Service for Petabyte-scale Clusters Most of the artwork in this presentation comes from: Open Icon Library