Massively Scalable File Storage. Philippe Nicolas, KerStor

Similar documents
FILE STORAGE. Philippe Nicolas, Scality

Find and Select the Right File Storage for your Applications. Philippe Nicolas, KerStor

FILE STORAGE. Philippe Nicolas, Scality

The File Systems Evolution. Christian Bandulet, Sun Microsystems

Tiered File System without Tiers. Laura Shepard, Isilon

Scaling Data Center Application Infrastructure. Gary Orenstein, Gear6

The Evolution of File Systems

A Promise Kept: Understanding the Monetary and Technical Benefits of STaaS Implementation. Mark Kaufman, Iron Mountain

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Storage Virtualization II Effective Use of Virtualization - focusing on block virtualization -

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

Application Recovery. Andreas Schwegmann / HP

The Role of WAN Optimization in Cloud Infrastructures. Josh Tseng, Riverbed

ADVANCED DATA REDUCTION CONCEPTS

pnfs, parallel storage for grid and enterprise computing Joshua Konkle, NetApp, Inc.

Multi-Cloud Storage: Addressing the Need for Portability and Interoperability

Storage Virtualization II. - focusing on block virtualization -

Data Deduplication Methods for Achieving Data Efficiency

SRM: Can You Get What You Want? John Webster, Evaluator Group.

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

Deploying Public, Private, and Hybrid. Storage Cloud Environments

LTFS Bulk Transfer Standard PRESENTATION TITLE GOES HERE

Overview and Current Topics in Solid State Storage

SRM: Can You Get What You Want? John Webster Principal IT Advisor, Illuminata

Expert Insights: Expanding the Data Center with FCoE

Everything You Wanted To Know About Storage (But Were Too Proud To Ask) The Basics

Trends in Worldwide Media and Entertainment Storage

Effective Storage Tiering for Databases

Overview and Current Topics in Solid State Storage

High Availability Using Fault Tolerance in the SAN. Wendy Betts, IBM Mark Fleming, IBM

Use Cases for iscsi and FCoE: Where Each Makes Sense

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

in Transition to the Cloud David A. Chapa, CTE EVault, a Seagate Company

Felix Xavier CloudByte Inc.

Storage in combined service/product data infrastructures. Craig Dunwoody CTO, GraphStream Incorporated

Distributed File Systems II

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Storage Performance Management Overview. Brett Allison, IntelliMagic, Inc.

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Restoration Technologies

Virtualization Practices: Providing a Complete Virtual Solution in a Box

Felix Xavier CloudByte Inc.

New Fresh Storage Approach for New IT Challenges Laurent Denel Philippe Nicolas OpenIO

IBM Storage Software Strategy

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

EVERYTHING YOU WANTED TO KNOW ABOUT STORAGE, BUT WERE TOO PROUD TO ASK Part Cyan Storage Management. September 28, :00 am PT

Virtualization Practices:

OpenStack Manila An Overview of Manila Liberty & Mitaka

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Planning For Persistent Memory In The Data Center. Sarah Jelinek/Intel Corporation

Adrian Proctor Vice President, Marketing Viking Technology

Cloud Computing CS

Cloud Archive and Long Term Preservation Challenges and Best Practices

An Introduction to Key Management for Secure Storage. Walt Hubis, LSI Corporation

Next Generation Storage for The Software-Defned World

SwiftStack and python-swiftclient

Software Defined Storage for the Evolving Data Center

SCSI Security Nuts and Bolts. Ralph Weber, ENDL Texas

ASPECTS OF DEDUPLICATION. Dominic Kay, Oracle Mark Maybee, Oracle

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Scale-out Data Deduplication Architecture

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Mobile and Secure Healthcare: Encrypted Objects and Access Control Delegation

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

Ramin Elahi, Adjunct Faculty UC Santa Cruz Silicon Valley

Rethink Storage: The Next Generation Of Scale- Out NAS

SAS: Today s Fast and Flexible Storage Fabric. Rick Kutcipal President, SCSI Trade Association Product Planning and Architecture, Broadcom Limited

NFSv4.1 Using pnfs PRESENTATION TITLE GOES HERE. Presented by: Alex McDonald CTO Office, NetApp

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Evolution of Fibre Channel. Mark Jones FCIA / Emulex Corporation

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Blockchain Beyond Bitcoin. Mark O Connell

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

SUSE Enterprise Storage 3

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Parallel File Systems. John White Lawrence Berkeley National Lab

Distributed Systems 16. Distributed File Systems II

Distributed Filesystem

Object-based Storage (OSD) Architecture and Systems

CA485 Ray Walshe Google File System

White Paper. A System for Archiving, Recovery, and Storage Optimization. Mimosa NearPoint for Microsoft

The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION

1560: Storage Management & Business Continuity Strategy and Futures

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Facing an SSS Decision? SNIA Efforts to Evaluate SSS Performance. Ray Lucchesi Silverton Consulting, Inc.

An Introduction to GPFS

IBM ProtecTIER and Netbackup OpenStorage (OST)

Ron Emerick, Oracle Corporation

Realizing the Promise of SANs

Trends in Data Protection CDP and VTL

GlusterFS Architecture & Roadmap

Storage and Virtualization: Five Best Practices to Ensure Success. Mike Matchett, Akorri

Mark Rogov, Dell EMC Chris Conniff, Dell EMC. Feb 14, 2018

NetApp Clustered ONTAP & Symantec Granite Self Service Lab Timothy Isaacs, NetApp Jon Sanchez & Jason Puig, Symantec

Transcription:

Philippe Nicolas, KerStor

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the Author nor the Presenter is an attorney and nothing in this presentation is intended to be nor should be construed as legal advice or opinion. If you need legal advice or legal opinion please contact an attorney. The information presented herein represents the Author's personal opinion and current understanding of the issues involved. The Author, the Presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract Title: Abstract: With the fast evolution of the Internet, the explosion of data volume and growing online services, massively scalable file storage is a reality today. We all use P2P applications which finally use workstation private storage space to enable an almost infinite space. Compliance and Archiving put also a high pressure on file storage and many solutions use innovative approaches developed recently, some research work was done around RAIN architecture. High Performance applications must run and use high performance file storage infrastructure and many proprietary developments were made to address these needs. Some standard work was also done by many vendor to propose some new parallel mode such as pnfs. This session covers only the high-end part of network file storage with high performance, reliability and huge capacity and presents P2P, CAS, Distributed, Parallel and Cloud file storage with important number of servers and clients such as Google FS, Apache project named Hadoop, PVFS, Lustre, pnfs and a few proprietary ones to illustrate industry answers. Learning objectives: This presentation will introduce and review all high-end network file storage technologies to help better understand vendors solutions and standards challenges. It will help understand each design, advantages and drawbacks to align with applications needs and finally participate to the education of the market to simplify technology adoption. Audience: IT & Storage Architect, IT Manager and Buyers. 3

Agenda Needs for Massive Scalability Fundamental drivers Needs and limitations Multiple usages Implementations and Philosophies Conventions & Approaches Industry and offering examples Conclusion 4

Needs for Massive Scalability Fundamental drivers, Needs and limitations Multiple usages

Fundamental drivers Avalanche of data, increase demand for unstructured data storage x6 in 4 years (281 EB in 2007 to 1800 EB in 2011 IDC March 2008) 80% of this volume is unstructured information 80% of file data growth every year, 80% of inactive data after 30 days! (file size grows as well) Too many discrete data servers silos and point products File Server/NAS, Backup & Archiving, Application, Virtual Machines Too low Storage utilization, cost of infrastructure management Limitations to support millions/100s of millions/billions of files, 10s-100s of PBs, thousands of clients and storage servers (IOPS & BW) Poor automation Protection too complex and too long RTO and RPO not aligned to business Too many tape restore errors (30%) Policies enforcement, Load balancing, Migration, Replication, Mirroring, ILM/FLM/Tiering Business Continuity (HA/BC/DR) 100% ubiquitous data access SLA not set, controlled and measured 6

File Storage Needs Needs by numbers and features Clients connections: 10000s Storage Servers: 1000s Storage capacity: 10-100-1000 PBs Throughput: 10s-100sGB/s of aggregated bandwidth & millions of IOPS Full embedded resiliency and redundancy Global, permanent and ubiquity access and presence Notion of global/shared namespace Standard or proprietary file access Security and privacy Always-on, Everywhere for Everyone 7

File Storage limitations Technology Distributed File System e.g NAS/File Servers with NFS/CIFS Cluster File System SAN File System (aka SAN File Sharing System) Bottleneck Server (Meta-Data and Data) Symmetric: DLM/GLM* Asymmetric: Master node Dozens of nodes Meta-Data Server Hundreds of nodes *DLM/GLM: Distributed/Global Lock Manager 8

Multiple usages Common and new file storage usages File services DB on NAS Backup Archiving & Compliance Data Mining (file warehouse) Cloud integration for above topics Towards an Universal, Unified, Ubiquitous File Storage 9

Implementations and Philosophies Conventions & Approaches Industry and offering examples

Conventions In this tutorial No coverage of local file system No commercial product mentioned but family, approach or model explained Only Distributed File Storage Scalability, a 3 dimensions approach Performance: no degradation, aggregated improvement Availability: data and access redundancy Manageability: no compromise and no complexity 11

Implementations and Philosophies Industry standard components & servers (COTS*) Open Software (often open-source) Standards file sharing protocols (+ API/proprietary) Online scalability Self Managing, Self-Healing, Self-Aware, Self-Organizing Data Management Services Scale-out approach with a distributed philosophy based on RAIN, Grid COTS: Commodity Off The Self 12

Distributed implementations Aggregation Union of storage servers Symmetric, Asymmetric or P2P Central Authority aka Master node or Masterless Parallel vs. non-parallel File striped and concurrent access, single node access Shared-nothing model Aggregation of file storage/servers User-mode vs. Kernel/System-mode File-based vs. object-based CIFS/NFS NAS Cluster WAFS NFM/NFV FAN Distributed Parallel Notion of Shared/Private Namespace (storage nodes wide) RAIN, Grid implementation with embedded data protection and resiliency No storage nodes are fully trusted and are expected to fail at any time Extended features Policies enforcement for Automation, Snapshot, CDP, Replication, Mirroring, ILM/FLM/Tiering, Migration, Load balancing pnfs PVFS Own or Proprietary 13

P2P Implementations Interesting implementation with aggregation of other machines storage space with or wo a central server (asym or sym P2P) Ex: P2P File Sharing System such as music, mp3 Napster Gnutella BitTorrent 14

Industry implementations Aggregation of independent storage servers File entirely stored by 1 storage server (no file striping) Symmetric Philosophy Shared Namespace 15

Industry implementations Aggregation of homogeneous storage servers File is striped across storage server Symmetric Philosophy Shared Namespace Data/File striping across servers 16

MogileFS Application-level Distributed FS (no kernel module) Open-source Fully redundant, automatic file replication Flat Namespace Shared-Nothing Local filesystem agnostic But not POSIX compliant www.danga.com/mogilefs 17

pnfs (Parallel NFS with NFS v4.1) pnfs is about scaling NFS and address file server bottleneck Same philosophy as SAN FS (master/slave asymmetric philosophy) and data access in parallel Allow NFSv4.1 client to bypass NFS server No application changes, similar management model pnfs extensions to NFSv4 communicate data location to clients Clients access data via Fibre Channel & iscsi (block), OSD (object) or NFS (file) www.pnfs.com pnfs Client Data Storage Net Storage Servers Host Net NFS Server 18

Lustre Open source object-based storage system Based on NASD study from Carnegie Mellon Univ. University project Asymmetric Philosophy Notion of Object (OST/OSS) www.lustre.org 19

Google File System Internal deployment but used by WW end-users Proprietary approach Asymmetric philosophy Thousands of chunk servers with 64MB chunk size (stripe unite) research.google.com/pubs/papers. html Source Google 20

PVFS (Parallel Virtual File System) Now PVFS2 Project from Clemson University and Argonne National Lab Open source and based on Linux Asymmetric philosophy www.pvfs.org #0 #1 #2 21

Hadoop DFS Apache project Highly fault-tolerant built in Java Large data sets Asymmetric philosophy Files striped across DataNodes hadoop.apache.org 22

Conclusion

Conclusion New IT/Technology and Research projects need Scalable (File) Storage Cost effective approach with COTS Acceptance of thousands of machines (clients, servers) Asymmetric (Master/Slave) seems to be a more scalable philosophy Unified, Global Namespace is fundamental Superiority of File/object on block approach Reliability & Security are key (authorization, authentication, privacy ) Standard is needed for broad adoption and cost reduction (pnfs) 24

SNIA Tutorials references Check out SNIA Tutorial: The File Systems Evolution Check out SNIA Tutorial: Check out SNIA Tutorial: Exploiting Multi-Tier File Storage Effectively Check out SNIA Tutorial: DFS For Not-So-Dummies Check out SNIA Tutorial: pnfs, Parallel Storage for Grid and Enterprise Computing Check out SNIA Tutorial: Check out SNIA Tutorial: Virtualization I - What, Why, Where and How? Cloud Storage: The New Paradigm for Accessing Storage as a Service Find and Select the Right File Storage for your Applications 2008 Storage Networking Industry Association. All Rights Reserved. 25

Q&A / Feedback Please send any questions or comments on this presentation to SNIA trackfilemgmt@snia.org (File Systems & File Management) Philippe Nicolas Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee 26

Philippe Nicolas, KerStor