Kinetic drive. Bingzhe Li

Similar documents
Kinetic Open Storage Platform: Enabling Break-through Economics in Scale-out Object Storage PRESENTATION TITLE GOES HERE Ali Fenn & James Hughes

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

Decentralized Distributed Storage System for Big Data

Seagate Kinetic Open Storage Platform. Mayur Shetty - Senior Solutions Architect

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

A BigData Tour HDFS, Ceph and MapReduce

Integrated hardware-software solution developed on ARM architecture. CS3 Conference Krakow, January 30th 2018

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

The Datacentered Future Greg Huff CTO, LSI Corporation

Flash Storage Complementing a Data Lake for Real-Time Insight

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Distributed Filesystem

Hadoop/MapReduce Computing Paradigm

Ceph vs Swift Performance Evaluation on a Small Cluster. edupert monthly call Jul 24, 2014

Hadoop and HDFS Overview. Madhu Ankam

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Chapter 5. The MapReduce Programming Model and Implementation

Evaluation of the Huawei UDS cloud storage system for CERN specific data

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

SUSE OpenStack Cloud Production Deployment Architecture. Guide. Solution Guide Cloud Computing.

Simplifying Collaboration in the Cloud

A brief history on Hadoop

Data Sheet FUJITSU Storage ETERNUS CD10000 S2 Hyperscale Storage

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

EsgynDB Enterprise 2.0 Platform Reference Architecture

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

powered by Cloudian and Veritas

virtual machine block storage with the ceph distributed storage system sage weil xensummit august 28, 2012

Enterprise Ceph: Everyway, your way! Amit Dell Kyle Red Hat Red Hat Summit June 2016

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Using Cloud Services behind SGI DMF

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

CS 655 Advanced Topics in Distributed Systems

Enosis: Bridging the Semantic Gap between

UNIFY DATA AT MEMORY SPEED. Haoyuan (HY) Li, Alluxio Inc. VAULT Conference 2017

White Paper. Nexenta Replicast

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Hadoop An Overview. - Socrates CCDH

CLOUD-SCALE FILE SYSTEMS

Home of Redis. April 24, 2017

MySQL High Availability

Big Data in OpenStack Storage

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Datacenter replication solution with quasardb

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Lecture 11 Hadoop & Spark

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Ghislain Fourny. Big Data 5. Column stores

NOSQL OPERATIONAL CHECKLIST

Ghislain Fourny. Big Data 5. Wide column stores

Distributed File Systems II

Big Data: Tremendous challenges, great solutions

Ceph Software Defined Storage Appliance

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

MI-PDB, MIE-PDB: Advanced Database Systems

Cloud Computing at Yahoo! Thomas Kwan Director, Research Operations Yahoo! Labs

Open vstorage RedHat Ceph Architectural Comparison

Big Data and Object Storage

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Scaling Out Key-Value Storage

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Data Analysis Using MapReduce in Hadoop Environment

MapReduce. U of Toronto, 2014

Cisco UCS Mini Software-Defined Storage with StorMagic SvSAN for Remote Offices

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

MapReduce & HyperDex. Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University

Cluster Setup and Distributed File System

NetApp Solutions for Hadoop Reference Architecture

Jai Menon and the rest of the team IBM Research Autonomic Storage Systems April 11, 2002

Big Data 7. Resource Management

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Highly accurate simulations of big-data clusters for system planning and optimization

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Distributed Databases: SQL vs NoSQL

OpenSFS Test Cluster Donation. Dr. Nihat Altiparmak, Assistant Professor Computer Engineering & Computer Science Department University of Louisville

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

50 Must Read Hadoop Interview Questions & Answers

Optimizing the Data Center with an End to End Solutions Approach

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Evaluation and Integration of OCP Servers from Software Perspective. Internet Initiative Japan Inc. Takashi Sogabe

Rapid Automated Indication of Link-Loss

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

HADOOP FRAMEWORK FOR BIG DATA

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Handling Big Data an overview of mass storage technologies

Transcription:

Kinetic drive Bingzhe Li

Consumption has changed It s an object storage world, unprecedented growth and scale In total, a complete redefinition of the storage stack https://www.openstack.org/summit/openstack-summit-atlanta-2014/session-videos/presentation/casestudy-seagate-kinetic-platform-in-action

What s Kinetic drive Kinetic drive is the world s first Ethernet-connected HDD with an open source object API designed specifically for hyperscale and scale-out environments. Seagate Kinetic HDD simplifies storage hardware and software architectures to reduce Total Cost of Ownership(TCO) and enable rapid response to a growing cloud storage infrastructure. http://www.seagate.com/tech-insights/kinetic-vision-how-seagate-newdeveloper-tools-meets-the-needs-of-cloud-storage-platforms-master-ti/

Kinetic drive features New class of Ethernet drives 2 Ethernet ports instead of 2 SAS ports Key/Value interface instead of SCSI Standard form-factor Open source key/value API and libraries https://www.openstack.org/summit/openstack-summit-atlanta-2014/session-videos/presentation/casestudy-seagate-kinetic-platform-in-action

HDD Form Factor and Connector Kinetic chassis conform to dimensions for the HDD drive form factor. Kinetic drives repurposes the standard SAS HDD connector. The connector is defined by and conforms to the dimensions and specifications of SFF- 8482 Rev 2.3. Its placement in the drive form factor is identical to SAS drives. https://developers.seagate.com/display/kv/kinetic+open+storage+documentation+wiki

First Generation Kinetic HDD The first Kinetic drive is a 4TB, 5900 rpm, 3.5" hard disk drive (HDD). Compared to its conventional sister drive, the Kinetic drive implements the Kinetic API that enables key-value object storage. The Kinetic drive replaces the Serial Advance Technology Attachment (SATA) or Serial Attached SCSI (SAS) interface connections with the two 1-Gbps SGMII Ethernet ports, which enables direct network attached connectivity. The Ethernet interface allows communication between drives and direct communication to the datacenter, eliminating the need for Storage Servers for datacenter storage racks.

The Seagate Kinetic Open Storage Data Center vs. the Traditional Model VS. http://www.seagate.com/tech-insights/kinetic-vision-how-seagate-new-developer-tools-meets-the-needs-of-cloud-storageplatforms-master-ti/

Basic architecture http://www.seagate.com/tech-insights/kinetic-vision-how-seagate-new-developer-tools-meets-the-needs-of-cloud-storageplatforms-master-ti/

Advantages of kinetic drive Performance Scalability Simplicity TCO(Total cost of ownership)

From Ali Fenn's and James Hugher s presentation in SNIA's DSI Conference - Santa Clara - April 2014

From Ali Fenn's and James Hugher s presentation in SNIA's DSI Conference - Santa Clara - April 2014

From Ali Fenn's and James Hugher s presentation in SNIA's DSI Conference - Santa Clara - April 2014

P2P operation example copydrive (see src/copydrive.cc) This example uses the P2P push functionality to copy an entire keyspace from one drive to one or more other drives. For example, to run a pipeline that looks like A -> B -> C you could call it like so:./copydrive A 8123 B 8123 C 4444

Goals of the kinetic API Data movement get/put/delete/getnext/getprevious Multiple masters Cluster-able 3 rd party copy Management

Software applications OpenStack Swift Distributed Hash Tables Hadoop Distributed File system Basho Riak CS https://developers.seagate.com/display/kv/ki netic+open+storage+documentation+wiki

OpenStack Kinetic drive https://developers.seagate.com/display/kv/openstack+swift

OpenStack Swift OpenStack Swift enables an object storage system that is highly scalable and durable. Swift allows the ability to store, retrieve, and delete objects in containers using a RESTful HTTP API. The key terminologies used for Swift are: Proxy Server - Scalable API request handler, determines storage node distribution of objects based on URL Partitions - A complete and non-overlapping set of key ranges such that each object, container and account is a member of exactly one partition as per the value of its key Ring - Maps each partition to a set of devices Objects - Key-value entries in the object store Containers - Groups of objects Accounts - Groups of containers Object Server - Service that provides access to object methods Container Server - Service that provides access to container methods Account Server - Service that provides access to account methods https://developers.seagate.com/display/kv/openstack+swift

Kinetic Implementation Swift clusters using Kinetic drives allow access to any drives and, thus, any object. For the current Kinetic integration, a fraction of Object Server commands (Object Daemon) are embedded within the Proxy Server acting as a logical construct. https://developers.seagate.com/display/kv/openstack+swift

Heterogeneous Clusters Keeping Object Server operations within the Proxy Server conservatively integrates the Kinetic ecosystem with the current Swift ecosystem and potentially provides backward compatibility with existing Swift deployments. The additional value gained by this implementation is allowing the Proxy Servers to use existing protocol, while reducing the amount of I/O from the Object Servers. A possible heterogenous cluster with Kinetic and Non-Kinetic Zones are shown below. https://developers.seagate.com/display/kv/openstack+swift

Future possibilities Current Kinetic replication process involves P2P Copy, where the Kinetic drives can copy objects from other Kinetic drives. Future potential goals for complete Kinetic integration would be to eliminate the Proxy Server restrictions of Zone accesses, such that the Proxy Servers can access all drives from any Zone. In this situation, the Client can make a single connection to a Proxy Server and obtain access to objects from all Zones https://developers.seagate.com/display/kv/openstack+swift

From Ali Fenn's presentation in OpenStack Summit - Paris - November 2014 RaliQuest, York Street Strategy

Distributed Hash Tables

Distributed Hash Tables (DHT) provides a lookup service by storing key-value pairs in a decentralized distributed system. The benefits of DHTs are: Highly scalable by automatically distributing loads to new nodes Distributed storage of objects with known names Self-organizing with no need of a central server Robust against node failure except for bootstrap nodes Bootstrap node provides initial configuration information to newly joining nodes so that they may successfully join the overlay network Data is automatically sent away from failed nodes The disadvantages to DHTs are: Searching feature based on hash algorithm, where very similar data values can be at totally different nodes. Security problems with secure routing and difficulty in verifying data integrity. https://developers.seagate.com/display/kv/distributed+hash+tables

Node failure/removal In the case of a node failure or removal, the data must be replicated on to the next successor node for survival. https://developers.seagate.com/display/kv/distributed+hash+tables

Node Lookup Since node 14 does not exist in the ring, the data stored for node 14 will be in the next successor, which in this case is node 15. https://developers.seagate.com/display/kv/distributed+hash+tables

Kinetic Open Storage Implementation DHT can be implemented with Kinetic drives with the use of an integrated Kinetic API that can directly store the key/value objects onto the drives. The use of Kinetic drives eliminates the need for servers to communicate with the drive. https://developers.seagate.com/display/kv/distributed+hash+tables

Hadoop Distributed File System

https://developers.seagate.com/display/kv/hadoop+distributed+file+system

Kinetic Open Storage Integration HDFS is integrated with Kinetic drives by providing a plug-in for the HDFS Client to communicate with the Kinetic drives. The integration of HDFS with Kinetic drives are independent of the Hadoop ecosystem and is designed to allow the client to choose whether to use the Default File System or the Kinetic File System for the HDFS application. With the Kinetic File System plug-in, the HDFS client no longer needs to communicate with the Datanode in order to communication with the storage devices. When needed, the HDFS client will communicate with the Namenode to obtain information about the file system's namespace and client's accessibility, before directly connecting to the Kinetic drives. The image on the right shows the architecture for the HDFS Kinetic integration. https://developers.seagate.com/display/kv/hadoop+distributed+file+system

Mixed With the use current HDFS architecture and conventional drives, when Datanodes fail, the drives attached to the nodes also fail and become no longer accessible. Having no Datanodes with the use of Kinetic drives, this allows failures to happen only at the drive level, which reduces chances of inaccessible data in a large scale environment. The comparison between conventional and Kinetic drives in HDFS are shown below. https://developers.seagate.com/display/kv/hadoop+distributed+file+system

Basho Riak+Kinetic drive Riak is a distributed NoSQL key-value data store with extra query capabilities that offers high availability, fault tolerance, operational simplicity, and scalability. https://developers.seagate.com/display/kv/basho+riak+cs

http://basho.com/about/customers/

Traditional Riak CS (Cloud Storage) is a highly scalable, open source distributed key/value store built on top of Basho Riak, designed to be used for public or private clouds, and for applications and services with high reliability demands. Riak CS API allow RESTful Get, Put and Delete operations for objects and buckets, along with a Map-Reduce function. The Riak CS API shards larges object sizes into 1MB chunks of key/value objects, which is sent to the Riak cluster to be streamed, stored, and replicated in the Ring. https://developers.seagate.com/display/kv/basho+riak+cs

Kinetic Implementation Replacing the drives in the Riak Node within The Ring with Kinetic drives allow access to any drives, thus any object. Integrating Kinetic drives to the Riak CS ecosystem allows the elimination of the object servers in the Riak Nodes, such any Riak CS API Server can access all the drives in any node. In the Riak Ring, the Object Server takes responsibility of a limited number of drives within the Riak Node, which restricts other Object Servers from accessing those drives for service client requests. The elimination of Object Servers can potentially allow any Riak CS Server Node to communicate with all Riak Nodes, thus access to all the drives. Data replication can potentially be performed using P2P Copy, such that Kinetic drives can communicate with each other independent of their Riak Node locations, with the absence of the Object Servers. https://developers.seagate.com/display/kv/basho+riak+cs

Evaluating the performance of Seagate Kinetic Drive Technology and its integration into the CERN EOS storage system Ivana Pejeva August 2015 CERN openlab Summer Student Report 2015

Goals of the project Goal: Performance evaluation of Seagate Kinetic drive technology and its integration into the CERN EOS storage system. CERN openlab is a unique public-private partnership that accelerates the development of cutting-edge solutions for the worldwide LHC community and wider scientific research EOS[1], a multi-petabyte disk storage [1]"Exabyte Scale Storage at CERN", Andreas J. Peters, Lukasz Janyst

EOS Storage System Three components: Management server(mgm) Message queue(mq) File storage services(fst) DAS vs. Ethernet Drive tech: Scalability DAS only support limited number of drives Ethernet is scaled independently

Kinetic Integration with EOS 32+4 encoding 10+2 encoding For comparisons a conventional EOS configuration (EOS DEV) with directly attached disks was tested storing two replicas on two individual FSTs. All FSTs were connected via 10GE.

Performance Benchmarks The graph for the 1 GE client shows that a plateau of upload rate (~90-100 MB/s) is reached for files larger than 64MB. There is a systematic few percent decrease of performance of Kinetic-10:2 vs. the conventional disk layout DEV. And a second few percent effect when changing from Kinetic- 10:2 to Kinetic-32:4. These effects are only visible for file sizes larger than 64MB

For a 10 GE client a plateau is reached with files larger than 256MB (~250-350 MB/s). There is no visible difference between the DEV and Kinetic-10:2 configuration, while the Kinetic-32:4 configuration involves a systematic performance loss around 10% for large files.

Thanks!