Big Data in OpenStack Storage

Similar documents
Storing in the Cloud: What You Need to Know Bret Piatt Rackspace Hosting

Next-Generation Cloud Platform

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

RACKSPACE ONMETAL I/O V2 OUTPERFORMS AMAZON EC2 BY UP TO 2X IN BENCHMARK TESTING

Course 20533B: Implementing Microsoft Azure Infrastructure Solutions

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

ITRI Cloud OS: An End-to-End OpenStack Solution

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

COP Cloud Computing. Presented by: Sanketh Beerabbi University of Central Florida

Ceph vs Swift Performance Evaluation on a Small Cluster. edupert monthly call Jul 24, 2014

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

OpenNebula on VMware: Cloud Reference Architecture

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

IBM Storage Solutions & Software Defined Infrastructure

CLOUD 102 The Long Range Forecast is Cloudy, with a Chance Of Virtualization Ron Clifton, James Kelso & Thomas Crowe III

Building a government cloud Concepts and Solutions

Renovating your storage infrastructure for Cloud era

Quobyte The Data Center File System QUOBYTE INC.

Cloud Computing introduction

Cloud Open Source Innovation on Software Defined Storage

Availability for the modern datacentre Veeam Availability Suite v9.5

Using Cloud Services behind SGI DMF

MidoNet Scalability Report

Kinetic Open Storage Platform: Enabling Break-through Economics in Scale-out Object Storage PRESENTATION TITLE GOES HERE Ali Fenn & James Hughes

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Transform Your Data Protection Strategy

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

An Open Architecture for Hybrid Delivery

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

Beyond 1001 Dedicated Data Service Instances

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

STATE OF MODERN APPLICATIONS IN THE CLOUD

powered by Cloudian and Veritas

What is. Thomas and Lori Duncan

Performance Is Not a Commodity Nathan Day Chief Scientist and Co - Founder

Modernizing Business Intelligence and Analytics

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Modern Data Warehouse The New Approach to Azure BI

Silicon House. Phone: / / / Enquiry: Visit:

HCI: Hyper-Converged Infrastructure

SwiftStack and python-swiftclient

OPENSTACK PRIVATE CLOUD WITH GITHUB

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

Hyperconverged Cloud Architecture with OpenNebula and StorPool

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

Network Implications of Cloud Computing Presentation to Internet2 Meeting November 4, 2010

Module Day Topic. 1 Definition of Cloud Computing and its Basics

When (and how) to move applications from VMware to Cisco Metacloud

Ordering and deleting Single-node Trial for VMware vcenter Server on IBM Cloud instances

Leveraging Traditional Technologies in Non-Traditional Ways

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

OpenStack Seminar Disruption, Consolidation and Growth. Woodside Capital Partners

Move, manage, and run SAP applications in the cloud. SAP-Certified Infrastructure from IBM Cloud

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Oracle IaaS, a modern felhő infrastruktúra

Using AWS Data Migration Service with RDS

EMC Solution for VIEVU Body Worn Cameras

Emerging Technologies for HPC Storage

AMD Opteron Processors In the Cloud

Red Hat OpenStack Platform 10 Product Guide

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

PUBLIC AND HYBRID CLOUD: BREAKING DOWN BARRIERS

Transform Your Business To An Open Hybrid Cloud Architecture. Presenter Name Title Date

Faculté Polytechnique

IBM Compose Managed Platform for Multiple Open Source Databases

Amazon. Exam Questions AWS-Certified-Solutions-Architect- Professional. AWS-Certified-Solutions-Architect-Professional.

Decentralized Distributed Storage System for Big Data

Cloud Storage Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Kinetic drive. Bingzhe Li

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

WHITE PAPER. RedHat OpenShift Container Platform. Benefits: Abstract. 1.1 Introduction

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

File system, 199 file trove-guestagent.conf, 40 flavor-create command, 108 flavor-related APIs list, 280 show details, 281 Flavors, 107

Backup & Recovery on AWS

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

SafeNet ProtectApp APPLICATION-LEVEL ENCRYPTION

Data Centers and Cloud Computing. Data Centers

High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack

Safe Harbor Statement

Veeam Backup & Replication on IBM Cloud Solution Architecture

Address new markets with new services

Introducing SUSE Enterprise Storage 5

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Intel Cluster Ready Allowed Hardware Variances

Getting to Know Apache CloudStack

Dell EMC Surveillance for IndigoVision Body-Worn Cameras

Backup Appliances. Geir Aasarmoen og Kåre Juvkam

HPSS Treefrog Introduction.

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7

MyCloud Computing Business computing in the cloud, ready to go in minutes

EMC Backup and Recovery for Microsoft SQL Server

Introduction To Cloud Computing

Transcription:

Big Data in OpenStack Storage Ivan Tomašić, Aleksandra Rashkovska, Matjaž Depolli, Roman Trobec Department of Communication Systems Jožef Stefan Institute, Ljubljana, Slovenia

Outline Introduction Swift in the CLASS project Experience with installation of OpenStack Storage Hybrid Cloud Storage (OpenStack Storage and AmazonS3) Conclusion

Experience with installation of OpenStack Storage Ivan Tomašić Ivan.Tomasic@ijs.si

Introduction OpenStack Object Storage in the CLASS project OpenStack Object Storage (Swift) installation: Hardware and software platform Current Swift installation on IJS Installation details Work on progress Benchmarks Conclusion

Swift in the CLASS project IJS group is responsible for CLASS/Petabyte storage Until M6 we identified requirements for IaaS needed for CLASS/Petabyte system and recommendations for its users We determined 12 main criteria for comparison of the available systems: scalability, data model, failure handling, compatibility, security, Three types of storage systems were distinguished regarding their data model: relational (SQL databases), NoSQL databases and Systems that store unstructured data objects.

Swift in the CLASS project cont. We analyzed open source systems: WALRUS (Eucaliptus), SWIFT, LUSTRE, TWISTED STORAGE, TASHI, SECTORE/SPHERE, HADOOP, MYSQL CLUSTER, MONGO DB And commercial cloud-based storage: MICROSOFT AZURE, ORACLE, AMAZON S3, GOOGLE We selected SWIFT (Object storage) and HADOOP (DFS) for test implementation and testing. We foresee different application scenarios: A single user with high storage requirements (private) Multiple users of shared data A single user with dynamic amount of data (need for extension of private storage with public storage) Same as above, but for shared data Some other scenarios could appear?

Hardware and software platform Cluster on Jožef Stefan Institute (JSI): 37 nodes: Master node Working nodes Each node is independent machine Nodes are connected with Gigabit Ethernet k1 k37 Master WAN

Hardware specifics and operating system The base is the HP DL160 G6 server: up to 2 processors up to 18 x 16 GB of RAM Processor Intel Xeon 5520 (4 cores and hyperthreading 6 GB RAM (DDR3, ECC, 1060 MHz, 3 channels) 500 GB hard disk (3.5 in, SATA, 7.2 K rotations/min) RAID - only the master node has 4 HDs: RAID 1+0 with 917 GB of storage space altogether All the nodes have 64-bit Ubuntu Server 11.04

Swift installation on IJS Current Swift installation on IJS has the following configuration: Master node is running: Proxy server Authorization server Working nodes k10, k11, k12, k13, k14 are each running: Object services Container service Account services

Current IJS Installation details Number of partitions: 2^9=512 Number of replicas: 3 Number of zones: 5 XFS file system (XATTRS) recommended Authorization: Swauth TempAuth is used at the present time

Work in Progress Extend Swift to the whole cluster Perform additional tests and benchmarks (Hadoop) Multiple data centers installation: Turbo Institute IJS Proxy, Auth WAN k38 k k1 k37

First Benchmark Results Test procedure: Writting nad reading Two types of access: From master machine (local host) From distant machine - 1Gb/s connection Expected values: Speed from master machine > speed from distant machine Writting and reading speeds > 50 MB/s

First Benchmark Results - Reading

First Benchmark Results Writing

Benchmark Results - Comments Reading and writting from distant machine is a bit faster - unexpected We noticed that while read or writte operation one core on the master node is at 100% utilization We expected higher speeds System fine tuning -> reruning benchmarks

Conclusion Difference in bandwidth up to 10x between Swift and Hadoop in reading and writing to the same architecture. We have to find out more details about performance Perform additional tests and benchmarks, also between multiple data centers installations Result system evaluation and prescription for fine tuning

Hybrid Cloud Storage (OpenStack Storage and AmazonS3) Aleksandra Rashkovska Aleksandra.Rashkovska@ijs.si

Outline Cloud Storage: public, private, hybrid. Approaches for migration to public cloud storage: Three routes to implementation. OpenStack Storage and AmazonS3: Implementation examples.

Public vs. Private Cloud Storage Public Cloud Storage Cloud storage option offered by a fast growing list of service providers (Amazon, AT&T, Iron Mountain Inc., Microsoft Corp., Nirvanix Inc., Rackspace Hosting Inc.) Infrastructure: usually low-cost storage nodes with an objectbased storage stack. Data in the cloud is accessed mostly via REST and SOAP. Redundancy is achieved by storing each object on at least two nodes. Usage is charged on a dollar-per-gigabyte-per-month basis. Designed for massive multi-tenancy that enables isolation of data, access and security for each client. Private Cloud Storage Runs on dedicated infrastructure. Usually for a single tenant. Do not scale to the degree public storage clouds can.

Hybrid Cloud Storage The best of both worlds

Hybrid Cloud Storage Hybrid cloud storage when traditional storage systems or private cloud storage are supplemented with public cloud storage. Key requirements : The hybrid cloud storage must behave like homogeneous storage. The hybrid cloud storage should be transparent. Mechanisms to keep active and frequently accessed data on-premise and push inactive data into the cloud (policy engines to define the circumstances when data gets moved into or pulled back from the cloud)

Migration to public cloud storage Three routes to implement hybrid cloud storage Via cloud storage software that straddles onpremise and public cloud storage (Cloud storage software implementation) Via cloud storage gateways (Cloud storage gateways implementation) Through application integration (Application integration implementation for hybrid cloud storage)

Cloud storage software implementation Only possible today if the internal and external storage clouds run the same cloud storage software. Standardization initiatives in progress: Storage Networking Industry Association (SNIA) Cloud Data Management Interface (CDMI) Cloud software vendors: Nirvanix - Nirvanix hnode internal cloud storage complemented with Nirvanix Storage Delivery Network cloud storage EMC Corp. s Atmos - software-based, hardware-agnostic, object-based storage stack. EMC sells Atmos to enterprises and providers, so on-premise Atmos deployments can federate with Atmos services in the cloud. EMC s most prominent customer is AT&T.

Cloud storage gateways implementation Cloud storage gateways sit between on-premise storage and public cloud storage. Cloud gateways perform data migration of data from on-premise storage into public cloud storage and vice versa, usually via policy engines. Cloud storage gateways vendors: Cirtas Systems Bluejet Cloud Storage Controller - blockbased cloud storage gateway appliance; currently integrated with public cloud storage services from Amazon and Iron Mountain. Riverbed Technology Riverbed Whitewater - cloud backup appliance offering inline data deduplication; currently integrated with the AT&T and Amazon storage clouds.

Application integration implementation for hybrid cloud storage All public cloud storage services offer APIs to interact with private cloud storage software and cloud gateways. Cloud storage APIs enable custom in-house and commercial applications to tap into public cloud storage via REST interfaces. Example: Backup application vendors have started to add public cloud storage support to their backup suites.

OpenStack Storage and AmazonS3: Cloud storage gateway implementation example Automating OpenStack Swift backup to Amazon S3 using SMEStorage Cloud Storage gateway Requirements: Personal Lifetime Cloud or an Organisation Cloud File Server SMEStorage Account, or SMEStorage Open Cloud Platform Appliance SMEStorage Cloud Dashboard: 1. Add Cloud Provider 2. Select backup cloud provider 3. Enter Amazon S3 Details

OpenStack Storage and AmazonS3: Cloud storage APIs example OpenStack AmazonS3 Compatible API Swift3 middleware emulates the Amazon S3 REST API on top of Swift. Jclouds BlobStore API Jclouds - open source library for the cloud in java and clojure. BlobStore - Portable means of managing keyvalue storage providers: AmazonS3 and OpenStack supported.

Conclusion Implementing hybrid clouds in data storage environment can be done in three different ways. Choose to implement hybrid clouds via cloud storage software, cloud storage gateways or through application integration - all viable options with several providers and products to choose from. Weigh your options and choose the hybrid cloud approach that best suits your storage environment.

Thank You for your attention.