RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Similar documents
RAIDIX Data Storage Solution. Highly Scalable Data Storage System for Postproduction Studios and Broadcasting Corporations

RAIDIX Data Storage System. Entry-Level Data Storage System for Video Surveillance (up to 200 cameras)

HIGH PERFORMANCE STORAGE SOLUTION PRESENTATION All rights reserved RAIDIX

RAIDIX Data Storage Solution. Data Storage for a VMware Virtualization Cluster

An Introduction to GPFS

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

CA485 Ray Walshe Google File System

Architecting Storage for Semiconductor Design: Manufacturing Preparation

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

SATA RAID For The Enterprise? Presented at the THIC Meeting at the Sony Auditorium, 3300 Zanker Rd, San Jose CA April 19-20,2005

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Microsoft Office SharePoint Server 2007

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Isilon Performance. Name

Coordinating Parallel HSM in Object-based Cluster Filesystems

Data Movement & Tiering with DMF 7

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

IBM System Storage DCS3700

RAIN: Reinvention of RAID for the World of NVMe

Introducing SUSE Enterprise Storage 5

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Rethink Storage: The Next Generation Of Scale- Out NAS

Object Storage: Redefining Bandwidth for Linux Clusters

A GPFS Primer October 2005

Cloudian Sizing and Architecture Guidelines

Distributed Filesystem

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

IBM Storwize V7000 Unified

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

RAIDIX 4.5. Product Features. Document revision 1.0

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Cluster Setup and Distributed File System

Parallel File Systems. John White Lawrence Berkeley National Lab

DDN About Us Solving Large Enterprise and Web Scale Challenges

Virtual Security Server

Assessing performance in HP LeftHand SANs

Storage for HPC, HPDA and Machine Learning (ML)

Decentralized Distributed Storage System for Big Data

IBM TotalStorage Enterprise Storage Server Model 800

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Emerging Technologies for HPC Storage

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

WHITE PAPER STORNEXT 4K REFERENCE ARCHITECTURES. Optimized Storage Solutions Based on Comprehensive Performance Testing

Xyratex ClusterStor6000 & OneStor

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Ceph: A Scalable, High-Performance Distributed File System

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

IBM InfoSphere Streams v4.0 Performance Best Practices

Customer Success Story Los Alamos National Laboratory

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

From an open storage solution to a clustered NAS appliance

Solution Brief: Archiving with Harmonic Media Application Server and ProXplore

SolidFire and Ceph Architectural Comparison

The Leading Parallel Cluster File System

The Google File System

IBM řešení pro větší efektivitu ve správě dat - Store more with less

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

Dell PowerVault MD Family. Modular Storage. The Dell PowerVault MD Storage Family

StarWind Storage Appliance

Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Introducing Panasas ActiveStor 14

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

Veritas NetBackup on Cisco UCS S3260 Storage Server

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

Omneon MediaGrid Technical Overview

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

FOUR WAYS TO LOWER THE COST OF REPLICATION

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

SONAS Best Practices and options for CIFS Scalability

Feedback on BeeGFS. A Parallel File System for High Performance Computing

MOHA: Many-Task Computing Framework on Hadoop

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

IBM FileNet Content Manager and IBM GPFS

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

The Google File System

Distributed File Systems II

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Synology High Availability (SHA)

Cisco UCS B440 M1High-Performance Blade Server

Optimizing Software-Defined Storage for Flash Memory

White paper Version 3.10

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

An introduction to GPFS Version 3.3

Dell EMC CIFS-ECS Tool

Transcription:

RAIDIX Data Storage Solution Clustered Data Storage Based on the RAIDIX Software and GPFS File System 2017

Contents Synopsis... 2 Introduction... 3 Challenges and the Solution... 4 Solution Architecture... 6 Configuration... 7 Technical Characteristics... 7 Business Impact... 8 About RAIDIX... 8 Synopsis The RAIDIX data storage software allows a system administrator to build highperformance storage clusters from multiple nodes. Integrated with distributed file systems like HyperFS, Intel Lustre or GPFS, RAIDIX ensures flexible scale-out of existing infrastructures up to multiple exabytes and 64 nodes. In this document, we ll analyze the technical interoperation of RAIDIX and the GPFS cluster file system, benefits of the comprehensive solution for servicing resource-intensive applications in M&E and HPC, as well as architecture, configuration, and other technical parameters of GPFS and RAIDIX. All rights reserved. RAIDIX, 2017 2

Introduction RAIDIX is a high-performance, reliable, high-density data storage system managing high workloads in Video Surveillance, Media & Entertainment, Enterprise, and other data-rich segments. RAIDIX-powered data storage systems reveal record performance when processing hundreds of parallel streams, ensures full integrity of large data volumes, and uninterrupted system functioning. RAIDIX Data Storage supports an Active-Active cluster out-of-the-box without any external devices. Bundled with the GPFS cluster system (General Parallel File System) developed by IBM, RAIDIX helps the system administrator to set up a storage cluster out of multiple nodes, building on commodity-off-the-shelf hardware. GPFS is utilized in Top 500 high-performance supercomputers. GPFS stands out from other cluster file systems by enabling shared high-speed access to files from applications executed on multiple cluster nodes, under various operating systems including RAIDIX. Aside from data storage features, GPFS provides cluster management and administration capabilities and allows concurrent access to file systems from remote GPFSclusters. Further on, we ll review tasks and scenarios that require implementation of a highly scalable data storage system powered by RAIDIX and GPFS, as well as configuration and complex architecture issues. All rights reserved. RAIDIX, 2017 3

Challenges and the Solution Given the snowballing growth of data volumes, film companies, TV channels, CCTV operators and users require highly scalable solutions. It s not uncommon that the system deals with petabytes of data and entails a multi-node storage cluster. When faced with this task, the system administrator comes across limitations of the traditional file systems: Metadata and data are stored on the same partitions Files are scattered across the partition, causing access latencies No protection against defragmentation Low scalability by capacity, performance, file number, directory depth, etc. Lack of native cross-platform support. Aside from high performance and low latencies, the list of key features expected from a fully functional Scale-Out solution includes: Single namespace for multiple storage clusters Concurrent access via versatile protocols File and block access to the same data. All these requirements are fulfilled by the GPFS system. A GPFS-based architecture reveals high performance, enabling shared data access from multiple workstations. Most file systems are designed for single-server environments, so adding yet another file server doesn't make a difference in terms of performance. GPFS provides greater input/output performance as it combines into stripes the data blocks from specific files stored across multiple disks and performs parallel reading of these blocks. Other GPFS features include high availability, support for heterogeneous clusters, disaster recovery, security, DMAPI, HSM, and ILM. All rights reserved. RAIDIX, 2017 4

A file written into a file systems is split into several blocks of specified size, under 1MB each. The blocks are distributed across several nodes of the file system, which adds up to higher read/write speed for each file (resulting from high aggregated throughput of multiple physical drives). At that, the file system does not guarantee full faulttolerance: a single drive failure may cause data loss. To ensure data integrity, the system administrator should employ RAID-controllers for the file system nodes: multiple copies of each block are written to physical drives on the separate nodes. GPFS supports the following features: Distributed metadata, including the directory tree. The file system has no single directory controller or index server. Efficient indexing of directory records for large-scale directories. Many file systems support a limited number of files in a directory (as a rule, 65536). GPFS does not have these limitations. Distributed locking. Full support for Posix filesystem semantics, including locking for exclusive file access. Flexible partition management (Partition Aware). Network failure may cause file system splitting into two or more group nodes. At that, the groups may only view their respective nodes. This can be tracked via the heartbeat protocol: in case of partitioning, the file system remains functional for the largest partition. Ultimately, most workstations will keep on functioning even when the file system goes degraded. GPFS may be serviced online. Most modifications in the system (adding new drives, balanced data distribution) may be performed on-the-fly. Thus, the file system provides a higher availability level in a supercomputing cluster. The RAIDIX software is a high-performance platform for deploying GPFS as a standalone storage node or a storage node bundled with an NSD server. The combination of RAIDIX and NSD stands out as a cost-effective solution that helps decrease equipment expenses and use the same hardware resources for the storage system and GPFS servers. All rights reserved. RAIDIX, 2017 5

Solution Architecture The RAIDIX software can be configured to operate as a dual controller cluster (Pic. 1). Pic. 1: The RAIDIX architecture In this scenario, the system includes two separate nodes for point-to-point connections: 1 GbE for HeartBeat 4 x SAS 12G for write cache synchronization (CacheSync) Both nodes are connected to JBOD enclosures through 4 SAS 12G ports for each node. In this architecture, we recommend using Infiniband 100GbE as a hardware interface. All rights reserved. RAIDIX, 2017 6

In case the system requires a large number of client connections without any dedicated client software installed a NAS Gateway may be employed for data transmission. Configuration RAIDIX is installed on both AH-RM212-SX12 nodes and configured according to the Administrator s Guide. GPFS packages should also be installed on both nodes operated by RAIDIX. You may find a step-by-step guide in the GPFS tech documentation. RAIDIX provides an opportunity to path through the volumes from one RAIDIX node to the other via the sync channel and configure GPFS using mpath devices. Technical Characteristics Based on IOzone and fio-test results, the overall sequential performance on a SAS back end may reach 12 13 GB/s when writing with 1MB blocks. These showings are available on using 120 NL-SAS HDDs. These drives are distributed across 12 RAID6i (8D+2P) arrays. All rights reserved. RAIDIX, 2017 7

Business Impact A comprehensive RAIDIX GPFS solution allows system integrators to utilize a multitude of storage nodes, distributing data dynamically and balancing workloads along the nodes. The solution architecture enables adding new storage nodes upon request with no need for data migration or system re-configuration. The RAIDIX technology in combination with the GPFS file system complies with the high performance and fault-tolerance requirements, and ensures shared access to video content from multiple workstations. The use of the RAIDIX technology allows the user to minimize hardware overheads when building a storage cluster by providing effective scale-out of existing infrastructures with no downtime or performance slump. About RAIDIX RAIDIX (www.raidix.com) is a leading solution provider and developer of high performance storage systems. The company s strategic value builds on patented erasure coding methods and innovative technology designed by the in-house research laboratory. Compatible with commodity-off-the-shelf server hardware, RAIDIX commits to resilient storage with high throughput, robust performance and low latency. RAIDIX Global Partner Network encompasses system integrators, storage vendors and IT solution providers. The latter offer RAIDIX-powered storage products focused on professional applications. RAIDIX delivers storage solutions to Enterprise, Media & Entertainment, Video Surveillance, High-Performance Computing (HPC) and other datarich industries. All rights reserved. RAIDIX, 2017 8