Cluster Setup and Distributed File System

Similar documents
I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

High Throughput WAN Data Transfer with Hadoop-based Storage

Distributed Filesystem

CA485 Ray Walshe Google File System

Using Hadoop File System and MapReduce in a small/medium Grid site

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

Distributed production managers meeting. Armando Fella on behalf of Italian distributed computing group

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Parallel Storage Systems for Large-Scale Machines

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

Distributed File Systems II

MapReduce. U of Toronto, 2014

Z RESEARCH, Inc. Commoditizing Supercomputing and Superstorage. Massive Distributed Storage over InfiniBand RDMA

Feedback on BeeGFS. A Parallel File System for High Performance Computing

The INFN Tier1. 1. INFN-CNAF, Italy

Emerging Technologies for HPC Storage

Parallel File Systems for HPC

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Dell EMC Microsoft Exchange 2016 Solution

Current Status of the Ceph Based Storage Systems at the RACF

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Cloudian Sizing and Architecture Guidelines

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Storage Evaluations at BNL

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

A Study of Comparatively Analysis for HDFS and Google File System towards to Handle Big Data

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

Hadoop/MapReduce Computing Paradigm

Certification Document macle GmbH IBM System xx3650 M4 03/06/2014. macle GmbH IBM System x3650 M4

Milestone Solution Partner IT Infrastructure Components Certification Report

Extremely Fast Distributed Storage for Cloud Service Providers

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Dell Reference Configuration for Hortonworks Data Platform 2.4

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

BigData and Map Reduce VITMAC03

Distributed Systems 16. Distributed File Systems II

Virtual Security Server

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Introduction To Gluster. Thomas Cameron RHCA, RHCSS, RHCDS, RHCVA, RHCX Chief Architect, Central US Red

EMC Business Continuity for Microsoft Applications

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Milestone Solution Partner IT Infrastructure Components Certification Report

Performance Comparisons of Dell PowerEdge Servers with SQL Server 2000 Service Pack 4 Enterprise Product Group (EPG)

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Accelerate Applications Using EqualLogic Arrays with directcache

Optimizing Local File Accesses for FUSE-Based Distributed Storage

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Data Transfers Between LHC Grid Sites Dorian Kcira

ISILON X-SERIES. Isilon X210. Isilon X410 ARCHITECTURE SPECIFICATION SHEET Dell Inc. or its subsidiaries.

SCS Distributed File System Service Proposal

NEW ARCHITECTURES FOR APACHE SPARK TM AND BIG DATA WHITE PAPER NOVEMBER 2017

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

The Future of Storage

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Application Acceleration Beyond Flash Storage

CS370 Operating Systems

Microsoft Office SharePoint Server 2007

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

GlusterFS and RHS for SysAdmins

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Analytics in the cloud

Map-Reduce. Marco Mura 2010 March, 31th

File Access Optimization with the Lustre Filesystem at Florida CMS T2

GlusterFS Architecture & Roadmap

Understanding StoRM: from introduction to internals

vstart 50 VMware vsphere Solution Specification

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

EMC ISILON X-SERIES. Specifications. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

Structuring PLFS for Extensibility

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

The Legnaro-Padova distributed Tier-2: challenges and results

Use of the Internet SCSI (iscsi) protocol

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Data storage services at KEK/CRC -- status and plan

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Scale-Out backups with Bareos and Gluster. Niels de Vos Gluster co-maintainer Red Hat Storage Developer

Ideal choice for light workloads

Surveillance Dell EMC Storage with Verint Nextiva

Lecture 11 Hadoop & Spark

iscsi Technology Brief Storage Area Network using Gbit Ethernet The iscsi Standard

Transcription:

Cluster Setup and Distributed File System R&D Storage for the R&D Storage Group

People Involved Gaetano Capasso - INFN-Naples Domenico Del Prete INFN-Naples Diacono Domenico INFN-Bari Donvito Giacinto INFN-Bari Armando Fella INFN-Pisa Silvio Pardi INFN-Naples Spinoso Vincenzo INFN-Bari Guido Russo INFN-Naples Incoming people are welcome!

Storage R&D Investigate usage of WN disks performance and applicability to apps/frameworks Hadoop/Proof/caching. Interactivity + usage of personal resources for personal analysis tasks File/Replica location. Investigate on storage systems able to do it natively Investigate simulation-based approach to define the minimum requirements for storage sites

Work on Storage R&d Work in progress: Testing both functionalities and performance File system under test: Lustre, Xrootd, HDFS, GlusterFS Testing new file systems with analysis jobs: Jobs coming from SuperB itself and CMS Testing several linux distribution in order to find out a usable NFSv4.1 Still no definitive solution found

COMMON STORAGE ARCHITECTURE CE Computing Element GridFTP SRM WN WN WN WN COMPUTING INFRASTRUCTURE DATA INFRASTRUCTURE GPFS SRM-Service I/O Serve EXT3

TIER2 ATLAS IN NAPLES 108 Nodes for a total of 1.056 cores Whereof 83 Worker Node for a total of 882 cores 19 Storage Node for a total of 144 cores 6 Services Nodes for a total of 30 core ~14% of Total Core for the storage infrastructure + Complex Network infrastructure to support traffic WN to SE. + 19 Server and Storage System in fiber optic

TIER2 ALICE-CMS IN BARI 135 Nodes for a total of 1350 cores 20 Disk Server => ~700TB of storage Each server has multiple Gbps connection 1 x 10 Gbps storage server dedicated for tests The network infrastructure is able to provide 1Gbps to each WN ~12% of Total Core for the storage infrastructure

Masahiro Tanaka SuperB Ferrara Meeting 2010 8

NEW POSSIBLE SCENARIOUS? CE Computing Element WN WN WN WN GridFTP DATA INFRASTRUCTURE COMPUTING INFRASTRUCTURE SRM To day in a 2 Unit server we can have more than 10TB of disk and 48 core We can design a cluster setup to integrate data storage and computing services in the same nodes. Computer facility as a farm of high density Box in term of core and disk

HIBRID SOLUTION CE Computing Element GridFTP GridFTP SRM DATA CACHE AND COMPUTING INFRASTRUCTURE AREA CACHE WN WN WN WN SRM DATA INFRASTRUCTURE GPFS SRM-Service EXT3 I/O Serve

10Gbit/s Farm Setup in Naples DELL Server PowerEdge R510 biprocessor quadcore Intel(R) Xeon(R) Memory 32Gbyte RAM, a PERC H200 controller RAID and four local disk of 500GB in Raid0 configuration. Broadcom NetXtreme I 57711 double port SFP+ 10GbE NIC with ofload TOE e iscsi, PCIe x8. OS SL 5.5 Interested Peoples can use the cluster for Test

The configuration of each node 10 Gbps 10 Gbps Twinax 2x CPU 2,13GHz 4Core, 4,80GT/s + 4,80GT/s QPI 8x 4GB Dual Rank 1066MHz 8x HDD 500GB 7200rpm SATA Hot-Plug 3,5 1 Gbps 1 Gbps UTP... 8x one node (12 nodes in this test) 1 2 96 core (8x 12), 384 GB Ram (32x 12), 48 TB HD (4x 12)

FILE SYSTEM IN EVALUATION In this presentation we show first benchmark obtained with two Open source solution currently in investigation. Hadoop File System GlusterFS 13

HDSF CARACTERISTICS HDSF is the Hadoop File system. HDFS is designed to be highly fault-tolerant An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system s data Automatic data Replication between the Cluster Nodes with a replica factor configured by the system administrator The detection of faults and quick, automatic recovery is a core architectural goal of HDFS. HDFS doesn't provide a native POSIX interface, but we have a fuse module HDFS doesn't provide multi-streaming writing and random-writing but just append. HDFS give best performance with map/reduce framework

GLUSTERFS CARACTERISTICS GlusterFS GlusterFS is an open source, clustered file system capable of scaling to several petabytes and handling thousands of clients. It aggregates various storage bricks over Infiniband RDMA or TCP/IP network. GlusterFS is based on a stackable user space design and can deliver exceptional performance for diverse workloads. GlusterFS is a standard posix file system. It works as a supervisor on already existing posix filesystem (i.e. Ext3) with the functionality of data distribution and data replica of complete files among the Brick nodes. GlusterFS does not provide namenode, it use distributed hash and the posix extention to manage the file replication between the nodes. GlusterFS Geo-replication provides a continuous, asynchronous, and incremental replication service from one site to another over Local Area Networks (LANs), Wide Area Network (WANs), and across the Internet.

Feature FILE SYSTEM IN EVALUATION HADOOP FILE SYSTEM (HDFS) Open Source YES Java Yes in C Metadata Architecture Single Name Node for HDSF infrastructure Posix Interface Non-native Native GLUSTER-FS No Name Node Data Distribution feature Data are split in block of 64MB and distributed on the DiskNode Tree FS: Distributed, Replica and Striped. Data replica Automatic at Block level Automatic at file level or at block level in the Striped mode. 16

DOUBLE FILE SYSTEM CONFIGURATION CURRENT SETUP: OS SL 5.5 with glite 3.2 Middleware. The Nodes are ready to be include in the Atlas TIER2 and available for SuperB FastSim Program (to do in the next week) IMPLEMENTATION OF THE HADOOP DISTRIBUTED FILE SYSTEM ON 8 NODE IMPLEMENTATION OF GLUSTERFS DISTRIBUTED FILE SYSTEM ON 9 NODE A HDFS area composed by 8 DataNode and one Name node, the fuse module. Each Worker Node of the cluster, mount and use the filesystem through the posix interface. A GlusterFS area of 4.5TB row configured as distributed filesystem with replica factor 3 for a total of 1.5TB 17

HDSF-HADOOP IMPLEMENTATION HDSF Name Node 1 NAME NODE 8 WORKER NODE / DATANODES that share the local disk All the DataNode mount HDSF Posix Inferface through Fuse module WN1/DataNode WN2/DataNode WN../DataNode HDFS FILE SYSTEM 10 GIGABIT SW WN8/DataNode 18

GLUSTER IMPLEMENTATION GLUSTERFS IS A FILE CLUSTER FILE SYSTEM THAT WORK WITHOUT DATA-NODES. IT CAN WORK IN 3 DIFFERENT CONFIGURATION: DISTRBUTED, STRIPED AND REPLICA. IT USE THE POSIX EXENSION TO GUARANTEED THE CONSISTENCE OF FS. NO NAME NODE WN1/DataNode 9 WORKER NODE / DATANODES that share the local disk All the DataNode are client and server. WN2/DataNode WN../DataNode GLUSTER 10 GIGABIT SW The Gluster FS is configured with replica 3 WN9/DataNode 19

GlusterFS Read/Write Test Basic Benchmark performed from a single nodes. We used: IO-Zone for GlusterFS and Benchmark provide by HDFS. HDFS shows strong limits when used ad General purpose FS without Map/Reduce Framework 20

Job Analysis Benchmark Performance test executed through a SuperB Analysis job provided by Alejandro Perez. It executes an analysis test on a set of simulated root files previously replicated in the GlusterFS and Hadoop volumes. The following compare the execution time and the bandwidth occupation during a set of different job execution setup 21

Jobs on a Single Node As Benchmark we send test job that analyze 10 Files for a total of about 3GB File System 1 Job 2 Jobs 4 Jobs 8 Jobs GlusterFS 115s 116s 114s 135s HDFS 141s 145s 148s 198s 22

Network traffic Single Job max rate 59MB/s 4 Jobs max rate 197MB/s 2 Jobs max rate 113MB/s 8 Jobs (max box size) max rate 333MB/s 23

7/8 Job on 9 nodes File System 7 Jobs 8 Jobs GlusterFS 135s 155s HDFS 205s 225s HDSF vs GlusterFS 7 jobs per Node HDSF vs GlusterFS 8 jobs per Node TIME TIME 12% of core dedicate for storage No core dedicate for storage 24

Network Traffic In this test we sent 8 analysis jobs for node We achieve an Aggregate Network occupation of about 10Gbit/s In the Cluster. 25

work on storage R&d GlusterFS Hadoopfs Resilient to failure Posix compliance Performance Community Scalability Metadata performance Quota

Resume Bandwidth doesn't represent a bottleneck but 10Gbit/s between nodes is needed. For 48 Core Nodes we expect a throughput of 1.5GB per Node File system choice is a crucial aspect to guaranteed the scalability GlusterFS is a very interesting product that show promising feature, in term of reliability and Performance. HDFS as general purpose FS doesn't show exiting performance.

To Do Test GlusterFS performance with other SuperB code and with other setup. Compare Our setup With standard ATLAS like setup by implementing a Storage Element on 10Gbit/S Net Test GlusterFS and HDSF for data movement Test the HDSF and GlusterFS on geographical scale through the Naples, Bari and Catania sites.

Interesting Issue Already available grid data movement solutions Dirac Phedex PD2P This evaluation will be interesting in order to understand which one could be useful for SuperB computing model